The a16z Show - Unlocking Creativity with Prompt Engineering

Starting point is 00:00:00 If you think about the next layer, is it's still quite hard to describe things with words. Designers, when they do work for clients, like it's one of their pet peeves because clients don't like it, but they can't explain why. With every new technology, some jobs are lost while others are gained. And while people often focus on the former, in this episode we're highlighting the latter, a highly creative role that emerges alongside AI, the prompt engineer. Until AI can close the loop of its own, each tool still requires a set of, of prompts. And just like a composer feeds an instrument, a set of notes to play, a prompt engineer feeds the AI a map of what to produce. And if we know anything for music, it's that composing

Starting point is 00:00:40 great music takes great skill. So in this episode, we dive into the emerging importance of prompting, the early learnings and how to do it effectively, and also where this field might be heading. And we do so with Guy Parsons. Guy has been an early mover on the text image AI space, having written the Dolly 2 prompt book in July of last year. So will the prompt engine, be more like the highly sought-after DevOps engineer or a proficiency like Excel that you find on every resume. Listen in to hear guys take. By the way, we're thinking of running a prompt competition coming up.

Starting point is 00:01:12 So if you think you have what it takes, email us at podpitches at A660.com with the subject, prompt engineer. As a reminder, the content here is for informational purposes only. Should not be taken as legal business tax or investment advice or be used to evaluate any investment or security and is not directed at any investors or potential investors in any A16Z fund. For more details, please see A6Cc.com slash disclosures. Guy, welcome to the show. Thank you for having me. I'm excited to be here.

Starting point is 00:01:52 When we originally reached out to you, it was around six months ago and you had just written something called your prompt book. Why don't you give everyone a little bit of an idea of what that prompt book was, what it is now, and also what prompted you to want to write it in the first place? This was in the initial the heyday of Dali 2, which was Open AI's text image model. When it came out, they rolled it out to a few test people at a time. They were super cautious about how it might be misused, how it could end up having a backlash, all these kinds of things, which then only increased the sense of people wanting to get their hands on this thing.

Starting point is 00:02:30 Because at the time, this was pre-things that you might think of now as stable diffusion, mid-journey kind of predated those by some small margin and seemed way ahead of anything people have tried using before. So yeah, if you've used a text image AI, by now, you know it's basically a text box and it all comes down to what you type in. It doesn't have buttons and all the kind of controls you might expect when you lock into something like Photoshop. So the question then becomes like a lot of people,

Starting point is 00:02:58 once mind goes blank or you don't actually know the name or the words of what you're trying to type in, right? If you've actually been to art school or you're up on your art history or in your design language, then you've probably got a head start on everyone else. But on places like Twitter and Reddit, there are people posting these amazing images. But because of the nature of social media, it's all lost. So I started trying to like collect these cool examples and these cool terms people were using to create these amazing visual effects.

Starting point is 00:03:25 So I started putting everything in it essentially like a slide deck. By the time I'd copied and pasted all these cool things I've seen, there was 80, 100 slides long, something like that. So that I rather grandly called it a book and shared it online. And it's just a jumping off point for people to realize the kind of stuff at the time that these tools were just about becoming capable of. Obviously, now they're capable of these. Even more advanced. And we'll get into that because within six months, it's crazy to see how these tools, the way people are using these tools,

Starting point is 00:03:56 how that's all changed in a matter of, again, just six months, it feels like yesterday when we didn't even have access to this. But this idea that these are tools and just like any other tool, person A versus person B may not get the same result. They may not have the same understanding of how to leverage the tool. And so before we get into maybe the tips and tricks that you've learned, I just want to give the audience a broad sense of how much time you've spent within the bowels of mid-journey, Dali, stable diffusion. Like, if you could give an estimate, how much time do you think you've spent kind of mastering this idea of prompting? I wouldn't say I'm a master in any sense. It's like so engaging and interesting to experiment with these calls. So, you know, like in the last six months, sure, like a couple of hundred hours.

Starting point is 00:04:45 What I really admire is people that are using these tools to create this like real body of work where they can really like pursue a direction to discover what's possible. I think I saw a thread where in a, I think it's a mid-journey, you can get it to tell you how many prompts you've ever done. and there are people in there thousands, hundreds of thousands. Yeah, and I appreciate how humble you are, but I think it's one of those scenarios where, again, we're six months in. You know, a parallel is when there's a new coding language, and then you see people write job descriptions for developers

Starting point is 00:05:14 looking for someone with five years' experience when that particular language has only been around for six months or a year. And so, yes, I don't think anyone could definitively say they're an expert in prompt engineering, partially because it's only been around for so long. But I do think you've at least shared a lot more than the average person. And given your experience with these tools,

Starting point is 00:05:34 I'm curious if you see a parallel skill set where you can kind of compare prompt engineering to learning to code. Is it similar to being able to storytell effectively? Is it similar to being able to process numbers in an Excel sheet? Is there a parallel skill set where it reminds you of, you know, something you've done before? I think there was an era. I don't know if we're still in it,

Starting point is 00:05:57 where there was a certain category of person who could consider themselves, was like good at Googling stuff? Do you know that kind of like, oh, file type this? And there's this big debate over whether, especially in text it image, you know, is there really like any artistry to it? For me, I'm not so sure because I'm no artist. But there's definitely something. It's always about discovering an image that's already out there.

Starting point is 00:06:19 You've just got to find the words that summon it forth as if you're kind of navigating like an infinite Pinterest of things that haven't quite existed until you manifested them. Well, I mean, to that point. We have so much information online. I feel like that is a skill set, even before these AI tools. Like, I used to work on a product called Trends, and that really was about using the right tools like subreddit stats or ATRAFs or other data sets online and learning to parse them

Starting point is 00:06:44 and learning to surface what other people find interesting. But let's get into the nitty-gritty. Like, you wrote this prompt book. You've been playing around with these tools for quite some time. Are there certain learnings, maybe the 80-20 approach of becoming a good prompt engineer in terms of things that you think are really valuable to understand. Maybe it's the prompt length. Maybe it's using certain modifiers within your prompt.

Starting point is 00:07:08 Maybe it's just like a framework for thinking about prompting. Is there anything that's surfaced that you think would be really valuable to someone who's just starting out with prompting? Oh, yeah. Like, I think if you've never used one before, like the best way to explain how they work at the moment, which is, again, always shifting and something else we can talk about, is to always like describe something as if it already exists. So imagine that it's an image in some kind of downloadable clip art library or a photography gallery.

Starting point is 00:07:39 And you know, someone's written underneath, oh, this is a fine example of a early modern photography shot. And those are the kind of descriptions that you're trying to kind of mimic to tell these tools what you're looking for. And it also gives it like a natural sense of why these tools are bad at some things. and the kind of problems that don't really work. Because if there's a like, let's say, some archive image of some women celebrating on the steps of a church in 1972, it would have that kind of caption, where they never go, the woman on the left is wearing a yellow hat. The woman on the right is wearing, you know, they just don't go into that car because you can see it.

Starting point is 00:08:20 So ironically, they often describe very generally what the image is about, but not like how you would draw it step by step. That's why these tools are less good at saying, like, I want this thing over here and then that thing next to it and then something on top. And that thing should be much bigger because that's in the real life. That's not how images are described in language. You'll find yourself next time you're in like an art museum or in a book or it's really looking now at those little panels next to it and being like, oh, okay. That's what like acrylic on glass looks like. I'll remember that.

Starting point is 00:08:51 Yeah. That's a really good point, though, because that's how these AIs were trained, right? So I think Dali trained on 600 plus million images, and they're using that alt text, again, that descriptor. And I've never thought about it that way, but actually training yourself to become a good promter by reviewing the inputs to the tool, which I've never done this before,

Starting point is 00:09:12 but I can imagine someone literally going online and reading the alt text on different images and going, ah, this is how this was described. This is how an AI might interpret my future prompt. Yeah. And I think to your point also, is something that I've learned from my very limited set of prompting is just the level of detail that you need with your prompt where when I first started I'm like, you know, monkey wearing a hat.

Starting point is 00:09:33 Yeah, yeah. And, you know, there's, you don't even realize until you start prompting the many iterations that could come from that. Like you have one image in your head, but then you get back all of these different results. And then you end up looking on different prompt search engines or libraries and seeing what other people are doing, you're like, this prompt is like 200 words. I would have never thought to do that. And I think there's something to be said. Like, I think the longer they are, there's definitely diminishing returns. But sometimes using a lot of related, almost synonymous terms, just like chucking in loads of like, you know, detail, techie, like photography language is all kind of pushing it

Starting point is 00:10:14 in a direction of like, wow, this really sounds like a kind of a real fancy. As I went through your prompt book, there were so many different ways that you could describe a shot. You could say a different camera angle. You could say a time period, as you just spoke to. You could say a specific type of artistry or even a specific artist. I know there's some controversy around using specific artists work to train your new images. But let's look forward to today. I feel like, as we talked about, six months later, these tools have evolved a lot. Are there any different ways that you can prompt today or leverage these tools, that didn't exist six months ago that are really important

Starting point is 00:10:51 and maybe extending the way that you can use them. 100%. So the main one, and these things are like changing all the time, right? But now there's increasingly tools where you can prompt with an image. Again, that's almost like an entire new field of exploration because it's not combining the image with your words in the way you would expect something like Photoshop to do it.

Starting point is 00:11:18 Like it's not collaging them together. It's almost describing to itself this like source image with words. And then doing the same with like a second image or maybe some additional text you supply. And then being like, okay, now I'm going to make a new picture that somehow represents both these things. So the results can be really surprising, really unexpected, probably quite difficult to control. But then you potentially have interesting opportunities like, okay, I can make a load of kind of abstract stuff using, my brand colours or something that's important to me, photos of me, who knows? And then, yeah, and then I'm going to use that and kind of multiply that visual base with

Starting point is 00:11:53 custom other prompts. And then everything will have this kind of lightness. And then, of course, like the big thing that happened since the days of the prompt book and so on was, of course, that huge spike in interest in selfies, right? Like the lenses and the profile picture.com. And there were like a dozen of them, which was just prompting with your face, space being like, yeah, I want to see more of this guy, because it's me, obviously. And then within the image-to-image space, you've now got other startups that are doing interesting things where, okay, give us 10 core images and now we'll generate you like infinite versions of

Starting point is 00:12:27 that based on like the modifiers that you want to see. So there's all kinds. So that's a really interesting space that's going to probably power like the next generation of how people, especially consumers, interact with these products. Yeah. One way that maybe you could put it is that when we first got access to these tools. You were really starting from scratch. You didn't even have the prompt libraries available to you. You were just like, okay, I have this image in my head. But today, you not only have those libraries, you also have images that you can input. So you're not starting from scratch. You have a baseline of, as you said, maybe it's brand colors, maybe it's a certain style. And instead of having to articulate that yourself, you can just say, hey, here's what I want.

Starting point is 00:13:05 But to your point, sometimes it's hard to control, right? Because you're trying to say something to the AI, you're trying to say, I want this output. You don't always get it. And so something I want to ask you about is how you've learned to rein that in, to really, you know, on the whole, get a higher throughput of images that you want versus images you don't want over time because these AIs, they are a little bit of a black box, right? You can't understand every little piece that went from your input to your output. And so you can't like find you in it in the same ways as maybe some other skills that we've learned in the past. And so how have you learned to actually become a better prompt engineer,

Starting point is 00:13:45 given that black box nature? I mean, I think another aspect is there's also like a random element. So if you and I both type in the same thing, it's not going to make the same picture because it kind of starts from this random cloud of noise and your cloud of noise is different to mine. And then it's slowly turning these clouds more and more into something that looks like orangutan in a tuxedo. But we're going to end up with different things.

Starting point is 00:14:09 So that's really frustrating when you're like testing things because was it good or did you just get lucky or alternatively if you're not seeing what you're expected? Should you just hit it again and again? And then when you see someone else has made something really cool, did they do something really clever or, you know, is it like a persistent thing? I have found myself in that exact spot where I have an idea for what I want. It's not something that is super important where I need to nail it. So I'm just, I just need it close enough. And I'm getting these results and they're getting a little close. and closer and closer and closer.

Starting point is 00:14:40 But I have found myself in that spot where I'm just like, let's just generate it again. Like, if I do this enough times, I'll eventually get to something that's workable. So do you have any thoughts there in terms of, like, how you don't end up in that spot where you're just like hoping for a better image? You're kind of like pulling the AI slot machine, if you will?

Starting point is 00:14:58 No. I mean, I think unless you kind of have evidence, I think it's why some of these like other tools and communities are so important, you know, where you see lots of other people's work, is, you know, If you can see someone else has done it, ideally you can also see the prompt they use

Starting point is 00:15:11 and work out how they did it. But even if not, then you're like, okay, I can get there. Also, you run into these things where you would think it's like the most simple thing. And then you're like, it doesn't know what a hot dog is. Like it just doesn't understand the rules. Yeah. Of like, you know, physically what can and can't that look like?

Starting point is 00:15:30 And you're like trying and it's like, now the sausage is a right angle. The bun has ears because it's starting to throw in some, like dashoned like, you know, aesthetic. And then you're like, minus, minus, no, no dashin. That's kind of the limitation of weather technology is at the moment, which is it's amazing until you're trying to do something very specific. And especially if you want to do something very specific,

Starting point is 00:15:54 this also to a very high professional standard. Well, I'm glad you even mentioned the negative queries. That's something I think a lot of people don't know, is that you can say, hey, AI, I don't want this. It doesn't always manage to still generate what you're looking for. But there's also almost like these glitches. One of them that is kind of infamous now is hands, right? So you can generate these beautiful images of these Instagram looking models.

Starting point is 00:16:18 And you can put them in all these different backgrounds. And you're like, wow, this is amazing. And then it's always like, well, look at the hands, which is kind of funny. I feel like it's like the perfect manifestation of how technology always is like much better in one direction when it's invented. But there's always like these things that need to be iterated on. And so are there other things? worth knowing about whether it's these negative prompts, whether it's these glitches that are still in the matrix, what would you call out from your, again, many hours of being deep in these tools?

Starting point is 00:16:48 I think it depends on the model. One example is when Dali came out, and there's still the case as far as I know, it's not very good at understanding that it's drawing things in a square. If you're drawing a person, it's often going to have like its feet and its head cut off because it's seeing those in portrait photos. But one thing you could do with Dally is you can actually upload like an image to like do variations of. And if you upload an image that's just like a little white border, then it knows that nothing can go there. That kind of encourages it, forces it to kind of think inside the box, if you will. But then, of course, you have now tools like Mid Journey, who've been iterating on their text to image model a lot more aggressively than Open AI, who understandably,

Starting point is 00:17:24 I think maybe have some other things in the in the cooker, you know, which have grown that into the model itself. So when you type things in, it knows it's a square and actually it will sometimes do quite clever things in order to fit it in that space. So if you ask for kind of like a group selfie of three people on something like Daly, that's going to be cut off at the end because it's used to seeing someone taking like disposable camera photo, whereas Midgian is clever enough to know that one of them kind of needs to be standing behind the other or like leaning in from the side. So it's kind of clever how they've managed to like solve that composition problem within the AI, which then, you know, the prompt engineering thing I think is just understanding the possibilities and the

Starting point is 00:18:01 limitations of where you are at the moment. Meanwhile, there's these other people who are doing some like very technically serious work to kind of make those limitations kind of no longer relevant. Yeah. Well, I'm glad you brought up the differences between these different tools. So if we talk about just stable diffusion, mid-jurney and Dolly, I feel like are three that a lot of people are familiar with. Yeah.

Starting point is 00:18:21 Would you liken the ability to prompt within each of these more like the difference between Excel and Google Sheets, where if you know how to use Excel, you know, you know how to use Excel, You really can drop right into Google Sheets and it's relatively straightforward. You might have to switch up your shortcuts a little bit or learn one little thing here and there. But for the most part, you can, again, drop from one to the other. Or would you liken them more to learning to speak different languages? It's not that different. I think the principles are like very similar.

Starting point is 00:18:50 And then the nuances of each are slightly different. So I think now if you went from Dali to Mid-Journey, it would be like amazing. and then if you went back in the other direction, you'd be like, it doesn't do what I want, but that's because Mid-Journey is doing so much of the heavy lifting to help you make something really good. If you are using the tools to create some very specific effect, imagine that I guess, yeah, like a very complicated Excel formula,

Starting point is 00:19:16 that would not have the exact same output in the other tool, if you know what I mean, because they're trained on like a different set of images, stable diffusion, I think it's on $5 billion for what things look like learning, and then like a smaller set of, like, 12 million for the what does nice look like. And then the fine tuning that's happened on the top and how they've optimized it in the later phase

Starting point is 00:19:36 is a technical element that escaped me slightly. They have made different creative decisions there. It's maybe like driving a different car. If you floor the accelerator in various different cars, some are going to take off, some are going to trundle along. For good analogy. Do you also find that, I mean, we've talked already about this idea where sometimes it's pretty easy to get to that 80%,

Starting point is 00:19:57 but then that final 20%, the real, the real refinement to get to exactly what you pictured in your head or exactly what you want and didn't picture in your head sometimes requires another tool. And so have you found, I've heard some people are using Facetune or different AIs to take it to the final level, or I guess you could also use in painting and outpainting a little more discreetly. So how have you found the relationship of maybe one tool to the suite of other tools that exist out there? I think there's lots of exciting crossovers. But actually, I kind of think it's a big opportunity for Photoshop's of this world, because those are tools that presuppose you have some

Starting point is 00:20:33 kind of original image to manipulate. Whereas now there's a huge amount of raw, but maybe not perfect material that for people to work with. There's lots of things also that I've been trying to do in prompting that are actually more easily achieved in other tools. So you can, you know, Spen ages trying to get this kind of vintage film look. But if you're like an Instagram influencer, which I'm sure you are. Who isn't? But there's loads of, there's loads of iPhone apps, right, that are out there just to like give all your photos that kind of like, dreamy vintage film look. Yeah.

Starting point is 00:21:01 I mean, I think back in July when you first wrote your prompt book, you had a requested feature list for Dolly 2. But are there things that are on your new list of, hey, these tools are great, but they're missing X, Y, Z, or they're lacking in these areas, this would be top of my list to see improved on. I think we're going to see more models come out. I mean, the fact that stable diffusion is open source,

Starting point is 00:21:27 means that lots of other things are going to be built on top of that. And I think it's going to be really exciting to see some of the directions that people take that in, either kind of on an individual sort of pro sema level, people building their own models to create their own stuff, more likely some bigger organizations training it for specific purposes. The whole challenge and the whole opportunity, I think at the moment, is like how do you go beyond the text box? How do you go beyond this like just blank rectangle to create something that is more user-friendly, that's more inspiring, that's more how people think.

Starting point is 00:22:01 Because on the one hand, if you're not an artist, the ability to describe things with words is definitely a big step forward. But if you think about the next layer, is it's still quite hard to describe things with words. Designers, when they do work for clients, like it's one of their pet peeves because clients don't like it, but they can't explain why or what they want different. They're like, oh, I want it to be more, do you know what I mean?

Starting point is 00:22:24 Like more, and they're like, I don't know. I don't know what that means. which is basically the position these, you know, AI models are in. So could you see like a conversation where it's face? Can you do the generations fast enough that you're always showing people multiple options, possible new directions? It's almost like in a sort of multi-dimensional space where it's like, do you want to take it more this way or more this way?

Starting point is 00:22:45 You know, part of the prompt book is I didn't know what metaphysical painting or codochrome or all these things were, but those at least have names. But there's probably other aesthetics, right? Other styles that we don't have work actually. words for. It's like, you know, that kind of gritty, but like modern gritty, like, almost like shiny gritty. Like the grit has a shine on it. And probably I can make you a mood board of that and you'd be like, oh yeah, like that's a thing. But there's no word for it. So if you can create ways of unleashing the inexplicable, the undefinable, that's the exciting thing about

Starting point is 00:23:17 vision art, is to express things or moods or things that you can't quite put into words. I totally have my mind spinning, thinking of different ideas. A couple of them that came to mind. One of them is just a better onboarding experience, but one where you're guiding the new prompter to understand how all these things might fit together to your point. Like, try this. Oh, look at what you got here. Oh, did you notice how when you use these two prompts together, this one kind of overshadows the other. Maybe there's a third word that's the synonym of this. And I think you've kind of done this on your own by just going through and prompting like crazy going through these different prompt libraries and trying to sort through the signal from the noise. But I do think any one of these

Starting point is 00:23:58 models, or maybe the UI built on top, could have just a much better onboarding experience so that people come into the tool, to your point, with just a better understanding of what they should be paying attention to. And then I also, in terms of these visual styles, I mean, it reminds me of a lot of Instagram influencers for a period of time were selling these filters because they had figured out the precise tuning of every little variable, which sounds easy, but I had tried to do it myself. I never managed to create good lightroom filters, but people had, and they would sell them. And so I wonder if you'll see the same thing where maybe someone creates kind of like a zip file of a mood board, and then they train the AI in some way that does make it, I guess, play nice

Starting point is 00:24:45 with that particular concept that you can't distill necessarily into a single term. Yeah, because you had that breakthrough. Someone did a paper on it. I think it's almost what led to that selfie craze, which was that you don't need to put you

Starting point is 00:24:58 photos and stuff in that original 600 million training data or wait for the next time we do that again for it to teach it what you look like. There's this kind of

Starting point is 00:25:08 embedding trick where you can show it like a bunch of photos of you and then you can refer to you and it knows how to kind of recreate that. And there was also an interesting thing in the same paper

Starting point is 00:25:17 but hasn't really been used or like commercialized in the same way, which is to do that with style. So rather than show it, yeah, this is what this person looks like. It's like, this is what the style of blah blah, blah is called. Here it is. And then off you go, which obviously has all kinds of potentially shady legal qualifications.

Starting point is 00:25:32 But let's assume this is a lovely art we've made ourselves. Yeah. Well, no, I mean, to the idea of honing in a style, I do wish there was a version of the product where I could go. And like we've talked about, maybe upload certain brand images or certain brand colors, and then have it iterate with me where it shows me a bunch of images and I say, it's okay,

Starting point is 00:25:54 but I want a little more of this color. And then we keep doing that to the point where I get a bunch of images where I'm like, yes, this is the style. You can lock that in. You lock it into a variable that you can then plug into future prompts. I've definitely seen there's some people out there

Starting point is 00:26:07 that have managed to lock in a particular look. And now every blog post they have always the same kind of thing. And that's like pretty cool. But we haven't seen that always built into the like foundation models as like a way of interacting with it. And then there are some startups like Scenario,

Starting point is 00:26:23 which is doing it for game assets, and then Leonardo, which is like more multipurpose, I think, or is just positioning itself that way, which is again all about can you like control things down to like consistent look. Yeah. So what we've talked about so far is this idea of controlling the AI. But I also like to think about the ways that when you work with these different models, you learn more about your own creativity.

Starting point is 00:26:47 the example that it reminds me of is in chess when we finally built the bots that were better than humans in chess, not only were we surprised by the fact that that could happen, but we were also surprised by all of the different openings or moves that humans in their thousands of years playing chess had never considered that were better than some of the moves that even the best chess players in the world had used. And so have you seen any of that,

Starting point is 00:27:15 even from a personal experience level, where you're in these tools and you're playing around and you're learning with the model, if that makes sense, it's almost surfacing things that you had never considered before. I like that. I think whenever you're using these tools, you have these two modes, right?

Starting point is 00:27:31 Where you're either like waiting to see what it shows you or you kind of are visualizing it in your minds and you're like, no, not that, not that. But if you just let it take you where it wants to go, then you're suddenly like, I have no idea what I'm looking at. But apparently I'm here with Dally, there's like this variations tool.

Starting point is 00:27:48 So you just get it to let's show an image. You'll be like, here's four more that are kind of the same. But obviously over time, if you leap and leap and leap and leap, you end up on this like completely bizarre visual journey, like a psychedelic dream. It's fun to play around in these tools. But ultimately, while there is a market for just interesting art in the world, a lot of this will need to ladder back into, you know,

Starting point is 00:28:12 whether it's blog post sharing images, whether it's creating the next sneaker design that you end up selling. Are there areas that you've seen really emerge from this where people are using these tools today and applying them to, again, what someone might call a practical use case? And in addition to maybe what you've seen so far, are there other areas where you're excited to see this be applied? It's interesting, isn't it? Because I think especially given the tenor of the conversation around these tools and the ethical and legal aspects they're in. I suspect that to an extent when you see these things used,

Starting point is 00:28:49 especially in prominent context, they might not be advertised as such. Much as like green screen, right? When green screen is used in films, you shouldn't be like, that is an amazing use of green screen. You should just be like, oh my God, like he's dangling off a thing.

Starting point is 00:29:01 Oh, this must have cost millions. So I think, you know, when we see AI tools used in lots of contexts, not this is covered up, But, you know, they might obviously be just a narrow part of the creative process. They might be all of it, but it's kind of hidden. I've raised this point online, I think, that you were making, which is like, well, where is this all going? Like, will it ever make images good enough?

Starting point is 00:29:22 And will other people want to look at them? Because it's not like we have this huge history of, like, logging in to social media and looking at just like abstract pictures, like, oh, a force or not. Yeah. On a surfboard, I mean, things tend to have like a grounding in reality, right? Like, that's what makes them viral or interesting. But then someone was like, no, like maybe this, it won't be that it's going to make content so good that it's like better than Netflix or like better than Instagram. It's the hobby of doing it.

Starting point is 00:29:50 That's the entertainment. Well, I mean, there are skills out there to your point where writing as an example, some people just like to write to write. And whether other people read it doesn't matter, they actually enjoy the process. And so I definitely could see an entertainment angle. But a lot of people really hate writing. And a lot of people find value in the money that they get. paid to write or the writing is used within a script which then is published on Netflix. And so it's like, how is this stuff used in the wider world, whether it's on an e-commerce website, whether it's

Starting point is 00:30:20 one day integrating with 3D printing and like the stuff that you generate in Mid-Journey, then can actually be printed into like a real-life product that you sell? Oh, actually, this isn't just a gimmick. This isn't just a toy. There's this very high level kind of debate around artistry, I suppose, and as if everything is either going to be like in the Louvre or, I'm not saying that right, in the Tate, I'm from London, or, you know, or in the bin. But ultimately, if you look around just any space that you're in and look at everything that has like a visual component or like a design component,

Starting point is 00:31:00 there's so many different levels at which we engage with art, you know, like the pattern on a cushion. the warning label on the coffee maker, the sausage dog on a card. They're all different things. There's something where the human touch is like literally the point. But other things, it's like a soothing pattern to look at so that your wall isn't just gray. And so there's all kinds of layers in between. And I think we'll see them used in more and more different situations.

Starting point is 00:31:27 The final thing I want to ask you about is how this all fits into the wider skill set that people might have. So on one hand, I can see how there might be an argument that this. idea of the prompt engineer is going to be one that only few can do really well, right? People are really going to master this skill set and they're going to be much more valuable than the people who don't know how to prompt well. But then I can also see an argument where, as you said, maybe this gets abstracted and we have great UIs where truly it becomes the type of thing where basically anyone can do it and anyone can do it pretty reasonably well. And it just

Starting point is 00:32:01 becomes, you know, similar to being able to write and read. These are just kind of fundamental, elemental skills that are in everyone's skill sets. They're taught in schools. Where do you sit with that in terms of how you see this progressing? Is it worthwhile you could also position the question as to become an excellent top 1% prompt engineer? Or is it like, oh, everyone should kind of have this in their toolbox? Well, that depends.

Starting point is 00:32:26 I think on the one hand, there's obviously every incentive for the people that make these foundational tools to make prompt. engineering, for instance, not a thing. Because they want everyone to be able to do it, right? They naturally want to de-conplexify the tools that they're offering. Again, if you look at the most recent model of mid-jurney, like version four, stuff that would not have been even possible six months ago, you can literally do the thing where you type in.

Starting point is 00:32:55 Like, I remember because I posted one, someone was arguing about it, and I was like, look at this space stuff. I just typed in space stuff. And it's like this amazing astronaut duck. And he said, there's no way you just typed that in. And so I went back and checked and I was like, no, that I lied. I actually typed in a really cool statistic. But at the same time, with any material, like artistic or otherwise,

Starting point is 00:33:15 if you push links to the boundary, there's always going to be people, like someone that explores everything that's possible or like just iterates, iterates, iterates or something, they're obviously going to explore further on the map of what's possible than someone that isn't. So I don't think it will become like this necessary skill that everyone needs to have. But I do think it will become, you know, like some people that are expert, woodwitlers or really good at animating hair or whatever, you know, the people that develop a real, like, passion or, like, do some of the most amazing things. And then there's also the kind of the secret prompting, I guess, like a copywriting thing would be like the obvious example at the

Starting point is 00:33:51 moment. You think you're typing something into a UX, but really there's something else wrapping that in a prompt and then sending it to like a foundational model. So there's probably going to be some people whose job is to like come up with that layer of thing that the consumer or the the average person is never seeing, and they think they're just talking to the AI, but really they're talking to this thing that then adds a little bit of juz to it and then tells the AI that.

Starting point is 00:34:17 This is going to be a tangent, but it reminds me of I just listened to a reply L episode where someone had remembered this song from his childhood and they were trying to figure out what it was. You've heard this episode. If people haven't, it's one of the best... That's the only one, but it was so famous. Yeah, of all time.

Starting point is 00:34:31 Isn't it such a good listen? Yes, but it reminds me of... It reminds me of, do you remember in the episode, they find this lady who is a music producer, but she is a music producer for specifically people who want to create music like the bare naked ladies. And it's like, you know, people have jobs like this when you grow up and you're in school and they tell you, you know, you could be a doctor one day, you could be a teacher one day. They don't tell you you could be a music producer for musicians that want to sound like the bare naked ladies. And it makes me wonder

Starting point is 00:35:00 or think about, you know, what specific niches are people going to go into within the realm of problem engineering, right? Like maybe you specialize, as you said, in hair, maybe in hands, maybe in something for enterprise SaaS companies. I don't know. It's kind of hard to predict at this point since we're so early. But yeah, I think you're right that there's going to be, I guess, kind of a bimodal nature to it. It does seem like the kind of tool that's going to be on everyone's desktop. But it does also seem like there is this opportunity to become, as someone might say, like a 10x prompt engineer. Yeah, but I think that's interesting, isn't it? Because that's such a tech world. metaphor, like the notion of 10x.

Starting point is 00:35:38 Because it even implies there's a scale where you can have one and therefore you can have 10 of it, which in the record industry, do people talk about being like a 10X recording engineer? Obviously, some recording engineers are like famous and better than others, and there's all this kind of talent. But I don't know if people are like, yeah, like I'm a 10x. But yeah, just like producers and all the kind of people that go into making, I think, music or film, you know, that huge list of people you

Starting point is 00:36:05 see at the end of every movie, a way you discover a whole new world of careers that you might have had. I'll unfortunately never be a best boy, but I'm still hoping to be a gaffer. Then, you know, there'll be all those kinds of jobs, I think, in the AI, the creative AI industry. You know, your point on the spectrum of like, what is 1x and what is 10x? What is the most popular piece of, you could say, art or imagery that is shared online? Like, what comes to mind for you there? I don't know. You said that as if you know the aunt. Well, I have an answer. What comes to mind for you there? Photos of parties. So I don't know if this is actually the most, but what comes to mind for me, at least as someone who spends a lot of time on Twitter is memes.

Starting point is 00:36:49 And memes are like the most basic kind of imagery ever. It's like literally an image with like some capitalized text on it. And your point just reminded me of this idea where art, especially is subjective and what people like and resonate with is not necessarily the most refined or extravagant, precise type of imagery, which you can generate in some of these text to image tools, but it doesn't necessarily mean that people are going to resonate with it. Exactly. I mean, until they invent an AI that can do 10x memes, which is the last thing we need. This is really fun guy. I loved hearing about where you see this industry, the skill set moving. We will definitely share the prompt book link in the show notes because I think

Starting point is 00:37:36 people can benefit from seeing the different types of modifiers that you can include in a prompt and also a link to your social because you're constantly sharing new hacks, new things that you're discovering. But yeah, any other places that people should look to find you or your work? You can find me on Twitter at Guy P, G-U-I-P, and you can find my substack when I finally post at prompt response.substack.com. Awesome. Well, thanks for doing this. Thank you so much for having me. It was a lovely to meet. I'm glad we could do this.

Starting point is 00:38:07 Thanks for listening to the A16Z podcast. If you like this episode, don't forget to subscribe, leave a review, or tell a friend. We also recently launched on YouTube at YouTube.com slash A16Z underscore video, where you'll find exclusive video content. We'll see you next time.

The a16z Show - Unlocking Creativity with Prompt Engineering

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.