a16z Podcast - Unlocking Creativity with Prompt Engineering

Starting point is 00:00:00 If you think about the next layer, is it's still quite hard to describe things with words. Designers, when they do work for clients, like it's one of their pet peeves because clients don't like it, but they can't explain why. With every new technology, some jobs are lost while others are gained. And while people often focus on the former, in this episode we're highlighting the latter, a highly creative role that emerges alongside AI, the prompt engineer. Until AI can close the loop of its own, each tool still requires a lot. a set of prompts. And just like a composer feeds an instrument a set of notes to play, a prompt engineer feeds the AI a map of what to produce. And if we know anything for music,

Starting point is 00:00:40 it's that composing great music takes great skill. So in this episode, we dive into the emerging importance of prompting, the early learnings and how to do it effectively, and also where this field might be heading. And we do so with Guy Parsons. Guy has been an early mover on the text image AI space, having written the Dolly 2 prompt book in July of last year. So will the prompt Prompt engineer be more like the highly sought-after DevOps engineer or a proficiency like Excel that you find on every resume. Listen in to hear guys take. By the way, we're thinking of running a prompt competition coming up.

Starting point is 00:01:12 So if you think you have what it takes, email us at Podpitches at A60.com, with the subject, prompt engineer. As a reminder, the content here is for informational purposes only. Should not be taken as legal business tax or investment advice or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. For more details, please see A16C.com slash disclosures. Guy, welcome to the show. Thank you for having me. I'm excited to be here.

Starting point is 00:01:52 When we originally reached out to you, it was around six months ago, and you had just written something called Your Prompt. book. Why don't you give everyone a little bit of an idea of what that prompt book was, what it is now, and also what prompted you to want to write it in the first place? This was in the initial the heyday of Dali 2, which was Open AI's text and image model. When it came out, they rolled it out to a few test people at a time. They were super cautious about how it might be misused, how it could end up having a backlash, all these kinds of things, which then only increased the sense of people wanting to get their hands on this thing because at the time

Starting point is 00:02:31 this was pre-things that you might think of now a stable diffusion. Mid-Journey kind of predated those by some small margin and seemed way ahead of anything people have tried using before. So yeah, if you've used a text to image AI by now, you know it's basically a text box and it all comes down to what you type in. It doesn't have buttons and in all the kind of controls you might expect when you like log into something like Photoshop. So the question then becomes like a a lot of people, once mind goes blank, or you don't actually know the name or the words of what you're trying to type in, right? If you've actually been to art school or you're up on your art history or your design language, then you've probably got a head start on

Starting point is 00:03:10 everyone else. But on places like Twitter and Reddit, there are people posting like these amazing images, but because of the nature of social media, it's all lost. So I started trying to like collect these cool examples and these cool terms people were using to create these, like, amazing visual effects. So I started putting everything in it. It's a lot of essentially like a slide deck. By the time I'd copied and pasted it all these cool things, I've seen there's 80, 100 slides long, something like that. So that I rather grandly called it a book and shared it online. And it's just a jumping off point for people to realize the kind of stuff at the time that these tools were just about becoming capable of. Obviously now the

Starting point is 00:03:45 capabilities, even more advanced. And we'll get into that because within six months, it's crazy to see how these tools, the way people are using these tools, how that's all changed in a matter of, again, just six months, it feels like yesterday when we didn't even have access to this. But this idea that these are tools and just like any other tool, person A versus person B, may not get the same result. They may not have the same understanding of how to leverage the tool. And so before we get into maybe the tips and tricks that you've learned, I just want to give the audience a broad sense of how much time you've spent within the bowels of Mid-Journey, Dali, stable diffusion.

Starting point is 00:04:26 Like, if you could give an estimate, how much time do you think you've spent kind of mastering this idea of prompting? I wouldn't say I'm a master in any sense. It's like so engaging and interesting to experiment with these goals. So, you know, like in the last six months, sure, like a couple of hundred hours. What I really admire is people that are using these tools to create this like real body of work where they can really like,

Starting point is 00:04:51 pursue a direction to discover what's possible. I think I saw a thread where in a, I think it's a mid-journey, you can get it to tell you how many prompts you've ever done, and there are people in the thousands, hundreds of thousands. Yeah, and I appreciate how humble you are, but I think it's one of those scenarios where, again, we're six months in. You know, a parallel is when there's a new coding language, and then you see people write job descriptions for developers

Starting point is 00:05:14 looking for someone with five years' experience when that particular language has only been around for six months or a year. And so, yes, I don't think anyone could definitively say they're an expert in prompt engineering, partially because it's only been around for so long. But I do think you've at least shared a lot more than the average person. And given your experience with these tools, I'm curious if you see a parallel skill set where you can kind of compare prompt engineering to learning to code. Is it similar to being able to storytell effectively? Is it similar to being able to process numbers in an Excel sheet? Like, is there a parallel skill set where it reminds you of, you know, something you've done before?

Starting point is 00:05:54 I think there was an era. I don't know if we're still in it where there was a certain category of person who could consider themselves like good at Googling stuff. Do you know that kind of like, oh, file type this? And there's this big debate over whether, especially in text it image, you know, is there really like any artistry to it? For me, I'm not so sure because I'm no artist. But there's definitely something. It's always about discovering an image that's already. out there. You've just got to find the words that summon it forth as if you're kind of

Starting point is 00:06:23 navigating like an infinite Pinterest of things that haven't quite existed until you manifested them. Well, I mean, to that point, like we have so much information online. I feel like that is a skill set. Even before these AI tools, like I used to work on a product called Trends, and that really was about using the right tools like subreddit stats or ATRAFs or other data sets online and learning to parse them and learning to surface what other people find interesting. But let's get into the nitty-gritty. You wrote this prompt book. You've been playing around with these tools for quite some time. Are there certain learnings, maybe the 80-20 approach of becoming a good prompt engineer in terms of things that you think are really valuable to understand? Maybe it's the prompt length. Maybe it's using certain modifiers within your prompt. Maybe it's just like a framework for thinking about prompting. Is there anything that's surface that you think would be really valuable to someone who's just starting out with prompting? Oh yeah. Like I think if you've never used one before, like the best way to explain how they work at the moment, which is, again, always shifting and something else we can talk about, is to always like describe something as

Starting point is 00:07:29 if it already exists. So imagine that it's an image in some kind of downloadable clip art library or a photography gallery. And, you know, someone's written underneath, oh, this is a fine example of a early modern photography shot. And those are the kind of descriptions that you're trying to kind of mimic to tell these tools what you're looking for. And it also gives it like a natural sense of why these tools are bad at some things and the kind of prompts that don't really work. Because if there's like a, let's say, some archive image of some women celebrating on the steps of a church in 1972, it will have that kind of caption where they never go,

Starting point is 00:08:13 the woman on the left is wearing a yellow hat. The woman on the right is wearing, you know, they just don't go into that car because you can see it. So, ironically, they often describe very generally what the image is about, but not like how you would draw it step by step. And that's why these tools are less good at saying, like, I want this thing over here and then that thing next to it and then something on top. And that thing should be much bigger because that's in the real life. That's not how images are described in language. So you'll find yourself next time you're in like an art museum or in a book or it's really looking now at those little panels next to it and being like, okay, that's what like a critic on glass looks like, I'll remember that.

Starting point is 00:08:51 Yeah, that's a really good point, though, because that's how these AIs were trained, right? So I think Dali trained on 600 plus million images, and they're using that alt text, again, that descriptor. And I've never thought about it that way, but actually training yourself to become a good prompter by reviewing the inputs to the tool, which I've never done this before, but I can imagine someone literally going online and reading the alt text on different images and going. oh, this is how this was described. This is how an AI might interpret my future prompt. Yeah. And I think to your point also, it's something that I've learned from my very limited set of prompting,

Starting point is 00:09:26 is just the level of detail that you need with your prompt, where when I first started, I'm like, you know, monkey wearing a hat. Yeah, yeah. And, you know, you don't even realize until you start prompting the many iterations that could come from that. Like you have one image in your head, but then you get back all of these different results. and then you end up looking on different prompt search engines or libraries

Starting point is 00:09:49 and seeing what other people are doing, you're like, this prompt is like 200 words. I would have never thought to do that. And I think there's something to be said. I think the longer they are, there's definitely diminishing returns, but sometimes using a lot of related, almost synonymous terms, just like chucking in loads of like, you know,

Starting point is 00:10:10 detail, techie, like photography language is all kind of pushing it in the direction of, like, wow, this really sounds like a kind of a real fancy. As I went through your prompt book, there were so many different ways that you could describe a shot. You could say a different camera angle. You could say a time period, as you just spoke to, you could say a specific type of artistry or even a specific artist. I know there's some controversy around using specific artists work to train your new images. But let's look forward to today. I feel like, as we talked about, six months later, these tools have evolved a lot.

Starting point is 00:10:44 Are there any different ways that you can prompt today or leverage these tools that didn't exist six months ago that are really important and maybe extending the way that you can use them? 100%. So the main one, and these things are like changing all the time, right? But now there's increasingly tools where you can prompt with an image. Again, that's almost like an entire new field of exploration because it's not combining the image with your. words in the way you would expect something like Photoshop to do it, like it's not collaging them together. It's almost describing to itself this like source image with words and then doing the same with like a second image or maybe some additional text you supply and then being like, okay, now I'm going to make a new picture that somehow represents both these things. So the results can be really surprising, really unexpected, probably quite difficult to control. But then you

Starting point is 00:11:39 potentially have interesting opportunities like, okay, I can make a load of kind of abstract stuff. using my brand colors or something that's important to me, photos of me, who knows? And then, yeah, and then I'm going to use that and kind of multiply that visual base with custom other prompts. And then everything will have this kind of lightness. And then, of course, like the big thing that happened since the days of the prompt book and so on was, of course, that huge spike in interest in selfies, right? Like the lenses and the profile picture.a.i and there were like a dozen of them, Which was just prompting with your face,

Starting point is 00:12:13 being like, yeah, I want to see more of this guy, because it's me, obviously. And then within the image-to-image space, you've now got other startups that are doing interesting things where, okay, give us 10 core images, and now we'll generate you, like, infinite versions of that based on, like, the modifiers that you want to see. So there's all kinds.

Starting point is 00:12:30 So that's a really interesting space that's going to probably power, like, the next generation of how people, especially consumers, interact with these products. Yeah, one way that maybe you could put it is that when we first, got access to these tools. You were really starting from scratch. You didn't even have the prompt libraries available to you. You were just like, okay, I have this image in my head. But today,

Starting point is 00:12:49 you not only have those libraries, you also have images that you can input. So you're not starting from scratch. You have a baseline of, as you said, maybe it's brand colors. Maybe it's a certain style. And instead of having to articulate that yourself, you can just say, hey, here's what I want. But to your point, sometimes it's hard to control, right? Because you're trying to say something to the AI, you're trying to say, I want this output. You don't always get it. And so something I want to ask you about is how you've learned to rein that in, to really, you know, on the whole, get a higher throughput of images that you want versus images you don't want over time because these AIs, they are a little bit of a black box, right? You can't understand every little piece that went

Starting point is 00:13:34 from your input to your output. And so you can't like find you in it in the same ways as maybe some other skills that we've learned in the past. And so how have you learned to actually become a better prompt engineer, given that black box nature? I mean, I think another aspect is there's also like a random element. So if you and I both type in the same thing, it's not going to make the same picture because it kind of starts from this random cloud of noise, and your cloud of noise is different to mine.

Starting point is 00:14:00 And then it's slowly turning these clouds more and more into something that looks like orangutan in a tuxedo, but we're going to end up with different things. So that's really frustrating when you're like testing things because was it good or did you just get lucky or alternatively if you're not seeing what you're expected, should you just hit it again and again? And then when you see someone else has made something really cool, did they do something really clever or, you know, is it like a persistent thing? I have found myself in that exact spot where I have an idea for what I want. It's not something that is super important where I need to nail it. So I'm just, I just need it close enough. And I'm getting these results and they're getting a little close.

Starting point is 00:14:39 and closer and closer, but I have found myself in that spot where I'm just like, let's just generate it again. Like, if I do this enough times, I'll eventually get to something that's workable. So do you have any thoughts there in terms of like how you don't end up in that spot where you're just like hoping for a better image? You're kind of like pulling the AI slot machine, if you will? No. I mean, I think unless you kind of have evidence, I think it's why some of these like other tools and communities are so important, you know, where you see lots of other people's work is, you know, if you can see someone else's done it.

Starting point is 00:15:09 Ideally, you can also see the prompt they use and work out how they did it, but even it's not, then you're like, okay, I can get there. Also, you run into these things where you would think it's like the most simple thing. And then you're like, it doesn't know what a hot dog is. It just doesn't understand the rules of how, of like what, you know, physically what can and can't that look like. And you're like trying and it's like, now the sausage is a right angle. The bun has ears because it's starting to throw in some like dashing,

Starting point is 00:15:39 like, you know, aesthetic. And then you're like, minus, minus, no, no dashing. That's kind of the limitation of weather technology is at the moment, which is it's amazing until you're trying to do something very specific. And especially if you want to do something very specific, this also to a very high professional standard. Well, I'm glad you even mentioned the negative queries. That's something I think a lot of people don't know is that you can say, hey, AI, I don't want this. It doesn't always manage to still generate what you're looking for. But there's also almost like these glitches. One of them that is kind of infamous now is hands, right? So you can generate these beautiful images of these Instagram looking models and you can put them in all these

Starting point is 00:16:20 different backgrounds and you're like, wow, this is amazing. And then it's always like, well, look at the hands, which is kind of funny. I feel like it's, it's like the perfect manifestation of how technology always is like much better in one direction when it's invented. But there's always like these things that need to be iterated on. And so are there other things worth knowing about whether it's these negative prompts, whether it's these glitches that are still in the matrix, what would you call out from your, again, many hours of being deep in these tools? I think it depends on the model. One example is when Dali came out, and there's still the case as far as I know, it's not very good at understanding that it's drawing things in a square.

Starting point is 00:16:55 If you're drawing a person, it's often going to have, like, its feet and its head cut off because it's seeing those in portrait photos. But one thing you could do with Dally is you can actually upload, like, an image to, like, do variations of. And if you, you, upload an image that's just like a little white border, then it knows that nothing can go there. That kind of encourages it, forces it to kind of think inside the box, if you will. But then, of course, you have now tools like Mid Journey, who've been iterating on their text or image model a lot more aggressively than Open AI, who understandably, I think, maybe have some other things in the cooker, you know, which have grown that into the model

Starting point is 00:17:29 itself. So when you type things in, it knows it's a square and actually it will sometimes do quite clever things in order to fit it in that space. So if you ask for kind of like a group selfie of three people on something like Dali, that's going to be cut off at the end because it's used to seeing someone taking like disposable camera photo, whereas Middney is clever enough to know that one of them kind of needs to be standing behind the other or like leaning in from the side.

Starting point is 00:17:52 So it's kind of clever how they've managed to like solve that composition problem within the AI, which then, you know, the prompt engineering thing, I think is understanding the possibilities and the limitations of where you are at the moment. Meanwhile, there's these other people who are doing some like very, very technically serious work to kind of made those limitations kind of no longer relevant. Yeah. Well, I'm glad you brought up the differences between

Starting point is 00:18:13 these different tools. So if we talk about just stable diffusion, mid-Journey, and Dolly, I feel like are three that a lot of people are familiar with. Yeah. Would you liken the ability to prompt within each of these more like the difference between Excel and Google Sheets, where if you know how to

Starting point is 00:18:30 use Excel, you really can drop right into Google Sheets and it's relatively straightforward. You might have to switch up your shortcuts a little bit or learn one little thing here and there, but for the most part, you can again drop from one to the other, or would you liken them more to learning to speak different languages? It's not that different. I think the principles are like very similar. And then the nuances of each are slightly different. So I think now if you went from Daly to Mid Journey, it would be like amazing. And then if you went back in the other direction, you'd be like, it doesn't do what

Starting point is 00:19:03 I want. But that's because Mid Journey is doing so much of the heavy lifting. to help you make something really good. If you are using the tools to create some very specific effect, imagine that I guess, yeah, like a very complicated Excel formula, that would not have the exact same output in the other tool, if you know what I mean, because they're trained on like a different set of images, stable diffusion, I think it's on $5 billion

Starting point is 00:19:25 for what things look like learning, and then like a smaller set of like $12 million for the what does nice look like. And then the fine tuning that's happened on the top and how they've optimized it in the later phases. a technical element that escaped me slightly. You know, they have made different creative decisions there. And it's maybe like driving a different car.

Starting point is 00:19:42 Okay. If you, like, floor the accelerator in various different cars, some are going to take off, some are going to trundle along. For good analogy. Do you also find that, I mean, we've talked already about this idea where sometimes it's pretty easy to get to that 80%, but then that final 20%, the real refinement to get to exactly what you pictured in your head

Starting point is 00:20:02 or exactly what you want and didn't picture in your head, sometimes requires another tool. And so have you found, I've heard some people are using Facetune or different AIs to take it to the final level or I guess you could also use in-painting and outpainting a little more discreetly. So how have you found the relationship

Starting point is 00:20:20 of maybe one tool to the suite of other tools that exist out there? I think there's lots of exciting crossovers. But actually, I kind of think it's a big opportunity for the Photoshop's of this world because those are tools that presuppose you have some kind of original image to manipulate, whereas now there's a huge amount of raw, but maybe not perfect material

Starting point is 00:20:39 that for people to work with. There's lots of things also that I've been trying to do in prompting that are actually more easily achieved in other tools. So you can, you know, spend ages trying to get this kind of vintage film look. But if you're like an Instagram influencer, which I'm sure you are. Who isn't? But there's loads of iPhone apps, right, that are out there just to like give all your photos that kind of like dreamy vintage film look. Yeah. I mean, I think back in July when you first wrote your prompt book, you had a requested feature list for Dolly 2. But are there things that are on your new list of,

Starting point is 00:21:11 hey, these tools are great, but they're missing XYZ or they're lacking in these areas. This would be top of my list to see improved on. I think we're going to see more models come out. I mean, the fact that stable diffusion is open source. It means that lots of other things are going to be built on top of that. And I think it's going to be really exciting to see some of the directions that people take that in.

Starting point is 00:21:35 either kind of on an individual sort of pro semen level, people building their own models to create their own stuff, more likely some bigger organizations training it for specific purposes. The whole challenge and the whole opportunity, I think at the moment, is like how do you go beyond the text box? How do you go beyond this like just blank rectangle to create something that is more user-friendly, that's more inspiring, that's more how people think?

Starting point is 00:22:01 because on the one hand, if you're not an artist, the ability to describe things with words is definitely a big step forward. But if you think about the next layer, is it's still quite hard to describe things with words. Designers, when they do work for clients, like it's one of their pet peeves because clients don't like it, but they can't explain why or what they want different. They're like, oh, I want it to be more, do you know what I mean?

Starting point is 00:22:24 Like more, and they're like, I don't know. I don't know what that means, which is basically the position these, you know, AI models it in. So could you see like a conversational interface? Can you do the generations fast enough that you're always showing people multiple options, possible new directions? It's almost like in a sort of multidimensional space where it's like, do you want to take it more this way or more this way? You know, part of the prompt book is I didn't know what metaphysical painting or codicrome or all these things were, but those at least have names. But there's probably other aesthetics, right? Other styles that we don't have actually words for. It's like, you know, that kind of.

Starting point is 00:22:59 of gritty, but like modern gritty, like almost like shiny gritty. Like the grit has a shine on it. And probably I can make you a mood board of that. And you'd be like, oh yeah, like that's a thing. But there's no word for it. So if you can create ways of unleashing the inexplicable, the undefinable, that's the exciting thing about vision art, is to express things or moods or things that you can't quite put into words. I totally have my mind spinning, thinking of different ideas. A couple of them that came to mind. One of them is just a better onboarding experience. one where you're guiding the new prompter to understand how all these things might fit together to your point. Like, try this. Oh, look at what you got here. Oh, did you notice how when you use these two prompts together, this one kind of overshadows the other?

Starting point is 00:23:43 Maybe there's a third word that's a synonym of this. And I think you've kind of done this on your own by just going through and prompting like crazy going through these different prompt libraries and trying to sort through the signal from the noise. But I do think any one of these models or maybe the UI built on top could have just a much better onboarding experience so that people come into the tool, to your point, with just a better understanding of what they should be paying attention to. And then I also, in terms of these visual styles, I mean, it reminds me of a lot of Instagram influencers for a period of time were selling these filters because they had figured out the precise tuning of every little variable, which sounds easy. but I had tried to do it myself. I never managed to create good lightroom filters, but people had, and they would sell them. And so I wonder if you'll see the same thing where maybe someone creates kind of like a zip file of a mood board,

Starting point is 00:24:40 and then they train the AI in some way that does make it, I guess, play nice with that particular concept that you can't distill necessarily into a single term. Yeah, because you had that breakthrough. Someone did a paper on it, and I think it's almost kind of what led to that selfie crais, which was that you don't need to put you photos of Steph in that original 600 million training data

Starting point is 00:25:02 or wait for the next time we do that again for it to teach it what you look like. There's this kind of embedding trick where you can show it like a bunch of photos of you and then you can refer to you and it knows how to kind of recreate that. And there was also an interesting thing in the same paper but hasn't really been used or like commercialized in the same way, which is to do that with style. So rather than show it, yeah, this is what this person looks like. It's like this is what the style of blah blah, blah is called, here it is, and then off you go, which obviously has all kinds of potentially shady legal qualifications.

Starting point is 00:25:32 But let's assume this is a lovely art we've made ourselves. Yeah. Well, no, I mean, to the idea of honing in a style, I do wish there was a version of the product where I could go and, like we've talked about, maybe upload certain brand images or certain brand colors, and then have it iterate with me where it shows me a bunch of images. And I say, it's okay, but I want a little more of. this color. And then we keep doing that to the point where I get a bunch of images where I'm like, yes, this is the style. You can lock that in. You lock it into a variable that you can then

Starting point is 00:26:04 plug into future prompts. I've definitely seen there's some people out there that have managed to lock in a particular look. And now every blog post they have, always the same kind of thing. And that's like pretty cool. But we haven't seen that always built into the like foundation models yet as like a way of interacting with it. And then there are some startups like scenario, which is doing it for game assets, and then Leonardo, which is like more multi-purpose, I think, or is just positioning itself that way, which is again all about can you, like, control things down to, like, consistent look. Yeah. So what we've talked about so far is this idea of controlling the AI, but I also like to think about the ways that when you work with these different models,

Starting point is 00:26:45 you learn more about your own creativity. The example that it reminds me of is in chess when we finally built the bots that were better than humans in chess, not only were we surprised by the fact that that could happen, but we were also surprised by all of the different openings or moves that humans in their thousands of years playing chess had never considered that were better than some of the moves that we, even the best chess players in the world, had used. And so, have you seen any of that, even from a personal experience level, like where you're in these tools and you're playing around and you're learning with the model, if that makes sense, it's almost surfacing things that you had never considered before. I like that. I think whenever

Starting point is 00:27:28 you're using these tools, you have these two modes, right, where you're either like waiting to see what it shows you or you kind of are visualizing it in your minds and you're like, no, not that, not that. But if you just let it take you where it wants to go, then you're suddenly like, I have no idea what I'm looking at. But apparently I'm here with Dally, there's like this variations tool. So you just get it to, let's show an image. You'll be like, here's four more that are kind of the same. But obviously over time, if you leap and leap and leap and leap, you end up on this like completely bizarre visual journey, like a psychedelic dream. It's fun to play around in these tools. But ultimately, while there is a market for just

Starting point is 00:28:06 interesting art in the world, a lot of this will need to ladder back into, you know, whether it's blog post sharing images, whether it's creating the next sneaker design that you end up selling. Are there areas that you've seen really emerge from this where people are using these tools today and applying them to, again, what someone might call a practical use case? And in addition to maybe what you've seen so far, are there other areas where you're excited to see this be applied? It's interesting, isn't it? Because I think especially given the tenor of the conversation around these tools and the ethical and legal aspects they're in, I suspect that to an extent when you see these things used, especially in prominent context,

Starting point is 00:28:50 they might not be advertised as such. Much as like green screen, right? When green screen is used in films, you shouldn't be like, that is an amazing use of green screen. You should just be like, oh, my God, like he's dangling off a thing. Oh, this must have cost millions. So I think, you know, when we see AI tools used in lots of context, not this is covered up, but, you know, they might obviously be just a narrow part of the creative

Starting point is 00:29:13 process. They might be all of it, but it's kind of hidden. I raised this point online I think that you were making which is like well where is this all going like will it ever make images good enough and will other people want to look at them because it's not like we have this huge history

Starting point is 00:29:26 of like logging in to social media and looking at just like abstract pictures like oh a force on a surfboard I mean things tend to have like a grounding in reality right that's what makes them viral or interesting but then someone was like no like maybe this it won't be that it's going to make content so good that it's like

Starting point is 00:29:46 better than Netflix or better than Instagram is the hobby of doing it. That's the entertainment. Well, I mean, there are skills out there to your point where writing, as an example, some people just like to write to write. And whether other people read it, it doesn't matter, they actually enjoy the process. And so I definitely could see an entertainment angle. But a lot of people really hate writing. And a lot of people find value in the money that they get paid to write or the writing is used within a script, which then is published on Netflix. And so it's like, how is this stuff used in the wider world, whether it's on an e-commerce website, whether it's one day integrating with 3D printing and like the stuff that you generate in Mid-Journey, then can actually be printed into like a real-life product that you sell? Oh, actually, this isn't just a gimmick. This isn't just a toy.

Starting point is 00:30:32 There's this very high level kind of debate around artistry, I suppose, and as if everything is either going to be like in the Louvre or. I'm not saying that right. In the Tate, I'm from London. Or, you know, or in the bin. But ultimately, if you look around just any space that you're in and look at everything that has like a visual component or like a design component, there's so many different levels at which we engage with art,

Starting point is 00:31:06 you know, like the pattern on a cushion, the warning label, on the coffee maker, the sausage dog on a card. They're all different things. there's something where the human touch is like literally the point. But other things, it's like a soothing pattern to look at so that your wall isn't just gray. And so there's all kinds of layers in between. And I think we'll see them used in more and more different situations.

Starting point is 00:31:27 The final thing I want to ask you about is how this all fits into the wider skill set that people might have. So on one hand, I can see how there might be an argument that this idea of the prompt engineer is going to be one that only few can do really well. Right? People are really going to master this skill set and they're going to be much more valuable than the people who don't know how to prompt well. But then I can also see an argument where, as you said, maybe this gets abstracted and we have great UIs where truly it becomes the type of thing where basically anyone can do it and anyone can do it pretty reasonably well. And it just becomes, you know, similar to being able to write and read. These are just kind of fundamental, elemental skills that are in everyone's skill sets. They're taught in schools. Where do you sit with that in terms of how you see this progressing. Like, is it worthwhile you could also position the question as to become an excellent top 1% prompt engineer? Or is it like, oh, everyone should kind of have this in their toolbox? Well, that depends. I think on the one hand, there's obviously every incentive for the people that make these foundational tools to make prompt engineering, for instance, not a thing. Because they want everyone to be able to do it, right? They naturally want.

Starting point is 00:32:43 want to de-conplexify the tools that they're offering. Again, if you look at the most recent model of mid-jurney, like version four, stuff that would not have been even possible six months ago, you can literally do the thing where you type in, like, I remember because I posted one, someone was arguing about it, and I was like, look at this space stuff, I just typed in space stuff. And it's like this amazing astronaut duck.

Starting point is 00:33:04 And he said, there's no way you just typed that in. So I went back and checked and I was like, no, I lied. I actually typed in a really cool space dark. But at the same time, with any material, like artistic or otherwise, if you push links to the boundary, there's always going to be people, like someone that explores everything that's possible or like just iterates, iterates, iterates or something, they're obviously going to explore further on the map of what's possible than someone that isn't. So I don't think it will become like this necessary skill that everyone needs to

Starting point is 00:33:31 have, but I do think it will become, you know, like some people that are expert woodwittlers or really good at animating hair or whatever, you know, the people that develop a real like passion or do some of the most amazing things. And then there's also the kind of the secret prompting, I guess, like a copywriting thing would be like the obvious example at the moment. You think you were typing something into a UX, but really there's something else wrapping that in a prompt and then sending it to like a foundational model. So there's probably going to be some people whose job is to like come up with that layer of thing that the consumer or the average person is never seeing. And they think they're just talking to the AI, but really they're talking to

Starting point is 00:34:11 this thing that then adds a little bit of juz to it and then tells the AI that. This is going to be a tangent, but it reminds me of I just listened to a reply out episode where someone had remembered this song from his childhood and they were trying to figure out what it was. You've heard this episode. If people haven't, it's one of the best quotes. That's the only one, but it was so famous. Yeah, of all time. Isn't it such a good listen? Yes, but it reminds me of, do you remember in the episode, they find this lady who is a music producer, but she is a music producer for specifically people who want to create music like the bare naked ladies.

Starting point is 00:34:46 And it's like, you know, people have jobs like this when you grow up and you're in school and they tell you, you know, you could be a doctor one day, you could be a teacher one day. They don't tell you you could be a music producer for musicians that want to sound like the bare naked ladies. And it makes me wonder or think about, you know, what specific niches are people going to go into within the realm of problem engineering, right? Like maybe you specialize, as you said, in hair, maybe in hands, maybe in something for enterprise. price SaaS companies. I don't know. It's kind of hard to predict at this point since we're so early. But yeah, I think you're right that there's going to be, I guess, kind of a bimodal nature to it. It does seem like the kind of tool that's going to be on everyone's desktop. But it does also seem like there is this opportunity to become, as someone might say, like a 10x prompt

Starting point is 00:35:31 engineer. Yeah, but I think that's interesting, isn't it? Because that's such a tech world metaphor, like the notion of 10x. Because it even implies there's a scale where you can have one. therefore you can have 10 of it, which in the record industry, do people talk about being like a 10x recording engineer? Obviously, some recording engineers are like famous and better than others. And there's all this kind of talent. But I don't know if people are like, yeah, like I'm a 10x. But yeah, just like producers and all the kind of people that go into making, I think, music or film, you know, that huge list of people you see at the end of every movie, where you discover a whole new world of careers that you might have had. I'll unfortunately

Starting point is 00:36:11 never be a best boy, but I'm still hoping to be a gaffer. Then, you know, there'll be all those kinds of jobs, I think, in the AI, the creative AI industry. You know, your point on the spectrum of like, what is 1x and what is 10x? What is the most popular piece of, you could say, art or imagery that is shared online? Like, what comes to mind for you there? I don't know. You said that as if you know the aunt. Well, I have an answer. What comes to mind for you there? Photos of party? So I don't know if this is actually the most, but what comes to mind for me,

Starting point is 00:36:45 at least as someone who spends a lot of time on Twitter, is memes. And memes are like the most basic kind of imagery ever. It's like literally an image with like some capitalized text tossed on it. And your point just reminded me of this idea where art especially is subjective and what people like and resonate with is not necessarily the most refined or extravagant, precise type of imagery, which you can generate in some of these text-to-image tools,

Starting point is 00:37:15 but it doesn't necessarily mean that people are going to resonate with it. Exactly. I mean, until they invent an AI that can do 10x memes, which is the last thing we need. This is really fun guy. I loved hearing about where you see this industry, this skill set moving. We will definitely share the prompt book link in the show notes

Starting point is 00:37:35 because I think people can benefit from seeing the different types of modifiers that you can include in a prompt and also a link to your social because you're constantly sharing new hacks, new things that you're discovering. But yeah, any other places that people should look to find you or your work? You can find me on Twitter at Guy P, G-U-I-P, and you can find my substack when I finally post at promptresponse.com. Awesome. Thanks for doing this. Thank you so much for having me. It was a lovely to meet you. I'm glad we could do this. listening to the A16Z podcast. If you like this episode, don't forget to subscribe, leave a review, or tell a friend. We also recently launched on YouTube at YouTube.com slash A16Z underscore video, where you'll find

Starting point is 00:38:21 exclusive video content. We'll see you next time.

Your Ad Here

a16z Podcast - Unlocking Creativity with Prompt Engineering

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.