a16z Podcast - Unlocking Creativity with Prompt Engineering
Episode Date: March 9, 2023With every new technology, some jobs are lost while others are gained. People often focus on the former, but in this episode we chose to highlight the latter – a highly creative role that emerges al...ongside AI: the prompt engineer.Until AI can close the loop of its own, each tool still requires a set of prompts. Just like a composer feeds an instrument the notes to play, a prompt engineer feeds an AI a map of what to produce. And if we know anything from music it’s that composing great music takes great skill!In this episode we explore the emerging importance of prompting with Guy Parsons, the early learnings of how to do it effectively, and where this field might be going.Will the prompt engineer be more like the highly sought after DevOps engineer, or a proficiency like Excel that you find on every resume? Listen in to hear Guy’s take.Interested in the prompt competition? Email us at podpitches@a16z.com.Resources:DALL-E 2 Prompt Book: https://dallery.gallery/the-dalle-2-prompt-book/Find Guy on Twitter: https://twitter.com/GuyPGuy’s combining image experiment: https://twitter.com/GuyP/status/1612880405207580672Guy’s amorphous prompt experiment: https://twitter.com/GuyP/status/1608475973300948993Guy’s space duck: https://twitter.com/GuyP/status/1601342688225525761Prompt base: https://promptbase.com/Lexica: https://lexica.art/ Topics Covered: 0:00 - Introduction01:49 - DALL-E 2 Prompt Book05:29 - Parallel skills06:51 - 80/20 prompting10:16 - New ways of prompting13:44 - Pulling the AI slot machine18:09 - Comparing models21:04 - Requested features26:34 - Learning with AI27:58 - Practical use cases32:08 - A top 1% prompt engineer36:17 - The most popular images Stay Updated: Find us on Twitter: https://twitter.com/a16zFind us on LinkedIn: https://www.linkedin.com/company/a16zSubscribe on your favorite podcast app: https://a16z.simplecast.com/Follow our host: https://twitter.com/stephsmithioPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. For more details please see a16z.com/disclosures.
Transcript
Discussion (0)
If you think about the next layer, is it's still quite hard to describe things with words.
Designers, when they do work for clients, like it's one of their pet peeves because clients don't like it, but they can't explain why.
With every new technology, some jobs are lost while others are gained.
And while people often focus on the former, in this episode we're highlighting the latter,
a highly creative role that emerges alongside AI, the prompt engineer.
Until AI can close the loop of its own, each tool still requires a lot.
a set of prompts. And just like a composer feeds an instrument a set of notes to play,
a prompt engineer feeds the AI a map of what to produce. And if we know anything for music,
it's that composing great music takes great skill. So in this episode, we dive into the emerging
importance of prompting, the early learnings and how to do it effectively, and also where this
field might be heading. And we do so with Guy Parsons. Guy has been an early mover on the text
image AI space, having written the Dolly 2 prompt book in July of last year. So will the prompt
Prompt engineer be more like the highly sought-after DevOps engineer or a proficiency like
Excel that you find on every resume.
Listen in to hear guys take.
By the way, we're thinking of running a prompt competition coming up.
So if you think you have what it takes, email us at Podpitches at A60.com, with the subject, prompt engineer.
As a reminder, the content here is for informational purposes only.
Should not be taken as legal business tax or investment advice or be used to evaluate any investment
or security, and is not directed at any investors or potential investors in any A16Z fund.
For more details, please see A16C.com slash disclosures.
Guy, welcome to the show.
Thank you for having me.
I'm excited to be here.
When we originally reached out to you, it was around six months ago,
and you had just written something called Your Prompt.
book. Why don't you give everyone a little bit of an idea of what that prompt book was, what it
is now, and also what prompted you to want to write it in the first place? This was in the initial
the heyday of Dali 2, which was Open AI's text and image model. When it came out, they rolled
it out to a few test people at a time. They were super cautious about how it might be misused,
how it could end up having a backlash, all these kinds of things, which then only
increased the sense of people wanting to get their hands on this thing because at the time
this was pre-things that you might think of now a stable diffusion. Mid-Journey kind of predated those
by some small margin and seemed way ahead of anything people have tried using before. So yeah,
if you've used a text to image AI by now, you know it's basically a text box and it all comes
down to what you type in. It doesn't have buttons and in all the kind of controls you might expect
when you like log into something like Photoshop. So the question then becomes like a
a lot of people, once mind goes blank, or you don't actually know the name or the words
of what you're trying to type in, right? If you've actually been to art school or you're
up on your art history or your design language, then you've probably got a head start on
everyone else. But on places like Twitter and Reddit, there are people posting like these
amazing images, but because of the nature of social media, it's all lost. So I started trying
to like collect these cool examples and these cool terms people were using to create these,
like, amazing visual effects. So I started putting everything in it. It's a lot of
essentially like a slide deck. By the time I'd copied and pasted it all these cool things,
I've seen there's 80, 100 slides long, something like that. So that I rather grandly called it
a book and shared it online. And it's just a jumping off point for people to realize the kind of
stuff at the time that these tools were just about becoming capable of. Obviously now the
capabilities, even more advanced. And we'll get into that because within six months,
it's crazy to see how these tools, the way people are using these tools, how that's all
changed in a matter of, again, just six months, it feels like yesterday when we didn't even
have access to this. But this idea that these are tools and just like any other tool,
person A versus person B, may not get the same result. They may not have the same understanding
of how to leverage the tool. And so before we get into maybe the tips and tricks that
you've learned, I just want to give the audience a broad sense of how much time you've spent
within the bowels of Mid-Journey, Dali, stable diffusion.
Like, if you could give an estimate,
how much time do you think you've spent
kind of mastering this idea of prompting?
I wouldn't say I'm a master in any sense.
It's like so engaging and interesting to experiment with these goals.
So, you know, like in the last six months, sure, like a couple of hundred hours.
What I really admire is people that are using these tools
to create this like real body of work where they can really like,
pursue a direction to discover what's possible.
I think I saw a thread where in a, I think it's a mid-journey,
you can get it to tell you how many prompts you've ever done,
and there are people in the thousands, hundreds of thousands.
Yeah, and I appreciate how humble you are,
but I think it's one of those scenarios where, again, we're six months in.
You know, a parallel is when there's a new coding language,
and then you see people write job descriptions for developers
looking for someone with five years' experience
when that particular language has only been around for six months or a year.
And so, yes, I don't think anyone could definitively say they're an expert in prompt engineering, partially because it's only been around for so long.
But I do think you've at least shared a lot more than the average person.
And given your experience with these tools, I'm curious if you see a parallel skill set where you can kind of compare prompt engineering to learning to code.
Is it similar to being able to storytell effectively?
Is it similar to being able to process numbers in an Excel sheet?
Like, is there a parallel skill set where it reminds you of, you know, something you've done before?
I think there was an era.
I don't know if we're still in it where there was a certain category of person who could consider themselves like good at Googling stuff.
Do you know that kind of like, oh, file type this?
And there's this big debate over whether, especially in text it image, you know, is there really like any artistry to it?
For me, I'm not so sure because I'm no artist.
But there's definitely something.
It's always about discovering an image that's already.
out there. You've just got to find the words that summon it forth as if you're kind of
navigating like an infinite Pinterest of things that haven't quite existed until you manifested
them. Well, I mean, to that point, like we have so much information online. I feel like that is a
skill set. Even before these AI tools, like I used to work on a product called Trends, and that
really was about using the right tools like subreddit stats or ATRAFs or other data sets online
and learning to parse them and learning to surface what other people find interesting. But
let's get into the nitty-gritty. You wrote this prompt book. You've been playing around with these tools for quite some time. Are there certain learnings, maybe the 80-20 approach of becoming a good prompt engineer in terms of things that you think are really valuable to understand? Maybe it's the prompt length. Maybe it's using certain modifiers within your prompt. Maybe it's just like a framework for thinking about prompting. Is there anything that's surface that you think would be really valuable to someone who's just starting out with prompting? Oh yeah. Like I think if you've never used
one before, like the best way to explain how they work at the moment, which is, again,
always shifting and something else we can talk about, is to always like describe something as
if it already exists. So imagine that it's an image in some kind of downloadable clip art
library or a photography gallery. And, you know, someone's written underneath, oh, this is a
fine example of a early modern photography shot. And those are the kind of descriptions that you're
trying to kind of mimic to tell these tools what you're looking for.
And it also gives it like a natural sense of why these tools are bad at some things
and the kind of prompts that don't really work.
Because if there's like a, let's say, some archive image of some women celebrating
on the steps of a church in 1972, it will have that kind of caption where they never go,
the woman on the left is wearing a yellow hat.
The woman on the right is wearing, you know, they just don't go into that car because you can see it.
So, ironically, they often describe very generally what the image is about, but not like how you would draw it step by step.
And that's why these tools are less good at saying, like, I want this thing over here and then that thing next to it and then something on top.
And that thing should be much bigger because that's in the real life.
That's not how images are described in language.
So you'll find yourself next time you're in like an art museum or in a book or it's really looking now at those little panels next to it and being like,
okay, that's what like a critic on glass looks like, I'll remember that.
Yeah, that's a really good point, though, because that's how these AIs were trained, right?
So I think Dali trained on 600 plus million images, and they're using that alt text, again, that descriptor.
And I've never thought about it that way, but actually training yourself to become a good prompter by reviewing the inputs to the tool, which I've never done this before, but I can imagine someone literally going online and reading the alt text on different images and going.
oh, this is how this was described.
This is how an AI might interpret my future prompt.
Yeah.
And I think to your point also,
it's something that I've learned from my very limited set of prompting,
is just the level of detail that you need with your prompt,
where when I first started, I'm like, you know, monkey wearing a hat.
Yeah, yeah.
And, you know, you don't even realize until you start prompting
the many iterations that could come from that.
Like you have one image in your head,
but then you get back all of these different results.
and then you end up looking on different prompt search engines or libraries
and seeing what other people are doing,
you're like, this prompt is like 200 words.
I would have never thought to do that.
And I think there's something to be said.
I think the longer they are,
there's definitely diminishing returns,
but sometimes using a lot of related, almost synonymous terms,
just like chucking in loads of like, you know,
detail, techie, like photography language
is all kind of pushing it in the direction of,
like, wow, this really sounds like a kind of a real fancy.
As I went through your prompt book, there were so many different ways that you could describe
a shot. You could say a different camera angle. You could say a time period, as you just spoke
to, you could say a specific type of artistry or even a specific artist. I know there's some
controversy around using specific artists work to train your new images. But let's look forward
to today. I feel like, as we talked about, six months later, these tools have evolved a lot.
Are there any different ways that you can prompt today or leverage these tools that didn't exist six months ago that are really important and maybe extending the way that you can use them?
100%. So the main one, and these things are like changing all the time, right?
But now there's increasingly tools where you can prompt with an image. Again, that's almost like an entire new field of exploration because it's not combining the image with your.
words in the way you would expect something like Photoshop to do it, like it's not collaging
them together. It's almost describing to itself this like source image with words and then
doing the same with like a second image or maybe some additional text you supply and then being like,
okay, now I'm going to make a new picture that somehow represents both these things. So the results
can be really surprising, really unexpected, probably quite difficult to control. But then you
potentially have interesting opportunities like, okay, I can make a load of kind of abstract stuff.
using my brand colors or something that's important to me, photos of me, who knows?
And then, yeah, and then I'm going to use that and kind of multiply that visual base with
custom other prompts. And then everything will have this kind of lightness.
And then, of course, like the big thing that happened since the days of the prompt book and
so on was, of course, that huge spike in interest in selfies, right?
Like the lenses and the profile picture.a.i and there were like a dozen of them,
Which was just prompting with your face,
being like, yeah, I want to see more of this guy,
because it's me, obviously.
And then within the image-to-image space,
you've now got other startups that are doing interesting things
where, okay, give us 10 core images,
and now we'll generate you, like, infinite versions of that
based on, like, the modifiers that you want to see.
So there's all kinds.
So that's a really interesting space
that's going to probably power, like,
the next generation of how people,
especially consumers, interact with these products.
Yeah, one way that maybe you could put it
is that when we first,
got access to these tools. You were really starting from scratch. You didn't even have the prompt
libraries available to you. You were just like, okay, I have this image in my head. But today,
you not only have those libraries, you also have images that you can input. So you're not starting
from scratch. You have a baseline of, as you said, maybe it's brand colors. Maybe it's a certain
style. And instead of having to articulate that yourself, you can just say, hey, here's what I want.
But to your point, sometimes it's hard to control, right? Because you're trying to say something to the
AI, you're trying to say, I want this output. You don't always get it. And so something I want to ask
you about is how you've learned to rein that in, to really, you know, on the whole, get a higher
throughput of images that you want versus images you don't want over time because these AIs,
they are a little bit of a black box, right? You can't understand every little piece that went
from your input to your output. And so you can't like find you in it in the same ways as maybe some
other skills that we've learned in the past.
And so how have you learned to actually become a better prompt engineer, given that
black box nature?
I mean, I think another aspect is there's also like a random element.
So if you and I both type in the same thing, it's not going to make the same picture
because it kind of starts from this random cloud of noise, and your cloud of noise is different
to mine.
And then it's slowly turning these clouds more and more into something that looks like
orangutan in a tuxedo, but we're going to end up with different things.
So that's really frustrating when you're like testing things because was it good or did you just get lucky or alternatively if you're not seeing what you're expected, should you just hit it again and again?
And then when you see someone else has made something really cool, did they do something really clever or, you know, is it like a persistent thing?
I have found myself in that exact spot where I have an idea for what I want.
It's not something that is super important where I need to nail it.
So I'm just, I just need it close enough.
And I'm getting these results and they're getting a little close.
and closer and closer, but I have found myself in that spot where I'm just like,
let's just generate it again. Like, if I do this enough times, I'll eventually get to
something that's workable. So do you have any thoughts there in terms of like how you don't
end up in that spot where you're just like hoping for a better image? You're kind of like
pulling the AI slot machine, if you will? No. I mean, I think unless you kind of have evidence,
I think it's why some of these like other tools and communities are so important, you know,
where you see lots of other people's work is, you know, if you can see someone else's
done it.
Ideally, you can also see the prompt they use and work out how they did it, but even
it's not, then you're like, okay, I can get there.
Also, you run into these things where you would think it's like the most simple thing.
And then you're like, it doesn't know what a hot dog is.
It just doesn't understand the rules of how, of like what, you know, physically what
can and can't that look like.
And you're like trying and it's like, now the sausage is a right angle.
The bun has ears because it's starting to throw in some like dashing,
like, you know, aesthetic. And then you're like, minus, minus, no, no dashing. That's kind of the
limitation of weather technology is at the moment, which is it's amazing until you're trying to
do something very specific. And especially if you want to do something very specific,
this also to a very high professional standard. Well, I'm glad you even mentioned the negative
queries. That's something I think a lot of people don't know is that you can say, hey, AI,
I don't want this. It doesn't always manage to still generate what you're looking for. But there's
also almost like these glitches. One of them that is kind of infamous now is hands, right? So you can
generate these beautiful images of these Instagram looking models and you can put them in all these
different backgrounds and you're like, wow, this is amazing. And then it's always like, well,
look at the hands, which is kind of funny. I feel like it's, it's like the perfect manifestation of
how technology always is like much better in one direction when it's invented. But there's always like
these things that need to be iterated on. And so are there other things worth
knowing about whether it's these negative prompts, whether it's these glitches that are still in
the matrix, what would you call out from your, again, many hours of being deep in these tools?
I think it depends on the model. One example is when Dali came out, and there's still the case
as far as I know, it's not very good at understanding that it's drawing things in a square.
If you're drawing a person, it's often going to have, like, its feet and its head cut off
because it's seeing those in portrait photos. But one thing you could do with Dally is you can
actually upload, like, an image to, like, do variations of. And if you, you,
upload an image that's just like a little white border, then it knows that nothing can go there.
That kind of encourages it, forces it to kind of think inside the box, if you will.
But then, of course, you have now tools like Mid Journey, who've been iterating on their
text or image model a lot more aggressively than Open AI, who understandably, I think,
maybe have some other things in the cooker, you know, which have grown that into the model
itself.
So when you type things in, it knows it's a square and actually it will sometimes do quite
clever things in order to fit it in that space.
So if you ask for kind of like a group selfie of three people
on something like Dali, that's going to be cut off at the end
because it's used to seeing someone taking like disposable camera photo,
whereas Middney is clever enough to know that one of them
kind of needs to be standing behind the other or like leaning in from the side.
So it's kind of clever how they've managed to like solve that composition problem
within the AI, which then, you know, the prompt engineering thing, I think is
understanding the possibilities and the limitations of where you are at the moment.
Meanwhile, there's these other people who are doing some like very,
very technically serious work to kind of
made those limitations kind of no longer
relevant. Yeah. Well,
I'm glad you brought up the differences between
these different tools. So if we
talk about just stable diffusion, mid-Journey, and
Dolly, I feel like are three that a lot of people are
familiar with. Yeah. Would you
liken the ability to prompt within
each of these more like the
difference between Excel and
Google Sheets, where if you know how to
use Excel, you really can drop right into
Google Sheets and it's relatively straightforward.
You might have to switch up your
shortcuts a little bit or learn one little thing here and there, but for the most part, you can again
drop from one to the other, or would you liken them more to learning to speak different languages?
It's not that different. I think the principles are like very similar. And then the nuances
of each are slightly different. So I think now if you went from Daly to Mid Journey, it would be
like amazing. And then if you went back in the other direction, you'd be like, it doesn't do what
I want. But that's because Mid Journey is doing so much of the heavy lifting.
to help you make something really good.
If you are using the tools to create some very specific effect,
imagine that I guess, yeah, like a very complicated Excel formula,
that would not have the exact same output in the other tool,
if you know what I mean,
because they're trained on like a different set of images,
stable diffusion, I think it's on $5 billion
for what things look like learning,
and then like a smaller set of like $12 million
for the what does nice look like.
And then the fine tuning that's happened on the top
and how they've optimized it in the later phases.
a technical element that escaped me slightly.
You know, they have made different creative decisions there.
And it's maybe like driving a different car.
Okay.
If you, like, floor the accelerator in various different cars,
some are going to take off, some are going to trundle along.
For good analogy.
Do you also find that, I mean, we've talked already about this idea
where sometimes it's pretty easy to get to that 80%,
but then that final 20%, the real refinement
to get to exactly what you pictured in your head
or exactly what you want and didn't picture in your head,
sometimes requires another tool.
And so have you found,
I've heard some people are using Facetune
or different AIs to take it to the final level
or I guess you could also use in-painting
and outpainting a little more discreetly.
So how have you found the relationship
of maybe one tool to the suite of other tools
that exist out there?
I think there's lots of exciting crossovers.
But actually, I kind of think
it's a big opportunity for the Photoshop's of this world
because those are tools that presuppose
you have some kind of original image
to manipulate, whereas now there's a huge amount of raw, but maybe not perfect material
that for people to work with. There's lots of things also that I've been trying to do in prompting
that are actually more easily achieved in other tools. So you can, you know, spend ages trying
to get this kind of vintage film look. But if you're like an Instagram influencer,
which I'm sure you are. Who isn't? But there's loads of iPhone apps, right, that are out there
just to like give all your photos that kind of like dreamy vintage film look. Yeah. I mean,
I think back in July when you first wrote your prompt book,
you had a requested feature list for Dolly 2.
But are there things that are on your new list of,
hey, these tools are great,
but they're missing XYZ or they're lacking in these areas.
This would be top of my list to see improved on.
I think we're going to see more models come out.
I mean, the fact that stable diffusion is open source.
It means that lots of other things are going to be built on top of that.
And I think it's going to be really exciting to see some of the directions
that people take that in.
either kind of on an individual sort of pro semen level,
people building their own models to create their own stuff,
more likely some bigger organizations training it for specific purposes.
The whole challenge and the whole opportunity,
I think at the moment, is like how do you go beyond the text box?
How do you go beyond this like just blank rectangle
to create something that is more user-friendly,
that's more inspiring, that's more how people think?
because on the one hand, if you're not an artist,
the ability to describe things with words is definitely a big step forward.
But if you think about the next layer,
is it's still quite hard to describe things with words.
Designers, when they do work for clients,
like it's one of their pet peeves because clients don't like it,
but they can't explain why or what they want different.
They're like, oh, I want it to be more, do you know what I mean?
Like more, and they're like, I don't know.
I don't know what that means, which is basically the position these, you know,
AI models it in. So could you see like a conversational interface? Can you do the generations fast enough
that you're always showing people multiple options, possible new directions? It's almost like in a sort of
multidimensional space where it's like, do you want to take it more this way or more this way?
You know, part of the prompt book is I didn't know what metaphysical painting or codicrome or all these
things were, but those at least have names. But there's probably other aesthetics, right? Other styles
that we don't have actually words for. It's like, you know, that kind of.
of gritty, but like modern gritty, like almost like shiny gritty. Like the grit has a shine on
it. And probably I can make you a mood board of that. And you'd be like, oh yeah, like that's
a thing. But there's no word for it. So if you can create ways of unleashing the inexplicable,
the undefinable, that's the exciting thing about vision art, is to express things or moods
or things that you can't quite put into words. I totally have my mind spinning, thinking of different
ideas. A couple of them that came to mind. One of them is just a better onboarding experience.
one where you're guiding the new prompter to understand how all these things might fit together to your point.
Like, try this. Oh, look at what you got here. Oh, did you notice how when you use these two prompts together, this one kind of overshadows the other?
Maybe there's a third word that's a synonym of this. And I think you've kind of done this on your own by just going through and prompting like crazy going through these different prompt libraries and trying to sort through the signal from the noise.
But I do think any one of these models or maybe the UI built on top could have just a much better onboarding experience so that people come into the tool, to your point, with just a better understanding of what they should be paying attention to.
And then I also, in terms of these visual styles, I mean, it reminds me of a lot of Instagram influencers for a period of time were selling these filters because they had figured out the precise tuning of every little variable, which sounds easy.
but I had tried to do it myself.
I never managed to create good lightroom filters,
but people had, and they would sell them.
And so I wonder if you'll see the same thing
where maybe someone creates kind of like a zip file of a mood board,
and then they train the AI in some way
that does make it, I guess, play nice
with that particular concept
that you can't distill necessarily into a single term.
Yeah, because you had that breakthrough.
Someone did a paper on it,
and I think it's almost kind of what led to that selfie crais,
which was that you don't need to put you photos of Steph in that original 600 million training data
or wait for the next time we do that again for it to teach it what you look like.
There's this kind of embedding trick where you can show it like a bunch of photos of you
and then you can refer to you and it knows how to kind of recreate that.
And there was also an interesting thing in the same paper but hasn't really been used
or like commercialized in the same way, which is to do that with style.
So rather than show it, yeah, this is what this person looks like.
It's like this is what the style of blah blah, blah is called, here it is, and then off you go,
which obviously has all kinds of potentially shady legal qualifications.
But let's assume this is a lovely art we've made ourselves.
Yeah.
Well, no, I mean, to the idea of honing in a style, I do wish there was a version of the product
where I could go and, like we've talked about, maybe upload certain brand images or certain
brand colors, and then have it iterate with me where it shows me a bunch of images.
And I say, it's okay, but I want a little more of.
this color. And then we keep doing that to the point where I get a bunch of images where I'm like,
yes, this is the style. You can lock that in. You lock it into a variable that you can then
plug into future prompts. I've definitely seen there's some people out there that have managed to
lock in a particular look. And now every blog post they have, always the same kind of thing.
And that's like pretty cool. But we haven't seen that always built into the like foundation models
yet as like a way of interacting with it. And then there are some startups like scenario,
which is doing it for game assets, and then Leonardo, which is like more multi-purpose, I think,
or is just positioning itself that way, which is again all about can you, like, control things down
to, like, consistent look. Yeah. So what we've talked about so far is this idea of controlling the
AI, but I also like to think about the ways that when you work with these different models,
you learn more about your own creativity. The example that it reminds me of is in chess when we
finally built the bots that were better than humans in chess, not only were we surprised by
the fact that that could happen, but we were also surprised by all of the different openings or
moves that humans in their thousands of years playing chess had never considered that were better
than some of the moves that we, even the best chess players in the world, had used. And so,
have you seen any of that, even from a personal experience level, like where you're in these
tools and you're playing around and you're learning with the model, if that makes sense,
it's almost surfacing things that you had never considered before. I like that. I think whenever
you're using these tools, you have these two modes, right, where you're either like waiting to
see what it shows you or you kind of are visualizing it in your minds and you're like, no,
not that, not that. But if you just let it take you where it wants to go, then you're suddenly like,
I have no idea what I'm looking at. But apparently I'm here with Dally, there's like this
variations tool. So you just get it to, let's show an image. You'll be like, here's
four more that are kind of the same. But obviously over time, if you leap and leap and leap and
leap, you end up on this like completely bizarre visual journey, like a psychedelic dream.
It's fun to play around in these tools. But ultimately, while there is a market for just
interesting art in the world, a lot of this will need to ladder back into, you know, whether
it's blog post sharing images, whether it's creating the next sneaker design that you end up
selling. Are there areas that you've seen really emerge from this where people are using these
tools today and applying them to, again, what someone might call a practical use case? And in addition
to maybe what you've seen so far, are there other areas where you're excited to see this be applied?
It's interesting, isn't it? Because I think especially given the tenor of the conversation
around these tools and the ethical and legal aspects they're in, I suspect that to
an extent when you see these things used, especially in prominent context,
they might not be advertised as such.
Much as like green screen, right?
When green screen is used in films, you shouldn't be like,
that is an amazing use of green screen.
You should just be like, oh, my God, like he's dangling off a thing.
Oh, this must have cost millions.
So I think, you know, when we see AI tools used in lots of context,
not this is covered up, but, you know, they might obviously be just a narrow part of the creative
process.
They might be all of it, but it's kind of hidden.
I raised this point online
I think that you were making
which is like well where is this all going
like will it ever make images good enough
and will other people want to look at them
because it's not like we have this huge history
of like logging in to social media
and looking at just like abstract pictures
like oh a force on a surfboard
I mean things tend to have like a grounding in reality right
that's what makes them viral or interesting
but then someone was like no like maybe this
it won't be that it's going to make
content so good that it's like
better than Netflix or better than Instagram is the hobby of doing it. That's the entertainment.
Well, I mean, there are skills out there to your point where writing, as an example,
some people just like to write to write. And whether other people read it, it doesn't matter,
they actually enjoy the process. And so I definitely could see an entertainment angle.
But a lot of people really hate writing. And a lot of people find value in the money that they
get paid to write or the writing is used within a script, which then is published on Netflix.
And so it's like, how is this stuff used in the wider world, whether it's on an e-commerce website, whether it's one day integrating with 3D printing and like the stuff that you generate in Mid-Journey, then can actually be printed into like a real-life product that you sell?
Oh, actually, this isn't just a gimmick. This isn't just a toy.
There's this very high level kind of debate around artistry, I suppose, and as if everything is either going to be like in the Louvre or.
I'm not saying that right.
In the Tate, I'm from London.
Or, you know, or in the bin.
But ultimately, if you look around just any space that you're in
and look at everything that has like a visual component
or like a design component,
there's so many different levels at which we engage with art,
you know, like the pattern on a cushion,
the warning label, on the coffee maker,
the sausage dog on a card.
They're all different things.
there's something where the human touch is like literally the point.
But other things, it's like a soothing pattern to look at so that your wall isn't just gray.
And so there's all kinds of layers in between.
And I think we'll see them used in more and more different situations.
The final thing I want to ask you about is how this all fits into the wider skill set that people might have.
So on one hand, I can see how there might be an argument that this idea of the prompt engineer is going to be one that only few can do really well.
Right? People are really going to master this skill set and they're going to be much more valuable than the people who don't know how to prompt well. But then I can also see an argument where, as you said, maybe this gets abstracted and we have great UIs where truly it becomes the type of thing where basically anyone can do it and anyone can do it pretty reasonably well. And it just becomes, you know, similar to being able to write and read. These are just kind of fundamental, elemental skills that are in everyone's skill sets. They're taught in schools. Where do you sit with that in terms of how you see this
progressing. Like, is it worthwhile you could also position the question as to become an excellent
top 1% prompt engineer? Or is it like, oh, everyone should kind of have this in their toolbox?
Well, that depends. I think on the one hand, there's obviously every incentive for the people
that make these foundational tools to make prompt engineering, for instance, not a thing.
Because they want everyone to be able to do it, right? They naturally want.
want to de-conplexify the tools that they're offering.
Again, if you look at the most recent model of mid-jurney,
like version four, stuff that would not have been even possible six months ago,
you can literally do the thing where you type in, like,
I remember because I posted one, someone was arguing about it,
and I was like, look at this space stuff,
I just typed in space stuff.
And it's like this amazing astronaut duck.
And he said, there's no way you just typed that in.
So I went back and checked and I was like, no, I lied.
I actually typed in a really cool space dark.
But at the same time, with any material,
like artistic or otherwise, if you push links to the boundary, there's always going to be people,
like someone that explores everything that's possible or like just iterates, iterates, iterates or
something, they're obviously going to explore further on the map of what's possible than someone
that isn't. So I don't think it will become like this necessary skill that everyone needs to
have, but I do think it will become, you know, like some people that are expert woodwittlers
or really good at animating hair or whatever, you know, the people that develop a real like
passion or do some of the most amazing things. And then there's also the kind of the secret
prompting, I guess, like a copywriting thing would be like the obvious example at the moment.
You think you were typing something into a UX, but really there's something else wrapping that in
a prompt and then sending it to like a foundational model. So there's probably going to be some
people whose job is to like come up with that layer of thing that the consumer or the average
person is never seeing. And they think they're just talking to the AI, but really they're talking to
this thing that then adds a little bit of juz to it and then tells the AI that.
This is going to be a tangent, but it reminds me of I just listened to a reply
out episode where someone had remembered this song from his childhood and they were trying
to figure out what it was. You've heard this episode. If people haven't, it's one of the best
quotes. That's the only one, but it was so famous. Yeah, of all time. Isn't it such a good
listen? Yes, but it reminds me of, do you remember in the episode,
they find this lady who is a music producer, but she is a music producer for
specifically people who want to create music like the bare naked ladies.
And it's like, you know, people have jobs like this when you grow up and you're in school and they tell you, you know, you could be a doctor one day, you could be a teacher one day.
They don't tell you you could be a music producer for musicians that want to sound like the bare naked ladies.
And it makes me wonder or think about, you know, what specific niches are people going to go into within the realm of problem engineering, right?
Like maybe you specialize, as you said, in hair, maybe in hands, maybe in something for enterprise.
price SaaS companies. I don't know. It's kind of hard to predict at this point since we're so
early. But yeah, I think you're right that there's going to be, I guess, kind of a bimodal nature
to it. It does seem like the kind of tool that's going to be on everyone's desktop. But it does
also seem like there is this opportunity to become, as someone might say, like a 10x prompt
engineer. Yeah, but I think that's interesting, isn't it? Because that's such a tech world
metaphor, like the notion of 10x. Because it even implies there's a scale where you can have one.
therefore you can have 10 of it, which in the record industry, do people talk about being
like a 10x recording engineer? Obviously, some recording engineers are like famous and better than
others. And there's all this kind of talent. But I don't know if people are like, yeah, like I'm a
10x. But yeah, just like producers and all the kind of people that go into making, I think,
music or film, you know, that huge list of people you see at the end of every movie,
where you discover a whole new world of careers that you might have had. I'll unfortunately
never be a best boy, but I'm still hoping to be a gaffer. Then, you know, there'll be all those
kinds of jobs, I think, in the AI, the creative AI industry. You know, your point on the spectrum
of like, what is 1x and what is 10x? What is the most popular piece of, you could say, art
or imagery that is shared online? Like, what comes to mind for you there? I don't know. You said
that as if you know the aunt. Well, I have an answer. What comes to mind for you there?
Photos of party?
So I don't know if this is actually the most,
but what comes to mind for me,
at least as someone who spends a lot of time on Twitter,
is memes.
And memes are like the most basic kind of imagery ever.
It's like literally an image with like some capitalized text tossed on it.
And your point just reminded me of this idea where art especially is subjective
and what people like and resonate with is not necessarily the most refined
or extravagant, precise type of imagery,
which you can generate in some of these text-to-image tools,
but it doesn't necessarily mean that people are going to resonate with it.
Exactly.
I mean, until they invent an AI that can do 10x memes,
which is the last thing we need.
This is really fun guy.
I loved hearing about where you see this industry,
this skill set moving.
We will definitely share the prompt book link in the show notes
because I think people can benefit from seeing the different types of
modifiers that you can include in a prompt and also a link to your social because you're
constantly sharing new hacks, new things that you're discovering. But yeah, any other places that
people should look to find you or your work? You can find me on Twitter at Guy P, G-U-I-P,
and you can find my substack when I finally post at promptresponse.com. Awesome. Thanks for doing
this. Thank you so much for having me. It was a lovely to meet you. I'm glad we could do this.
listening to the A16Z podcast. If you like this episode, don't forget to subscribe, leave a review, or tell a friend.
We also recently launched on YouTube at YouTube.com slash A16Z underscore video, where you'll find
exclusive video content. We'll see you next time.