The a16z Show - Unlocking Creativity with Prompt Engineering
Episode Date: March 9, 2023With every new technology, some jobs are lost while others are gained. People often focus on the former, but in this episode we chose to highlight the latter – a highly creative role that emerges al...ongside AI: the prompt engineer.Until AI can close the loop of its own, each tool still requires a set of prompts. Just like a composer feeds an instrument the notes to play, a prompt engineer feeds an AI a map of what to produce. And if we know anything from music it’s that composing great music takes great skill!In this episode we explore the emerging importance of prompting with Guy Parsons, the early learnings of how to do it effectively, and where this field might be going.Will the prompt engineer be more like the highly sought after DevOps engineer, or a proficiency like Excel that you find on every resume? Listen in to hear Guy’s take.Interested in the prompt competition? Email us at podpitches@a16z.com.Resources:DALL-E 2 Prompt Book: https://dallery.gallery/the-dalle-2-prompt-book/Find Guy on Twitter: https://twitter.com/GuyPGuy’s combining image experiment: https://twitter.com/GuyP/status/1612880405207580672Guy’s amorphous prompt experiment: https://twitter.com/GuyP/status/1608475973300948993Guy’s space duck: https://twitter.com/GuyP/status/1601342688225525761Prompt base: https://promptbase.com/Lexica: https://lexica.art/ Topics Covered: 0:00 - Introduction01:49 - DALL-E 2 Prompt Book05:29 - Parallel skills06:51 - 80/20 prompting10:16 - New ways of prompting13:44 - Pulling the AI slot machine18:09 - Comparing models21:04 - Requested features26:34 - Learning with AI27:58 - Practical use cases32:08 - A top 1% prompt engineer36:17 - The most popular images Stay Updated: Find us on Twitter: https://twitter.com/a16zFind us on LinkedIn: https://www.linkedin.com/company/a16zSubscribe on your favorite podcast app: https://a16z.simplecast.com/Follow our host: https://twitter.com/stephsmithioPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. For more details please see a16z.com/disclosures. Stay Updated:Find a16z on YouTube: YouTubeFind a16z on XFind a16z on LinkedInListen to the a16z Show on SpotifyListen to the a16z Show on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Transcript
Discussion (0)
If you think about the next layer, is it's still quite hard to describe things with words.
Designers, when they do work for clients, like it's one of their pet peeves because clients don't like it, but they can't explain why.
With every new technology, some jobs are lost while others are gained.
And while people often focus on the former, in this episode we're highlighting the latter,
a highly creative role that emerges alongside AI, the prompt engineer.
Until AI can close the loop of its own, each tool still requires a set of,
of prompts. And just like a composer feeds an instrument, a set of notes to play, a prompt engineer
feeds the AI a map of what to produce. And if we know anything for music, it's that composing
great music takes great skill. So in this episode, we dive into the emerging importance of prompting,
the early learnings and how to do it effectively, and also where this field might be heading. And we do
so with Guy Parsons. Guy has been an early mover on the text image AI space, having written the
Dolly 2 prompt book in July of last year. So will the prompt engine,
be more like the highly sought-after DevOps engineer or a proficiency like Excel that you find on
every resume.
Listen in to hear guys take.
By the way, we're thinking of running a prompt competition coming up.
So if you think you have what it takes, email us at podpitches at A660.com with the subject, prompt engineer.
As a reminder, the content here is for informational purposes only.
Should not be taken as legal business tax or investment advice or be used to evaluate any investment
or security and is not directed at any investors or potential investors in any A16Z fund.
For more details, please see A6Cc.com slash disclosures.
Guy, welcome to the show.
Thank you for having me.
I'm excited to be here.
When we originally reached out to you, it was around six months ago and you had just
written something called your prompt book.
Why don't you give everyone a little bit of an idea of what that prompt book was,
what it is now, and also what prompted you to want to write it in the first place?
This was in the initial the heyday of Dali 2, which was Open AI's text image model.
When it came out, they rolled it out to a few test people at a time.
They were super cautious about how it might be misused, how it could end up having a backlash,
all these kinds of things, which then only increased the sense of people wanting to get their hands on this thing.
Because at the time, this was pre-things that you might think of now as stable diffusion,
mid-journey kind of predated those by some small margin
and seemed way ahead of anything people have tried using before.
So yeah, if you've used a text image AI, by now, you know it's basically a text box
and it all comes down to what you type in.
It doesn't have buttons and all the kind of controls you might expect
when you lock into something like Photoshop.
So the question then becomes like a lot of people,
once mind goes blank or you don't actually know the name or the words of what you're trying
to type in, right?
If you've actually been to art school or you're up on your art history or in your design language,
then you've probably got a head start on everyone else.
But on places like Twitter and Reddit, there are people posting these amazing images.
But because of the nature of social media, it's all lost.
So I started trying to like collect these cool examples and these cool terms people were using
to create these amazing visual effects.
So I started putting everything in it essentially like a slide deck.
By the time I'd copied and pasted all these cool things I've seen,
there was 80, 100 slides long, something like that.
So that I rather grandly called it a book and shared it online.
And it's just a jumping off point for people to realize the kind of stuff at the time that these tools were just about becoming capable of.
Obviously, now they're capable of these.
Even more advanced.
And we'll get into that because within six months, it's crazy to see how these tools, the way people are using these tools,
how that's all changed in a matter of, again, just six months, it feels like yesterday when we didn't even have access to this.
But this idea that these are tools and just like any other tool, person A versus person B may not get the same result.
They may not have the same understanding of how to leverage the tool.
And so before we get into maybe the tips and tricks that you've learned, I just want to give the audience a broad sense of how much time you've spent within the bowels of mid-journey, Dali, stable diffusion.
Like, if you could give an estimate, how much time do you think you've spent kind of mastering this idea of prompting?
I wouldn't say I'm a master in any sense.
It's like so engaging and interesting to experiment with these calls.
So, you know, like in the last six months, sure, like a couple of hundred hours.
What I really admire is people that are using these tools to create this like real body of work
where they can really like pursue a direction to discover what's possible.
I think I saw a thread where in a, I think it's a mid-journey, you can get it to tell you how many prompts you've ever done.
and there are people in there thousands, hundreds of thousands.
Yeah, and I appreciate how humble you are,
but I think it's one of those scenarios where, again, we're six months in.
You know, a parallel is when there's a new coding language,
and then you see people write job descriptions for developers
looking for someone with five years' experience
when that particular language has only been around for six months or a year.
And so, yes, I don't think anyone could definitively say
they're an expert in prompt engineering,
partially because it's only been around for so long.
But I do think you've at least shared a lot more
than the average person.
And given your experience with these tools,
I'm curious if you see a parallel skill set
where you can kind of compare prompt engineering to learning to code.
Is it similar to being able to storytell effectively?
Is it similar to being able to process numbers in an Excel sheet?
Is there a parallel skill set where it reminds you of, you know,
something you've done before?
I think there was an era.
I don't know if we're still in it,
where there was a certain category of person who could consider themselves,
was like good at Googling stuff?
Do you know that kind of like, oh, file type this?
And there's this big debate over whether, especially in text it image, you know,
is there really like any artistry to it?
For me, I'm not so sure because I'm no artist.
But there's definitely something.
It's always about discovering an image that's already out there.
You've just got to find the words that summon it forth as if you're kind of navigating
like an infinite Pinterest of things that haven't quite existed until you manifested them.
Well, I mean, to that point.
We have so much information online.
I feel like that is a skill set, even before these AI tools.
Like, I used to work on a product called Trends,
and that really was about using the right tools like subreddit stats or ATRAFs
or other data sets online and learning to parse them
and learning to surface what other people find interesting.
But let's get into the nitty-gritty.
Like, you wrote this prompt book.
You've been playing around with these tools for quite some time.
Are there certain learnings, maybe the 80-20 approach of becoming a good prompt engineer
in terms of things that you think are really valuable to understand.
Maybe it's the prompt length.
Maybe it's using certain modifiers within your prompt.
Maybe it's just like a framework for thinking about prompting.
Is there anything that's surfaced that you think would be really valuable
to someone who's just starting out with prompting?
Oh, yeah.
Like, I think if you've never used one before, like the best way to explain how they work at the moment,
which is, again, always shifting and something else we can talk about,
is to always like describe something as if it already exists.
So imagine that it's an image in some kind of downloadable clip art library or a photography gallery.
And you know, someone's written underneath, oh, this is a fine example of a early modern photography shot.
And those are the kind of descriptions that you're trying to kind of mimic to tell these tools what you're looking for.
And it also gives it like a natural sense of why these tools are bad at some things.
and the kind of problems that don't really work.
Because if there's a like, let's say, some archive image of some women celebrating
on the steps of a church in 1972, it would have that kind of caption,
where they never go, the woman on the left is wearing a yellow hat.
The woman on the right is wearing, you know, they just don't go into that car because you can see it.
So ironically, they often describe very generally what the image is about,
but not like how you would draw it step by step.
That's why these tools are less good at saying, like, I want this thing over here and then that thing next to it and then something on top.
And that thing should be much bigger because that's in the real life.
That's not how images are described in language.
You'll find yourself next time you're in like an art museum or in a book or it's really looking now at those little panels next to it and being like, oh, okay.
That's what like acrylic on glass looks like.
I'll remember that.
Yeah.
That's a really good point, though, because that's how these AIs were trained, right?
So I think Dali trained on 600 plus million images,
and they're using that alt text, again, that descriptor.
And I've never thought about it that way,
but actually training yourself to become a good promter
by reviewing the inputs to the tool,
which I've never done this before,
but I can imagine someone literally going online
and reading the alt text on different images
and going, ah, this is how this was described.
This is how an AI might interpret my future prompt.
Yeah.
And I think to your point also,
is something that I've learned from my very limited set of prompting is just the level of detail
that you need with your prompt where when I first started I'm like, you know, monkey wearing a hat.
Yeah, yeah.
And, you know, there's, you don't even realize until you start prompting the many iterations that could come from that.
Like you have one image in your head, but then you get back all of these different results.
And then you end up looking on different prompt search engines or libraries and seeing what other
people are doing, you're like, this prompt is like 200 words. I would have never thought to do that.
And I think there's something to be said. Like, I think the longer they are, there's definitely
diminishing returns. But sometimes using a lot of related, almost synonymous terms, just like chucking
in loads of like, you know, detail, techie, like photography language is all kind of pushing it
in a direction of like, wow, this really sounds like a kind of a real fancy. As I went through your
prompt book, there were so many different ways that you could describe a shot. You could say a
different camera angle. You could say a time period, as you just spoke to. You could say a specific
type of artistry or even a specific artist. I know there's some controversy around using
specific artists work to train your new images. But let's look forward to today. I feel like,
as we talked about, six months later, these tools have evolved a lot. Are there any different ways
that you can prompt today or leverage these tools,
that didn't exist six months ago that are really important
and maybe extending the way that you can use them.
100%.
So the main one,
and these things are like changing all the time, right?
But now there's increasingly tools where you can prompt with an image.
Again, that's almost like an entire new field of exploration
because it's not combining the image with your words
in the way you would expect something like Photoshop to do it.
Like it's not collaging them together.
It's almost describing to itself this like source image with words.
And then doing the same with like a second image or maybe some additional text you supply.
And then being like, okay, now I'm going to make a new picture that somehow represents both these things.
So the results can be really surprising, really unexpected, probably quite difficult to control.
But then you potentially have interesting opportunities like, okay, I can make a load of kind of abstract stuff using,
my brand colours or something that's important to me, photos of me, who knows?
And then, yeah, and then I'm going to use that and kind of multiply that visual base with
custom other prompts. And then everything will have this kind of lightness.
And then, of course, like the big thing that happened since the days of the prompt book
and so on was, of course, that huge spike in interest in selfies, right?
Like the lenses and the profile picture.com.
And there were like a dozen of them, which was just prompting with your face,
space being like, yeah, I want to see more of this guy, because it's me, obviously.
And then within the image-to-image space, you've now got other startups that are doing interesting
things where, okay, give us 10 core images and now we'll generate you like infinite versions of
that based on like the modifiers that you want to see. So there's all kinds. So that's a really
interesting space that's going to probably power like the next generation of how people, especially
consumers, interact with these products. Yeah. One way that maybe you could put it is that when we first
got access to these tools. You were really starting from scratch. You didn't even have the prompt
libraries available to you. You were just like, okay, I have this image in my head. But today,
you not only have those libraries, you also have images that you can input. So you're not starting
from scratch. You have a baseline of, as you said, maybe it's brand colors, maybe it's a certain
style. And instead of having to articulate that yourself, you can just say, hey, here's what I want.
But to your point, sometimes it's hard to control, right? Because you're trying to say something to the
AI, you're trying to say, I want this output. You don't always get it. And so something I want to ask
you about is how you've learned to rein that in, to really, you know, on the whole, get a higher
throughput of images that you want versus images you don't want over time because these AIs,
they are a little bit of a black box, right? You can't understand every little piece that went
from your input to your output. And so you can't like find you in it in the same ways as maybe some
other skills that we've learned in the past.
And so how have you learned to actually become a better prompt engineer,
given that black box nature?
I mean, I think another aspect is there's also like a random element.
So if you and I both type in the same thing,
it's not going to make the same picture because it kind of starts
from this random cloud of noise and your cloud of noise is different to mine.
And then it's slowly turning these clouds more and more
into something that looks like orangutan in a tuxedo.
But we're going to end up with different things.
So that's really frustrating when you're like testing things because was it good or did you just get lucky or alternatively if you're not seeing what you're expected?
Should you just hit it again and again?
And then when you see someone else has made something really cool, did they do something really clever or, you know, is it like a persistent thing?
I have found myself in that exact spot where I have an idea for what I want.
It's not something that is super important where I need to nail it.
So I'm just, I just need it close enough.
And I'm getting these results and they're getting a little close.
and closer and closer and closer.
But I have found myself in that spot
where I'm just like, let's just generate it again.
Like, if I do this enough times,
I'll eventually get to something that's workable.
So do you have any thoughts there in terms of, like,
how you don't end up in that spot
where you're just like hoping for a better image?
You're kind of like pulling the AI slot machine, if you will?
No.
I mean, I think unless you kind of have evidence,
I think it's why some of these like other tools and communities
are so important, you know,
where you see lots of other people's work,
is, you know,
If you can see someone else has done it,
ideally you can also see the prompt they use
and work out how they did it.
But even if not, then you're like,
okay, I can get there.
Also, you run into these things where you would think it's like the most simple thing.
And then you're like, it doesn't know what a hot dog is.
Like it just doesn't understand the rules.
Yeah.
Of like, you know, physically what can and can't that look like?
And you're like trying and it's like,
now the sausage is a right angle.
The bun has ears because it's starting to throw in some,
like dashoned like, you know, aesthetic.
And then you're like, minus, minus, no, no dashin.
That's kind of the limitation of weather technology is at the moment,
which is it's amazing until you're trying to do something very specific.
And especially if you want to do something very specific,
this also to a very high professional standard.
Well, I'm glad you even mentioned the negative queries.
That's something I think a lot of people don't know,
is that you can say, hey, AI, I don't want this.
It doesn't always manage to still generate what you're looking for.
But there's also almost like these glitches.
One of them that is kind of infamous now is hands, right?
So you can generate these beautiful images of these Instagram looking models.
And you can put them in all these different backgrounds.
And you're like, wow, this is amazing.
And then it's always like, well, look at the hands, which is kind of funny.
I feel like it's like the perfect manifestation of how technology always is like much better in one direction when it's invented.
But there's always like these things that need to be iterated on.
And so are there other things?
worth knowing about whether it's these negative prompts, whether it's these glitches that are still
in the matrix, what would you call out from your, again, many hours of being deep in these tools?
I think it depends on the model. One example is when Dali came out, and there's still the case as far as I
know, it's not very good at understanding that it's drawing things in a square. If you're drawing
a person, it's often going to have like its feet and its head cut off because it's seeing
those in portrait photos. But one thing you could do with Dally is you can actually upload like an
image to like do variations of. And if you upload an image that's just like a little white border,
then it knows that nothing can go there. That kind of encourages it, forces it to kind of think inside
the box, if you will. But then, of course, you have now tools like Mid Journey, who've been
iterating on their text to image model a lot more aggressively than Open AI, who understandably,
I think maybe have some other things in the in the cooker, you know, which have grown that into the model
itself. So when you type things in, it knows it's a square and actually it will sometimes do quite
clever things in order to fit it in that space. So if you ask for kind of like a group selfie of
three people on something like Daly, that's going to be cut off at the end because it's used to
seeing someone taking like disposable camera photo, whereas Midgian is clever enough to know that one
of them kind of needs to be standing behind the other or like leaning in from the side. So it's kind
of clever how they've managed to like solve that composition problem within the AI, which then,
you know, the prompt engineering thing I think is just understanding the possibilities and the
limitations of where you are at the moment.
Meanwhile, there's these other people who are doing some like very technically serious work
to kind of make those limitations kind of no longer relevant.
Yeah.
Well, I'm glad you brought up the differences between these different tools.
So if we talk about just stable diffusion, mid-jurney and Dolly, I feel like are three that a
lot of people are familiar with.
Yeah.
Would you liken the ability to prompt within each of these more like the difference between
Excel and Google Sheets, where if you know how to use Excel, you know, you know how to use Excel,
You really can drop right into Google Sheets and it's relatively straightforward.
You might have to switch up your shortcuts a little bit or learn one little thing here and there.
But for the most part, you can, again, drop from one to the other.
Or would you liken them more to learning to speak different languages?
It's not that different.
I think the principles are like very similar.
And then the nuances of each are slightly different.
So I think now if you went from Dali to Mid-Journey, it would be like amazing.
and then if you went back in the other direction,
you'd be like, it doesn't do what I want,
but that's because Mid-Journey is doing so much of the heavy lifting
to help you make something really good.
If you are using the tools to create some very specific effect,
imagine that I guess, yeah, like a very complicated Excel formula,
that would not have the exact same output in the other tool,
if you know what I mean,
because they're trained on like a different set of images,
stable diffusion, I think it's on $5 billion for what things look like learning,
and then like a smaller set of, like,
12 million for the what does nice look like.
And then the fine tuning that's happened on the top
and how they've optimized it in the later phase
is a technical element that escaped me slightly.
They have made different creative decisions there.
It's maybe like driving a different car.
If you floor the accelerator in various different cars,
some are going to take off, some are going to trundle along.
For good analogy.
Do you also find that, I mean, we've talked already about this idea
where sometimes it's pretty easy to get to that 80%,
but then that final 20%, the real,
the real refinement to get to exactly what you pictured in your head or exactly what you want
and didn't picture in your head sometimes requires another tool. And so have you found,
I've heard some people are using Facetune or different AIs to take it to the final level,
or I guess you could also use in painting and outpainting a little more discreetly. So how have you
found the relationship of maybe one tool to the suite of other tools that exist out there?
I think there's lots of exciting crossovers. But actually, I kind of think it's a big
opportunity for Photoshop's of this world, because those are tools that presuppose you have some
kind of original image to manipulate. Whereas now there's a huge amount of raw, but maybe not
perfect material that for people to work with. There's lots of things also that I've been trying
to do in prompting that are actually more easily achieved in other tools. So you can, you know,
Spen ages trying to get this kind of vintage film look. But if you're like an Instagram influencer,
which I'm sure you are. Who isn't? But there's loads of, there's loads of iPhone apps, right,
that are out there just to like give all your photos that kind of like,
dreamy vintage film look.
Yeah.
I mean, I think back in July when you first wrote your prompt book,
you had a requested feature list for Dolly 2.
But are there things that are on your new list of, hey,
these tools are great, but they're missing X, Y, Z,
or they're lacking in these areas,
this would be top of my list to see improved on.
I think we're going to see more models come out.
I mean, the fact that stable diffusion is open source,
means that lots of other things are going to be built on top of that.
And I think it's going to be really exciting to see some of the directions that people take that in,
either kind of on an individual sort of pro sema level, people building their own models to create their own stuff,
more likely some bigger organizations training it for specific purposes.
The whole challenge and the whole opportunity, I think at the moment,
is like how do you go beyond the text box?
How do you go beyond this like just blank rectangle to create something that is more user-friendly,
that's more inspiring, that's more how people think.
Because on the one hand, if you're not an artist,
the ability to describe things with words is definitely a big step forward.
But if you think about the next layer,
is it's still quite hard to describe things with words.
Designers, when they do work for clients,
like it's one of their pet peeves because clients don't like it,
but they can't explain why or what they want different.
They're like, oh, I want it to be more, do you know what I mean?
Like more, and they're like, I don't know.
I don't know what that means.
which is basically the position these, you know, AI models are in.
So could you see like a conversation where it's face?
Can you do the generations fast enough that you're always showing people
multiple options, possible new directions?
It's almost like in a sort of multi-dimensional space where it's like,
do you want to take it more this way or more this way?
You know, part of the prompt book is I didn't know what metaphysical painting
or codochrome or all these things were, but those at least have names.
But there's probably other aesthetics, right?
Other styles that we don't have work actually.
words for. It's like, you know, that kind of gritty, but like modern gritty, like, almost
like shiny gritty. Like the grit has a shine on it. And probably I can make you a mood board
of that and you'd be like, oh yeah, like that's a thing. But there's no word for it. So if you can
create ways of unleashing the inexplicable, the undefinable, that's the exciting thing about
vision art, is to express things or moods or things that you can't quite put into words.
I totally have my mind spinning, thinking of different ideas. A couple of them that came to mind. One of
them is just a better onboarding experience, but one where you're guiding the new prompter
to understand how all these things might fit together to your point. Like, try this. Oh, look at what
you got here. Oh, did you notice how when you use these two prompts together, this one kind of overshadows
the other. Maybe there's a third word that's the synonym of this. And I think you've kind of done this
on your own by just going through and prompting like crazy going through these different prompt
libraries and trying to sort through the signal from the noise. But I do think any one of these
models, or maybe the UI built on top, could have just a much better onboarding experience so that
people come into the tool, to your point, with just a better understanding of what they should be
paying attention to. And then I also, in terms of these visual styles, I mean, it reminds me of
a lot of Instagram influencers for a period of time were selling these filters because they had
figured out the precise tuning of every little variable, which sounds easy, but I had tried to do it
myself. I never managed to create good lightroom filters, but people had, and they would sell
them. And so I wonder if you'll see the same thing where maybe someone creates kind of like a zip file
of a mood board, and then they train the AI in some way that does make it, I guess, play nice
with that particular concept that you can't distill necessarily into a single term. Yeah, because you had
that breakthrough.
Someone did a paper on it.
I think it's almost
what led to that selfie
craze,
which was that
you don't need to put you
photos and stuff
in that original
600 million training data
or wait for the next time
we do that again
for it to teach it
what you look like.
There's this kind of
embedding trick
where you can show it
like a bunch of photos of you
and then you can refer to you
and it knows how to kind of
recreate that.
And there was also an interesting thing
in the same paper
but hasn't really been used
or like commercialized
in the same way,
which is to do that with style.
So rather than show it, yeah, this is what this person looks like.
It's like, this is what the style of blah blah, blah is called.
Here it is.
And then off you go, which obviously has all kinds of potentially shady legal qualifications.
But let's assume this is a lovely art we've made ourselves.
Yeah.
Well, no, I mean, to the idea of honing in a style,
I do wish there was a version of the product where I could go.
And like we've talked about, maybe upload certain brand images or certain brand colors,
and then have it iterate with me
where it shows me a bunch of images
and I say, it's okay,
but I want a little more of this color.
And then we keep doing that to the point
where I get a bunch of images
where I'm like, yes, this is the style.
You can lock that in.
You lock it into a variable
that you can then plug into future prompts.
I've definitely seen there's some people out there
that have managed to lock in a particular look.
And now every blog post they have
always the same kind of thing.
And that's like pretty cool.
But we haven't seen that always built into
the like foundation models
as like a way of interacting with it.
And then there are some startups like Scenario,
which is doing it for game assets,
and then Leonardo, which is like more multipurpose, I think,
or is just positioning itself that way,
which is again all about can you like control things down to like consistent look.
Yeah.
So what we've talked about so far is this idea of controlling the AI.
But I also like to think about the ways that when you work with these different models,
you learn more about your own creativity.
the example that it reminds me of is in chess
when we finally built the bots that were better than humans in chess,
not only were we surprised by the fact that that could happen,
but we were also surprised by all of the different openings or moves
that humans in their thousands of years playing chess
had never considered that were better than some of the moves
that even the best chess players in the world had used.
And so have you seen any of that,
even from a personal experience level,
where you're in these tools and you're playing around
and you're learning with the model,
if that makes sense,
it's almost surfacing things that you had never considered before.
I like that.
I think whenever you're using these tools,
you have these two modes, right?
Where you're either like waiting to see what it shows you
or you kind of are visualizing it in your minds
and you're like, no, not that, not that.
But if you just let it take you where it wants to go,
then you're suddenly like,
I have no idea what I'm looking at.
But apparently I'm here with Dally,
there's like this variations tool.
So you just get it to let's show an image.
You'll be like, here's four more that are kind of the same.
But obviously over time, if you leap and leap and leap and leap,
you end up on this like completely bizarre visual journey,
like a psychedelic dream.
It's fun to play around in these tools.
But ultimately, while there is a market for just interesting art in the world,
a lot of this will need to ladder back into, you know,
whether it's blog post sharing images,
whether it's creating the next sneaker design that you end up
selling. Are there areas that you've seen really emerge from this where people are using these
tools today and applying them to, again, what someone might call a practical use case? And in addition
to maybe what you've seen so far, are there other areas where you're excited to see this be applied?
It's interesting, isn't it? Because I think especially given the tenor of the conversation around
these tools and the ethical and legal aspects they're in.
I suspect that to an extent when you see these things used,
especially in prominent context,
they might not be advertised as such.
Much as like green screen, right?
When green screen is used in films,
you shouldn't be like,
that is an amazing use of green screen.
You should just be like, oh my God,
like he's dangling off a thing.
Oh, this must have cost millions.
So I think, you know, when we see AI tools used in lots of contexts,
not this is covered up,
But, you know, they might obviously be just a narrow part of the creative process.
They might be all of it, but it's kind of hidden.
I've raised this point online, I think, that you were making,
which is like, well, where is this all going?
Like, will it ever make images good enough?
And will other people want to look at them?
Because it's not like we have this huge history of, like, logging in to social media
and looking at just like abstract pictures, like, oh, a force or not.
Yeah.
On a surfboard, I mean, things tend to have like a grounding in reality, right?
Like, that's what makes them viral or interesting.
But then someone was like, no, like maybe this, it won't be that it's going to make content so good that it's like better than Netflix or like better than Instagram.
It's the hobby of doing it.
That's the entertainment.
Well, I mean, there are skills out there to your point where writing as an example, some people just like to write to write.
And whether other people read it doesn't matter, they actually enjoy the process.
And so I definitely could see an entertainment angle.
But a lot of people really hate writing.
And a lot of people find value in the money that they get.
paid to write or the writing is used within a script which then is published on Netflix. And so it's
like, how is this stuff used in the wider world, whether it's on an e-commerce website, whether it's
one day integrating with 3D printing and like the stuff that you generate in Mid-Journey, then can
actually be printed into like a real-life product that you sell? Oh, actually, this isn't just a gimmick.
This isn't just a toy. There's this very high level kind of debate around artistry, I suppose,
and as if everything is either going to be like in the Louvre or,
I'm not saying that right, in the Tate, I'm from London,
or, you know, or in the bin.
But ultimately, if you look around just any space that you're in
and look at everything that has like a visual component or like a design component,
there's so many different levels at which we engage with art,
you know, like the pattern on a cushion.
the warning label on the coffee maker, the sausage dog on a card.
They're all different things.
There's something where the human touch is like literally the point.
But other things, it's like a soothing pattern to look at so that your wall isn't just gray.
And so there's all kinds of layers in between.
And I think we'll see them used in more and more different situations.
The final thing I want to ask you about is how this all fits into the wider skill set that
people might have.
So on one hand, I can see how there might be an argument that this.
idea of the prompt engineer is going to be one that only few can do really well, right?
People are really going to master this skill set and they're going to be much more valuable
than the people who don't know how to prompt well. But then I can also see an argument where,
as you said, maybe this gets abstracted and we have great UIs where truly it becomes the type of
thing where basically anyone can do it and anyone can do it pretty reasonably well. And it just
becomes, you know, similar to being able to write and read. These are just kind of fundamental,
elemental skills that are in everyone's skill sets.
They're taught in schools.
Where do you sit with that in terms of how you see this progressing?
Is it worthwhile you could also position the question as to become an excellent top 1% prompt
engineer?
Or is it like, oh, everyone should kind of have this in their toolbox?
Well, that depends.
I think on the one hand, there's obviously every incentive for the people that make
these foundational tools to make prompt.
engineering, for instance, not a thing.
Because they want everyone to be able to do it, right?
They naturally want to de-conplexify the tools that they're offering.
Again, if you look at the most recent model of mid-jurney, like version four,
stuff that would not have been even possible six months ago,
you can literally do the thing where you type in.
Like, I remember because I posted one, someone was arguing about it,
and I was like, look at this space stuff.
I just typed in space stuff.
And it's like this amazing astronaut duck.
And he said, there's no way you just typed that in.
And so I went back and checked and I was like, no, that I lied.
I actually typed in a really cool statistic.
But at the same time, with any material, like artistic or otherwise,
if you push links to the boundary, there's always going to be people,
like someone that explores everything that's possible or like just iterates, iterates, iterates or something,
they're obviously going to explore further on the map of what's possible than someone that isn't.
So I don't think it will become like this necessary skill that everyone needs to have.
But I do think it will become, you know, like some people that are expert,
woodwitlers or really good at animating hair or whatever, you know, the people that develop
a real, like, passion or, like, do some of the most amazing things. And then there's also the kind
of the secret prompting, I guess, like a copywriting thing would be like the obvious example at the
moment. You think you're typing something into a UX, but really there's something else
wrapping that in a prompt and then sending it to like a foundational model. So there's probably
going to be some people whose job is to like come up with that layer of thing that the consumer or the
the average person is never seeing,
and they think they're just talking to the AI,
but really they're talking to this thing
that then adds a little bit of juz to it
and then tells the AI that.
This is going to be a tangent,
but it reminds me of I just listened to a reply L episode
where someone had remembered this song from his childhood
and they were trying to figure out what it was.
You've heard this episode.
If people haven't, it's one of the best...
That's the only one, but it was so famous.
Yeah, of all time.
Isn't it such a good listen?
Yes, but it reminds me of...
It reminds me of, do you remember
in the episode, they find this lady who is a music producer, but she is a music producer for
specifically people who want to create music like the bare naked ladies. And it's like, you know,
people have jobs like this when you grow up and you're in school and they tell you, you know,
you could be a doctor one day, you could be a teacher one day. They don't tell you you could be
a music producer for musicians that want to sound like the bare naked ladies. And it makes me wonder
or think about, you know, what specific niches are people going to go into within the realm of
problem engineering, right? Like maybe you specialize, as you said, in hair, maybe in hands,
maybe in something for enterprise SaaS companies. I don't know. It's kind of hard to predict at this
point since we're so early. But yeah, I think you're right that there's going to be, I guess,
kind of a bimodal nature to it. It does seem like the kind of tool that's going to be on everyone's
desktop. But it does also seem like there is this opportunity to become, as someone might say,
like a 10x prompt engineer. Yeah, but I think that's interesting, isn't it? Because that's such a tech world.
metaphor, like the notion of 10x.
Because it even implies there's a scale where you can have one
and therefore you can have 10 of it, which in the record industry,
do people talk about being like a 10X recording engineer?
Obviously, some recording engineers are like famous and better than others,
and there's all this kind of talent.
But I don't know if people are like, yeah, like I'm a 10x.
But yeah, just like producers and all the kind of people that go into making,
I think, music or film, you know, that huge list of people you
see at the end of every movie, a way you discover a whole new world of careers that you might
have had. I'll unfortunately never be a best boy, but I'm still hoping to be a gaffer. Then, you know,
there'll be all those kinds of jobs, I think, in the AI, the creative AI industry.
You know, your point on the spectrum of like, what is 1x and what is 10x? What is the most
popular piece of, you could say, art or imagery that is shared online? Like, what comes to
mind for you there? I don't know. You said that as if you know the aunt. Well, I have an answer.
What comes to mind for you there? Photos of parties. So I don't know if this is actually the most,
but what comes to mind for me, at least as someone who spends a lot of time on Twitter is memes.
And memes are like the most basic kind of imagery ever. It's like literally an image with like
some capitalized text on it. And your point just reminded me of this idea where art,
especially is subjective and what people like and resonate with is not necessarily the most
refined or extravagant, precise type of imagery, which you can generate in some of these
text to image tools, but it doesn't necessarily mean that people are going to resonate with
it. Exactly. I mean, until they invent an AI that can do 10x memes, which is the last
thing we need. This is really fun guy. I loved hearing about where you see this industry,
the skill set moving. We will definitely share the prompt book link in the show notes because I think
people can benefit from seeing the different types of modifiers that you can include in a prompt
and also a link to your social because you're constantly sharing new hacks, new things that you're
discovering. But yeah, any other places that people should look to find you or your work?
You can find me on Twitter at Guy P, G-U-I-P, and you can find my substack when I finally post at
prompt response.substack.com.
Awesome. Well, thanks for doing this.
Thank you so much for having me. It was a lovely to meet.
I'm glad we could do this.
Thanks for listening to the A16Z podcast.
If you like this episode, don't forget to subscribe, leave a review, or tell a friend.
We also recently launched on YouTube at YouTube.com slash A16Z underscore video,
where you'll find exclusive video content.
We'll see you next time.
