a16z Podcast - How OpenAI Builds for 800 Million Weekly Users: Model Specialization and Fine-Tuning
Episode Date: November 28, 2025In this episode, a16z GP Martin Casado sits down with Sherwin Wu, Head of Engineering for the OpenAI Platform, to break down how OpenAI organizes its platform across models, pricing, and infrastructur...e, and how it is shifting from a single general-purpose model to a portfolio of specialized systems, custom fine-tuning options, and node-based agent workflows.They get into why developers tend to stick with a trusted model family, what builds that trust, and why the industry moved past the idea of one model that can do everything. Sherwin also explains the evolution from prompt engineering to context design and how companies use OpenAI’s fine-tuning and RFT APIs to shape model behavior with their own data.Highlights from the conversation include: • How OpenAI balances a horizontal API platform with vertical products like ChatGPT• The evolution from Codex to the Composer model• Why usage-based pricing works and where outcome-based pricing breaks• What the Harmonic Labs and Rockset acquisitions added to OpenAI’s agent work• Why the new agent builder is deterministic, node based, and not free roaming Resources: Follow Sherwin on X: https://x.com/sherwinwu Follow Martin on X: https://x.com/martin_casado Stay Updated:If you enjoyed this episode, be sure to like, subscribe, and share with your friends!Find a16z on X: https://x.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zListen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYXListen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711Follow our host: https://x.com/eriktorenbergPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see http://a16z.com/disclosures Stay Updated:Find a16z on XFind a16z on LinkedInListen to the a16z Podcast on SpotifyListen to the a16z Podcast on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Transcript
Discussion (0)
We want ChatGPT as a first-party app.
First-party app is a really great way to get 800 million wows or whatever now.
10th of the globe, right?
Yeah, yeah, 10% of the globe uses it.
Every week, every week.
Even within Open AI, the thinking was that there would be, like, one model that rose them all.
It's, like, definitely completely changed.
It's, like, I'm increasing and clear.
There will be room for a bunch of specialized models.
There will likely be a proliferation of other types of model.
Companies just have giant treasure troves of data that they are sitting on.
The big unlock that has happened recently is with the reinforcement fine-tuning.
With that setup, we're now letting you actually run a REL, which allows you to leverage your data way more.
Open AI sells weapons to its own enemies.
Every day, thousands of startups build on OpenAI's API, many trying to compete directly with Chichipete.
It's the ultimate platform paradox.
Enable your competitors or lose the ecosystem.
Sherman Wu runs this highwire act.
He leads engineering for OpenAI's developer platform, the API that powers half of Silicon Valley's AI ambitions.
Before Open AI, he spent six years at Open Door teaching machines to price houses where a single wrong prediction could cost millions.
Today, Sherwin sits down with A16Z general partner Martine Casado to explore something nobody expected,
that the models themselves are becoming anti-distance remediation technology.
You can't abstract them away.
And every attempt to hide them behind software fails because users already know and care which model they're using.
It's changing everything about how platforms work.
Sherwin and Martine talk about why Open AI abandoned the dream of one model to rule them all,
how they price access to intelligence,
and why deterministic workflows might matter more than pure AI agents.
German, thanks very much for joining.
So we're being joined by Sherman Wu.
It'd be great, actually, if you provided the long form of your background as we get into this,
just for those that may not know you.
I mean, I've used Sherman as one at the top AI thought leader, so I've been really looking forward to this.
Yeah, yeah, thanks for having me.
I'm really excited to be on the podcast.
Yeah, so a little bit more of my background.
So maybe we can start from present and go backwards.
So I currently lead the engineering team for Open AIs developer platform.
So the biggest product in there, of course, is the API.
Is there more for the developer platform than the API?
It's kind of assumed that it's synonymous.
Well, so I also think about other things that we put into our platform side.
So technically our government work is also like offering and deploying this in different areas.
Yeah, like I've talked about.
Oh, like so you have like a local deployment?
Yeah, yeah.
So we actually do have a local deployment at Los Alamos National Labs.
It's super cool.
I went to visit it.
It's very different than what I'm used to.
But yeah, in a classified supercomputer with our model running there.
So there's that, but like mostly at the API.
Did you go to Los Alamos?
We didn't.
Yeah, I did go Los Alamos.
It's great.
They showed us around.
They showed us on the historic sites.
Real historic.
Yeah.
I just worked at Livermore, man.
So I've got like a new.
Oh, yeah, yeah, yeah.
My first time out of college.
Right, right, right.
You sell to them next.
Yeah, well, we hope to.
Yeah.
So I work on the developer platform.
I've been working on it for around three years now.
So I joined in 2022.
It was basically hired to work on the API
product, which at the time was the only product that OpenI
had. And I've basically just worked on it the
entire time. I've always been super interested in the developer
side and kind of like the startup story of this
technology. And so it's been really
cool to kind of see this evolve. And so
that's my time in Open AI. Before Open AI, I was
at Open Door for around six years.
I was working on the pricing side. My
general background before... Something is such a
dissident. Yeah. Yeah. Pricing it
open door to like running API. It's such
a different. It's been fascinating actually for me
to see the differences between the companies.
They run so differently. They both have opened in the
names that's some overlap but that's pretty much it but yeah I was there for around six years
working on the pricing team so our team basically would run the ML models this is actually pricing
the assets on Open Door yeah yeah the inventory exactly so yeah open door would buy and sell
homes and their main product was buying homes directly from people selling them with all cash offers
and so my team was responsible for how much we would pay for them and so it was a really fun
like ML challenge it had a huge operational element to it as well because not everything was
automated obviously yeah but it was a really fascinating technical challenge and
Is there any sense of that on the API side, like GPU capacity buying, or is it just totally unrelated?
On the API side, there is a small bit of, like, how we price the models, but I don't think we do anything as sophisticated as open door.
Open door is just, like, such a, like, expensive asset.
The holding costs are very expensive.
You're, like, holding onto it for, like, months at a time.
There's, like, a variability in the holding time.
And that's a long tail of potential things that could grow up.
Long tail, yes, and, like, try to think about it from a portfolio perspective.
And, like, if one of them just, like, you're holding on it for,
two years it blows everything like goes negative so it's a very very different
different challenge yeah yeah six years there lots of up and nouns saw a lot of the booms
saw a lot of the struggles and then we IPOed for a far far I left but yeah just in general it was a
great experience I think for me it was also just had such a very like business operations and like
a very like a very like a very like a book type of culture whereas opening eye is like very different
was so interesting I was just thinking about it now it's like even for a company like that like
you don't think about it as a tech company but if there is a deep technology problem it actually is the
right? It's actually an ML problem.
Yeah, that's what it's not like the website. It's not the platform. It's not the API. It's literally that.
Yep, yep, yep. And that's what attracted me to it. I think that was interesting.
It's also a way lower margin business than Open AI because you're like making a tiny spread on these homes.
They would talk about like basis points, like eating bits for breakfast and all that.
Anyways, I was at Open Door for around six years. And then before that was my first job out of college, which was at Kora, Adam DeAndz from there. Yeah. So I was working on the news feed.
So worked on News Feed ranking for a bit. Worked on the product side. That was actually my first
exposure to like actual ML in industry and learned a lot from the engineers at core.
We basically hired a lot of the early feed engineers from Facebook.
Charlie still there when you were there?
Charlie was not there when I was there right after you.
Yeah, yeah, yeah.
And that was a really legendary team.
It's still known to be kind of this super iconic founding team.
Yeah, yeah.
The early founding team was really solid.
I still think that even while I was there, I still am amazed at the quality of the talent that we had.
I think there was like when the company was like 50 to 100 people.
But yeah, like a bunch of the perplexity team was there.
Dennis was on the feed team with me, Johnny Ho.
Jerry Ma.
Yeah, that's right.
And then Alexander, the scale, now MSL, you know,
I was there between high school and college.
It was an incredible team.
I think I kind of took it for granted all.
I was a good group.
How did you get to Quora?
What did you study in an undergrad?
Yeah, so before that I was at MIT for undergrad.
I studied computer science, did like one of those, like,
computer science and the master's degree, kind of like crammed it in.
I ended up a Quora because I got in what we call an externship there.
So at MIT, you actually get January off.
So there's like the full semester and then January's off.
And then you have the spring semester.
And so it's called independent activities period.
So some people just, like, take classes.
Some people would just do nothing.
But some people will do, like, month-long internships.
And some crazy companies will offer a month-long internship to a college student.
And it really is just kind of like a way to get people.
Did you come out here from Boston?
Yeah, it was crazy.
So you had to apply.
I remember, yeah, this is, I think, 2013 January or something.
You had apply.
And I remember the core internship was the one that just paid the most.
They paid, I think it was like $8,000, $9,000.
And it was like, wow, it's like all for a month.
And you're kind of ramping up like half the time.
I can eat for a year.
Yeah, yeah, as a college student, it's like great.
And yeah, they would kind of like fly you out here.
So I did the interviews and then luckily got an offer.
And so, yeah, came out for a January that was right when they moved into their new Mountain View office.
And I basically, yeah, honestly just ramped up for like two weeks and then have two weeks of good productivity working on the feed team.
So that was that like user-facing?
Like user-facing product work, yeah.
Yeah, I distinctly remember my externship project for those two weeks was just to like add a couple of features to our feature store.
and that would make it sway into the model.
I remember my mentor there was, is Tudor,
who's now running, I think it's called Harmonic Labs.
Yeah, yeah, yeah.
Crazy team.
That's an envelope.
I mean, by the way, I think it's one of the untold stories of Silicon Valley's,
like how good that original team ended up in Corr is.
I mean, a lot of them are still there and still good,
but the diaspora from Quora is everywhere.
Yeah, yeah.
That's actually how I ended up at OpenAI, too,
kind of fast-forting from there,
because Open AI kind of kept a quiet profile-ish.
I'd always kind of kept house on them
because a bunch of the core people I knew kind of, like,
ended up there.
It's kind of like checking in on it
and they were like
something crazy is happening here
you should definitely check it out
so yeah I definitely owe a lot to Quora
but yeah part of the reason why I went there
versus other options as a new grad was
the team was just so incredible
and I just felt like I could learn a ton from them
I didn't think about everything afterwards
I was just like man if I could just absorb
some knowledge from this group of people
it would be great awesome yeah
so one place I wanted to start
is something that I find very unique about
open AI is it's both a pretty
horizontal company like it's got an API
like I would say
we've got this
massive portfolio of companies, right?
And I would say a good fraction of them use the API.
And then it's also a vertical company in that you've got full-on apps, right?
Like everybody uses chat GPT, for example.
And so you're responsible for the API and kind of the dev tool side.
So maybe just to begin with, is there an internal tension between the two?
Like, is that a discussion?
Like the API may, whatever, it may help a competitor to like the vertical version,
or is it not, things are just growing so fast, it's not an issue.
I'll just love to how you think about that.
By the way, it's very unusual for companies.
Yeah, these two things this early.
It's very unusual.
Yeah, yeah, I completely agree.
I think there is some amount of tension.
I think one thing that really helps here is Sam and Greg,
just from a founder perspective,
have since day one just been very principled
in the way in which we approach this.
They've always have kind of told us we want chat GPT as a first-party app.
We also want the API.
And the nice thing is I think they're able to do this
because at the end of it kind of comes back to the mission of Open AI,
which is to create AGI and then to distribute the benefits as broadly as possible.
And so if you interpret this, you want it in as many surfaces as you want.
And the first party, it's a really great way to get, you know, it's like 800 million wows or whatever now.
800 million wows?
Yeah, yeah, it's pretty, it's actually mind-boggling to think about it.
I don't think many people listening to this don't understand how big that is.
Yeah, it's crazy, yeah.
That's got to be, like, actually historic for the time it's taken to get to 800 million.
It's historic.
It's also just like, yeah, the amount of time and just like how much we've had to scale up.
A tenth of the globe, right?
Yeah, yeah, 10% of the globe uses it.
Every week, every week.
Yeah, yeah.
And it's growing, and it's growing.
So, like, at some point, you know, it'll go even higher than that.
And so, so, yeah, like, obviously the reach there is unmatched.
But then also just, like, being able to have a platform where we can reach even more than just that.
Like, one thing we talk about internally sometimes is, like, what does our end user reach from the API?
Like, it's actually, like, really, really, it's really broad.
It might even, it's hard because chat GPU is growing so quickly.
But, like, at some points, it was definitely larger than chat GPT.
and the fact that we're able to get tap it in all of this
and get the reach that we want, I think is really good.
But yeah, I mean, there's definitely some tension sometimes.
I think it's come up in a couple places.
I think one of them is on the product side.
So as you mentioned, you know,
sometimes there are competitors kind of like building on our platform
who, you know, might not be happy if chat GPT launches something
that competes with them.
Yeah.
I mean, that's the tale of the old as the cloud or operating systems or whatever.
So, like, that's, you know, I think it's more like,
does chat chippy t worry about the competitor
you know type thing like you know you enabling a competitor
yeah yeah so i mean uh the interesting thing is like i would say
not particularly most of just because we've been growing so quickly
it's like i get it's just such a you know force right now yeah yeah growth solves so many
so many different things and like and the other way we think about is like everyone's kind
of building building around aGI building towards a gei of course there's going to be some
overlap yeah um here um so yeah i mean but but i would say like at least in my position i feel
more of this tension from the API customers themselves.
Like, oh my gosh, you know, you're like,
are you going to build this thing that I'm working on?
Yeah, that's that story is as old as computer systems.
There's never not been a computer platform that didn't have that problem.
So, okay, so I kind of go back and forth in this one.
I want to try one out on you, which is the problem historically with, you know,
offering a core services and APIs, you can get disintermediated, right?
And so I can build on top of it, but then, you know, the user doesn't know,
like whatever, I build on top of the cloud,
but I just remediate from the cloud
and then I can switch to another cloud or whatever.
And it occurs to me that that's kind of hard to do with these models
because the models are so hard to abstract away.
Like, they're just unruly, right?
If you try to, like, have traditional software drive them,
they just don't kind of manage very well.
So part of me thinks that it's almost like this, like,
anti-disintermediation technology
that you kind of have to expose it to the user directly.
Does that make sense?
And so I'm wondering if, even if I think chat GPT is really just trying to expose the model to the user,
the API is kind of just trying to expose the models to the user.
So I think there's almost this argument that's like if the real value is in the models,
it doesn't really matter how you get it to them because it's going to be very tough
for someone's going to abstract it away in the classic sense of computer science,
of like they don't know that they're using the model.
Like you always know you're using GPT5.
Yeah, and the interesting thing is I think like the entire industry kind of has slowly changed their mind around this too.
I think like in the beginning we kind of thought like,
Oh, these are all going to be interchangeable.
It's just like software.
Yeah, yeah, exactly.
So these have been for that you can just swap out.
Yeah.
But I think we're learning this on the product side with like, you know, the GPD5 launch and like 4-0 and like how so many people like 0-3 and 4-0 and all of that.
I felt that.
I felt that when it changed.
I'm like, I'm like, you're not as nice to me.
Like I like the validation.
Yeah.
It's actually fun because I really loved GPD5's personality.
But I think it's like the way I used, you know, chat GPT was very utilitarian.
Oh, so, yeah.
It's like, you know, mostly for work or just like information.
Yeah, I've definitely come around.
But, like, I actually felt a distance when it changed.
It's like, it's like, there's this emotional thing that goes on.
But it's almost like it's an anti-distance mediation technology.
Like, you kind of have to show this to the user.
Yeah, yeah.
And then you see a lot of, like, you know, more successful products like cursor, like do this directly,
especially the coding products where users want more control.
We've even seen some, like, you know, more general consumer products do this.
And so it's definitely been true on the consumer side.
The interesting thing is I think it's also been true on the API side.
And that's also something that I think.
No, that's exactly what I'm saying.
So, like, the argument could be that I could use the API to disintermediate you.
But, like, you don't see that happening because it's so hard to put a layer of software
between a model and a person.
You almost have to expose the model.
Yes, yes.
And I think, if anything, I think the models are, like, almost, like, diverging in terms
of, like, what they're good at and, like, their specific use case.
And I think there's going to be more and more of this.
But, yeah, basically, it's been surprisingly hard for, or, like, the retention of people
building on our API is, like, surprisingly high.
especially when people thought
you could just kind of swap things around
you might have even tools
that help you swap things around
but yeah the stickiness
of the model itself has been
has been surprising
and do you think that is
because of a relationship
between the user and the model
or do you think it's more of a technical thing
which is like
my e-vals work for like open AI
and like the crackness
maintains or yeah yeah I think it's both
so I think there's definitely an end user piece here
which is what we've heard from
some of our customers, like they just get familiar
with the model itself. But I also think
there's a technical piece, which is like the,
also as a developer, especially with startups,
you're like really going deep with these models
and like really like iterating
on it trying to get it really good
within your particular harness. You're iterating on your harness
itself. You're giving it different tools
here and there. And so you really do end up
like building a product around the model.
And so there is a technical piece where
you know, as you kind of keep building with a particular
product like GPD5,
you're actually, like, building more around it
so that your product worked uniquely well with that model.
So I use cursor and just for, like, a lot of something,
like writing blogs and, like, you know, we're investors
and I use it for sometimes for coding.
And it's remarkable how many models I use in cursor.
So, like, literally my go-to model is GPD-5.
I love GPT-5, I think is a phenomenal, like, you know.
And then, like, I use, like, max mode with GPT-5 for planning.
And then, but, you know, like, I mean,
I like the tab complete model that's in cursor
and like, you know, the new model they just dropped
is for like some basically, you know, some stuff
is good. Yeah, the composer one. Like, yeah, the composer one's good.
Yeah. And so like, you know...
And I think that like kind of reflects this too
because it's like, it's a particular model for each particular
use case. Yeah, yeah, yeah, yeah. Like, I've talked to a bunch of people who
use the new composer model and it's just really good for
like fast, like first pass.
Exactly, that's right. Keep you in flow kind of thing.
And then you kind of like bubble out to another
model if you want like, you know, deeper thinking or something like that.
I literally sit down. I literally sit down at SGPT5 to help
we plan something out and it's really good at that.
And then, you know, like when I'm coding
and I'm doing like the quick chat thing, then I'll use
Composer. And then if there's like, whatever, there's like
some crazy bug or something like that.
So, you know,
do you remember in like in the early days
of all of this where like there's going to be one model
and I mean like even
like investors, like we will never invest
in a model company because like there will
only be one model and it's going to be AGI
but like the reality it feels like there's this massive
proliferation of models like you said before.
They're doing many things. And so
maybe two questions, maybe too blunt
or too crass, but the first one is, what does that mean
for AGI? And the second
was, what does that mean for open AI?
Like, does that mean that, like,
you end up with a model portfolio?
Do you select a subset?
Do you think this all gets superseded by some God model
in the future? Like, how does that play out?
Because it's against what most people thought.
Most people thought this is all going towards
one large model that does everything.
Yeah, I think the crazy thing about all this is just, like,
how everyone's thinking has just changed over time.
Like the, I distinctly remember this, like,
and the crazy thing,
it's not that long ago. It's just like two or three years ago. I remember like even with an
opening eye, the thinking was that there would be like one model that rules them all. And it's like,
why would you, I mean, like, this kind of goes to the fine tuning API product. There's like,
why would you even have a fine tuning product? Why would you even want to, like, iterate on it?
There's going to be this one model that just subsumes everything. And that was also kind of the,
that is also like the most simplistic, like, view of what the, what the AGI will look like.
And, and yeah, it's like definitely completely changed since then. I think one. And, and, but then the other
thing to keep in mind is, like, it might continue to change, like, even from where we are today.
But it's, like, becoming increasing and clear, I think that there will be room for a bunch of
specialized models. There will likely be a proliferation of other types of models. I mean,
you see us to do this with, like, the Codex model itself. We have, like, GPD4-1 and, like,
4-0 and, like, 5, and all of this. And so I don't think there's room for all this.
I don't think that's bad for what's worth. Like, if anything, I think, you know, as we've tried
to move towards AI, things have just been very
unexpected, and I think the market just evolved and the
product portfolio evolves because of that.
So I don't think it's a bad thing at all.
What I do think it means... You can easily argue it's very
good for Open AI and very good for like the
model companies to like... Yeah, because
not have like, you know, win or take all
consolidated dynamics, right? I mean, you just
have a healthier ecosystem, a lot more solutions you can
provide a lot. Yeah. Yeah, and as
the ecosystem grows, it generally is helpful.
Like, this is one thing we actually think about a lot too is
as the general like AI ecosystem grows, like open AI
it just stands to benefit a lot from this.
And this is also why we've, like, some of our products,
we've even started opening up to other models, right?
Like our Eval's product now allows you to bring in other models.
It's all of this.
We think it's like any rising tide generally helps us here.
But yeah, I think as we move into a world where there would be a bunch of our models,
this is why we've kind of invested in our model customization product
with the fine-tuning API, with the reinforcement, fine-tuning, opening that up as well.
It's also part of why we open-sourced GBTOSS as well,
because we want to be able to, you know, facilitate that.
Super. I want to talk about that in just a bit because the open source is actually very interesting.
I mean, actually, I thought the open source model was great, but clearly it's something that a company has to be careful with.
But before that, I want to talk a little bit about the fine-tuning API.
So I've noticed that you are moving towards kind of more sophisticated to use of things like fine-tuning,
which, you know, in a way you could read that as a bit of a capitulation that, like, you know,
there is product-specific data
and there's product-specific use cases
that a general model won't do, to your point, right?
So, like, as opposed to proliferation of model, you do that.
It seems like a lot of that data
is actually very, very valuable, right?
And so, you know, to what extent
is there, like,
interest in almost a tit for tat
where you can, like, expose,
you know, the ability to get product data
into fine-tuning, and then you also benefit
from that data because
the vendors provide it to you
versus like this is 100%
you know like they keep their own data
and there's kind of no interest in that
because it feels to me like the next level of scaling
this is kind of where we're at and so
I just kind of curious how
Yeah so I mean maybe even like taking a step back
the main reason why we even invested
in a fine tuning API in the very beginning
is one there's been
huge demand from people to be able to customize
the models a bit more it kind of goes into like
prompt engineering and also like I think the industry's
our mind on that as well, like, is a vault.
But the second thing is exactly what you said, which is the companies just have giant
treasure troves of data that they are sitting on that they would like to utilize in
some fashion in this AI wave.
And you can, you know, the simple things to put it in like, you know, some like vector, like
do rag with it or something.
But there's also, you know, if they have a more technical team, they do want to see
how they can use it to customize the models.
And so that is actually the main reason why we've invested in this.
The interesting thing was way back, kind of back in like 22,
three, our fine-tuning offering was, I'd say, like, too limited so that it was very difficult
for people to tap into and use this data. So it was just like a supervised fine-tuning
PI. And like, we're like, oh, you can kind of use it, but in practice, it really is only
useful for like, like, it's honestly just like instruction following plus plus. You like kind of
change the tone. You're just like instructing it. But I think the big unlock that has happened
recently is with the reinforcement fine-tuning model, because with that setup, we're now letting
you actually run RL, which is more finicky and it's like harder and, you know, like you
need invest more in it. But it allows you to leverage your data way more.
By the way, this is just a naive question from me, which is it feels from just my understanding
from my own portfolio, it feels like there's two modalities of use. One of them is I've got
a treasure trojan of data that I've had for a long time and I create my model on that
treasure trove of data and all that happens offline and then I deploy that.
There's another one which is like I actually have the product being used in real time. I've got a
bunch of users.
Yeah.
And, like, I can actually get much closer to the user.
I can kind of A-B-test and decide which data, and, like, it's kind of more of a near-real-time
thing.
Is, like, is this focus on, like, more product stuff or more...
So the dream with the fine-tuning API was that we should be able to handle both, right?
It's like, we actually had this dream, and then we have this whole, like, Laura set up
with the fine-tuning inference where we should just be able to scale to, like, millions
and millions of these fine-tune models, which is usually what would happen if you
have, like, this online learning thing.
Exactly, yeah.
In practice, it's mostly been the format.
In practice, it's mostly been, like, the offline data that they've, like, already created,
or they are creating with experts or something and, like, using their product that they're able to use here.
But the main thing I was trying to say around the Reinforce and Fine Tuning API is it kind of changes the paradigm away from just, like, small incremental improvement, like, tone improvements,
which is what SFT did, to actually improving the model to potentially soda level on a particular use case that you know about.
Like, that's where people have really started using the reinforcement.
and fine-tuning API, and that's why it's, it's gotten more, more uptake. Because if the discussion
is less like, hey, I can make this model, you know, not like speak in a certain way better, it's
less compelling. But if it's like, hey, for like, you know, medical insurance coding or for like
coding planning, agentic planning or something, you can create the world's best model using your
data set with RFT. And it becomes a lot more. And will you, when you ever like, or maybe do you,
will you ever like find ways to get access to that data? Like, you know, like, listen, if I, if I had the
data and I wanted cheap GPUs I'd trade you for it.
I don't know.
Yeah, I mean, we've talked about this and we've actually been piloting some pricing here
too where it's like, because this data is like really helpful and it's kind of hard to get.
And if you actually build with the reinforcement fine-tuning API, you can actually get
discounted inference and potentially free training too if you're willing to share the data.
It's always kind of, you know, it's up to the customer there.
But if they do, it is helpful for us and there will be benefits for the customer as well.
That's awesome.
Okay.
You said that views on prompt engineering have changed.
Yeah.
Actually, I wasn't aware of that.
All the other things I wasn't aware of this one, I wasn't.
Yeah, I mean, I think the prevailing view, this is back in 2022.
I remember I was talking to so many people and they're basically, I mean, this is similar
to like the single model AGI view as well, which is like, like, prompt engineering is just
not going to be a thing and you're just not going to have to think about what you're putting
in the context window in the future.
Like, the model would just be good enough and it will just like know, it'll know what you need
to do.
Yeah, that's definitely not a thing.
Yeah, but like that, like, I don't know, maybe people forget it,
but like that was like a very common belief back then
because like the scaling laws
or whatever, something with the scaling laws
and like you'll just mind mel with the model
and like you're just like prompting
and like instruction following will be so good
that you won't really need it.
And if anything like, yeah, it's like clearly been wrong.
Yeah, yeah, yeah.
But it is interesting because I think it's a slightly different world
that we're in now where the models have gotten
really, really good at instruction following
relative to the, you know, like GB-3-5 or something.
Yeah. But I think the name of the game now
is less on like prompt,
engineering as we had thought about it two years ago.
It's more of like, it's like the context engineering side
where it's like, what are the tools you give it?
What is like the data that it pulls in?
When does it pull in the right data?
Well, this is very interesting.
I mean, I mean, to reduce it to like an almost absurdly simplistic level.
Like the weird thing about rag, for example, the classic use of rag is like you're using like
cosine similarity to choose something that you're going to feed into a superintelligence.
So like, you know, you're like, I'm like, I'm going to like randomly grab this thing
based on like fucking embedding.
space. It doesn't really, you know, and like, and then I'm, you know, when you want the super
intelligence decide the thing to do. And so it's like pushing intelligence in that retrieval
clearly is something that makes a lot of sense. It's almost like pushing the intelligence out
in a way. Exactly. And, and to be fair, I think like, like, Rag was kind of introduced when the
models were like, it's like pre-reasoning models. So it was like, you only had kind of like one
shot to like do this and it wasn't that smart. But now that we do have the reasoning models, now that
we have, I mean, if you like, one of my favorite models is actually 03 because it was like
one of the most diligent models. I use 03.
It would just, like, do all these tool calls.
And it's, like, really the intelligence itself
trying to, like, do the, you know,
tool calls or rag or anything like that
or write the code to execute.
And so the paradigm has shifted there.
But, yeah, because of that, I think,
like, context engineering, prompt engineering,
what you put, what you give the model is, like,
extra important.
Yeah, yeah.
Okay, so you have API,
so the API, which is horizontal.
You've got chat GPT and other products,
which are vertical.
We haven't even talked about pixels.
This is all just language.
Are agents a new modality?
Is that something else?
Like, you know, like a codex or...
What do you mean by modality here?
Like, I mean, they feel both vertical and horizontal to me in a way.
Like, to me, chat, CheapT is a product, right?
It's like, it's a product and, like, my mom uses it, right?
Yep.
And an API is a dev thing.
You kind of give it to a developer.
And, like, a CNA is kind of somewhere in between to me.
It's like, is it a product?
Is it, like, it is horizontal?
Like, you know, how is it handled internally?
Is it a totally separate team that does agents?
or no so it's um uh yeah it's interesting because like i i i think the way that i the way that
you frame it just now almost seemed like agents was like this like singular concept that like you know
might like might have its own particular maybe a better question is what is an agent to you
yeah yeah yeah yeah yeah it's like even getting a language it's like important for this conversation yeah
so i i i actually don't even know it'd be helpful for me to share about my general take on agents is it's a
it's an it's an it's an i that will take actions on your behalf that can work over a long time
horizons. And I think that's the, that's the
pretty general... Pretty utilitarian. Yeah, yeah.
But like, if you think about it that way, yeah, I mean,
maybe this is what you mean by modality, but it is just
a, like, way of, like,
using AI, and it is a, I guess it could be viewed as a
modality, but we don't view it as, like, a separate
thing, separate from AI,
AI, and... Let me just try
and kind of, you know, give you a sense
of where this question's coming from. Like, I know how to build
a product, like, and we know how to do go-to-market
for products. We know how to do, like,
you know, we know the implications.
of turning them into platforms.
Like, it's just, we've been doing this for a very long time, right?
We know how to do the same thing for APIs, right?
We know how to do billing, we know, like, the tension of, like, people build on top of it
and all of that stuff.
And, like, what I've been trying to, and this is just maybe a personal inquiry,
it's just not clear for me for an agent if you, if it sits in one of those two camps,
is it more like the product camp?
Is it more like the, or, because it's kind of both.
Like, I could, like, literally give you code.
Yeah, yeah.
And, like, as a user, and then you just talk to it, or I could, like, build in a way, kind of embed it in, like, my app.
And so, like, but then that means something to you as far as, like, you know, how do you price it and what does it mean for ecosystem?
Like, for example, like, would you be fine if I started a company and just, like, built it around Codex?
Is that a thing?
Starting a company and building it around Codex, yeah, yeah.
I actually think that would be great.
Like, it's a, we, like, release, like, the Codex SDK and we, like, want people to be able to build it and hack on it.
Yeah.
Actually, I think this might be what you're.
getting at, which is, and this is like a kind of a unique thing about Open AI and kind of reflects
on how it's run, which is at the end, like, at the end of the day, opening AI is like an AGI company.
It's like an intelligence company. And so agents are just like one way in which this intelligence
kind of be manifested. And so the way that I'd say we actually think about internally is all of our
different product lines, SORA, Codex, API, chat GPT are just different interfaces and different ways
of deploying this. So you don't really. And so there's no like single teams like this is, you know,
like thinking about agents.
I would say the way that it manifests itself more
as like each product area thinks about like what is, you know,
this intelligence is actually turning into a form
where like it can actually agentic behavior is more possible.
What would that look like in a first-party product like chat GBT?
What would that look like?
This is actually why Codex ended up becoming its own products.
Like, what would it look like in a coding style product?
Yeah.
Like we explored it in chat GPT like kind of worked there,
but like actually the Kly interface actually makes a lot more sense.
That's another interface to deploy it.
And then if you look about the API itself,
it's like this is another interface to deploy.
deployed. You're thinking about it in a slightly different way because it's a developer-first mindset
we're helping other people build it. The pricing is slightly different. But it's all these
different manifestations of this core, like, intelligence that is the Asian behavior.
It is so remarkable how much of this entire economy is basically just token laundering.
It's literally like anything I can do to get like English in or like a natural language in
and then like, you know, the intelligence is out.
Yeah. And I mean, and it's because these things are so resistant to layering.
It's so hard to layer a language out.
Like, you know, like, I could even do it pretty easily with, like, codex.
I could just, like, use it, you know, as a component of a program
and just, you know, basically launder intelligence.
I mean, of course, you know, I'd be charged to do that.
So I actually, my view of this,
and having seen now so many kind of launches of different products,
I've seen agent launches and the definition that you have,
I've definitely seen APIs, and I've seen products on these.
It's like they're actually quite different than, like,
what we're used to like the cogs is different
the defensibility is different like all of so we're kind of
rewriting it and so it's kind of like
you know you came from a kind of pricing background
I mean you're working on a model for pricing now you have the
API so I just love your thoughts on like
I mean how have you evolved your thinking and how do you
price these you know access to intelligence
where you know you don't know how many people can use it
it almost certainly usage based billing not something else
Like, can you talk just a bit about, like, philosophy around pricing on these things?
Is it different for product-first API?
Yeah, I think the honest route there is, like, it's evolved over time as well.
And, like, I actually think the simplest, like, the reason why we've done usage-based pricing on the API, honestly, is because it's been, like, it's closest to how it's actually being used.
And so that's kind of how we started.
I actually think usage-based pricing on the API has, has, like, surprisingly held strong.
And, like, I actually think this might be something that we'll keep doing for quite.
a long time, mostly because
I don't know how you don't
do usage base. Yeah, yeah, yeah, yeah. I just don't know
how that. Yeah, and then there's
also the strategy of like how we price it and
internally one thing we do is
we always make sure that we actually price
our usage base pricing from a like cost plus
perspective. Like we're actually just like
trying to make sure that we're being responsible
from a margin perspective.
By the way, this is a huge shift in the industry
in general just because like I remember the shift
from on-prem to recurring.
Yeah. That was a big, big deal. Like that
created Zora, like, created whole company.
It was like, their whole books on in and, like,
a bunch of consultants on how you do this, it changed.
Yeah.
You know, and like, I think the shift to usage is as bigger, bigger.
And it's also even a really hard technical problem.
Yeah.
Like, I can't even imagine 800 million wow.
Like, how do you build?
Yeah, yeah.
Well, 800 million wow is a little easier because it's, it's not user-based pricing.
It's subscription.
So it's like, that's like, that's way easier.
But, I mean, there's still like a lot of users on the API that we need.
to, like, you know, manage all the building side.
There's some, like, overages or stuff
you've got to deal with on that, or?
What do you mean by overages?
Like, I don't know.
I guess I don't know.
I don't know.
We'll kind of, like, you know,
they're like max quotas that we don't let people go over.
But, like, in practice,
these quotas are, like, pretty massive.
And that would literally be, like,
one of the most complex systems
somebody's ever built of you would do usage base
at that scale.
I mean, these are very, very, very, very...
And, like, you have to be correct.
Like, these are very hard systems to scale.
Yep, yep, yeah.
Yeah, I mean, we have a whole team thinking about this now, internally.
Yeah, I mean, users for pricing is also interesting.
So there's, we acquired this company called Rockset a while ago, a founder, his name is Bencott.
He's been here in a network now.
Bencott's incredible.
He's one of the best, like, Bencott, if you're listening, we're a huge fan.
I'm a huge fan.
He's going to love this.
Yeah, he's great, man.
He's a legend.
Anyways, I was talking to him about pricing as well.
And his, his take is that pricing is kind of like a one-way ratchet.
and like basically once you get a taste of usage-based pricing
you're never going to go back to like the per-deployment type pricing
and I think it's definitely true
and I think it's just because it gets closer and closer
to like your true utility you're getting all this thing
the main pain point is like you have to maintain all his infra
yeah to like get it to work well but if you do have it
he thinks it's like a one-way ratchet where like there's just like no
going back and then and I think the hot new thing now is like
oh with AI you can now kind of measure like outcomes
and so that's like another you know like step forward
and if that works like maybe it's a one-way ratchet
it. So we thought about that.
Is there some type of outcome-based pricing?
This is more on the first-party side, on an API.
It's kind of hard to measure that.
That's very hard. I mean, that's hard because you end up
having to price and value non-computer science infrastructure,
right? Like, you're literally going into verticalization now.
Yep. You're like, I mean, listen, if it's like porting a code base,
maybe you'd have some expertise, but if it's like,
whatever, like increasing crop yield.
Yeah, like at some level you need to like.
But there could be a world,
where the AI is like, you know, where it can actually, you know,
make judgments of these and do it in an accurate enough way
where you can tie it to billing.
I think this is a problem with AI conversations
because, like, at any point in time, you're like,
but it could get good at...
Yeah, yeah, yeah.
It's not a problem anymore.
Yeah, yeah, at some point it'll be solved.
It's so much like the prompt engineering
and the single AG, I think, from before.
Yeah, yeah, yeah.
Yeah, it's like, when you reach that level of...
When you push it that far, everything's kind of solved.
On outcome-based pricing, it sounds very appealing.
Like, if it can work and it can work.
But one thing that,
we've started realizing is it actually ends up correlating quite a bit with
usage-based pricing, especially with test-time compute.
Like, if the thing is just like thinking quite a bit, like, actually, you know,
if you charge just by usage-based and not outcome-based, you're, like, basically
approximating outcome-based at this point.
If the thing is, like, thinking for, like, so long, it's, like, highly correlated with what
it's doing.
It's just adding more value.
Yeah, yeah, exactly, exactly.
And so, like, maybe at the end of the day, like, usage-based pricing is all you need,
and it's like, we're just going to, like, you know, live in this world forever.
but yeah I don't know
it's constantly evolving
I think our thinking has evolved here as well
I personally am like keeping track of
if the outcome based pricing setups can actually work here
but at least on the API side I think
it's such a usage based setup
we have to get infrastructure around this
and so I think we'll probably stay with that for a while
so how do you think about open source
I mean you know I think you're the only
big lab that's releasing open source is that
no Google has some of theirs
yeah it's mostly smaller models on their
Yeah, yeah, yeah.
Yeah.
So how do you think about open source vis-a-vis, you know, competition, cannibalization?
You know, like what's the strategic, what's the complexity?
Yeah.
Yeah.
So I personally love open source.
Like I think it's great that there's a...
All of us grew up with it, right?
Yeah, all of us grew up with it.
Like the internet wouldn't exist without it.
Like, you know, so much of the world was built and hop of it.
Cloud wouldn't exist without it.
Yeah.
Nothing would exist without it, except for maybe Windows.
And so it was interesting because, like, you know, it was interesting because, like,
I felt like over the last through is before we launched the open source model.
I know Sam feels this way as well.
It's like there's this like weird like, you know,
uh, mindset where because Open AI hadn't launched anything,
it just seemed like it was super like anti, like open AI was like super anti open source.
Um, but I'd actually been having conversation with Sam ever since I joined about open sourcing a model.
We were just trying to think about like, how can we sequence it?
Yeah.
What compute is always a hard thing.
It's like, do we have the compute to kind of like train this thing?
So we've always wanted to kind of do this.
I'm really glad that we were able to finally do it.
I think it was earlier this year?
I had lost sense of time.
AI time is so great.
Yeah, I was it last year or no, is this year, yeah, when GPOSS came out.
And so I was just really glad that we did that.
The way that I generally think about it is one, I think as a,
this is also particularly true for Open AI,
because as you said, we are a vertical and a horizontal company.
It's like we want to continue investing in the ecosystem.
And just from like brand perspective, I think it's good.
But then also, I think from Open AI's perspective,
if the AI ecosystem grows more and more,
it's like a rising type of social shit.
And it's all like really helpful for us.
And if we can launch an open source model
and it helps like unlock a whole bunch of other use cases
in the other industries, I think that's, you know,
that's actually not good for us.
Also what people talk about a lot
is like how well these open source AI business models
actually work because like this is very like,
like the cannibalization risk is actually very low.
Yeah.
And, like, you don't really enable competitors a lot
because, I mean, when we say open source,
you really mean open weights, right?
It's not like they can recreate it, right?
You know?
And, like, if I can distill your API as well
as I can distill, like, you give me the weights in some way,
like, and so, like, it doesn't really change that dynamic a lot.
But, yeah, I mean, to be clear, like,
we have not seen cannibalization at all from the open source models.
It seems like a very different set of use cases.
The customers tend to be, like, slightly different.
The use cases are very different.
And, by the way, it turns out inference is super hard.
actually have like scalable fast performance that's a hard hard problem yeah so like i'd say the way
that i personally think about open source in relation to the API business in particular is uh well one
it hasn't shown cannibalization risks so you know i'm not particularly worried about that yeah but also like
especially for all these major labs like they're usually like two or three models where like that is
where you're making all of your impact all of your revenue yeah and those are the ones where we're
throwing a bunch of resources into improving the model and these tend to be the larger ones that are like
extremely hard to inference yeah we have a really cracked inference team at
open AI and my sense is like even if we just like you know open source them like if we just
literally open sourced gpd5 or something yeah it would be really really hard to inference it at the
level that we are able to get it um to do there's also by the way like feedback loop between the
inference team like the training team too so like we can kind of like ophthalize all of that
can you like is it possible to verticalize models for products i have like train models specifically
for products yeah i mean to actually yeah uh i think i mean we've kind of done this with gpd5 codex right
Or do you mean like even more verticalization?
I mean like deep, deep, deep verticalization
where like, you know, like the released model wouldn't, you know,
it's like actually part of a product.
I think we're like basically starting to move in that direction.
I think there's a question of how deeply you verticalize it.
I think most of what we've done is mostly at like the post training,
like the tool use level.
Like codex is particularly good at using the, sorry,
GPD5 code is particularly good at using the codex harness.
but there's like even deeper verticalization you can do like that and that one i think is more
of an open question yeah so like a lot of my i mean a lot of my mental model this comes from the
pixel space which is like you you know um you can laura a bunch of image models right and you can
do a bunch of stuff to make it better and more suitable for some products for example um but like
these open source models are really really good and like i you would believe that you
could like verticalize a model for like editing or cut and paste or this or that you know like
that's actually part of this but you actually don't see that happen yeah it's almost always like
you're just kind of exposing like a model not something like specific to a product yeah i think i think
so i think there's a distinction to be made between the like the image model space and the text
model space yeah also because the image models tend to be way smaller and like you can iterate on it
a lot faster like that's why you get that crazy cool proliferation of like the image model side
whereas like i don't know for the text models there's also
always going to be this, like, really big, that pre-training step that, like, you have to invest in.
And then even the post-training side is, like, you know, it's not, like, the, it's not, like, the easiest thing.
Like, it's, you know, we, like, just from a compute perspective, obviously it's much smaller,
but, like, it's still pretty heavy to do, like, a full mid-train or, like, a post-training run.
Yeah.
And so, I actually think, like, that's one of the bigger bottlenecks.
Because I think you're, you're, you are right that, like, on the image side.
Yeah, you can, like, fine-tune a, like, image-of-fusion model to be, like, extremely good at, like, editing faces.
Yeah, like, something very specific.
And then you build a product around that.
And it's like, yeah, you can just kind of put all these resources
and iterate on that one specific model,
whereas it's a much heavier motion.
It seems like that on the tech side.
I got to say, it is a bit of an anti-pattern to do both languages,
like language-based models and diffusion like pixel models in the same company.
Most that have tried, like, it found it very clunky to do it.
But, I mean, you and Google are the two kind of counter examples for this.
And so, like, is it possible to even, like,
verge the infrastructures on these things?
Like, I mean, is it totally different orgs?
Is it shared infrastructure?
Like, yeah, how do you operationalize?
Yeah.
I think you're totally right.
It's an anti-pounder.
It's pretty tough to pull off.
I think, honestly, like, props to mark on our research team for, like, you know,
structuring things in a way we're able to do it.
For my perspective, I think the biggest thing is I think our, like, image, like our,
I think we're called like the World Simulation team, like the team that builds Sora and all
that under Aditya, is.
just extremely solid. Like they are probably, it's like the highest
concentration of like talent that I've seen in a while. But is it the same
is it the same? Is it like, are they like totally separate infrastructure? Do they use
the same infrastructure? Yeah, yeah, yeah. So it's actually like pretty separate. So
and I think that's part of the reason why we're able to kind of do this. Well, it's like,
one is like the team needs to be extremely strong, which which they are. And then two is
they're, they're, they're, they're, they're, they're, they're, they're, they're,
thinking about their own particular roadmap. They think about productization very
separately as well, right? Which is how like the SOAR app kind of came
came out of that as well.
And then, yeah, even like the inference stacks
are slightly different, are kind of
like different. They own
a lot more around their inference stack and they optimize their
reference stack pretty separately.
And so I think that
that contributes to helping us run
things in parallel, but it's
pretty hard to pull off for sure.
Maybe you can educate this on me.
So I think about APIs as mostly text-based
from opening. Do you guys do actual, do you do
actual pixel-based stuff? Yeah, yeah, we do.
We have a bunch. So Dolly
Dolly 2's in the API
The OG
OG model
Dolly 2's in the API
That was like the first real text image model
Right yeah yeah yeah
That was actually the model that got me to go to Open AI
Because it was the summer when I was looking for
I was thinking about something new
It's when Dolly 2 came out
And it just completely blew my mind
Wow
And I distinctly remember I was like asking it to do the simplest thing
Like draw a picture of a duck or something
And it's like the simplest thing now
And it just like it generated a picture of a
You know like a white duck
And so that
That was actually the thing that kind of got me to open it in the first place.
But yeah, we have a bunch in our API, the image gen model as well as in our API.
And then SOR2 is in our API.
We launched it at Dev Day.
It's actually been a huge hit.
I've been very, very surprised.
Need more GPS for that.
But the amount of use cases.
And then from your standpoint, you can converge that, like, the API infrastructure, probably.
Yeah, so there's, yeah, I'd say on the API side, a lot of the infrastructure is shared for those.
But once you reach the inference level, they're separate, right?
Because you've got to inference them differently.
and it is that team that has just been really laser-focused on making that side
particularly efficient and work well separate from the text models.
But yeah, we have image gen, we have video gen, and we'll continue adding more to the API there.
So it feels like we've been evolving our thinking as an industry on a bunch of stuff, right?
Like one of them for sure is the models like we've talked about.
The other one is like context engineering.
It seems to me that like actually how you build agents and expose them has evolved.
too, so maybe you can talk a bit about that.
Yeah. Yeah, I think, so at Dev Day this year, when we launched our Asian builder,
I got a bunch of questions around this because the Asian builder is like, yeah, it's like the bunch
of nodes and it's like the deterministic thing. And I was like, oh, is this really like the future
of agents? And we obviously put a lot of thought into this when we were thinking about building
that product. But the way I think about it is, do you think they came from a point of being
constrained? By the way, they're like, oh, this is too constraining. And like, yeah, I think people
are like, it's too constraining. It's not like AGI forward. You know, like at the end of the, again,
At the end of the AGI will do everything.
And so, like, why have nodes in this, like, node builder thing?
Just tell what to do.
Yeah, and so I think there's, like, two things at play here.
One of them is, like, there is a, like, practicality component.
And then the other thing is, I think there are actually, like, different types of work that exist out there that could be automated into agents.
And so on the practicality side is, yeah, like, the models today just, like, maybe in some future world, instruction following would be so good that you just, like, ask it to do this four-step process.
And it, like, always does the four-step process exactly.
we're still not there yet
and in the meantime
this entire industry
being born
and a lot of people still
want to use these models
what can you build for them
so there's a practicality component
of it
when did you launch that
Deb day
so it feels like forever ago
earlier this month
October
it was like October 6th or something
yeah yeah yeah so less than a month ago
yeah okay
it's been it's been crazy
seeing the reception to it by the way
like it's
I think the video
where Christina on my team
demos, agent builders
is like one of the most
viewed videos on our YouTube
channel now.
I will say,
I will say just anecdotally
from kind of my perspective,
people love it.
But I also saw the dissonance too.
Like I saw when it came out,
people were like, wait, what is this?
Yeah, exactly.
It's another low code, low code.
Yeah, exactly.
It's another low code thing.
And now people love it.
Yeah, yeah.
Yeah, so there's a practicality piece.
There's another piece which is like
when we were talking to our customers,
we've realized that there's like,
because at the end of day,
a lot of this,
the agent work is just trying to automate work
and like what people do
in their day-to-day jobs.
I realize there's actually
like two different types of work.
There's the work that we think about,
which is like maybe what like software engineers do,
which is like it's very undirected.
There's like a high level goal.
And then you have like, you know,
you have your cursor and you're just like writing,
writing code.
And you're kind of like exploring things
and going towards an objective.
That's like, I don't know,
more like knowledge-based work.
Like data analysis, maybe like that,
like coding is kind of like this.
But then there's another type of work,
which is actually what we realize
is like maybe even more prevalent in industry
than software.
We're just not aware of it,
which is work tends to be very procedural, very like SOP oriented.
Like customer support is a good example of this.
Like customer support, there's like very clear policy that these agents and people have to follow.
And it is actually not great for them to deviate from this and like try something else.
It's like the team really, the people running these teams just really want these SOPs to be followed.
And this pattern actually generalizes a ton of different work.
A standard operating procedure.
Yeah, sorry.
So it's like the way in which you need to operate the support.
team. But this extends to like marketing. This extends to like sales, extends to like a bunch
way more than it has any right to. And what we realize is like there's a huge need on that
side to have determinism here. Of which an agent builder with nodes that kind of like helps
enforce this thing ends up being very, very helpful. But I think a lot of us, especially in Silicon
Valley, don't really appreciate that there's like a ton of work that actually falls into
this camp. I got to say like there's a pattern that's similar to this. I'm one of you've seen it that
I've seen where some regulated industries actually can't let any generated content go to a user.
Yeah, right?
And so what they do is, I think it's so interesting.
They'll either pass in like a conversation tree
and that you can choose something from here.
Yeah, so there's some human element to it.
So as part of the prompt, they're like,
here are the viable things you can say,
choose which one to say.
So the language reasoning has happened by the model,
but nothing generated comes out.
Interesting, interesting.
Does that make sense?
Yeah, yeah, yeah, yeah.
And then another one I've seen is like actual pseudocomps
and like a Python function.
And then it'll ask a human to, like, use the pseudocode to write actual code that makes it in?
It actually has a response catalog as part of it, and it has, like, the logic to apply.
And so, like, the model takes the language in from the human user.
And then, well, like, you know, the logic of how to respond is, like, in Python code,
because it just turns out that, like, there's been a lot of code written for these types of things,
and then it actually includes the responses that you would send out.
Does that make sense?
Actually, a lot of NPCs are done this way,
like, interesting video game NPCs.
So because the way that I think about it is like, you know.
So that way with the NPCs,
it's the actual code being generated by the model
is not what ends up making it to the end user.
That's to the...
It's not the code is not being generated by the model.
It's the prompt has the code.
So let's say that I have an NPC,
and I want the NPC, like, let's say you're the gamer.
And so you're coming and you're talking to my NPC,
but my NPC has some logic that it needs.
to do. Like, if you say a certain thing, I'll give you a key or maybe a little barter.
Like, describing the game logic in English just doesn't work, actually, if you try
and do it. And then, like, actually, scripting the output doesn't work either if you needed
to use it in a game context. Like, you would have to know, like, give, like, a specific direction
or a specific this or that. Yeah, yeah. So how do you make these things behave in a more
constrained way? People pass in functions. Like, they'll actually describe the logic in Python.
So, like, my prompt will be like, you're an NPC in a video.
game. The user just asked you a question. Here's the logic you should go through. If the user says
this, then do this. It's like the pseudocode. Like if the user has this, you know, in the belt
do this, like whatever, whatever, whatever. And then here are the set of valid responses. And so
you're almost constraining. And then when it actually does do a response, you can validate that
it's one of those responses. I see. It's like highly structured. Yeah, yeah. So the MPC still only
exists in that, like the space that it can act in is still only within the space of the program that
Yeah, well, the logic is in there.
So, it can have a normal conversation,
but, like, in as much as you're trying to guide the logic
for, like, like, game design or game logic.
I see.
So, like, so you see this with NPCs,
but you also see this with regulated industries.
I literally can't have it, like...
Yeah, I was going to say what you described
kind of sounds like, you know,
giving the SOPs to, like, your set of human operators
to, like, have to stick to it, please.
Yeah, you must say these three things, and here's, like,
the discussion.
And, like, you cannot give a refund if it's, like, less than this amount.
Yeah, yeah, yeah.
Yeah, yeah.
I mean, I mean, yeah.
I don't want to equate them to MPCs, but like this is similar to similar.
I'm just saying it's actually like if you want,
if you want to really guarantee what happens,
there's like a set of techniques that you do.
And like there's some situations where you want to constrain what they do.
It could be from a regulatory standpoint.
It could be because you want it to run for a long time.
And it also could because I actually have game logic.
And my game logic is a traditional program.
Like I have like a monetary system.
I have an item system. I have a battle system.
Like you can't describe that in English.
Like you have to kind of give it to them so it can behave within that.
Yes, and that is exactly the problem I think we were trying to solve here, right?
That's awesome.
If you do not give it any of this, like, it can just kind of go off and do whatever.
And yet, they're like regulatory concerns around this.
And that is the exact use case that I think we're trying to target with the Asian building.
That's awesome.
Well, listen, we're running out of time, man.
There's a million more things I want to ask you.
But listen, I really appreciate your time to come in.
It was a great kind of surveying, like, what's going on.
And particularly, like, teasing apart, horizontal versus vertical in this page, which I really want to do.
So thank you so much.
Yeah, thank you.
Thanks for listening to this episode of the A16Z podcast.
If you like this episode, be sure to like, comment, subscribe, leave us a rating or review, and share it with your friends and family.
For more episodes, go to YouTube, Apple Podcasts, and Spotify, follow us on X at A16Z, and subscribe to our substack at A16c.com.
Thanks again for listening, and I'll see you in the next episode.
As a reminder, the content here is for informational purposes only.
should not be taken as legal business, tax, or investment advice, or be used to evaluate any
investment or security, and is not directed at any investors or potential investors in any A16Z fund.
Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast.
For more details, including a link to our investments, please see A16Z.com forward slash disclosures.
Thank you.
