The a16z Show - How OpenAI Builds for 800 Million Weekly Users: Model Specialization and Fine-Tuning

Starting point is 00:00:00 We want ChatGPT as a first-party app. First-party app is a really great way to get 800 million wow or whatever now. 10th of the globe, right? Yeah, yeah, 10% of the globe uses it. Every week, every week. Yeah, even with an open eye, the thinking was that there would be, like, one model that rose them all.

Starting point is 00:00:14 It's like definitely completely changed. It's like I'm increasing and clear. There will be room for a bunch of specialized models. There will likely be a proliferation of other types of model. Companies just have giant treasure troves of data that they are sitting on. The big unlock that has happened recently is with the reinforcement fine-tuning. With that set up,

Starting point is 00:00:30 We're now letting you actually run our REL, which allows you to leverage your data way more. OpenAI sells weapons to its own enemies. Every day, thousands of startups build on OpenAI's API, many trying to compete directly with Chichipete. It's the ultimate platform paradox. Enable your competitors or lose the ecosystem. Sherman Wu runs this highwire act. He leads engineering for OpenAI's developer platform, the API that powers half of Silicon Valley's AI ambitions. Before OpenAI, he spent six years at OpenAI.

Starting point is 00:01:00 Open Door, teaching machines to price houses where a single wrong prediction could cost millions. Today, Sherwin sits down with A16Z general partner Martine Casado to explore something nobody expected, that the models themselves are becoming anti-distance remediation technology. You can't abstract them away. And every attempt to hide them behind software fails because users already know and care which model they're using. It's changing everything about how platforms work. Sherwood and Martine talk about why OpenAI abandoned the dream of one model to roll. role to mall, how they price access to intelligence, and why deterministic workflows might matter

Starting point is 00:01:35 more than pure AI agents. Sherman, thanks very much for joining. So we're being joined by Sherman Wu. It'd be great, actually, if you provided the long form of your background as we get into this just for those that may not know you. I mean, I've used Sherman as one at the top AI thought leader, so I've been really looking forward to this. Yeah, yeah.

Starting point is 00:01:52 Thanks for having me. I'm really excited to be on the podcast. Yeah, so a little bit more of my background. So maybe we can start from present day and go backwards. So I currently lead the engineering team for OpenAI's developer platform. So the biggest product in there, of course, is the API. Is there more for the developer platform than the API? It's kind of assume that it's synonymous.

Starting point is 00:02:07 Well, so I also think about other things that we put into our platform side. So technically our government work is also like offering and deploying this in different areas. Yeah, like I've talked about. Oh, like so you have like a local deployment? Yeah, yeah. So we actually do have a local deployment at Los Alamos National Labs. It's super cool. I went to visit it.

Starting point is 00:02:23 It's very different than what I'm used to. But yeah, in a classified supercomputer with our model running there. So there's that. But like mostly. API. Did you go to Los Alamos? We didn't. Yeah, I did go Los Alamos. It's great. They showed us around. They showed us on the historic sites. Real history. Yeah. I just worked at Livermore, man. So I've got like a Oh, yeah, yeah, yeah. Yeah, yeah. I'm first time out of college. So you saw them next.

Starting point is 00:02:44 Yeah, well, we hope to. Yeah, so I work on the developer platform. I've been working on it for around three years now. So I joined in 2022. It was basically higher to work on the API product, which at the time was the only product that opening I had. And I've basically just worked on it the entire time. I've always been super interested in the developer side and kind of like the startup story of this technology. So it's been really cool to kind of see this evolve. And so that's my time in OpenAI. Before Open AI, I was at Open Door for around six years.

Starting point is 00:03:08 I was working on the pricing side. My general background before. It's such a dissident. Yeah, yeah. Pricing at Open Nord to like running API. It's such a different. It's been fascinating actually for me to see the differences between the companies. Like they run so differently.

Starting point is 00:03:21 They both have opened in the name. So you should have some overlap. But that's pretty much it. But yeah, I was there for around six years working on the pricing team. So our team basically would run the ML models. This is actually. pricing the assets on Open Door, the inventory. Exactly. So, yeah, Open Door would buy and sell homes,

Starting point is 00:03:36 and their main product was buying homes directly from people selling them with all cash offers. And so my team was responsible for how much we would pay for them. And so it was a really fun, like, ML challenge. It had a huge operational element to it as well because not everything was automated, obviously. But it was a really fascinating technical challenge. Is there any sense of that on the API side, like GPU capacity buying, or is it just totally unrelated? on the API side. There is a small bit of how we price the models, but I don't think we do anything as sophisticated as Open Door. Open Door is just like such a hard problem. It's like such a expensive asset. The holding costs are very expensive. You're like holding onto it for like months at a time. There's like a variability in the holding time. And that's a lot long tail of potential things that could grow up. Long tail. Yes. And like you try to think about it from a portfolio perspective. And like if one of them just like you're holding on it for two years, it blows everything like goes negative. So it's a very, very different. Six years? Different challenge. Yeah. Yeah, six years there.

Starting point is 00:04:28 Wow. Lots of up and nouns, saw a lot of the booms, saw a lot of the struggles. And then we IPOed for our left. But yeah, just in general, it was a very great experience. I think for me it was also had such a very like business operations and like a very like by the book type of culture, whereas opening eyes like very different. Well, so interesting. I was just thinking about it now. It's like even for a company like that, like you don't think about it as a tech company.

Starting point is 00:04:49 But if there is a deep technology problem, it actually is the pricing, right? It's actually an ML problem. Yeah, that's not a website. It's not the platform. It's not the API. It's literally that. Yep, yep, yep. And that's what attracted me to it. I think that was interesting. It's also a way lower margin business than OpenAI

Starting point is 00:05:04 because you're making a tiny spread on these homes. They would talk about basis points, like eating bits for breakfast and all that. Anyways, I was at Open Door for around six years. And then before that was my first job out of college, which was at Quora, Adam Deans from there. Yeah, so I was working on the News Feed. So worked on News Feed ranking for a bit, worked on the product side. That was actually my first exposure to, like, actual ML and industry

Starting point is 00:05:24 and learned a lot from the engineers at Core. We basically hired a lot of the early feed engineers from Facebook. Was Charlie still there when you were there? Charlie was not there when I was there. So you're like right after you're there. Yeah, yeah, yeah. And that was a really legendary team. It's still known to be kind of this super iconic founding team.

Starting point is 00:05:38 Yeah, yeah. The early founding team was really solid. I still think that even while I was there, I was still like I'm amazed at the quality of the talent that we had. I think there was like when the company was like 50 to 100 people. But yeah, like a bunch of the perplexity team was there. Dennis was on the feed team with me, Johnny Ho, Jerry Ma. That's right.

Starting point is 00:05:53 And then Alexander, the scale. now MSL, you know, I was there between high school and college. It was an incredible team. I think I kind of took it for granted all. I was a good group. How did you get to Quora? What did you study in an undergrad? Yeah, so before that I was at MIT for undergrad.

Starting point is 00:06:08 I studied computer science, did like one of those like computer science and the master's degree, kind of like crammed it in. I ended up at Quora because I got in what we call an externship there. So at MIT, you actually get January off. So there's like the fall semester and then January's off. And then you have the spring semester. And so it's called independent. activities period. So some people just like take classes. Some people would just do nothing. But some

Starting point is 00:06:29 people will do like month-long internships and some crazy companies will offer a month-long internship to a college student. And it really is just kind of like a way to get people. Did you come out here from Boston? Yeah. Yeah, it was crazy. So you had to apply. I remember, yeah, this is I think 2013 January or something. You had apply. And I remember the core internship was the one that just paid the most. They paid, I think it was like $8,000, $9,000. And it was like, wow, it's like for a month. And you're just like kind of ramping up like half the time. I can eat for a year. Yeah. Yeah. It's like. college student. It was like great. And yeah, they would kind of like fly you out here.

Starting point is 00:06:58 So I did the interviews and then luckily got an offer. And so, yeah, came out for a January. That was right when they moved into their new Mountain View office. And I basically, yeah, honestly just ramped up for like two weeks and then have two weeks of good productivity working on the feed team. So that was that like user facing product work? Yeah. Yeah. I distinctly remember my externship project for those two weeks was just to like add a couple features to a feature store. Yeah. And that would make it sway into the model. I remember my mentor there was is Tudor, who's now running, I think it's called Harmonic Labs. Yeah, yeah.

Starting point is 00:07:28 Crazy team. Crazy team. I mean, by the way, I think it's one of the untold stories of Silicon Valley's, like how good that original team ended up in Korea is. I mean, a lot of them are still there and still good, but the diaspora from Quora is everywhere. Yeah, yeah. That's actually how I ended up at Open AI, too,

Starting point is 00:07:42 kind of fast-forwarding from there, because Open AI kind of kept a quiet profile-ish. I'd always kind of kept house on them because a bunch of the core people I knew kind of, like, ended up there. It's kind of like checking in on it, and they were like, yeah, something crazy is happening here. You should definitely check it out. So, yeah, I definitely owe a lot.

Starting point is 00:07:55 to Quora. But yeah, part of the reason why I went there versus other options as a new grad was the team was just so incredible and I just felt like I could learn a ton from them. I didn't think about everything afterwards. I was just like, man, if I could just absorb some knowledge from this group of people, it could be great. Awesome. Yeah. So one place I wanted to start is something that I find very unique about Open AI is it's both a pretty horizontal company. Like it's got an API. Like I would say we've got this massive portfolio of companies, right? And I would say a good fraction of them use the API. And then it's also a vertical company in that you've got full-on apps, right? Like everybody uses chat GPT, for example. And so you're responsible for the API and kind of the DevTools side. So maybe just to begin with, is there an internal tension between the two?

Starting point is 00:08:41 Like, is that a discussion? Like the API may, whatever, it may help a competitor to like the vertical version or is it not, things are just growing so fast. It's not an issue. I'll just love to how you think about that.

Starting point is 00:08:53 By the way, it's very unusual for companies that. both of that. These two things this early is very unusual. Yeah, yeah, I completely agree. I think there is some amount of tension. I think one thing that really helps here is Sam and Greg, just from a founder perspective,

Starting point is 00:09:05 have since day one just been very principled in the way in which we approach this. They've always have kind of told us we want chat GPT as a first party app. We also want the API. And the nice thing is I think they're able to do this because at the end of it kind of comes back to the mission of Open AI,

Starting point is 00:09:18 which is to create AJ and then to distribute the benefits as broadly as possible. And so if you interpret this, you want it in as many surfaces as you want. And the first party up is a really great way to get, you know, it's like 800 million wows or whatever now. 800 million wows? Yeah, yeah, it's pretty, it's actually mind-boggling to think about it.

Starting point is 00:09:35 I don't think many people listening to this don't understand how big that is. Yeah, it's crazy, yeah. That's got to be, like, actually historic for the time it's taken to get to 800 million. It's historic. It's also just like, yeah, the amount of time and just like how much we've got to scale up. A tenth of the globe, right? Yeah, yeah, 10% of the globe uses it. Every week.

Starting point is 00:09:52 Every week. Yeah, yeah. And it's growing. and it's growing. So like at some point, you know, it'll go even higher than that. And so, so yeah, like, obviously the reach there is unmatched. But then also just, like, being able to have a platform where we can reach even more than just that. Like, one thing we talk about internally sometimes is, like, what does our end user reach from the API? Like, it's actually, like, really, really, it's really broad. It might even, it's hard

Starting point is 00:10:13 because chat GPU is growing so quickly. But, like, at some point, it was definitely larger than chat GPT. And the fact that we're able to get tap in all of this and get the reach that we want, I think is really good. But yeah, I mean, there's definitely some tension sometimes. I think it's come up in a couple places. I think one of them is on the product side. So as you mentioned, you know, sometimes there are competitors kind of like building on our platform

Starting point is 00:10:35 who, you know, might not be happy if chat chitblet launch is something that competes with them. Yeah. I mean, that's the tale of the old is the cloud or operating systems or whatever. So like that's, you know, I think it's more like, does chat chitpT worry about the competitor?

Starting point is 00:10:50 Yeah. You know, type thing. Like, you know, you enabling a competitor. Yeah, yeah. So, I mean, the interesting thing is, like, I would say not particularly, mostly just because we've been growing so quickly. It's like, you know, it's such a, you know, force right now. Yeah, yeah.

Starting point is 00:11:03 Growth solves so many, so many different things. And like, and the other way we think about it is like everyone's kind of building, building around AGI, building towards AGI. Of course, there's going to be some overlap here. So, yeah, I mean, but I would say like, at least in my position, I feel more of this tension from the customer, like the API customers themselves. Right. So like, oh my gosh, you know, you're like, are you going to build this thing that I'm working on?

Starting point is 00:11:22 Yeah, that's, that story is as old as. computer systems. There's never not been a computer platform that didn't have that problem. So, okay, so I kind of go back and forth in this one. I want to try one out on you, which is the problem historically with, you know, offering a core service as an API,

Starting point is 00:11:40 you can get disintermediated, right? And so I can build on top of it, but then, you know, the user doesn't know, like whatever, I build on top of the cloud, but I disintegrate from the cloud and then I can switch to another cloud or whatever. And it occurs to me that that's kind of hard. hard to do with these models because the models are so hard to abstract away.

Starting point is 00:11:59 Like, they're just unruly, right? If you try to, like, have traditional software drive them, they just don't kind of manage very well. So part of me thinks that it's almost like this, like, anti-disintermediation technology that you kind of have to expose it to the user directly. Does that make sense? And so I'm wondering of like, so even if I think chat GPT is really just trying to expose the model to the user, the API is kind of just trying to expose the model to the

Starting point is 00:12:23 user. So I think there's almost this argument that's like, if the real value is in the models, it doesn't really matter how you get it to them, because it's going to be very tough for someone's going to abstract it away in the classic sense of computer science, of like, they don't know that they're using the model. Like, you always know you're using GPD-5. Yeah, and the interesting thing is I think like the entire industry kind of has slowly changed their mind around this too. I think like in the beginning, we kind of thought like, oh, these are all going to be interchangeable. It's just like software. Yeah, yeah, exactly. So the piece of infer that you swap out. Yeah. But I think we're learning this on the product side with like, you know, the

Starting point is 00:12:53 GPD 5 launch and like 4-0 and like how so many people liked O3 and 4-0 and all of that. I felt that. I felt that when it changed. I'm like, I'm like, you're not as nice to me. Like I like I like the validation. Yeah. It's actually fun because I really loved GPD-5's personality, but I think it's like the way I used, you know, chat GPT was very utilitarian. Like it's like, you know, mostly for work or just like information.

Starting point is 00:13:14 Yeah, I've definitely come around just, you know, but like I actually felt the distance when it changed. It's like, it's like, yeah. Like there's this emotional thing that goes on, but it's almost like it's an anti- you know, disintermediation technology. Like, you kind of have to show this to the user. Yeah, yeah. And then you see a lot of, like, you know, more successful products like cursor, like do this directly,

Starting point is 00:13:31 especially the coding products where users want more control. We've even seen some, like, you know, like more general consumer products do this. And so it's definitely been true on the, on the consumer side. The interesting thing is I think it's also been true on the API side. And that's also something that I think. No, exactly. No, that's exactly what I'm saying.

Starting point is 00:13:45 Yeah, it's like... The argument could be that I could use the API to disintermediate you. But, like, you don't see that happening because it's so hard to, put a layer of software between a model and a person. You almost have to expose the model. Yes, yes. And I think, if anything, I think the models are almost like diverging in terms of

Starting point is 00:14:03 what they're good at and like their specific use case. And I think there's going to be more and more of this. But yeah, basically it's been surprisingly hard for, or like the retention of people building on our API is like surprisingly high, especially when people thought you could just kind of swap things around. You might have, you know, like even tools that help you swap things around. But yeah, the stickiness of the model itself has been surprising. And do you think that is because of a relationship between the user and the model,

Starting point is 00:14:29 or do you think it's more of a technical thing, which is like my e-vals work for, like, open AI, and like the correctness means tains? Yeah, yeah. I think it's both. So I think there's definitely an end user piece here, which is what we've heard from some of our customers. Like they just get familiar with the model itself. But I also think there's a technical piece, which is like the – Also, as a developer, especially with startups,

Starting point is 00:14:53 you're really going deep with these models and really, like, really, like, iterating on it, trying to get it really good within your particular harness. You're iterating on your harness itself. You're giving it different tools here and there. And so you really do end up, like, building a product around the model. And so there is a technical piece where, you know,

Starting point is 00:15:09 as you kind of keep building with a particular product like GPD5, you're actually, like, building more around it so that your product worked uniquely well with that model. So I use Cursor. And just for like a lot of something, like writing blogs and like, you know, we're investors. And I use it for sometimes for coding. And it's remarkable how many models I use in Cursor. So like literally my go-to model is GPD-5.

Starting point is 00:15:35 I love GPD-5. I think it's a phenomenal, like, you know. And then like I use like max mode with GPD-5 for planning. And then, but, you know, like, I mean, I like the tab complete model that's in Cursor. And like, you know, the new model they just dropped is for like some basically, you know, some stuff is like. Yeah, the composer one. Like, yeah, the composer one's good.

Starting point is 00:15:51 Yeah. And so, like, you know. And I think that, like, kind of reflects this, too, because it's like, it's a particular model for each particular use case. Yeah, yeah, yeah, yeah. Like, I've talked to a bunch of people who've used the new composer model, and it's just really good for, like, fast, like, first pass. Exactly, that's right.

Starting point is 00:16:05 Like, keep you in flow kind of thing. And then you kind of, like, bubble out to another model if you want, like, you know, deeper thinking or something up. I literally sit down. I literally sit down at SGPT5 to help me plan something out. And it's really good at that. And then, you know, like when I'm coding and I'm doing like the quick chat thing,

Starting point is 00:16:18 then I'll use Composer. And then if there's like, whatever, there's like some crazy bug or something like that. So, you know, do you remember like in the early days of all of this where like there's going to be one model? And I mean like even like investors, like we will never invest in a model company

Starting point is 00:16:34 because like there will only be one model and it's going to be AGI. But like the reality, it feels like there's this massive proliferation of models. Like you said before, they're doing many things. And so maybe two questions, maybe too blunt or too crass.

Starting point is 00:16:45 but the first one is, what does that mean for AGI? And the second one was, what does that mean for open AI? Like, does that mean that, like, you end up with a model portfolio? Do you select a subset? Do you think this all gets superseded by some God model in the future? Like, how does that play out? Because it's against what most people thought. Most people thought this is all going towards one large model that does everything.

Starting point is 00:17:04 Yeah, I think the crazy thing about all this is just, like, how everyone's thinking has just changed over time. Totally. Like the, I distinctly remember this, like, and the crazy thing is not that long ago. It's just like three, like two or three years ago. I remember, like, even with an opening eye, the thinking was that there would be, like, one model that rules them all. And it's like, why would you, I mean, like, this kind of goes to the fine-tuning API product. There's like, why would you even have a fine-tuning product?

Starting point is 00:17:24 Why would you even want to, like, iterate on it? There's going to be this one model that just subsumes everything. And that was also kind of the, that is also, like, the most simplistic, like, view of what the AGI will look like. And, yeah, it's, like, definitely completely changed since then. I think one. And, but then the other thing to keep in mind is, like, it might continue to change, like, even from where we are today. But it's like becoming increasing and clear, I think, that there will be room for a bunch of specialized models. There will likely be a proliferation of other types of models.

Starting point is 00:17:53 I mean, you see us to do this with like the Codex model itself. We have like GPD 4-1 and like 4-0 and like 5 and all of this. And so I don't think there's room for all this. I don't think that's bad for what's worth. If anything, I think, you know, as we've tried to move towards AGI, things have just been very unexpected. and I think the market just evolved and the product portfolio evolves because of that.

Starting point is 00:18:16 So I don't think it's a bad thing at all. What I do think it means... You can easily argue it's very good for OpenAI and very good for like the model companies to like... Yeah, because not have like, you know, winner-take-all consolidated dynamics, right? I mean, you just have a healthier ecosystem

Starting point is 00:18:29 and a lot more solutions you can provide a lot. Yeah. You know. Yeah, and as the ecosystem grows, it generally is helpful. Like, this is one thing we actually think about a lot, too, is as the general, like, yeah, ecosystem grows, like, open-eye just stands to benefit a lot from this.

Starting point is 00:18:41 And this is also why we've, like, some of our products we've even started opening up to other models, right? Like our Ethals product now allows you to bring in other models. It's all of this. We think it's like any rising tide generally helps us here. But yeah, I think as we move into a world where there would be a bunch of our models, this is why we've kind of invested in our model customization product with the fine-tuning API, with the reinforcement, fine-tuning, opening that up as well.

Starting point is 00:19:04 It's also part of why we open-sourced GBTOSS as well because we want to be able to, you know, physical tape. I want to talk about that in just a bit because the open source is actually very interesting. I mean, actually, I thought the open source model was great, but clearly it's something that a company has to be careful with. But before that, I want to talk a little bit

Starting point is 00:19:21 about the fine-tuning API. So I've noticed that you are moving towards kind of more sophisticated to use of things like, you know, like fine-tuning, which, you know, in a way you could read that as a bit of a capitulation that, like, you know, there is product-specific data and there's product-specific use cases

Starting point is 00:19:40 that a general model won't do, to your point, right? So, like, as opposed to proliferation of model, you do that. It seems like a lot of that data is actually very, very valuable, right? And so, you know, to what extent is there, like, interest in almost a tit for tat where you can, like, expose, you know, the ability to get product data into fine-tuning, and then you also benefit from that data

Starting point is 00:20:05 because the vendors provide it to you. versus like this is 100% you know like they keep their own data and there's kind of no interest in that because it feels to me like the next level of scaling this is kind of where we're at and so I just kind of curious how yeah so I mean maybe even like taking a step back

Starting point is 00:20:23 the main reason why we even invested in a fine tuning API in the very beginning is one there's been huge demand from people to be able to customize the models a bit more it kind of goes into like prompt engineering and also like I think the industry changed our mind on that as well like it's evolved but the second thing is exactly what you said, which is the companies just have giant treasure troves of data that they are sitting on

Starting point is 00:20:44 that they would like to utilize in some fashion in this AI wave. And you can, you know, the simple things to put it in like, you know, some like vector, like do rag with it or something. But there's also, you know, if they have a more technical team, they do want to see how they can use it to customize the models. And, and so that is actually the main reason why we've invested in this. The, the interesting thing was way back, kind of back in like 22, 23, our fine-tuning offering was, I'd say like two limited so that it was very difficult for people to tap into and use this data. So it was just like a supervised fine tuning

Starting point is 00:21:15 PI and like we're like oh you can kind of use it but in practice it really is only useful for like like it's honestly just like instruction following plus plus you like kind of change the tone and you're just like instructing it. But I think the big unlock that has happened recently is with the reinforcement fine tuning model because

Starting point is 00:21:31 with that setup we're now letting you actually run RL which is more finicky and it's like harder and you know like you need invest more in it but it allows you to leverage your data way more. By the way, this is just a naive question from me, which is it feels from just my understanding from my own portfolio, it feels like there's two modalities of use.

Starting point is 00:21:49 One of them is I've got a treasure trojave of data that I've had for a long time, and I create my model on that treasure trove of data, and all that happens offline, and then I deploy that. There's another one, which is like, I actually have the product being used in real time. I've got a bunch of users. Yeah.

Starting point is 00:22:02 And, like, I can actually get much closer to the user. I can kind of A-B-test and decide which data, and, like, it's kind of more of a near-refer real-time thing is, is this focus on, like, more product stuff or more treasure to? So the dream with the fine-tuning API was that we should be able to handle both, right? It's like, we actually had this dream, and we have this whole, like, Laura set up with the fine-tuning inference where we should just be able to scale to, like, millions and millions of these fine-tune models, which is usually what would happen if you have, like, this online

Starting point is 00:22:29 learning thing. Exactly, yeah. In practice, it's mostly been the form, right? In practice, it's mostly been, like, the offline data that they've, like, already created, or they are creating with experts or something and, like, using their product that they're able to use here. But the main thing I was trying to say around the reinforce and fine-tuning API is it kind of changes the paradigm away from just like small incremental, like tone improvements,

Starting point is 00:22:50 which is what SFT did, to actually improving the model to potentially soda level on a particular use case that you know about. Like that's where people have really started using the reinforcement, fine-tuning API. And that's why it's gotten more, more uptake. Because if the discussion is less like, hey, I can make this model, you know, not like speak in a certain way better, it's less compelling. But if it's like, hey, for like, you know,

Starting point is 00:23:14 medical insurance coding or for like coding planning, agentic planning or something, you can create the world's best model using your data set with RFT, then it becomes a lot more. And will you, will you ever, like, or maybe do you? Will you ever, like, find ways to get access to that data?

Starting point is 00:23:28 Like, you know, like, listen, if I had the data and I wanted cheap GPUs, I'd trade you for it. I don't know. Yeah, I mean, we've talked about this. And we've actually been piloting some pricing here, too, where it's like, because this data is like really helpful and it's kind of hard to get

Starting point is 00:23:42 and if you actually build with the reinforcement fine-tuning API, you can actually get discounted inference and potentially free training too if you're willing to share the data. It's always kind of, you know, it's up to the customer there. But if they do, it is helpful for us and there will be benefits for the customer as well.

Starting point is 00:23:58 That's awesome. Okay, you said that the use on prompt engineering have changed. Yeah. Actually, I wasn't aware of that. All the other things I wasn't aware of this one, I wasn't. Yeah, I mean, I think the prevailing view, This is back in 2022. I remember I was talking to so many people. And they're basically, I mean, this is similar to, like, the single model AGI view as well,

Starting point is 00:24:15 which is, like, like, prompt engineering is just not going to be a thing. And you're just not going to have to think about what you're putting in the context window in the future. Like, the model would just be good enough. It'll just, like, no, it'll know what you need to do. Yeah, that's definitely not a thing. Yeah, but, like, I don't know, maybe people forget it. But, like, that was, like, a very common. Yeah, that was a very common thing.

Starting point is 00:24:32 Yeah, because, like, scaling laws or whatever, something with scaling laws. And, like, you'll just mindmel with the model. and like you just like prompting and like instruction following will be so good that you won't really need to do it and if anything like yeah it's like clearly been wrong and but it is interesting because

Starting point is 00:24:46 I think it's a slightly different world that we're in now where the models have gotten really really good at instruction following relative to the like GB3-5 or something but I think the name of the game now is less on like prompt engineering as we had thought about it two years ago it's more of like it's like the context engineering side

Starting point is 00:25:02 where it's like what are the tools you give it what is like the data that it pulls in when does it pull in the right data Well, this is very interesting. I mean, to reduce it to, like, an almost absurdly simplistic level. Like, the weird thing about rag, for example, the classic use of rag is, like, you're using, like, cosine similarity to choose something that you're going to feed into a superintelligence. Yeah. So, like, you know, you're like, I'm not randomly, like, randomly grab this thing based on, like, fucking embedding space.

Starting point is 00:25:28 It doesn't really, you know, and like, and then, you know, when you want the superintelligence decide the thing to do. And so it's, like, pushing intelligence in that retrieval. clearly is something that makes a lot of sense. It's almost like pushing the intelligence out in a way. Exactly. And to be fair, I think, like, Rag was kind of introduced when the models were like, it's like pre-reasoning models. So it was like, you only had kind of like one shot to like do this and it wasn't that smart.

Starting point is 00:25:49 But now that we do have the reasoning models, now that we have, I mean, if you, like, one of my favorite models is actually 03 because it was like one of the most diligent models. It was like, oh three. It would just like do all these tool calls. And it's like really the intelligence itself trying to like do the, you know, tool calls or rag or anything like that. or write the code to execute.

Starting point is 00:26:08 And so the paradigm has shifted there, but yeah, because of that, I think, like, context engineering, prompt engineering, what you put, what you give the model is, like, extra important. Yeah, yeah. Okay, so you have API, so the API, which is horizontal, you've got chat, GPT, and other products, which are vertical. We haven't even talked about pixels.

Starting point is 00:26:22 This is all just language. Are agents a new modality? Is that something else? Like, you know, like a codex or... What do you mean by modality here? Like, um, I mean, they feel both vertical and horizontal to me in a way. Like, to me, chat GPT is a product, right?

Starting point is 00:26:40 It's like it's a product and my mom uses it, right? Yep. And an API is a dev thing. You kind of give it to a developer. And like a CILI is kind of somewhere in between to me. It's like, is it a product? Is it like at this horizontal? Like, how is it handled internally?

Starting point is 00:26:55 Is it a totally separate team that does agents or? No. So it's, yeah, it's interesting because like I think the way that I, the way that you frame it just now almost seemed like agents was like this singular concept that like, you know, or like might have its own particular team. Maybe a better question is, what is an agent to you? Yeah, yeah, yeah, yeah.

Starting point is 00:27:14 Even getting a language is like important for this conversation. So I actually don't even know if you be helpful for me to share about my general take on agents is it's a, it's an AI that will take actions on your behalf that can work over long time horizons. And I think that's the pretty general. Utilitarian, yeah, yeah. But like if you think about it that way, yeah, I mean, maybe this is what you mean by Mo.

Starting point is 00:27:34 but it is just a way of using AI. And it is a, I guess it could be viewed as a modality, but we don't view it as like a separate thing, separate from AI and attach. Let me just try and kind of, you know, give you a sense of where this question is coming from. Like, I know how to build a product, like, and we know how to go to market for products.

Starting point is 00:27:53 We know how to do, like, you know, the implications of turning them into platforms. Like, it's just we've been doing this for a very long time, right? We know how to do the same thing for APIs, right? We know how to do billing, we know, like, the tension of, like, people bill on top of it and all of that stuff. And, like, what I've been trying to, and this is just maybe a personal inquiry, it's just not clear for me for an agent if you, if it sits in one of those two camps. Is it more like the product camp? Is it more like the, or is it.

Starting point is 00:28:22 Because it's kind of both. Like, I could, like, literally give you code. Yeah, yeah. And, like, as a user, and then you just talk to it. Or I could, like, build in a way, kind of embed it in. in my app. And so like, but then that means something to you as far as like,

Starting point is 00:28:38 you know, how do you price it and what does it mean for ecosystem? Like, like, for example, like would you be fine if I started a company and just like built it around Codex? Is that a thing?

Starting point is 00:28:46 Starting company and building it around? Correct. Yeah. I actually think that would be great. Like it's a, we like release like the Codex SDK and we like want people to be able to build it and hack on it. Yeah.

Starting point is 00:28:55 Actually, I think this might be what you're getting at, which is, um, and this is like a kind of a unique thing about opening eye and kind of reflects on how it's run, which is at the end, Like, at the end of the day, opening AI is like an AGI company.

Starting point is 00:29:06 It's like an intelligence company. Yeah, for sure. And so agents are just like one way in which this intelligence kind of be manifested. And so the way that I'd say we actually think about internally is all of our different product lines, SORA, Codex, API, chat, APT are just different interfaces and different ways of deploying this. So you don't really. So there's no, like, single teams like this is, you know, like thinking about agents. I would say the way that it manifests itself more is like each product area thinks about, like, what is, you know, this intelligence is actually turning into a form where, like, like it can actually agentic behavior is more possible.

Starting point is 00:29:35 What would that look like in a first-party product like chat GPT? What would that look like? This is actually why Codex ended up becoming its own products. What would it look like in a coding style product? Like we explored it and chat GPT kind of worked there, but actually the Klai interface actually makes a lot more sense. That's another interface to deploy it. And then if you look about the API itself,

Starting point is 00:29:52 it's like this is another interface to deploy it. You're thinking about it in a slightly different way because it's a developer-first mindset. We're helping other people build it. The pricing is slightly different. But it's all these different manifestations of this core. like intelligence that is the the Asian behavior. It is so remarkable how much of this entire economy is basically just token laundering.

Starting point is 00:30:10 It's literally like anything I can do to get like English in or like a natural language in and then like, you know, the intelligence out. Yeah. And I mean, and it's because these things are so resistant to layering, it's so hard to layer language out. Like, you know, I could even do it easily pretty easily with like codex. I could just like use it, you know, as a component of a program. and just, you know, basically launder intelligence.

Starting point is 00:30:35 I mean, of course, you know, I'd be charged to do that. So I actually, my view of this, and having seen now so many kind of launches of different products, I've seen agent launches and the definition that you have,

Starting point is 00:30:45 I've definitely seen APIs. And I've seen products on these. It's like, they're actually quite different than, like, what we're used to. Like, the COGS is different. The defensibility is different. So we're kind of rewriting it.

Starting point is 00:31:00 And so it's kind of like, you know, you came from a kind of pricing background. I mean, you're working on a model for pricing. Now you have the API. So I just love your thoughts on, like, I mean, how have you evolved your thinking and how do you price these, you know, access to intelligence where, you know, you don't know how many people can use it.

Starting point is 00:31:20 It's almost certainly usage-based billing, not something else. Like, can you talk just a bit about, like, philosophy around pricing on these things? Is it different for product-first API? Yeah, I think that the, the, The honest truth theory is, like, it's evolved over time as well. And, like, I actually think the simplest, like, the reason why we've done usage-based pricing on the API, honestly, is because it's been, like, it's closest to how it's actually being used. And so that's kind of how we started. I actually think usage-based pricing on the API has, has, has, like, surprisingly held strong.

Starting point is 00:31:50 And, like, I actually think this might be something that we'll keep doing for quite a long time, mostly because, um... The Cogynist, so I don't know how you don't do usage-based. Yeah, yeah, yeah, yeah. I just don't know how that... Yeah, and then, and then there's also the strategy of, like, how we price it. And internally, one thing we do is we always make sure that we actually price our usage-based pricing from a cost-plus perspective. Like, we're actually just like trying to make sure that we're being responsible from a margin perspective. By the way, this is a huge shift in the industry in general just because, like, I remember the shift from on-prem to recurring.

Starting point is 00:32:20 Yeah. That was a big, big deal. Like, that created Zora. Like, it created whole company. It was like your whole books on into, like, a bunch of consultants on how you do this. It changed. Yeah. You know, and like, I think the shift to usage is, it's.

Starting point is 00:32:32 as bigger, bigger. And it's also even a really hard technical problem. Yeah. Like, I can't even imagine 800 million wow. Like, how do you build? Yeah, yeah. Well, 800 million wow is a little easier because it's not user-based pricing. It's subscription.

Starting point is 00:32:46 So it's like that's way way well. But I mean, there's still like a lot of users on the API that we need to like, you know, manage all the billing side. There's some like overages or stuff you've got to deal with on that or? What do you mean by overages? I don't know. I guess I don't know. I don't know.

Starting point is 00:33:00 I don't know. I don't know. I don't know. Oh, I see. Okay. They're like max quotas that we don't let people go over. But, like, in practice, these quotas are, like, pretty massive. And that would literally be, like, one of the most complex systems somebody's ever built of you would do a usage base at that scale. I mean, these are very, very, very, very, very, very, and like, you have to be correct.

Starting point is 00:33:14 Like, these are very hard systems to scale. Yep, yep, yeah. Yeah, I mean, we have a whole team thinking about this now internally. Yeah, I mean, usage free pricing is also interesting. So there's, we acquired this company called Roxette a while ago. A while ago, a founder's, his name is Vencott. Yeah, Vencott's incredible. Awesome.

Starting point is 00:33:30 Awesome. Awesome. Awesome. Awesome. Ben kind of you're listening, we're huge fans. I'm a huge fan. He's going to love this. He's great, man.

Starting point is 00:33:37 He's a legend. Anyways, I was talking to him about pricing as well. And his take is that pricing is kind of like a one-way ratchet. And like, basically, once you get a taste of usage-based pricing, you're never going to go back to like the per-deployment type pricing. And I think it's definitely true. And I think it's just because it's getting, it gets closer and closer to like your true utility. You're getting all this thing. The main point is like you have to maintain all his infra.

Starting point is 00:34:00 Yeah, to like get it to work. But if you do have it, he thinks it's like a one-way ratchet where like there's just like no going back. And then I think the hot new thing now is like, oh, with AI you can now kind of measure like outcomes. And so that's like another, you know, like step forward. And if that works, like maybe it's a one-way ratchet. So we thought about that. It's like, you know, is there some type of like outcome-based pricing. This is more on the first party side on an API.

Starting point is 00:34:20 It's kind of hard to measure that. Yeah, that's very hard. I mean, that's hard because you end up having to price and value non-computer science infrastructure, right? Like you're literally going into verticalization now. Yep. You're like, I mean, listen, if it's like porting a code base, maybe you have some expertise, but if it's like whatever, like increasing crop yields.

Starting point is 00:34:40 Like at some level you need to like. But there could be a world where like the AI is like, you know, make judgments of these and do it in an accurate enough way where you can tie it to billing. I think this is a problem with AI conversations because at any point in time you're like, but it could get good at.

Starting point is 00:34:56 It's not a problem anymore. Yeah, yeah. At some point it'll be solved. It's so much like, The prompt engineering and the single AGI, I think, from before. Yeah. Yeah, it's like when you reach that level of, when you push it that far, everything's kind of solved on outcome-based pricing.

Starting point is 00:35:09 It sounds very appealing. Like, if it can work and it can work. But one thing that we've started realizing is it actually ends up correlating quite a bit with usage-based pricing, especially with test-time compute. Like, if the thing is just like thinking quite a bit, like, actually, you know, if you charge just by usage-based and not outcome-based, you're, like, basically approximating outcome-based at this point. If the thing is, like, thinking for, like, so long,

Starting point is 00:35:32 it's, like, highly correlated with what it's doing. It's just adding more value. Yeah, yeah, exactly, exactly. And so, like, maybe at the end of the day, like, usage-based pricing is all you need, and it's like, we're just going to, like, you know, live in this world forever. But, yeah, I don't know.

Starting point is 00:35:45 It's constantly evolving, I think, our thinking has evolved here as well. I personally am, like, keeping track of if the outcome-based pricing setups can actually work here. But at least on the API side, I think, you know, it's such a usage-based setup. We have the get infrastructure around this. I think we'll probably stay with that for a while.

Starting point is 00:36:01 So how do you think about open source? I mean, you know, I think you're the only big lab that's releasing open source. Is that? No, Google has some of theirs. Okay. Yeah, mostly smaller models on their side. Yeah, yeah, yeah, yeah.

Starting point is 00:36:13 So how do you think about open source vis-a-vis, you know, competition, cannibalization, you know, like, what's the strategic, what's the complexity? Yeah, yeah. So I personally love open source. Like, I think it's great. There's a, all of us grew up with it, right? Yeah, all of us grew up with it.

Starting point is 00:36:32 Like, the internet wouldn't exist without it. Like, you know, so much of the world was built in half of it. Cloud wouldn't exist without it. Yeah. Nothing would exist without it, except for maybe Windows. And so it was interesting because, like, I felt like over the last, there was before we launched the open source model. I know Sam feels this way as well.

Starting point is 00:36:46 Yeah. It's like, there's this, like, weird, like, you know, uh, mindset where because Open AI hadn't launched anything, it just seemed like it was super, like, anti-openingI was, like, open source. But I'd actually been having conversation with Sam ever since I, joined about open sourcing a model. We were just trying to think about, like, how can we sequence it?

Starting point is 00:37:04 What compute is always a hard thing. It's like, do we have the compute to kind of, like, train this thing? So we've always wanted to kind of do this. I'm really glad that we were able to finally do it. I think it was earlier this year? I, like, lost time. AI time is so crazy. Yeah, I was the last year or no, it was this year, yeah, when GPSS came out.

Starting point is 00:37:21 And so I was just really glad that we did that. The way that I generally think about it is one, I think as a, this is also particularly true for open AI because as you said we are a vertical and a horizontal company is like we want to continue investing in the ecosystem and just from a brand perspective I think it's good but then also I think from open

Starting point is 00:37:41 AI's perspective if the AI ecosystem grows more and more it's like a rising type of social and like yeah it's all like really helpful for us and if if we can launch an open source model and it helps like unlock a whole bunch of other use cases in the other industries I think that's you know that's

Starting point is 00:37:56 actually not good for us also say what people talk about a lot is like how well these open source AI business models actually work because like this is very like the cannilization risk is actually very low. Yeah. And like you don't really enable competitors a lot because I mean when we say open source, you really mean open weights, right? It's not like they can recreate it, right? You know?

Starting point is 00:38:18 And like if I can distill your API as well as I can distill like you give me the weights in some way, like it doesn't really change that dynamic a lot. But yeah, I mean, to be clear, like we have not seen Canada capitalization at all from the open source models. It seems like a very different set of use cases. The customers tend to be like slightly different. The use cases are very different. And by the way, it turns out inference is super hard.

Starting point is 00:38:39 Like to actually have like scalable, fast, performant. That's a hard, hard problem. Yeah. So like I'd say the way that I personally think about open source in relation to the API business in particular is, well, one, it hasn't shown cannibalization risk. So, you know, I'm not particularly worried about that. But also like, especially for all these major labs, like there are usually like two or three models where, like that is where, you're making all of your impact, all of your revenue. And those are the ones where we're throwing a bunch of resources into improving the model.

Starting point is 00:39:04 And these tend to be the larger ones that are like extremely hard to inference. We have a really cracked inference team at OpenAI. And my sense is like even if we just like open source them, like, if we just literally open source GPD5 or something, it would be really, really hard to inference it at the level that we are able to get it to do. There's also, by the way, like feedback loop between the inference team and like the training team too. So like we can kind of like optimize all that. Can you, can you like, is it possible to do?

Starting point is 00:39:28 verticalized models for products. You know, like, train models specifically for products? Yeah, I mean, to actually, yeah. I think, I mean, we've kind of done this with GPD5 Codex, right? Or do you mean, like even more verticalization? I mean, like deep, deep, deep verticalization where like, you know, like the, like the released model wouldn't, you know, is like actually part of a product. I think we're like basically starting to move in that direction.

Starting point is 00:39:56 I think there's a question of how deeply you verticalize it. I think most of what we've done is mostly at, like, the post-training, like, the tool use level. Like, Codex is particularly good at using the... Sorry, GPD5 code is particularly good at using the Codex harness. But there's, like, even deeper verticalization you can do. Yeah, so that, and that one, I think is more of an open question. Yeah, so, like, a lot of my mental model, this comes from the pixel space, which is, like, you, you know, you can laura a bunch of image models, right? and you can do a bunch of stuff

Starting point is 00:40:27 to make it better and more suitable for some products, for example. But like these open source models are really, really good. And like you would believe that you could like verticalize a model for like editing or cut and paste or this or that. You know, like that's actually part of this. But you actually don't see that happen. Yeah.

Starting point is 00:40:46 It's almost always like you're just kind of exposing like a model, not something like specific to a product. Yeah, I think there is a distinction to be made between the image model space and the text model space. Also because the image models tend to be way smaller and you can iterate on it a lot faster.

Starting point is 00:41:02 That's why you get that crazy, cool proliferation of the image model side. Whereas, like, I don't know, for the text models, there's always going to be this really big, that pre-training step that you have to invest in here. And then even the post-training side

Starting point is 00:41:13 is like, you know, it's not like the easiest thing. Like, it's, you know, we all, like, just from a compute perspective, obviously it's much smaller, but like it's still pretty heavy to do like a full mid-train

Starting point is 00:41:22 or like a post-training run. And so I actually, I actually think that's one of the bigger bottlenecks. Because I think you are right that on the image side. Yeah, you can fine tune a image diffusion model to be extremely good at like editing faces. Yeah, like something very specific. And then you know, like, yeah, yeah, yeah.

Starting point is 00:41:38 And it's like, yeah, you can just kind of put all these resources into and iterate on that one specific model, whereas it's a much heavier motion. It seems like on the text side. I got to say it is a bit of an anti-pattern to do both languages, like language-based models and diffusion like pixel models in the same company. Like, most that have tried, like, it found it very clunky to do it.

Starting point is 00:41:59 But, I mean, you and Google are the two kind of counter examples for this. And so, like, is it possible to even, like, converge the infrastructures on these things? Like, I mean, is it totally different orgs? Is it shared infrastructure? Like, how do you operationalize? Yeah. I think you're totally right. It's an anti-pounder.

Starting point is 00:42:17 It's pretty tough to pull off. I think, honestly, like, props to Mark on our research team for, like, you know, structuring things in a way we're we're able to do it. For my perspective, I think the biggest thing is I think our image, like our, I think we're called like the world simulation team, like the team that builds SORA and all that under Dittia is just extremely solid. Like they're probably, it's like the highest concentration of like talent that I've seen in a while. But is it the same?

Starting point is 00:42:44 Is it like, are they like totally separate infrastructure? Do they use the same infrastructure? Yeah, yeah, yeah. So it's actually like pretty separate. So and I think that's part of the reason why we're able to kind of do this. Well, it's like, one of the same infrastructure. Well, one is like the team needs to be extremely strong, which they are. And then two is they're, they're run very separately.

Starting point is 00:42:59 They're kind of like thinking about their own particular roadmap. They think about productization very separately as well, right? Which is how like the SORA app kind of came out of that as well. And then, yeah, even like the inference stacks are slightly different, are kind of like different. They own a lot more around their inference stack and they optimize their inference stack pretty separately. And so I think that that contributes to helping us run things in our own. but it's pretty hard to pull off for sure.

Starting point is 00:43:26 Maybe you can educate this on me. So I think about APIs as mostly text-based from Open AI. Do you do actual, do actual pixel-based stuff? Yeah, yeah, we do. We have a bunch. So Dolly, Dolly 2 is in the API. The OG model. Dolly 2's in the API.

Starting point is 00:43:42 That was like the first real text image model, right? Yeah, yeah, yeah. That was actually the model that got me to go to Open AI. No kidding. Because it was the summer when I was thinking about something new, it's when Dolly 2 came out. and it just completely blew my mind. Wow.

Starting point is 00:43:55 And I distinctly remember, I was asking it to do the simplest thing, like draw a picture of a duck or something. And there's like the simplest thing now, and it just like it generated a picture of a, you know, like a white duck. And so that was actually the thing that kind of got me to open it in the first place. But yeah, we have a bunch in our API,

Starting point is 00:44:12 the image gen model as well as in our API. And then SOR II is in our API. We launched it at Dev Day. It's actually been a huge hit. I've been very, very surprised. Need more GPs for that. But the amount of use cases And then from your standpoint,

Starting point is 00:44:24 you can converge that, like the API infrastructure probably like that. Yeah, so there's, yeah, I'd say on the API side, a lot of the infrastructure is shared for those, but once you reach the inference level, they're separate, right? Because you've got to inference them differently.

Starting point is 00:44:36 And it is that team that has just like been really laser-focused on making that side particularly efficient and, yeah, and work well, separate from the text models. But yeah, yeah, we have image gen, we have video gen, and we'll continue adding more. to the API there. So it feels like we've been evolving our thinking as an industry on a bunch of stuff, right?

Starting point is 00:44:57 Like one of them for sure is like the models like we've talked about. The other one is like context engineering. It seems to me that like actually how you build agents and expose them has evolved too. So maybe you can talk a bit about that. Yeah. Yeah. I think so at Dev Day this year when we launched our agent builder, I got a bunch of questions around this because the Asian builder is like, yeah. It's like the bunch of different nodes and it's like the deterministic thing.

Starting point is 00:45:18 And I was like, oh, is this really like the future of agent? And we obviously put a lot of thought into this when we were thinking about building that product. But the way I think about it is... Do you think they came from a point of being constrained? By the way, they're like, oh, this is too constraining. And like... Yeah, I think people are like, it's too constraining.

Starting point is 00:45:32 It's not like AGI forward. You know, like, at the end of the AGI will do everything. And so, like, why not... Why have nodes in this, like, node builder thing? Just tell what to do. Yeah. And so I think there's, like, two things at play here. One of them is, like, there is a, like,

Starting point is 00:45:45 practicality components. And then the other thing is, I think there are actually, like, different types of work that exist out there that could be automated into agents. And so on the practicality side is, yeah, like the models today just like, maybe in some future world, instruction following would be so good

Starting point is 00:45:58 that you just like ask it to do this four-step process and it like always does the four-step process exactly. We're still not there yet. And in the meantime, you know, this entire industry being born and a lot of, you know, people still want to use these models. So what can you build for them? So there's a practicality component of it.

Starting point is 00:46:13 When did you launch that? Deb Day. So it feels like forever ago. Earlier this month, October, it was like October 6th or something. Yeah, yeah, yeah. So less than a month ago, I know. Yeah, okay.

Starting point is 00:46:27 It's been crazy seeing the reception to it, by the way. Like, it's the, I think the video where Christina on my team demos, agent builder is like one of the most viewed videos on our YouTube channel now. I will say, I will say just anecdotally from kind of my perspective, people love it. That's great. But I also saw the dissonance, too. Like, I saw when it came out, people were like, wait, what is this? Yeah, exactly.

Starting point is 00:46:46 No code, low code. Yeah, exactly. It's another low code thing. And how people love it. Yeah, yeah. Yeah. So there's a practicality piece. There's another piece which is like when we were talking to our customers,

Starting point is 00:46:55 we've realized that there's like, because at the end of day, a lot of this, the agent work is just trying to automate work and like what people do in their day-to-day jobs. I realize there's like actually like two different types of work. There's the work that we think about, which is like maybe what like software engineers do, which is like it's very undirected. There's like a high-level goal. And then you have like, you know, you have your cursor and you're just like writing,

Starting point is 00:47:14 writing code. And you're kind of like exploring things and going towards an objective. that's like, I don't know, more like knowledge-based work, like data analysis, maybe like that, like, coding is kind of like this. But then there's another type of work, which is actually what we realize is like maybe even more prevalent in industry than software. We're just not aware of it, which is work tends to be very procedural, very like SOP oriented. Like customer support is a good example of this. Like customer support, there's like very clear policy that these agents and people have to follow. And it is actually not great for them to deviate from this and like try something else.

Starting point is 00:47:45 It's like the team really, the people running these teams just really want the, these SOPs to be followed. And this pattern actually generalizes a ton of different work. A standard operating procedure. Yeah, sorry. So it's like the way in which you need to operate the support team.

Starting point is 00:48:01 But like this extends to like marketing, this extends to like sales, extends to like a bunch, way more than it has any right to. And what we realize is like there's a huge need on that side to have determinism here. Of which an agent builder with nodes that kind of like helps enforce this thing

Starting point is 00:48:15 ends up being very, very helpful. But I think a lot of us, especially in Silicon Valley, they don't really appreciate that there's a ton of work that actually falls into this camp. I got to say, like, there's a pattern that's similar to this. I'm one of you've seen it that I've seen where some regulated industries actually can't let any generated content go to a user. Yeah, right?

Starting point is 00:48:31 And so what they do is, I think it's so interesting. They'll either pass in like a conversation tree and that you can choose something from here. Yeah. So there's some human element to it. So as part of the prompt, they're like, here are the viable things you can say, choose which one to say. So the language reasoning has happened by the model, but nothing generated comes out. Interesting. Interesting. Does that make sense? Yeah, yeah, yeah, yeah. And then another one I've seen is, like, actual pseudocodes. It'll ask a human to, like, use the pseudocode to write actual code

Starting point is 00:49:01 that makes it in, or? It actually has a response catalog as part of it, and it has, like, the logic to apply. And then... Interesting. And so, like, the model takes the language in from the, it takes language in from the human user. And then, well, like, you know, the logic of how to respond is, like, in Python code, because it just turns out that, like, there's been a lot of code written for these types of things, and then it actually includes the responses that you would send out.

Starting point is 00:49:27 Does that make sense? Actually, a lot of NPCs are done this way, like, interesting video game NPCs. So, yeah. So because the way that I think about it is, like, you know. So that way, with the NPCs, it's the actual code being generated by the model is not what ends up making it to the end user.

Starting point is 00:49:42 Just to the... That's, it's not the... the code is not being generated by the model. It's the prompt has the code. So let's say that I have an NPC, and I want the NPC, like, let's say you're the gamer. And so you're coming and you're talking to my NPC, but my NPC has some logic that it needs to do.

Starting point is 00:49:58 Like, if you say a certain thing, I'll give you a key, or maybe a little barter. Like, describing the game logic in English just doesn't work, actually, if you try and do it. And then, like, actually, scripting the output doesn't work either if you needed to use it in a game context. Like, you would have to know, like, give, like,

Starting point is 00:50:12 a specific direction or specific this or that. So how do you make these things behave in a more constrained way? People pass in functions. They'll like to describe the logic in Python. So my prompt will be like, you're an NPC in a video game. The user just asked you a question. Here's the logic you should go through. If the user says this, then do this.

Starting point is 00:50:33 It's like the pseudocode. Like if the user has this in the belt, do this, like whatever, whatever, whatever. And then here are the set of valid responses. And so you're almost constraining. I see, I see. And then when it actually does do a response, you can validate that it's one of those responses. I see, it's like highly structured.

Starting point is 00:50:49 Yeah, yeah, okay. So the NPC still only exists in that, like the space that it can act in is still only within the space of the program that you give. Yeah, well, the logic is in there. So it can have a normal conversation, but like in as much as you're trying to guide the logic for like, like game design or game logic.

Starting point is 00:51:04 I see this with NPCs, but you also see this with regulated industries. I literally can't have it like. Yeah, I was going to say what you described kind of sounds like, you know, giving the SOPs to like your set of human operators to like yeah yeah yeah you must say these three things and here's like the discussion like you cannot give a refund if it's like less than this amount yeah yeah yeah yeah very

Starting point is 00:51:22 interesting yeah yeah i mean i mean yeah i don't want to equate them to mpc's but like this is similar to similar i'm just saying it's actually like if you want if you want to really guarantee what happens you have there's like a set of techniques that you do and like there's some situations where you want to constrain what they do it could be from a regulatory standpoint it could because you wanted to run for a long time. And it also could because I actually have game logic. And my game logic is a traditional program. Like I have like a monetary system.

Starting point is 00:51:48 I have an item system. I have a battle system. Like you can't describe that in English. Like you have to kind of give it to them so it can behave within that. Yes. And that is exactly the problem I think we were trying to solve here. That's just like if you do not give it any of this, like it can just kind of go off and do whatever.

Starting point is 00:52:01 And yet they're like regulatory concerns around this. And that is the exact use that I think we're trying to target with Asian building. That's awesome. Well, listen, we're running out of time. I mean, a million more things I want to ask you. But listen, I really appreciate your time to come in. It was a great kind of surveying, like, what's going on. And particularly, like, teasing apart, horizontal versus vertical in this page, which I really want to do.

Starting point is 00:52:21 So thank you so much. Yeah, thank you. Thanks for listening to this episode of the A16Z podcast. If you like this episode, be sure to like, comment, subscribe, leave us a rating or review, and share it with your friends and family. For more episodes, go to YouTube, Apple Podcast, and Spotify. follow us on X, A16Z, and subscribe to our substack at A16Z.com. Thanks again for listening, and I'll see you in the next episode. As a reminder, the content here is for informational purposes only.

Starting point is 00:52:53 Should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see A16Z.com forward slash disclosures.

The a16z Show - How OpenAI Builds for 800 Million Weekly Users: Model Specialization and Fine-Tuning

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.