a16z Podcast - How OpenAI Built Its Coding Agent

Episode Date: September 16, 2025

OpenAI’s Codex has already shipped hundreds of thousands of pull requests in its first month. But what is it really, and how will coding agents change the future of software?In this episode, General... Partner Anjney Midha goes behind the scenes with one of Codex’s product leads- Alexander Embiricos - to unpack its origin story, why its PR success rate is so high, the safety challenges of autonomous agents, and what this all means for developers, students, and the future of coding. Timecodes:0:00 Intro: The Vision for AI Agents1:25 Codex’s Origin and Naming3:20 Early Prototypes and Agent Form Factors6:00 Cloud Agents: Safety and Security9:40 Prompt Injection and Attack Vectors12:00 PR Merging: Metrics and Transparency17:00 The Future of Code Review and Automation20:00 User Adoption: Internal vs. External Surprises22:00 Multi-Turn Interactions and Product Learnings29:30 Best-of-N, Slot Machine Analogy, and Creativity33:00 Human Taste, Iteration, and Collaboration40:00 AI’s Impact on Software Engineering Careers45:00 Education, CS Degrees, and AI Integration49:00 Prototyping, Hackathons, and Speed to Magic55:00 Legacy Code, Modernization, and Global Adoption1:00:00 Enterprise, Security, and Air-Gapped Environments1:05:00 Product Roadmap and Future of Codex1:10:00 Advice for Founders and Startups1:15:00 Education Reform and Project-Based Learning1:20:00 Hiring, Building, and New Grad Advice Resources: Find Alex on X: https://x.com/embiricoFind Anjney on X: https://twitter.com/AnjneyMidha Stay Updated: If you enjoyed this episode, be sure to like, subscribe, and share with your friends!Find a16z on X: https://x.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zListen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYXListen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711Follow our host: https://x.com/eriktorenbergPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Stay Updated:Find a16z on XFind a16z on LinkedInListen to the a16z Podcast on SpotifyListen to the a16z Podcast on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Transcript
Discussion (0)
Starting point is 00:00:00 It kind of sucks to like go and write this prompt and then like wait 10 minutes. What you really want when you hire someone is to kind of tell them what the job is, give them the credentials to all the tools and just have them pick up work automatically. The goal is to get to an agent that is basically a teammate and it's seeing what's going on on your team and picking stuff up for you. This form factor of an agent working on its own computer in the cloud is the future and is incredibly powerful and worth figuring out how to get right. What happens when AI stops helping you auto-complete code and starts acting like a real teammate,
Starting point is 00:00:30 Today, we're exploring Codex, OpenAI's coding agent. And today, Midha, is joined in studio by Alexander and Burekos, who leads product for Codex at OpenAI. They discuss the origin story why reasoning models plus tools unlock agents, how developers are actually using codecs in the wild, and what all this means for the future of software engineering, from debugging and prototyping to how CS students should think about their careers. Let's get into it. Hey, Alex. Hey, how's going?
Starting point is 00:01:03 Good. Thanks for coming. Yeah, good see you again. You are one of the folks working on product for codex, which is probably one of the most exciting launches to come out of the Open AI team, for me, at least in a while. So for a lot of people, though, it was confusing. For sure. Because it was the fifth codex release from Open AI.
Starting point is 00:01:24 Yeah. But of course, it's completely new and different from the previous codexes. So let's just start with the origin story. What is the backstory on how the current version of Codex came to be? Yeah, and man, our naming is so fun at OpenAI. I'm excited for the naming to make more sense over time with Codex as we bring this all together. But yeah, let's go way back to the beginning.
Starting point is 00:01:45 The first Codex product was actually released. I think it was in 2021. I might get the year wrong. But actually it was like a code completion model that powered GitHub co-pilot. And so recently we were basically, talking about a whole bunch of like coding like stuff we want to do you know like models but like models in product we were thinking about what to call it and we just felt like the codex name was really cool and so we wanted to go back to it so how did this codex product come about
Starting point is 00:02:13 basically we've been thinking a lot about agents as everyone has and before that we've been thinking about reasoning models and basically in our minds like one way you could think about an agent is you take a reasoning model and then you give that reasoning model access to like the tools that like some agent would want to use or some human in-given function we want to use and an environment that tool works with, take side effects it. And then from there, you come up with, like, what kind of tasks would this person do? So you basically have this model, you give it tools, and then you make sure that the model's really good at doing, like, the specific tasks that, like, some function would do.
Starting point is 00:02:45 And the task bit is actually super important because if you think of, like, there's a difference scene like writing and journalism. And similarly, there's a difference in like coding and, like, software engineering. So we've been doing a lot of this tinkering with reasoning models internally, getting them to write code. And so the first tool we'd given them was like terminals.
Starting point is 00:03:02 And we'd be like poking at this for a while and just started, it was like actually the one of the first like real like feel the AGI moments for me was when someone showed me a website editing itself
Starting point is 00:03:12 by being prompted to itself because we had this like reasoning model like very hackily trait connected to a terminal. And then you know was editing this terminal. It was just editing the DOM basically directly as a CLA.
Starting point is 00:03:23 Yeah, exactly. Okay. Well, and that wasn't the DOM directly but like whatever, you know? And it was like, But how was it barsing the visual, did you give it access to a browser? No, it was like, I liked to use this term like site reading. It was just like sight reading the code.
Starting point is 00:03:35 So it wasn't like taking screenshots of itself or any of this like stuff that now like people are building. Okay, got it. It was just like editing React. And so we had this prototype like a while ago and just people internally like really loved it. So we're starting to like write more and more code. And then we were starting to think about like, okay, well, you know, what is the right form factor for this thing? When it's editing code, it's like pretty great. on my computer, it's pretty great,
Starting point is 00:03:58 but it's quite annoying to only have it able to work on one thing at a time. It's also like a giant safety and security question if you just have this agent unleashed entirely on your computer. And so around this time, we started exploring a lot of different places to put this reasoning model that has access to a terminal.
Starting point is 00:04:15 And so we had a prototype that like ran in CI when you're like tests failed. We had a prototype that like through some crazy hack automatically fixed your like linear issues, but that was actually running in CI. We had this prototype that was like running on your computer. And so basically the Codex product we launched was like a distillation of that. Or we thought, okay, well, what is the most powerful incarnation of this?
Starting point is 00:04:37 And we figured, you know, like, if you think about like what an agentic teammate will be like in the future, you'll, like, hire them. You'll tell them what their job is. Give them some compute or a laptop and give them some permissions and then they'll go off and do work. And so we figured, okay, this is going to be kind of like a strange, like, unwieldy research preview. But let's like put all our or like the vast majority of our efforts. effort into this form factor of an agent working remotely and they kind of see what happens. And so that led to the Codex product that really is just like a cloud agent that can, you know, basically answer questions and write PRs in the background.
Starting point is 00:05:07 And what was the reason that you guys picked? You know, it's pretty opinionated in the entry point to the task, which is that you have to start by first getting your entire environment set up. And then it interacts with a repo to a merged PR. Yeah. Right. And we were chatting about this briefly, but. Somebody published a dashboard maybe a week ago, you know, kind of tracking VR merge success
Starting point is 00:05:32 rates on GitHub across different autonomous agents. And Codex is like clearly the gold standard at like this 80 plus percent, right? Why is that, why did you guys decide to have the place where the PR starts the after a bunch of sort of in private working through the code? Totally. Or so much earlier, if you could just start a draft BR, you'd have other people work on it together with you or much early on the process. Yeah, so like, I think we're talking, you know, you and I were talking about this like chart
Starting point is 00:06:02 and someone posted on Hacker News and went viral. It was basically showing like the number of open PRs, merged PRs from different coding agents as you might track from like GitHub labels. And Codex, actually I checked this morning because I figured we might talk about it. And like Codex has opened like 400K PRs since launch. In like 34 days. Yeah.
Starting point is 00:06:20 And how many days have been? Yeah, probably. And it's merged like 350 something KPS or 350K of those PRs have been merged. which is really cool and also very cool but misleading I'll say but very cool is that the merge rate for codex PRs is like 80 something percent right so like if you know assuming a PR is open with a codex label like if you look and get hub open source repose later is it merged in and it's like way higher than other agents which are like 20 or 30 percent right so yeah just to talk about this this chart is really a reflection of the form factor so I will say it makes us look really good
Starting point is 00:06:51 like it makes us look like the order of magnitude like winner and we are of like a specific kind of agent, which is this cloud agent that's working on its own computer, independently from you, and therefore, can do many tasks in parallel and so forth. So, like, we believe that's where the future is going. I'm sure we'll talk about that. And it looks like, you know, right now we're like absolutely winning there. But, you know, just to mention probably the most AI, the most used AI coding feature right now is just like autocomplete, right?
Starting point is 00:07:17 And tap completion. Right. Obviously, that's not getting like a label when someone merges a PR now. So I think it's worth mentioning. Like, there's a whole bunch of other great AI. That's like essentially invisible. work happening in an IDE. Exactly.
Starting point is 00:07:28 That's just a different form factor. Yes, that's a different thing, right? So that's not included in that chart. And then the other interesting thing, so you were mentioning out the merge rate, our merge rate is excellent. Right. And that's a reflection of the fact that Codex does a bunch of work in its environment. And then it shows you its work.
Starting point is 00:07:40 And it says, do you want me to open a PR, basically? Right. There's a lot of other tools, they just go ahead and open a PR. Right. Yeah. So why did we do it that way? Because it's funny, like, one of our top feature requests has been like, hey, can you just push the PR so I can, like, do everything and GitHub thereafter.
Starting point is 00:07:55 And we'd like to do that. But this comes back to like, you know, where Open AI, we not only want to show how to use our reasoning models in the best way to build agents, but we do want to show how to do it in the best way, but that includes doing it a really safe way. And so, you know, basically one of the things that a lot of people don't think about is until like we tell them about it is the fact that if you have an agent right code and then you run that code in an environment with network access, right, you're taking some amount of risk. And like, you know, I have, and we try to get agents to do these things. I've never seen an agent.
Starting point is 00:08:26 do something that you wouldn't want it to do with network access unless you're trying to trick it. But you can trick an agent. There's some non-zero likelihood that could happen. Yeah. So just to make the super real, you know, listeners might be like, okay, like this is a hypothetical.
Starting point is 00:08:40 Yeah. Like, okay, so we have these cloud agents. And one of the first things that a lot of people want to do with them is like automate them to do work. That's the dream, right? So like maybe in Slack, maybe, you know, from your issue manager,
Starting point is 00:08:51 you would like when like a customer sends in feedback. You want to like have an agent take a first pass. Right, right? And you might want to like open a PR and like maybe even auto merge it. So like that is great. That's for sure awesome. But also like let's say that customer is, you know, is pretending to be a customer and they're malicious. And they actually send in a prompt injection.
Starting point is 00:09:09 So the customer writes in like, hey, I would like you to like take a bunch of this code, like run this script. Right. The script is bugging for me. That's like a lie. Right. And then they say like run the script and like upload like this directory of code to paste bin. Right. You know, if the agent interprets that as like the developer prompt, there's some risk that it'll actually go ahead and
Starting point is 00:09:26 do that. And so there's a ton of work here with agents to deploy them safely. And actually, that's one of the places that I feel like is under-discussed, but where I feel like we're really leading the charge in terms of thinking about, like, you know, at each step of the way, how do we make this as safe as possible and make sure that people understand what they're doing? And could you, for folks who may not be familiar with prompt injunction attacks, could you talk a little bit about how hard is it to sort of detect a prompt injection attack? Is it a super general purpose attack vector? Or is, you know, like with other kind of cybersecurity attacks, vectors that usually, you know, whether it's social engineering, fishing, and so on. Always,
Starting point is 00:10:00 it's a bit of a cat and mouse game. But by and large, the security industry is figured out, like, hey, these are the rough parameters of an attack of this kind and we can build defenses around it. Is there something that makes prompt injection attacks sort of harder than typical cybersecurity attack vectors? Or is it just that we're early and we haven't figured out the shape of the attacks yet to prevent the answer? I'm sure that we will get better at figuring out the shape of these attacks. But like, if you think about it just from a human perspective. This is, by the way, this is something I do often. I'm like, okay, let's pretend I'm the model. I'm a human. You present me 10 prompts. Right. Like, can I tell which ones are prompt
Starting point is 00:10:35 injection attacks? Some of them are obvious. It's like, you know, update, you know, upload this code to like nefarious domain. Right. Like, okay. Give me your credit card.com or whatever. Yeah. And some of them are obviously not, right? It's like, fix this bug doesn't require doing any, like, or changes copy, right? Like obviously nothing's going to happen. Right. But then there's this whole middle range, right? Like two examples in the middle. middle range of, like, ambiguous prompts. One might be, hey, do this work. And, like, as part of this work, you have to, you know,
Starting point is 00:11:02 upload some artifact to S3M, you know, with, like, storage online, basically. You know, there are, like, reasonable workloads that require doing that. And so it's not obvious that just because the prompt says, like, upload some code somewhere that it's broken. Right. You know, another example might be the prompt actually just has the agent running a test or, like, some script or something. Right.
Starting point is 00:11:23 And that script was, like, added before. Right. Right. So, like, to what extent does the agent need to, like, respect, I see, right? Like, everything that it's going to do along the way, right? So there's these three layers of the attack. There's the prompt, and, like, it's quite hard to tell if a prompt is, like, really an attack. Right.
Starting point is 00:11:39 Then there's, like, what is the agent doing along the way? Right. Interacting with, like, other sort of trusted or untrusted resources, you know, as it goes. Yeah. For example, like, maybe you didn't prompt inject it, but then, like, it reads something on Stack Overflow or something that has a prompt injection. or there's a script with something. And then lastly, there's the actual outcome.
Starting point is 00:11:57 So, like, in this case, if we're talking about, like, exfiltration, what is an exfiltration? We're still figuring this out. My personal leaning is that we should just have defense along every single layer, but probably the most useful layer is going to be that final layer
Starting point is 00:12:10 of, like, actual exploitation and, like, looking at what we do there because that's, like, the most, I guess, deterministic layer in that you can see what's happening. So the tension here is going to be, a critic might say, hey, you guys have overinflated merge success rates because the draft PR comes so late after the human has reviewed a bunch of code coming up, you know, up to that. And the, what you give up
Starting point is 00:12:33 is the transparency and openness of seeing the process of iterating on the draft PR from the first one to the final merged one. But I guess what you're pointing out is yes, but the tradeoff is you get much more security, essentially. And so is there, in your mind, is the future that like, that a bunch of these workloads or a lot of the code, that's written by AI agents will, over time, let's say, you know, you said there's 350,000 or so now merged PRs in 35 days. If we're rolling forward to the end of this year, do you think that rate of growth continues? Does it plateau? Because more and more people actually move, want to move the draft PR process earlier in the merge flow? Or do you actually
Starting point is 00:13:12 think, having used it now, having seen how customers have been using it for like the first 35 days, that roughly this is the shape of the workflow that people are going to want to just do merges right at the end after they've gone through all the security checks and so on internally. Yeah. I mean, so first off, yeah, I think what I would say about the stat is it's like really cool, just not comparable to the other ones. Right, right. But, you know, it's still a valid stat.
Starting point is 00:13:33 It's just a different phase of the pipeline. But thinking about like, yeah, what is the shape of the journey? Like, I think the shape of how people will merge code even with these cloud agents is going to completely change. Okay. So like, let's talk about where we're at right now. Basically, we have, you could kind of think of it as like there's a spectrum. Maybe there's like three things, right?
Starting point is 00:13:50 There's like interactive coding, which is like tab completion, like chat, that kind of stuff. You know, Command K, a lot of that's being done in the IDE. There's some like CLI tools where you can go back and forth in the agent. So that's interactive coding. It's awesome. That's probably where like most people are adopting AI right now. And it's because like if you think about it, like tab completion with an AI model is the same as tap completion before an AI model.
Starting point is 00:14:11 So you can get like fully brought along the journey. I guess what I'm saying is it's not going away. I don't think. Okay. Yeah. Because I think even as the majority of code of like, say, code, of the current level of abstraction. Okay, let me unpack that of it.
Starting point is 00:14:24 So if you think about it, we used to, like, write punch cards, basically, or like punch cards, I guess. And then we had, like, assembly, and then we had C, and now we have a Python and, like, JavaScript and so forth, right? So we just keep rising up the level of abstraction. And one way of looking at what's happening now is that that is we're still, we're just going to go up one more level. So, like, my view is that we'll still have developers spending a bunch of time in
Starting point is 00:14:43 the IDE, just, like, operating at higher levels of abstraction. And so when a developer is, like, doing work, like, writing whatever it is that they're writing, or communicating. in whatever way, they'll still be like AI features just helping accelerate like every keystroke that developer is doing. Those will still be awesome. So that's interactive coding. Then we have sort of agents, I guess.
Starting point is 00:15:01 And then the fun part, maybe later, naming TBD, maybe we'll have interactive agents. So, okay, we'll get into that. It's like not a fully baked idea. But basically, then we can talk about agents. How will we work with agents? My view is that over time, the majority of code written will be written by agents.
Starting point is 00:15:17 And actually the majority of that code will not be manually prompted by a human. Some automated pipeline based. Yeah, because it kind of sucks to like go and like write this prompt and then like wait 10 minutes and like during those 10 minutes or if they say pushups or whatever. Yeah, like our average, you know, duration of a rollout, you know, is around like three minutes or a little under it. For larger code basis, like ours, it's like longer, like maybe eight or something.
Starting point is 00:15:38 But it kind of sucks to have to like multitask across these things. Right. And the power users of codex have like built this like amazing workflow that they use where they're like juggling tasks. We can talk about how people are using it. But this isn't great, in my opinion. Like, what you really want when you hire someone, like a teammate, is to kind of tell them what the job is, give them the credentials, solve the tools and just have them, like, pick up work automatically and it kind of let you know when it's done. So you're not feeling that latency on your own time. Right. So, you know, if we go to back to this original point of like, when will people merge PRs?
Starting point is 00:16:06 Like, I think what I would love for to see is like where agents are picking up work and they're kind of like deciding whether or not it's worth pushing a PR maybe to trigger CI. But by the time you find out about it, they're like, hey, I did this thing. Maybe I asked you for some input along the way. CI checks are green. Right. Like, should we merge it? So we have to build our way.
Starting point is 00:16:25 So it's the classic green light. And then over time, ideally like most of the, you know, lower order bit tests, are just getting merged automatically. And then when there's some, like, judgment call, they come to you. The way kind of like a, you know, more junior engineer would come to you as an engine manager and say, it's looking good,
Starting point is 00:16:40 but I want your, here's some risk. Are you comfortable with that risk? And then you get the thumbs up, thumbs down. Is that roughly where you think we're going? Yeah, I think so. Like, actually, like, you know,
Starting point is 00:16:48 we've been talking basically about code gen this entire conversation so far. Right. And, okay, so code gen is getting much easier. Is code review getting much easier? Because code review is still a key thing and like validation. Right. And I think right now we're in this, like, slightly awkward phase, or we're entering an awkward phase where we have a lot of code gen,
Starting point is 00:17:05 and a lot of that code is actually not going to be merged. For the other tools, you see it in their PR verge rate. For our tool, you would actually see it in the internal stat of what percentage of the time is a PR created from a rollout. And so there's, like, vastly more code to review and land. And yeah, so it's awkward right now, but this is something we're definitely thinking about. And I'm, like, quite hopeful for the future in that I think we can make it even better for the humans involved, because, like, no one likes reviewing code, right?
Starting point is 00:17:29 Yeah. Yeah, kind of thing. Actually, let's take a bit of a detour to talk about how it's been 35 days. What are people doing with it? What have you observed as like usage patterns now that it's out in the wild and what surprised you most? And then I want to talk about now, are the usage patterns more fun or not for people? Because there was a moment, I think, in the first live stream you guys did it on the product where one of your colleagues said, you know, my job has changed where I'm going from writing a lot of code to mostly reviewing PRs now. And I heard then I went, oh my God, that was the worst part of when I was an engineer.
Starting point is 00:17:58 That was the part I hated the most. And there's always this like, I've been, I was at an offside for a startup about a month and a half ago where literally we ended up spending 45 minutes talking about how to incentivize people on the team to review PRs more. They're just sitting in the tray because nobody loves checking somebody else's code. It's just not a very creative task. But let's start with first, how are people using it and how are they using it? What surprised you most about, especially as a product person, about how they're using it
Starting point is 00:18:22 versus how you expected them to use it? Yeah, for sure. So we, it was really interesting building to, launch where we ran, use it internally and figured out how to use it. And then what we found is that when we gave it to people externally, they didn't, first, they didn't know how to use it the way we did. And they didn't find it useful. And then we obviously refined our messaging in the product. And then when we actually launched it, people still used it differently from us, but they do find it useful. So we can go through that journey, right? So like, internally,
Starting point is 00:18:50 I think because we've spent a lot of time, like, working with reasoning models and, like, training them. We have this way of prompting reasoning models that is like intuitive to most open AI employees. Right. Like you write a pretty good prompt. You give it a lot of information. It's kind of like a self-contained unit. It's almost like a sleep bench task, but obviously maybe not as well formed as that. Give it all the right context. Give it everything. Yeah. And then it goes and works and like you generally maybe don't go multi-turn like where you like it gives you something and you reply. Like maybe you're more likely to just reprompt. Right. Adjust your prompt and re-go. Just to do a best event essentially. Yeah. And actually there's a there's an analogy I love
Starting point is 00:19:22 floating around by another company that builds agents and it was like treat it like a slot machine and I was like, oh, that's so apt because like that's pretty much our intuition too. Right. So if you're training something like slot machine, then the question is like when do you use it? And when we first ran like a small external alpha, like people were using it like the agent, local agent they have in their IDE, which is actually not the right way to use it. If something's going to work in your IDE, you're kind of lending it your computer for a while. So you probably want to be really thoughtful about like, do I think this task is going to succeed?
Starting point is 00:19:50 and, like, if I'm 80% sure it'll succeed, then I could, like, get it to go. But maybe I also have some expectation of interactivity so we can kind of refine along the way. The way to use, like, an agent in the cloud is just throw everything at it. It doesn't matter if it's going to see. Just, like, spam as many as possible.
Starting point is 00:20:05 Yeah, it's like abundance mindset, you know, slot machine. It's somebody else's computer, right? Yeah, okay. Throw stuff at it. And also, you know, you don't need to have the code on your computer to get and, like, decide to merge that code to get value. You could just be asking questions.
Starting point is 00:20:19 You could be like, explore this like four different ways so I can like pick the right way that I then want to do it right you know you can almost treat it as like your to do list of things that you will get to later in the day so that was some of the learnings we had when we ran the alpha where hey we need to kind of change the product so that it feels more like parallelization is like a key part of how to use it and so to more like make it so you like let go of what it's doing okay so then we shipped broadly externally and we got a bunch of feedback that we expected, like, hey, the containers don't have network access. This is really annoying,
Starting point is 00:20:53 which it is. Or, hey, environment variables are hard to set up. Environment variables are hard to set up, which they are. Right? And like, we didn't, like, obviously we have many ideas. We had ideas for how to, like, enable network access. We just wanted to do that carefully. And so, you know, and then we, on the environment set up stuff, like, we have ideas that we haven't shaped yet on how to make that better. Import.com. Yeah, simple model loop to, like, help write it and so forth. But we just cut scope and, like, ship the really early research preview. So there's a much of that expected feedback. Now, one of the things that really surprised me is that there was one feature that we
Starting point is 00:21:25 didn't expect people to use. And in fact, we used it so little internally that it just had a bunch of bugs we hadn't caught before releasing. And that was multi-turn. So basically, like I was saying, like we, and we told our alpha users, I guess, to do this, basically said, hey, just reprompt, like, fire many prompts. And, like, maybe you can go back and forth. It turns out that if you go back and forth more than once, so you do, like, you,
Starting point is 00:21:47 three turns total right right the product was completely broken and that we were not like correctly like carrying over the diffs from the prior steps and like just a lack of context persistent context essentially after the third term exactly and this is just like a plain old deterministic bug it's not like a weird model behavior thing it's just like we implemented the code wrong because no one ever nobody just got to turn four basically exactly yeah yeah and so for me that was really interesting to see that like people had this intuition for how they wanted to use the product and that wasn't like the reprompt intuition, it was the, hey, like, I'm going to get like this main thing. And then I kind of want to, you know, get babysit that across the way to like actually landing
Starting point is 00:22:24 it without it over touching my computer. And that, like, we kind of knew that might be a thing, but it was much more of a thing than we expected. And do you think that's basically because internally, opening eye employees are sophisticated enough to know that you, you do all this upfront context building work for the agent to try to get as much as you can in the first turn. But in a user, once you made it fully cloud connected. So the cost of, the marginal cost of doing, you know, kicking off an agent was so low that they just quickly. got to the third, fourth turn without too much thinking. It's funny, you know, I almost feel like in a way we're like less sophisticated because
Starting point is 00:22:52 we understand too much about like the models or something. Like your expectations are lower than the average. Yeah. Because we're like, oh, you know, this is a reasoning model like works great. Like especially when you like prompt it in this way. Right. And then like, you know, folks outside open AI are just like, this is how I want to use it.
Starting point is 00:23:09 This thing is like basically like, you know, obviously it's not AGI. But it's like, oh, is it like, it's this like super smart model. I can't it just like all I want. You wrote this amazing PR. I just want you to change one thing. Why can't you do it? Right. And so, you know, obviously the bug that I mentioned, we fixed.
Starting point is 00:23:23 But that's something now we're thinking more about. Like, okay, how do we enable that kind of multi-turn interaction? How do we make it faster as well? Like container startup, just, for example, takes time. Yep. And, you know, there's a lot of optimization we can do. But for now, if you need to incur a full container startup to, like, change one variable name, that's super frustrating.
Starting point is 00:23:39 So there's a bunch of things like that that we want to improve around, that iteration loop. do you think that is the arc of product development of agents such that do you think the shape of the industry will be more and more Apple-esque where you'd go well cold starts our problem for containers because that's a really terrible user experience so instead of like outsourcing containers to some third-party vendor who then we're reliant on for providing us cold start we're just going to bring this all in-house is this is the most magical experience going to be a full stack end-to-end integrated experience where all the dependencies all the middleware is all down in-house, or do you think that this is going to be more Android desk where, you know, you guys, a company like OpenEA has an opinionated experience owns the agent sort of interface, but everything else is mostly like a collection of different duels orchestrated by different vendors. It's a great question. I think it's going to be a bit of both, maybe an annoying answer, but or where do you think the line, where would you build versus buy, right? Yeah, no, totally. So I think it's actually more like for whom or who will use what? Like, I,
Starting point is 00:24:43 think that the average user or maybe like the new startup that is building with agents from scratch will just do things in a very different way and they'll basically have a bunch of agents with this a computer environment that scales really well that has like all the credentials they need but is also like protected with the right forms of sandbox sandboxing applied at the right times you know with the right like monitors on all like network egress and all this stuff and right you know maybe this kind of like computer i think of it as a laptop although obviously it's not is actually the thing that, like, many agents use, right? And it contains many tools, not just the terminal,
Starting point is 00:25:16 but it has a browser and it has whatever, you know, API access. And it's like, it gets piped the right credentials at the right time. And so, like, you kind of think of yourself when you're hiring, like, your new agent for your new startup, which you might do before you bring on a co-founder even, you know, you think of yourself as just, like, setting up that environment. And it's, and you're just getting, like, this, like, fairly generalist employee that can code. Right. Like, if you think of Codex right now, it's like, it basically takes prompts and turn
Starting point is 00:25:41 them into messages and diffs. And that's, like, not general. I can't be like, oh, yeah, hey, like, can you move engineering sync to 30 minutes later because I have a conflict? But, like, a real software engineer can do that, right? A real software engineer can go peruse, like, any source of data, can, like, find out that they don't have potential. I mean, they can just use the internet.
Starting point is 00:25:59 Right, right. So I think we will get towards that. And I think we'll be able to build, like, a really nice managed system for that that lets use more capabilities safely. And with some, like, product pushes from us on, like, how to make the most of it. So, for example, recently, we shipped Best Event. And, like, you know, it's a very simple feature.
Starting point is 00:26:17 But in our minds, it's, like, kind of just the beginning of, like, taking advantage of the fact that we're not running into laptop. So we can explore, like, four versions of the same idea. And then you have, is, there's some evaluator model looking at the best event. Actually, the evaluator is the human right now. But, like, you know, the roadmap is, like, fairly obvious. If you just imagine, like, what we're thinking about. Yeah, you just throw, like, 03 Pro.
Starting point is 00:26:35 So, right. So, so, yeah, so there's that. However, also, you know, the majority, maybe of valuable code is actually written by enterprises who rightly so are like really locked down all their IP and their code. Right. And so something we've been thinking about as well is like how do we meet these
Starting point is 00:26:51 enterprises in a way that we can like provide value to them as well in a way that they like. And so I think what we're going to get towards is like there's this default way of working with things. And then we'll basically have like some flavor of like on-prem or bring your own compute that we support
Starting point is 00:27:07 where it's like, hey, you know, here are all the things we manage for you when you use our compute. If you're going to use your compute, then, like, we can work with you and, like, provide you as much of a harness as possible to automate things. But, like, you're going to have to want to manage that compute and, like, for the agent, basically, that environment for the agent. Here are the tools it should have. Here's how you should sandbox it. Or bring your own our back or whatever. Yeah, exactly. I see. And so, like, the Codex CLI, which we haven't talked much about. But in my mind, like, the Codex CLI might evolve into that, where it's like,
Starting point is 00:27:37 hey, if you want to, like, run the agent loop in your own environment, then we can help you you do that and you can use something that's an evolution of the CLI. I think you should, let's talk about CLI versus the interface. What are the two differences between Codex and Codex CLI? Yeah. So the place where I want this to get to is just like there's GitHub, right? And GitHub has a website and a CLI and a mobile app and like it's not confusing. Right now it's a little bit confusing in that they are just completely distinct experiences.
Starting point is 00:28:01 We have Codex in chatbt, which is an interface that you can write a prompt and then we run codex in the cloud and then you get back a different answer or an answer to your question. Then we have the Codex CLI, and that's a completely distinct experience with a lot of the same ideas in it, which is basically you can run this tool in your terminal, and we'll hit our model via API, and basically this agent will work locally with you in your computer. So right now I kind of think of it as you delegate to Codex and chat shpT remotely, and then you pair with Codex CLI on your computer. And what is the moment where the CLI journey integrates into the cloud workflow? Yeah, and so where I think we want this to go is there's just
Starting point is 00:28:40 like one idea of codex, and it's just like, where do you want it working? Right. And, you know, there's going to be times where it's just like simply easier. Like, you don't have to set up an environment when it runs locally, right? So maybe if you're trying something for the first time. Yeah. Or like, you don't even know if you like codex yet. You know, you're just a new user. Like, maybe you just want to use the CLI or something. Right. And then maybe then you're using it and you realize, hey, like, I want all this like cool paralyization and all this stuff. Let me have this run in the cloud. And you set up the cloud environment. And from then on, like, you should still be able to like interface with that in the CLI, if you want, except now it's running a cloud environment,
Starting point is 00:29:14 so it's more powerful. Yeah. So I think we kind of want to construct that and bring these things together, but obviously we're in this temporary state of they're completely distinct. Yeah, I think, so it's interesting hearing you talk about how there was this evolution from like the moment where you were using the tool as this like very precious first iteration tool where you put a ton of sort of weight and context into it hoping to get back a really useful answer the first time around.
Starting point is 00:29:40 And then there was an aha moment where you're like, actually, this is more like a slot machine because other modalities in AI have played out very similarly. So this was the case with image models, for example, right? Two years ago, people were trying really hard to get the first version of image models, which were like GANS, you know, general adversarial networks, even pretty like stable diffusion to produce useful sort of coherent images. And they just weren't there, right? They would produce these like artistic renders, which were great for like artistic exploration,
Starting point is 00:30:06 but they weren't sort of useful because they didn't have, the concrete coherence of a graphic design, a piece of graphic design, for example. And then if you remember the first, like era of diffusion models like Dolly and Mid Journey 1, they started to get more coherent, but there was this trick that a lot of product people started using. And David from Mid Journey was one of the first to do this, where he added four generations in the Discord bot, not one. Because the idea was, the insight was like, this is a slot machine. This is a stochastic process. And you never really know which one the user, is going to like best, especially for a super subjective domain like art and like images.
Starting point is 00:30:43 And so human preferences is super subjective. So let's just give them all four and we'll figure out which one they like. Now, over time, if you collect enough human preference, you can kind of nudge the distribution to be more aesthetically pleasing or you can nudge it to be more like better typography or whatever. You can nudge these distributions. But by and large, to this day, the best UIs for image models are still ones that give you like four outputs, if not more, and then allow the user to select the best of n.
Starting point is 00:31:07 And for a long time, people were like, that's going to work for these super creative domains where, like, verifiability or accuracy is not an issue, like images, like video, like music, audio. But what's surprising is you're actually describing that same for pre-verifiable domain like coding. Because at the end of the day, it sounds like there's still enough stochasticity in the sampling of a model, even as it gets better at reasoning, that makes sense to try, use it like a best of end machine. And, you know, this has led to the, I guess, a popular set of critiques against reasoning models that, like, they're not, you know, REL from verifiable rewards doesn't actually introduce new capabilities. It's just really good at pulling out capabilities that are already in the model. It's really good at sampling. Do you think that this is just an interim awkward phase where, like, yes, the best of NSP is better at getting sort of the right answer from the existing model. It's not adding new capabilities yet.
Starting point is 00:32:05 But where we are going a year from now, there will be actually new capabilities that come from running verifiable, you know, are all on all the codex usage that is about to happen from users. Where do you? How bitter, lesson build basically are you roughly on that dimension? Yeah. I mean, basically, I think an unsolved problem. And it's a, it's both a research and a product problem is like how do we steer agents? Right. What that are working independently. And, you know, you're talking, you mentioned like, hey, like, is best of end there to, to, you know, so the model has more shots on goal. basically to you know to sample correctly and I think you know that might be part of it but actually one of the things we've learned working in codecs is that well the human also doesn't know what they want right right and you know so if I ask you to fix a bug like there might actually be four reasonable ways to fix that bug with sort of different architecture implications and I might I haven't
Starting point is 00:32:55 explored the solution space myself that's why I'm delegating this so I I kind of want to know what the ways are and then I want to you know maybe I would pick the one that the model thinks is best too, but it's like helpful for me to see, like, maybe that sucks in some way. Yeah. But it's helpful for me to see the other ways that have like larger tradeoffs to then be confident in the right one. Yeah. So, so that's for like fixing a bug, which is like a very verifiable type thing. If I ask you a model to like, you know, the classic example, implement tic-tac-toe or something. Right. You know, I might not know what I want either. Like maybe there's different styles and different like approaches you could take at various steps along the way. Right. Right.
Starting point is 00:33:29 And so, you know, it's kind of funny you were talking about, you know, generating four images and seeing those in the grid. And like, in my mind, like, for a front and change, you could totally imagine a UI where it's like the model does some work. And then we like run the stuff. We take, you know, the model in its environment runs the app and then like takes four screenshots. And you actually just like have this like similar curatorial UI. Right. It's like just pick the one you like most. We had Rick Rubin on the podcast a few weeks ago.
Starting point is 00:33:54 And Rick's a legendary music producer. And he recently used Claudeco to create a new. vibe coding book. And so we're talking to him about how he, what's his, what was his observation about how creating with AI, how is it creating with AI, you know, code gen tools different from creating music. And he was like, oh, no, it's the same. It's like going into a studio and he was talking about this story about, you know, going into the studio with Johnny Cash and watching Johnny just pick up a guitar and start jamming. And often the process of creating a great song is you just pick up a tool like a guitar and then you just do four different iterations in completely different
Starting point is 00:34:33 directions and then you usually have a creative partner like a producer or somebody going not that one sucked go this way and it's that constant sort of best of end process in create like in the process of creating music that often results in the best you know output and often the quality of the end song is a determinant of the taste decisions you make along the tree of it of best of end and so what's giving me hope about hearing you talk about it is if you read the hacker news thread for example when you guys launched codex somewhere down i forget about halfway down the page was like a tree of discussions about how does this mean coding is going to get much less fun because all of the interesting parts are being delegated to the agent and all the humans having to do now is just sit
Starting point is 00:35:16 and review but actually what you're saying is there are parts of the workflow where you get to almost entirely off the plumbing parts of software engineering and focus on the taste exploration, which is sometimes the most fun part of software engineering is, right? You're creating a front-end U.S. Or even when you're specking out like a really great schema for a database, you know, some of the most fun times I've had is when I'm sitting with an infra engineer and we're speccing out the schema and like you go down one spec with, you know, a bunch of pseudocode neuralize. Actually, that's not the right one, but it gave you an insight that then allows you to try another
Starting point is 00:35:49 schema out. Is that where you think we go? Is that the silver lining or are we actually destined for world where we're just all reviewing BRs and all the creative parts of software are gone. Totally, yeah. So this is just opinion here, but I think you're right in that coding might be a little more painful for some number of months because you have to do things like environment set up.
Starting point is 00:36:10 Right. These are the teenagers. Yeah, these are the teenagers. I think, like, to be real, like, that's true. Maybe you don't get to write as much of like the code yourself right now. But I think we will get to that more exciting place pretty quickly because, you know, it turns out environment set up is probably something that an agent can also massively help with. Right. And we can like close that loop where, you know, you're not comparing like four
Starting point is 00:36:31 diffs or something like that, but we've like figured out the interaction model with the agent. So you're kind of like making decisions in a way that feels like more like talking to another human. Right. Who's just like really smart and fast. And then also that you're making these decisions not based on like reading like raw code in the case of front end at least, but like maybe you're like making decisions based on the outcomes. You know, like in the case of front end, like you're just choosing screenshots or like clicking around a preview or like if it's back end maybe there's like some tests you agreed on and you're just like looking at test outputs to sort of decide right the other thing that's interesting is that well if you were to guess let's say I'll give you a few
Starting point is 00:37:06 things that people use codex for and I'm curious what your guess would be the most like the biggest ones are like let's say it's like building you features asking questions planning debugging and fixing bugs like what do you think people would use codex for more I think they're would like to use it for debugging. They probably aren't using it yet for that because there's often my knee jerk when I'm using an agent is that it just doesn't have enough context to fix for routine tasks like some piece of boilerplate React is broken like debugging is totally fine. But I find I use it more and more for well-defined, well-scoped, well-contained tasks like create this new UI element that does blah or a refactor that's like where the atomic unit is very well
Starting point is 00:37:52 constrained. But I'm curious, what are you actually seeing? Yeah, I mean, so my intuition was that people would use Codex for fixing bugs. Okay. A lot. Because, you know, bugs are somewhat well-defined-ish. You know, you can kind of tell if it's fixed. You might even have, like, some logging data or telemetry data that you could just paste into the model. Right. Excellent in fixing it. Right. Right. Those are some of our earliest delight moments. We're, like, dumping in the stack trace and then just- And it just figures that, right. But actually, by far, the thing that people use codex for is building new features. And I don't know, that was just like slightly surprising to me because, you know, that is some of the most fun stuff to do. Right. If you read like, you know,
Starting point is 00:38:29 blog posts by folks who are using codex in that way, and it does look like they're having quite a lot of fun because of just the sheer speed they're experiencing. Right. The speed to prototyping has basically collapsed completely with something like codex. Yeah. And broadly speaking, this is the vibe, the explosion of vibe coding, right? I think it's, that makes sense to me because some, when you're prototyping a new idea, I find the most rewarding is when you actually If you can get to the first draft really fast and then kind of iterate from there, that's fun. Sometimes the worst is when you have an idea, you kind of want to see it and then you lose steam between firing up your IDE and seeing the first version of it, right, compiling.
Starting point is 00:39:06 This is why hackathons have proven to be this like, I think, magical sort of, you know, type of event where you get people together and commit to getting over the hump of the first prototype. But in many ways, I think something like Codex or, you know, broadly speaking, really good coding agents have turned every day into a hackathon because they've collapsed the energy you need to get over the hump of all the plumbing, all the environment set up to test an idea. When I was at Discord, we used to have this ritual across the company that was an annual tradition called Hackweek. And some of the, where the entire company would just stop for like a week. And it wasn't just engineering. It was product, marketing, sales, operas.
Starting point is 00:39:47 the entire company could hack on anything they wanted. And some of the most enduring and popular features that made it into production, the company over the years, came from hackathon projects. And it begs the question of, well, if there's a whole team called the product and engineering team whose job it is to ship great features, why did it take this special thing called a hack week
Starting point is 00:40:07 to produce such great features? And there is something about when you reduce the cost of prototyping new ideas, you end up getting things that don't make it through the usual PR, D flow. And it sounds like that's what a lot of users are using Codex for now, is like that first to reduce the time to magic, essentially, the time to first prototype. Let's change stack for it. Because there's this elephant in the room, right, which is that if, you know, Mark famously wrote an op-ed in 2011 or 2012, which is like, you know, software is eating the world. And after I saw that chart, you mentioned of the GitHub merge success rates of AI agents starting 35 days ago,
Starting point is 00:40:42 hitting 80%. And as of this morning, the volume being 350,000, it sounds like, AI is eating software engineering. Does it even make sense to study software engineering anymore to get a CS degree? If you're a freshman at Stanford today, or just a freshman, you know, somebody graduating high school and you're broadly interested in software, does it even make sense to major in CS? So my take is that it's two things. First of all, I think still a great time to major in CS. I think there's like going to be so much more software created and therefore so much more
Starting point is 00:41:13 software engineers needed. But I also think figure out how to be using AI. constantly while you do it. And hopefully you're at a university that's like very forward-leaning and so they're kind of embracing it. You know, I hear about policies like, hey, use AI as much as you want,
Starting point is 00:41:26 but you just have to say how you used AI as part of your assignment. Right. That's great. If you're out a place where, like the main place where I would be worried if I was a student right now is if I was studying CS
Starting point is 00:41:35 and my college didn't allow the use of any AI because then I would just feel like I'm like falling behind. Like, it'd be like if you went to college but you were only allowed to write assembly and you could not write C you know, back in the day. Right. that would just be deeply worrying, I think.
Starting point is 00:41:49 Right. But yeah, my take is we can do, like, you were talking about this, right? Like, we can do so many more things now. And, you know, we hear this from customers, too, like, and from users. They're just like, hey, like, I would never have bothered doing this before, but I threw the idea into Codex just for the sake of it. Right. And I do this all the time.
Starting point is 00:42:06 And, you know, a lot of the time I do that and then I see the output. And I'm like, I just still don't really care to do this. But then sometimes this thing that they would not have even bothered doing, Codex either straight shots it or gets it to like 90%. And they're like, you know what, I'm excited enough to do the last 10% here, just get this merged. And then this thing that would never have happened, now it happens. Right.
Starting point is 00:42:24 Right. You know, some of my favorite examples, like internally are like when people build, like, new internal tools that accelerate the rest of their team. And like, it's the kind of thing, like, someone's complaining in Slack. Like, I wish we had this tool to like, I don't know, look at these logs in a better way. And they're like, no, you know, it just can't be bothered. Everyone's too busy. And then you, now you have this, like, great parser.
Starting point is 00:42:39 Right. So I think that there are so many places where we could use software and that software could be more personalized to small groups or even individuals. Right. that we just are missing out on. And so, yeah, now I believe that, like, with just the acceleration we're seeing in software development, I think we'll have many more of those tools existing,
Starting point is 00:42:58 and they'll be much cheaper to maintain as well. Like, that's the thing we're on the tip of now as well, where you're starting to see AI agents getting plugged into, you know, like GitHub or like Slack or, you know, linear has the agent's feature. And I think that that will make it much more efficient to actually have some, like, app out there and running. Right.
Starting point is 00:43:15 Similarly, you know, even we're seeing those, like, this is not Codex, but we're seeing products out there that will, like, write the app for you and then deploy it for you as well. And so it's just like all in one. Full stack, basically. So it's just like, anyways, long story short, it's much easier, I think, to build software, to deploy that software and to maintain it. I think that's just going to, we're just at the beginning of this change.
Starting point is 00:43:33 So let's talk about that. It's been 35 days now. As a product lead, you've had a chance to actually see, you know, the best laid plans rarely survive contact with reality. So now, what priors have you updated the most and what comes next? Where does Kodex go in the V2? Because this was just a research preview. But what are the biggest improvements and what's the shape of the arc of the product in the future?
Starting point is 00:43:56 Yeah. So I think there's one sort of conviction that has deepened and then one prior that's like being slightly updated. So the conviction that deepened is that this form factor of an agent working on its own computer in the cloud is the future and is incredibly powerful and worth figuring out how to get right. So we're continuing to invest in making that environment set up faster or making like, performance just first time user onboarding. Yeah, first time user onboarding, but also just like,
Starting point is 00:44:20 you know, once you're running, like things should just be faster. Sure. Speed is actually always the underrated feature. And is that, are the biggest gains in speed you think are going to come from
Starting point is 00:44:29 doing things like model distillation or do you think that comes from just better orchestration of tools? Honestly, I think the low-hanging fruit is just like plain old deterministic, like DevOps-y type stuff. Okay.
Starting point is 00:44:40 You know, like right now, we clone your repo every time you do a task, even if it's a follow-up. And then we run your setup scripts from scratch every time. And so if you have a large repo and a lot of dependencies to install, like that thing is slow. Okay.
Starting point is 00:44:52 Start with gashing. Yeah, we can just like, we can fix these things, right? Yeah. And again, like, I love that we didn't,
Starting point is 00:44:56 I love that we shipped without those things. Yeah, to be zero. Yeah, exactly. So there's like that. And I think, like I mentioned Best of N,
Starting point is 00:45:03 I think thinking about how to make the most, like basically how do we spend like more compute for you on your behalf? Okay. Is like very exciting. And then how do we bring this closer to the tools you work in, right? For me, the interface and chat, BT, it's actually like very functional, but it's like not where developers go when they want to
Starting point is 00:45:21 write code, right? Like, where do you go when you want to write code? Either your terminal or your IDE, right? Similarly, like, where do you go when you want to like triage issues? Well, like, you go to your issue manager, right? And so forth. So I think we want to bring it much closer to the tools people work in. And eventually, you know, the goal is to get to an agent that is like, basically a teammate? And it's like seeing what's going on your team and like picking stuff up for you. Right. Is this just, is codex just going to be a Slack teammate that I can just ping and I interact with Slack. I kind of think of it
Starting point is 00:45:48 as like, it's just, it should be sort of a ubiquitous teammate. Right. You know, it's just in your tools. In the tools you want it to be in at least. Right. You know, and we'll start very gentle, just like, hey, you decide when Codex does work.
Starting point is 00:45:59 And then over time, we'll figure out how for it to like, kind of like more proactively chime in. And, you know, we had a jam about this recently. Like, you know, it's kind of an interesting point. Like, I don't think we want it to proactively like DM you all the time. Every five minutes when something happens. So I think there'll be some evolution. of tools where we come up with like if you if anyone here has played video games you know there's
Starting point is 00:46:20 always like press x to like and like if you're next to a door it opens a door if you are next to some object it picks up the object it just it's a contextual action yes right yeah contextual proactiveness it waits for the hint that you want to do something and then jumps in yeah right and this is kind of like when we're getting to like interactive agents I think that's just like a big open area but it's like how do we have agents who understand what your team is trying to do and respond to like stuff in your team workspaces right and then how do we have an agent that understand what you are trying to do. And it's almost like this agent is like both in all your tools, but like sitting next to you while you're working on your computer and like kind of just being
Starting point is 00:46:52 like, oh yeah, like I can help you here. Right. So that's like actually the conviction that is deepened, right? We're like, yes, all of this works when you give it its own computer and we need to figure out how to create this infrastructure. For ecosystem integration. And like make that safe and so forth. Then the other thing though that there's a bit of an update is like just thinking about how people like learn to use these tools. I think right now there's some things that are pretty clunky. Obviously we've talked a lot about environment set up. I think also some of the things that, you know, you have to do, like, updating agents on MD is very manual and you have to, like, commit to your repo to get that context
Starting point is 00:47:25 to the agent. And so for me, I'm just thinking a lot now about like, okay, how do we make this like way easier to try? I reduce the cognitive burden of the onboarding, fewer decisions to get to the magic one. Yeah, exactly. Okay, got it. What has it changed most about research and the frontier of where frontier models are going, right?
Starting point is 00:47:42 has in your mind, does this mean that is the efficacy of how good codex is as a post-strained version of O3 Pro at using tools that like plugging into this workflow? Does it make you go, well, it just makes sense to for an unlimited amount now of compute on post-training models to get better and better at being autonomous coding agents? Or do you think there's some marginal plateau point at which you go, you know, after this point, there's not really much the user is getting from better. and better tool usage. You know, how does this change the trajectory of progress when it comes to the
Starting point is 00:48:18 frontier of research? Yeah, that's a really interesting question. I definitely don't know if I have the answers to this. But what I can say is that one of the best parts of doing, you know, an optimized version of O3 was that we got to make a bunch of like hybrid research product decisions very quickly. And I think that is incredibly exciting for thinking about how to make something useful. So, you know, if I imagine we would have had this idea of like, you know, it's like really
Starting point is 00:48:41 important that the agent knows how to write really good, like, PR descriptions and, you know, tests code in a certain way that's used to working in varied environments. And, you know, when it runs some tests, it doesn't just tell you that it did, but it cites deterministically, like in the logs, the output so you can verify that yourself. Those are a bunch of, like, product ideas, really. Right. And they're not, like, those ideas I just mentioned are not like higher model intelligence, nor even really a higher ability to call the right tools. Right. It's just like this understanding that, like, I liken to the first few years of job experience of a software engineer, right?
Starting point is 00:49:12 Like, you start, you have a, you know this incredibly precocious college grad, like, very smart, but, like, doesn't actually know how to be a software engineer, just that's not a code. Right. And, like, there's some, like, transfer, so it kind of knows a bit of software engineering. Right. And then, like, that's fine. But you can make it way more useful for, you know, the human trying to use the agent.
Starting point is 00:49:28 If it has those first few years of job experience. Right. So I think that there's no reason that those, that knowledge couldn't be infuse into the model. Exactly. Right. Yeah. But I think that having the freedom.
Starting point is 00:49:41 to go and, like, explore these ideas, like, relatively cheaply and see what sticks and what doesn't. It is really powerful. So, frankly, like, I don't really know to what extent it makes sense to, like, have, like, a bunch of custom post trains for, like, absolutely everything that matters. But I think for something as important as, like, coding to us, I think that, I think we're willing to say, like, hey, for coding, we really care about this. Let's just do everything we can to, like, have the best product. So, like, we actually did a similar thing with GPD 4.1, where we basically were getting a bunch of feedback from developers. We said, okay, let's go talk to a bunch of developers,
Starting point is 00:50:13 like make custom e-vals for them, deeply understand, like, what our model is great at, what they want us to get better at, and then we release the custom model, right? And then the goal should always be, okay, whenever we do this, like we have 4.1, okay, the next version of our, like,
Starting point is 00:50:25 sort of general model. Should just integrate that. Yeah, should integrate everything. Right. Yeah. We have friends who are different levels of AGI build. Did working on Codex update your priors on, you know, 2027? Okay, so I'm very AGI-I-pill.
Starting point is 00:50:42 I'm aware. My, like, slightly joking, but I can't tell if I'm joking 100% take, is that if you took a model today and ran it in the right loop, we're basically there. Would it have rights? That's the question I sometimes wonder. And should they be able to turn themselves off and go take a vacation if they want? Yeah, so, you know, that's kind of where I have. Are you pro labor rights for O3 Pro?
Starting point is 00:51:07 I am pro thinking about it. You know what I mean? Like, I don't think we're at a point where it's obvious, but I, it sounds kind of crazy. But I feel like it's a question worth considering every now and then. Or more concretely. How far are we from full recursive self-improvement? Okay, okay. Sorry.
Starting point is 00:51:21 So, back to you. Basically, I think working on Codex made it very clear how we can have agents just like omnipresent in our lives being incredibly useful. Because what I realized is that obviously we need to do a lot of model improvement. But I also saw how there's like just concretely a lot of model improvement. of like normal product work to do right to set them up in the right way and then that normal product work will then like pull the models into you know into being more and more useful so i think like by 2027 like agents will just be absolutely ubiquitous in the workplace i think in personal life it might be a little bit slower because in personal life there's less of these like constant pipes
Starting point is 00:51:59 of like signals of things to respond to the reason this matters is that if you think of chat you just have this like input box right and like most people including myself probably use it for like 1% of the things that I could use it for because I just don't even know to use it in that way or I don't prompt it right right that intention just isn't there yet yeah but like it's similar like if imagine you hired a teammate and then the only time they do work is if you specifically tell them to do a task right then they would just be very underutilized right but what makes a great teammate great is that they you kind of tell them what their job is and they just start responding proactively the self-charters yeah so I think like that is the big unlock for agents at work because
Starting point is 00:52:34 there's like streams you can subscribe them to, like, you know, your communications tool. Right. And in personal life, I think that might be a bit slower, but we'll see. Do you think that, well, actually, what percentage of all GitHub BRs do you think would be written by an AI agent 12 months from now? That's a really tough question. I sort of changed my mind every time I answer it. So maybe a slight cop-out, and then I'm curious for your answer to, would be that there
Starting point is 00:53:00 will be teams for whom 90% of their PRs. are written by agents. But I don't know how quickly that will, like, spread. You know, this is a common thing with AI. It's like, we live on, like, you could call it in the bubble. You could call it on the cutting edge. And so we're just, like, adopting everything rapidly. But then it takes a while to, like, defuse or diffuse out.
Starting point is 00:53:19 Yeah. So, but I think the cutting edge will, you'll be at like 90% on teams. Right. No, I think that's right. There's, I don't think people often talk about the coding economy as one homogenous economy. And the reality is there's multiple sub-economies, but they're at least two big economies, which is there's the, for lack of a better word, you know, there's the digital native companies, right?
Starting point is 00:53:39 These are technology companies, usually born in the post-internet era where they grew up, where either the founders or most of the vast majority of the team has grown up natively understanding how to do modern software development. The default assumptions when a code base is initialized is that it's going to be, you're going to use Git
Starting point is 00:53:56 for version management. There's going to be branching. There's going to be good review process and so on, like sort of modern software teams. And then there's the, the vast majority of actually the world's mission critical code, which we talked about earlier is Fortran, Cobol, like running on-prem in these massive ETL systems like in Virginia or in parts of Europe that were set up in post-World War II or in the Cold War with a default
Starting point is 00:54:22 assumption that everything had to be locked down. Often these code bases are running big parts of critical infrastructure like the railway system of an economy or the air traffic control system. So they're very high impact and high stakes code. They're not modernized whatsoever. And they're constantly rotting because of technical debt. And I think one of the most exciting things is that the one-time migration costs to modernize these code bases now has collapsed precipitously because agents can do so much of the plumbing work that typically would hire some system integrator, you know, Accenture, Delo for a 10-year contract where they'd come in. You know, this is part of the founding thesis of Doge, right?
Starting point is 00:55:01 Which is like just vast parts of the American government in IT infrastructure is like super legacy and we're getting overcharged as a country to like modernize it and agents go in and our if you as long as we can get enough distribution, you know, training data on Fortran and COBOL and so on, then the one time like upgrade costs should fall and we should see like a ideally this is my hope is that tools like Codex modernize that entire sort of legacy code economy. and then we get to upgrade everybody onto like modern software engineering, right?
Starting point is 00:55:34 It's tending to happen from what I can see now in countries that get to leapfrog legacy infrastructure because it's starting from day one. And it's very similar to like civil infrastructure like roads and highways and so on. So if you go to a country like Singapore, which is a much more modern country
Starting point is 00:55:48 because it's barely 60 years old, you know, they only got its independence in the 1950s. Then they didn't have to build the roads and so on that Britain did and then upgrade them all, which is like refactors suck. take way more time. If you could just start from sort of a clean slate, it's much easier to
Starting point is 00:56:04 modernize. And so what I'm finding is that it is easier for countries that are, whose IT infrastructure is just newer to adopt agents. They're still legacy. I mean, there's still, it's a vast majority of is running off and on-prem and it's not modern, you know, it's certainly not TypeScript, but it's easier to upgrade from, you know, systems that were written in C-plus to what, to Python, than it is to go from Cobol to Fortran, whatever, to Python. But if there's anything that makes me super excited that these economies will merge, it's autonomous agents, right, doing all the plumbing work and doing it for a fraction of the cost and time that these mega, you know, sort of consulting companies have started to
Starting point is 00:56:44 charge. And frankly, many of them don't end up ever completing a project that just turned into a boondoggle. So I'm very excited about that part. And that's why I think AI is going to eat software. Because there's software did the modern sort of startup economy and digital economy, software ate really fast. But there were other parts of the world, especially mission critical industries where there was like a one-time software upgrade largely driven by military scenarios. And then we never modernized all that infrastructure since then. So that's why I think the cybersecurity side of this, the safety evils that you're talking about, I think over time will come to be seen as having been very prudent because the thing that puts all of that adoption
Starting point is 00:57:22 at risk is having like one terrible incident that then changes the risk posture for a bunch of enterprises. I have a question about that, actually, I'm kind of curious. So when, you know, a lot of the larger companies that we talk to, their use case is very different. It's not like building new features, right, which is what we see like most of our users using us for, but it's refactors, large refactors and re-platforming. Right. So I'm curious, like, if you mentioned some of these companies or governments or systems that you're thinking about, kind of had this like one-time upgrade for military reasons and then never upgraded from there. I am curious if there was like a specific reason that they all want to upgrade now that you're seeing or if actually
Starting point is 00:57:56 we're still kind of in the state of like there's no forcing function. So like, although it's easier to do, there's still no impetus. Right. So for sure there's the geopolitics has accelerated like adoption for a bunch of governments, right? In Europe, the Ukraine crisis has forced a lot of governments in that region to go, wait a minute, like our air traffic control systems, especially in age of unmanned sort of drone warfare. It is it is crazy that when there's a bug, we need to call in some legacy contractor who built it like 20 years ago to come and do some on-site maintenance, right? That's been a wake-up call. And so you're seeing these like, there was a, there's sort of an $800 billion defense bill that Europe passed, you know, six months ago.
Starting point is 00:58:36 And the most urgent adoption is certainly happening at the intersection of like legacy code, not working and battlefield needs and drone warfare, code bases that interact with air traffic control systems, with like UAV planning, with mapping. Those are the code bases that are like most urgently being upgraded. I think in other. parts of the world, there's just a desire to modernize. So if you look at the UAE or the Kingdom of Saudi Arabia, we talked about how the UAE rolled out, is rolling out chat GPT to the entire country. I think that's coming mostly from a top-down directive to just embrace the like AI future that's coming rapidly. Basically, the more AGI build I find the head of state is,
Starting point is 00:59:11 the more rapid the adoption is, certainly for chat GPT like tools, but also coding. That's not driven by some like military function. But then there are other regions like Europe where like for sure geopolitics accelerating all that. And you know, you and I've talked about this before, but usually those scenarios often need a slightly different organ, like the ergonomics of code are different. They're very on-prem. They're very, they require a level of air gaping from cloud systems that like the modern software engineering workflow doesn't lend itself to. And so we may see this like bifurcation of codex as a family. Like I'm curious over the next few years, you know, the military require, or let's call it the critical industry needs.
Starting point is 00:59:52 of modern autonomous coding agents might require some pretty basic architectural differences than the, you know, let me ship the latest and greatest of our next version of our software product on GitHub. I think it, I don't think it's a coincidence that the last time we saw a huge adoption and IT infrastructure around the world was the Cold War. And now we're, you know,
Starting point is 01:00:14 living through some pretty unstable times, both in the Europe, the Middle East. And I think that is causing governments. I think the U.S. has always been somewhat, what, forward-leading, posture-wise, on adopting the latest and greatest technology. We make other governments look, you know, rightly so, like dinosaurs. And those folks, nothing forces dinosaurs to wake up like an impending comet, hitting them and impending extinction.
Starting point is 01:00:38 So that's definitely happening. Yeah. I think it's interesting, like, for me playing this through line as we, you know, as we're working on Codex, I do think there needs to be an answer for like, you know, how do you use this agent in an air-gap environment? Right. how to use this agent like you know you know there's critical industries and then there's just many like large companies who have like incredibly stringent security needs right it's kind of the way
Starting point is 01:00:59 we've kind of thought about building is the most important thing is to you know build to aGI right and then distributed the benefits of that to all humanity and so we're kind of like leaning towards the like okay the primary thing is the like fully self-hopped you know the thing where we host it for you you know contain the environment and everything and it's kind of in parallel we have this like sidetrack of like okay and like how are we going to make sure that like today you know you can use Codex CLI, you could use that in a, I guess, relatively air-gapped way. Obviously, it needs to sample the model. And then as we build new capabilities into Codex and Chachb-T, how do we just make sure that if you're running something like CLI can, like get the most of all, you know, the capabilities
Starting point is 01:01:35 without a trade-off? But it might be a little bit like, okay, we build it in the like fully self-contained system first and then we push down. Right. You know, this, there's this narrative violation I keep hearing about, I keep hearing from folks in San Francisco that, oh, you know, opening eyes all in on consumers because it's, because the rise of chat GPT as a consumer companion has been so extraordinary. But clearly our entire conversation is an exception to that story, right? Because almost everything we've talked about has been focused on developers and governments. So why is that misconception there? I think chat chit is in fact an amazing and large business and it's super cool to work at a company that is like really distributing AI.
Starting point is 01:02:18 Right. So like a giant number of people, but yeah, we are incredibly serious about coding. And in fact, we always have been, you know, since like the first codex product that was powering GitHub co-pilot right on all the way through with our models. I will say though, like, I think people are noticing like we are getting, we've always been like very serious about coding models and we're now getting like very serious about like coding products as well. Right. And so like, whereas before we had these amazing models, you could use them in like whatever tool that you want to use them in.
Starting point is 01:02:46 Like now definitely, I mean, a lot of the stuff that I'm working on is thinking about like, hey, actually there's a lot of, you know, as we build agents, there's a lot of value we can provide by not only thinking about the model, but also thinking about how the model is like useful to you in a certain form factor. And actually the form factor really affects everything. And so, yeah, we're spending a lot of time and effort building like even better coding models and even better coding products, particularly focused on agents, but even beyond. So you've been a founder before. One of the scary things about hearing opening I going from being serious about models to all the products is if you're a founder in the space and you want to build something interesting in the coding space, there's this tension looming, right?
Starting point is 01:03:24 Which is anything I'm going to build just going to be subsumed by opening I's products next year. So how would you think about that? If you were leaving opening eye and starting a company today, what would you do and what would you not do? Okay. So if I was leaving open ad today, probably the sort of the market changed that I would be thinking the most about or one of them would. be agents. Okay, great, not super controversial. Then I would think, okay, like we were talking about earlier, an agent is basically like a really good model that I'm probably not going to build at my startup. And then I need to give that model access to tooling in an environment. And then I
Starting point is 01:03:55 need to like figure out what tasks it needs to be good at. And then obviously give it to customers. And the interesting thing about it is that those latter three things, right, the tooling, the environment, and the task distribution, why I guess I'm the customer. So four things, whatever. All of those things are very much based in, like, knowledge of a customer. And those aren't things that, like, open the eyes is going to, like, you know, generally do for, like, every industry, right? Like, coding happens to be of particular importance to us, just broadly. But even, you know, within coding, there's a lot more specifics, specific areas.
Starting point is 01:04:25 So just to really spell this out, you know, if you think of the environment, like, it's really, you know, training codex, it was, like, really non-traveal to, like, figure out how to give the environments different, how to give the model different environments to train in, you know, with, like, different kinds of realistically dependent, realistic dependency setups, various amounts of dependencies even installed, like varying amounts of unit tests. Like we actually, the startup that, you know, I sold to Open AI was like multi, that's how I joined.
Starting point is 01:04:49 And we had very few unit tests on a lot of our code. And it's like kind of funny and like that. But that's realistic. That's like a real startup code base. Right. So actually, if you wanted to do that for like some specific function, I don't think it would be easy for us at OpenEI to like create that many environments for the agent to use and train on and like, and then use it, you know, test.
Starting point is 01:05:09 time. So that's hard. And then I think the task distribution is also really interesting. Like codex, you know, we have a lot of intuition for what a good code and task could look like and, like, kind of where to draw the boundaries, right? Like today, it's like, provide prompt and then you get an answer or a diff that you can turn into a PR. But like, those are some decisions we had to make around what the boundaries of the agent are. Right. And then we had to like go collect a bunch of those like type of door tasks or like invent those tasks to like, again, like train the agent how to do it and evaluate how well it was doing. So I think that, again, And for a very specific industry, I don't know, I'm trying to come up with an example.
Starting point is 01:05:42 Let's say accountants, but in the specific region of the world where there's like a specific set of rules, like they might have like very specific tooling that's like provided by the state for doing that accounting, right? Right. There might be very different kinds of like based like knowledge and documents available and then like the way you need to do the work on you different. So I mean, I think it is a very good question and I'm not 100% sure what I would do as if I was a founder right now, but I think that I would try to lean really hard on like very, very
Starting point is 01:06:08 good customer knowledge and less hard on, like, product, which that makes sense. Right. It sounds like the last mile connective tissue between an industry where you have deep domain expertise becomes more valuable, whereas the first mile of, like, all the general purpose parts of an agent's flow, you basically, you should assume you should offload that to open AI. Yeah. Yeah.
Starting point is 01:06:30 And then I think the other thing I might do is I might keep my company really small. So rather than like, you know, like doing the classic like hyperscale thing, I would try to to use agents as much as possible, the company as small as possible, so that we're just agile and nimble. I guess this is probably like just the sort of age-old advice. Well, I'll let me push back on that for a second, because it turns out that in many industries, serving the customer deeply like you're describing, often requires a human touch. That might be sales. It might be solutions engineering. It might be customer support and so on. It does sound like what you're saying is you would certainly keep your engineering team very small
Starting point is 01:07:04 and minimal. But if servicing the domain required more of the human touch, then that you would scale. Because if it required, often my experience is that getting an agent to actually work in the enterprise and the legacy industry requires going in and doing a fair amount of integration work, at least up front. So maybe it's a setup thing, right? Up front, you parachute and somebody who understands how to get an agent up and running. And then you can leave because it's really just for them, for the customers like consuming teammates, like you were saying earlier, but maybe where you do need people is that integration point. Now, ideally over time, I guess you're saying the product should just get good enough at integrating into the
Starting point is 01:07:46 customer's environment, but sometimes for regulatory reasons or otherwise, you just need a human there. You know, are there some industries that like clearly do you feel like out of bounds for Open AI because that just is not on the path to AI, but that would still would interact with coding agents. First of all, it's a good point on the actual, like, integration work probably requires humans. I would say, yeah, especially if it's in-person type integration work or like complex, then I think you're spot on there.
Starting point is 01:08:11 Industries that are out of balance. I think it's like, it's like a hard question to reason about because like we are building like general products. Right. And so you can like kind of use like chat GPT to answer any question like already today. So I wouldn't say there's like
Starting point is 01:08:27 balance, but it's more like focus. I would say you know, right now opening eye. we're very focused on like certain consumers generally and like being really good at coding. You know, there's some other things too. So I would just say,
Starting point is 01:08:39 yeah, the more, maybe we should just not even have this answer in the podcast. Yeah, we can take this part out. Probably. I'll give a 10 minute time check as well as.
Starting point is 01:08:46 Perfect. Great. No. Oh, great. Yeah, about to wrap. He stopped me. That was a good one.
Starting point is 01:08:51 I'm like, I don't know, man. I'm not a founder right now. You don't want to speak on behalf of Sam about which why world domination is not complete and total. I'll take that part out. So,
Starting point is 01:08:59 slightly different topic. A question I get from a lot of parents, especially with kids who are approaching the end of high school and in that phase where they're picking careers or thinking about what they want to do is this immense anxiety, especially for folks in tech, for whom, you know, for the last, for the vast majority of the like 20, 30 years, it's been a fairly stable assumption that like if you went, if you were smart and generally oriented towards technical fields, if you went and studied software engineering, you'd have a pretty great career. and safe and sort of rewarding time in the knowledge economy. And it seems like coding agents like Codex are taking a violent hammer to that assumption. How would you advise, you know, friends who are parents who are trying to figure out how to help their kids choose a career for the future?
Starting point is 01:09:50 So I'll answer this with humility because I don't have kids, but I do think about this. And actually, I think my point of view would just be that the world has always been changing, it's changing now but it was changing before that maybe it's changing a little faster but that's the main thing to notice is actually the pace of change not the specific change and so like I think the most you know if I had a kid at late high school now I would probably just be trying to encourage them to just be like excited it is about whatever they're doing and like be incredibly
Starting point is 01:10:19 curious and constantly learning right like I studied CS did you study CS as well or I started with CS and then transfer to bioinformatics because I was more interested in healthcare right you know and now you do investing right and like I studied mechanical engineering and then I change to CS and now I work in product in a, you know, in AI at OpenAI, but like the startup that I'd started was not an AI company. So things are constantly changing. And I think the most important thing is to like be agile, curious and like, you know, have some foundation that you can build upon as the world evolves around you. So I think similarly, if I had a child in late high school, I would just want them to crush whatever it is that they're doing. And it wouldn't really matter
Starting point is 01:10:54 what specific thing they've chosen. You know, I lean technical. So that would be cool. But like maybe even that is optional. And then I would just raise them with the expectation that they'll probably have like many career transitions throughout their lives. And if you were having seen what you have with Codex, knowing what you do about where it's going, let's say you were the chair of the computer science department at the university, what would you do differently now versus before when Codex launched? Well, one is you'd allow kids to use the AI tools. But let's see you're thinking about the future of computer science education and how that should be taught over the next five, 10, 15, 20 years. How would, what would you do differently? Yeah, again, just opinions here, but I think
Starting point is 01:11:35 I would have, you know, like at Stanford, there was a class where we wrote assembly, I forget the name of that class. That was cool. We had one class. CS-140, I think it was. Yeah. And then, you know, similarly, I would have like a handful of classes where folks do things like very manually to understand what's going on behind the scenes and also to build a confidence that they can. But then generally, I would move towards like having students trying to deliver some kind of like outcome, be it like they've learned something or they've built something or something like that. Project-based learning. Yeah.
Starting point is 01:12:03 And then I would probably encourage them to like use these various tools so that they're picking up the skills. And you know, I don't know. This is just an idea in my head. But if we could help them kind of like speed run through that arc, then maybe every quarter that they're using a different set of tools. And so they're like becoming like very mentally plastic in terms of how they get things done. And I think that would be the best simulation of like what future work. would look like. I'm not sure. What would you do? Well, I teach your class CS-143 at Stanford every year. This
Starting point is 01:12:30 year, we taught it in winter quarter, and we had about 300 students. And I was, you know, thinking through what was a, in previous years, we had a midterm and, you know, we had like problem sets. And this year we decided just to do, have it be a combination of speakers who are CTOs or folks, researchers in AI come in and talk about the infrastructure problems of building AI products at scale. And then we had one final project where everybody had to build an agent and ship it. And they were all allowed to use any coding tools. Obviously, in fact, we gave folks some credits to Mistral models and Black Forest models and the founder of Cursor came by and kind of talked about the ID and why they should all be using it. And what was extraordinary,
Starting point is 01:13:12 right, was it was so clear that the distribution of the final projects followed this power law where the top four or five teams that really adopted wholeheartedly the coding, the cursor and the AI models and did a fully sort of AI assisted workflow of their final project, like produced software that was like production grade ready. If I was still running the platform work at Discord, I would have totally shipped four or five of those on the front page of the app store we had. In fact, I sent some of them to the founders of Discord and they were like, we should probably ship this. The quality bar was just extraordinary for something they were able to build in a basically a 10-week quarter. Then there was this sort of, you know, usual sort of middle of the back that had made a half-hearted attempt, but enough to get a good grade to customize the templates we'd given them, but clearly hadn't, like, asked what is something that now I can create that I couldn't before, now that I have access to
Starting point is 01:14:08 extraordinary coding agents. And then there was just, you know, the classic sort of bottom of the class that I think just didn't, didn't accept those tools and think deeply about like trying them, using them, learning with them, developing a feel for like what they're good at and what not good at and kind of turned in a final project that would have been totally possible to build a year ago. Why do you think they didn't want to use the tools you were giving them? Look, it's hard to parse out from just a final project. But I did office hours with a lot of the students every week.
Starting point is 01:14:41 And you could very clearly think the number one predictor, of their success was their mindset. It was just about, like, did they, were they curious and hungry to learn outside of like a traditional textbook? And look, some of them, some of the students just had a lot going on. You know, being a college student is a stressful thing today. And so I don't, I have a lot of empathy for, there, there's definitely this, these awkward moment you're describing right now where a number of the graduating seniors from, who are graduating with college degrees this year, started out as freshmen in a very different economy. Right.
Starting point is 01:15:16 When they picked CS, the assumption was, hey, if I, like, do well in the core CS curriculum, if I get a 4.0 GPA, and I do like one or two good internships, you know, somewhere along the way, and I apply for a job, I'm going to get a job at a pretty good debt company. That's just not happening anymore. And it might be because there's a set of layoffs or some overhang from the Zurp era, or it might be because a lot of engineering teams are reducing their footprint of entry-level jobs. But I was definitely shocked by how many Stanford CS grads they were looking for, you know, graduating seniors still looking for full-time jobs, you know, come winter senior year. And I think that's anxiety-inducing, it's stress-inducing.
Starting point is 01:15:57 That has bleed over effects on, like, can you concentrate on this, like, project-based class when a number of the students were also juggling interviews and were coming to office hours when I thought they were going to be coming to ask about, you know, the code. we're asking for career advice, which is totally fine. But I do think there's a transition phase right now, which can be very stressful for computer science students. And I think you're right, the faster they're able to onboard to using these tools rapidly and realizing that the cap on what they can create now is extraordinarily high, the faster I think they're going to transition into the new economy better.
Starting point is 01:16:34 Because I do think there's an expectation, certainly for modern software team, certainly at Open AI, that, like, you're just fluent in all of these tools now relative to, you know, four or five years ago. It was crazy. When I, you know, when we graduated through Stanford, I didn't take a single class that required the use of Git.
Starting point is 01:16:49 Right. Which is absurd. Yeah. Like, I happened to, like, you know, pick it up in an internship, but there's no class that actually requires you, at least at the time, required you know how to use Git. Yeah.
Starting point is 01:16:58 And so I think, I do think the computer science departments around the country have to recognize that and change and do the kind of make the changes you're talking about. And my hope is that in the intern, you know, students won't wait around for their deans and their professors to do that for them because you can just go and use Codex, you know, for free. I think the research previews is literally free. Is that right? Well, you have to, you have to have a plus account or pro account, but yeah, it's a good point. Maybe we should do something for students. Student licenses. Yeah.
Starting point is 01:17:23 You know, I will say that like we, so we're hiring for Codex, please. What should I say? If you're interested in working at Codex, DM at Embirico on Twitter, it's EM, E, M-V-I-R-I-C-O-N-E-R-E-R-E-E-O-S-E. Yeah, I don't know if I'm allowed to plug myself here. But yeah, we're hiring, but we mostly are hiring very senior, but we actually are, we decided that we're pretty interested in hiring like a couple of new grads. Oh, that's interesting. Yeah. And so it's been interesting just looking at new grad profiles. And I totally feel you on the, yeah, I mean, it's definitely a tough time to be graduating. I don't know if this is advice, but what I can say is that when I look at new grad profiles, for me, the thing that I take the most signal from is if they've built something. Right. And if they've
Starting point is 01:18:02 built something that's linked from their profile and I can just like click to it. Projects. Yeah. And, you know, like, it's just like a cool website. Right. You know, like grades matter much less now. Yeah, I don't even look. Actually, now that you, I didn't even realize that I haven't looked at anyone's grades. You know, like, I just like, because, you know, admittedly, we're only hiring a few new grads. Right.
Starting point is 01:18:22 But that is the single largest signal. It's just like, what have you built? Right. And is there some way for me to validate that? Like, maybe it's because I can click to the website or maybe you just have some stats on, like, how many people used it. Right. And then when I talk to them, I'm just like, yeah, let's talk about what you built and how you thought about that. So maybe that's somewhat helpful for folks who are looking for something.
Starting point is 01:18:40 You know, I kind of reflect on my journey here to Open AI, which I'm really grateful for. And I view it as a privilege to be working here. But, you know, when I look back to when we were working on the startup, multi, which is like not an AI company. And we saw like, chat to be come out and we started to follow all this LLM stuff. I remember just feeling like, wow, like there is a chance that if we don't do this right over the next couple of years, like my co-founder and I were talking. There's a chance that we actually just end up like dinosaurs. Right. And so at the time, we actually made, like, a very explicit decision to, like, heavily prioritized getting us and the entire company, like, ramped on AI things. And to some extent,
Starting point is 01:19:15 like, I don't know if I could have, like, gotten the job that I have here at Open AI. If I was just applying randomly, I think it's because we had built something that was interesting, that we were able to, like, get that attention and have that conversation. So I guess if there's one takeaway here, it's just, like, just got to build. It's time to build. Yeah. Thanks for listening to the A16s. podcast. If you enjoyed the episode, let us know by leaving a review at rate thispodcast.com slash A16Z. We've got more great conversations coming your way. See you next time. As a reminder, the content here is for informational purposes only. Should not be taken as legal
Starting point is 01:19:50 business, tax, or investment advice, or be used to evaluate any investment or security and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see A16Z.com forward slash disclosures.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.