a16z Podcast - From Vibe Coding to Vibe Researching: OpenAI’s Mark Chen and Jakub Pachocki

Episode Date: September 25, 2025

What comes after vibe coding? Maybe vibe researching.OpenAI’s Chief Scientist, Jakub Pachocki, and Chief Research Officer, Mark Chen, join a16z general partners Anjney Midha and Sarah Wang to go dee...p on GPT-5—how they fused fast replies with long-horizon reasoning, how they measure progress once benchmarks saturate, and why reinforcement learning keeps surprising skeptics.They explore agentic systems (and their stability tradeoffs), coding models that change how software gets made, and the bigger bet: an automated researcher that can generate new ideas with real economic impact. Plus: how they prioritize compute, hire “cave-dweller” talent, protect fundamental research inside a product company, and keep pace without chasing every shiny demo. Timecodes: 0:00  Introduction & Goals of Automated Researcher0:43  The Evolution of Reasoning in AI1:46  Evaluations: From Benchmarks to Real-World Impact5:15  Surprising Capabilities of GPT-56:56  The Research Roadmap: Next 1, 2, 5 Years7:46  Long-Horizon Agency & Model Memory9:44  Reasoning in Open-Ended Domains11:18  The Role and Progress of Reinforcement Learning13:14  Reward Modeling & Best Practices14:21  The New Codex: Real-World Coding16:20  AI vs. Human Coding: The New Default20:07  What Makes a Great Researcher?21:14  Persistence, Conviction, and Problem Selection26:00  Building and Sustaining a Winning Research Culture31:45  Balancing Product and Fundamental Research39:00  The Importance of Compute and Physical Constraints45:50  Maintaining Speed and Learning at Scale47:18  Trust and Collaboration at OpenAI Resources: Find Jakub on X: https://x.com/merettmFind Mark on X: https://x.com/markchen90Find Sarah on X: https://x.com/sarahdingwangFind Anjney on X: https://x.com/AnjneyMidha Stay Updated: If you enjoyed this episode, be sure to like, subscribe, and share with your friends!Find a16z on X: https://x.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zListen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYXListen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711Follow our host: https://x.com/eriktorenbergPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Stay Updated:Find a16z on XFind a16z on LinkedInListen to the a16z Podcast on SpotifyListen to the a16z Podcast on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Transcript
Discussion (0)
Starting point is 00:00:00 The big thing that we are targeting is producing an automated researcher. So automating the discovery of new ideas, the next set of evils and milestones that we're looking at will involve actual movement on things that are economically relevant. I was talking to some high schoolers and they're saying, oh, you know, actually the default way to code is vibe coding. I do think, you know, the future hopefully will be vibe researching. What does it take to build an automated researcher
Starting point is 00:00:24 and can AI discover new ideas on its own? OpenAI's chief scientist, Yakopohotsky, and chief research officer Mark Chen, joined A16Z general partners Ageny Mitha and Sarah Wang to unpack GPT-5's reasoning push. Why e-vowels must shift to economically meaningful benchmarks and the march towards an automated researcher. We get into Long Horizon Agency, why RL keeps working, the new codex for real-world coding, research culture versus product, and why, for now, compute, is destiny. Let's get in. to have.
Starting point is 00:00:59 Thanks for coming, Jakob and Mark. Jakob, you are the chief scientist at Open AI. Mark, you are the chief research officer at Open AI, and you guys have both the privilege and the stress of running probably one of the most high-profile research teams in AI. And so we're just really stoked to talk with you about a whole bunch of things we've been curious about, including GPD-5, which was one of the most exciting updates to come out of Open Eye in recent times,
Starting point is 00:01:24 and then stepping back how you build a research team that can do not just GPD-5, but codex and chat GPD-T and an API business and can weave all of the many different bets you guys have across modalities, across product form factors into one coherent research culture and story. And so to kick things off, why don't we start with GPD-5? Just tell us a little bit about the GPD-5 launch from your perspective. How did it go? So I think GPD-5 was really our attempt to bring reasoning into the mainstream.
Starting point is 00:01:56 And prior to GPT5, right, we have two different series of models. You had the GPT kind of two, three, four series, which were kind of these instant response models. And then we had an O series, which essentially thought for a very long time and then gave you the best answer that it could give. So tactically, we don't want our users to be puzzled by, you know, which mode should I use, and involves a lot of research in kind of identifying what the right amount of thinking for any particular prompt looks like,
Starting point is 00:02:26 and taking that pain away from the user. So we think the future is about reasoning, more and more about reasoning, more and more about agents, and we think GP-D5 is this step towards delivering reasoning and more agentic behavior by default. There is also a number of improvements across the part in this model
Starting point is 00:02:48 relative to all three other previous models, but our primary thesis for this launch was indeed bringing the reason about more people. Can you say more about how you guys think about e-vals? I noticed even in that launch video, there were a number of e-vals where you're inching up from, you know, 98 to 99 percent, and that's kind of how you know you've saturated the eval. What approach do you guys take to measuring progress and how do you think about it? One thing is that indeed for like these evils that we've been using for the last few years, they're indeed pretty close to saturated. And so, yeah, like
Starting point is 00:03:19 for a lot of them, like, you know, inching from like 96 to 98 percent is not necessarily, the most important thing in the world. I think another thing that's maybe even more important, but a little bit subtler. When we were in this GPT2, GPT3, GD4 era, there was kind of one recipe. You just like pre-trained a model on a lot of data and you kind of like use these evils
Starting point is 00:03:41 as just kind of a yard stick of how this generalizes to like different tasks. Now we have this different ways of training in particular reinforcement learning on like serious reasoning where we can pick a domain and we can really train a model to like become an expert in this domain to reason very hard about it, which lets us target particular kinds of tasks, which will mean that we can get extremely good performance on some evils, but it doesn't
Starting point is 00:04:05 indicate as great generalization to other things. So the way we think about it in this world, we definitely think like we are in a little bit of a deficit, like of great evaluations. And I think the big things that we look at are actual marks of the model being able to discover new things. I think for me, the most exciting trend and actual sign of progress this year has been our models' performance in math and programming competitions, although I think, like, they are also becoming saturated in a sense. And the next set of evils and milestones that we're looking at will involve actual discovery and actual movement on things that are economically relevant. Totally. You guys already got number two in the At-Coder competition, so there's only number one left. Yeah. I mean, I think it is important to note that these e-vals, like, you know, I.O.I. at Coder, IMO, are actually real-world markers for success in future research. I think a lot of, you know, the best researchers in the world have gone through these competitions that have gotten very good results. And yeah, I think we are kind of preparing for this frontier where we're trying to get our models to discover new things. Yeah, very exciting.
Starting point is 00:05:13 Which capability from GPD 5 before the release surprised you the most when you were working through the eval bench or using it internally? Were there any moments where you felt like this was starting to get good enough to release because it was useful in your daily usage? I think one big thing for me was just how much it moved the frontier in very hard sciences. You know, we would try the models with some of our friends who are, you know, professional physicists or professional physicists or professional. mathematicians. And you already saw kind of some instances about this on Twitter where, you know, you can take a problem and have it discover, maybe not like very complicated new mathematics, but, you know, some non-trivial new mathematics. And we see physicists, mathematicians kind of repeating this experience over and over where they're trying Jupy-5 Pro and saying,
Starting point is 00:06:04 wow, this is something that previous version of the models couldn't do. And it is a little bit of a light bulb moment for them. It's like able to automate maybe like what could take one of their students months of time. Well, GP5 is a definite improvement on O3. For me, O3 was definitely like that moment where the reasoning models became like actually very useful on a daily basis. I think especially for, you know, working through a math formula or a derivation. Like it actually got to a level where it is like fairly trustworthy. I can actually use it as a tool for my work. And, yeah, I think it is very exciting to get to that moment, but I expect that, well, now as we're seeing, you know, these models like actually able to automate, well, yes, like we're saying, solving contest problems over longer time horizons. I expect that that was quite small compared to what's coming over the next year.
Starting point is 00:06:55 What is coming in the next one to five years? At whatever level you're comfortable sharing, what does the research roadmap look like? So the big thing that we are targeting with our research is producing an automated researcher. automating the discovery of new ideas. And, you know, of course, like a particular thing we think about a lot is automating our own work, automating ML research, but that can get a little bit self-referential. So we're also thinking about automating progress in other sciences. And I think, like, one good way to measure progress there is looking at, like, what is the time horizon
Starting point is 00:07:29 on which these models actually can reason and make progress. And so now as we get to a level of near mastery of this high school competitions, let's say, say we get to like maybe on the order of one to five hours of reasoning. And so we are focused on extending that horizon, both in terms of like the models, the ability to plan over very long horizons and actually able to retain ability to retain memory. And back to the evals question. That's why I think evils of the form of how long does this model autonomously operate for are of particular interest to us. And actually maybe on that topic, there's been this huge move toward agency and model development. But I think at least the state that
Starting point is 00:08:11 it's in currently users have sort of observed this tradeoff between too many tools or planning hops can result in quality regressions versus something that maybe has a little bit less agency. The quality is at least observed today to be a bit higher. How do you guys think about the tradeoff between stability and depth? The more steps that the model is undertaking, maybe the less likely the 10th step is to be accurate versus you ask it to do one thing, it can do it very, very well. And to have it keep doing that one thing better and better, but more complex things. There's sort of that trade-off. But of course, to get to full autonomy, you are taking multiple steps. You're using multiple tools.
Starting point is 00:08:49 I think actually, like, well, the ability to maintain depth is a lot of it is being consistent over long horizons. So I think there are very related problems. And in fact, I think like with the reasoning models, we have seen the models, like, great. extend the length of which they are able to reason and work reliably without going off track. Yeah, I think this is remain a big area of focus for us. Yeah, and I think reasoning is core to this ability to operate over a long horizon. Because, you know, you imagine kind of yourself solving a math problem, or you try an approach, it doesn't work.
Starting point is 00:09:23 And, you know, you have to think about, you know, what's the next approach I'm going to take? What are the mistakes in the first approach? And then you try another thing. And, you know, the world gives you some hard feedback, right? And then you keep trying different approaches. And the ability to do that over a long period of time is reasoning and gives agents that robustness. We talked a lot about math and science. I was curious to get your take on, do you think some of the progress that we've made can actually extend similarly to domains that are less verifiable?
Starting point is 00:09:50 They're sort of less of an explicit right or wrong. Oh, yeah, this is a question. I really like. I think if you actually truly want to extend to research and, you know, finding, discovering ideas that that meaningfully advanced technology on the, you know, the scale of like months and years, like, I think these questions like stop being so different, right? Like, it is one thing to solve like a very well post-constrained problem on the scale of an hour, right? And there's like kind of a finite amount of ideas
Starting point is 00:10:20 you need to look through and that might feel extremely different from solving something very open-ended. But, you know, even if you want to solve like a very well-defined problem that is on much longer scale, right? Like, you know, prove this millennium price problem. Well, that suddenly requires you to think about, okay, like, what are the fields of mathematics or other sciences that might possibly be relevant? You know, are there inspiration from physics that I must take? Like, what is kind of the entire program that I want to develop around this? And now this become very open-ended questions. And it's actually hard to, you know, for our own research, right? Like, if all we cared about is, you know, reduce the modeling clause on a given data set,
Starting point is 00:10:55 right? Like, measuring the progress on that, like, are we kind of actually ask the right questions in research, like actually becomes like a fairly open-ended affair. Yeah, and I think it also makes sense to think about what the limits of, you know, open-ended means. I think a while back Sam tweeted about some of the improvements that we were making in having our models write more creatively. And you know, we do consider the extremes here as well. Right, right. Let's talk about RL. Because it seems like since 01 came out, RL has been the gift that keeps giving. You know, every couple of months, opening I put some to release, and everyone goes, oh, that's great, but this RL thing is going to plateau.
Starting point is 00:11:34 We're going to saturate the evals. The models won't generalize, or there's going to be mode collapse because of too much synthetic data for whatever. Everybody's got a laundry list of reasons to believe that the gains and performance from RL are going to tap out. And somehow they just don't. You guys just keep coming out and putting out continuous improvements. Why is RL working so well? And what, if anything, has surprised you about how well it works? is a very versatile method, right? And there are a lot of ideas you can explore once you have an RL system working. A long time at Open AI, we started from this before language models, right?
Starting point is 00:12:11 Like we were thinking about like, oh, okay, like RL is this extremely powerful thing, of course, like on top of deep learning, which is that's like incredible general learning method. But the thing that we struggled with for a very long time is like what is the environment, like how do we actually anchor these models to the real world or like, should we, you know, simulate, you know, some island where they all learn to collaborate and compete. And then, you know, of course, came the language modeling breakthrough, right? And we saw that, oh, yeah, if we scale deep learning on modeling natural language, we can create models with this like incredibly new understanding of human language.
Starting point is 00:12:43 And so since then, we've been, you know, seeking how to combine these paradigms and how to get our all to work on natural language. And once you do, right, like, then you kind of have the, well, you have the ability to actually, like, execute on these different ideas and objectives in this, like, extremely robust, rich, given by pre-training. And so, yes, I think it's been perhaps the most exciting period in our research over the last few years where we've really, like, found so many new directions and promising ideas that all seemed to be working out, and we're trying to understand how to compare. One of the hardest things about RL for folks who are not practitioners of RL is the idea
Starting point is 00:13:19 of crafting the right reward model. And so especially if you're a business or an enterprise who wants to harness all this amazing progress you guys are putting out, but doesn't even know. where to start. What do the next few years look like for a company like that? What is the right mindset for somebody who's trying to make sense of RL to craft the right reward model? Is anything you've learned about the best practices or an approach of thinking, of using this latest sort of family of reasoning techniques? What is the right way I should think about even approaching reward modeling as a biologist or a physicist? I expect this will evolve quite
Starting point is 00:13:56 rapidly. I expect it will become simpler, right? Like, I think, you know, maybe like two years ago, we would have been talking about, like, what is the right way to craft my fine-tuning data set? And I don't think we are, like, at the end of that evolution yet. And I think we will be inching towards more and more human-like learning, which, you know, RL is still not quite. So I think maybe the most important part of the mindset is to, like, not assume that, like, what is now will be forever. So I want to bring the conversation back to coding. We would be remiss not to say congrats on GPT-5 Codex, which just dropped today. Can you guys say a little bit more about what's different about it,
Starting point is 00:14:31 how it's trained differently, maybe why you're excited about it? Yeah, so I think one of the big focuses of the Codex team is to just take the raw intelligence that we have from our reasoning models and make it very useful for real-world coding. So a lot of the work they've done is kind of consistent with this. They are working on kind of having the model be able to handle more difficult environments. We know that real-world coding is very messy.
Starting point is 00:14:58 So they're trying to handle all the intricacies there. There's a lot of coding that has to do with style, with just like kind of softer things, like how proactive the model is, how lazy it is. And just being able to define in some sense, like a spec for how a coding model should behave. They do a lot of very strong work there. And as these things, like they're also working
Starting point is 00:15:22 on a lot better presets. you know, coders, they have some kind of notion of this is how long I'm waiting, I'm willing to wait for a particular solution. I think we've done a lot of work to dial in on, you know, for easy problems, being a lot, you know, lower latency. For harder problems, actually, the right thing is to be even higher latency, get you the really best solution. And just being able to find that preset is very important. What's the sweet spot for, if you were to say, like, easier problems versus harder? What we found is the previous generation of the Codex models. They were spending too little time solving the hardest problems and too much time solving the easy problems.
Starting point is 00:15:59 And I think that is actually just probably out of the box what you might get out of 03. Maybe just on the topic of coding, since you guys are both competitive coders in prior lives. I know you've been at opening eye for a decade now, but I was struck by the story of Lee Cidall, the Go player, who kind of famously quit Go after he lost to AlphaGo multiple times. And I think in a recent interview, you guys were both saying that now the coding models are better than your capabilities, and that gets you excited. But say more about that.
Starting point is 00:16:34 And how much would you say you code now? Well, if your hands on keyboard, you can talk about Open AI generally, but how much code is written by AI now? In terms of cutting models being better, I mean, I think, yeah, I think it is extremely exciting to see this progress. I think like the programming competitions have a nice kind of encapsulated test of like ability to come off some new ideas in, you know, in this like boxed environment and time frame. I do think like, you know, if you look at things like, well, I guess the IMO problem
Starting point is 00:17:09 six or maybe some very hardest programming competitions problems, like I think there's still a little bit of headway to go for the models, but I wouldn't expect that to look. last very long. I do go a little bit. Historically, I've been like... He's being humble. Historically, I've actually been extremely reluctant to use any sort of tools. I just
Starting point is 00:17:31 used them pretty much. Oh, yeah. Okay. Old school. Yeah. Yeah. Eventually, I think like, like, especially with this latest calling tools, like GPT5, I've really kind of felt like, okay, like this is, this is no longer
Starting point is 00:17:47 the way. Like, like, you can do a 30-file refactor like pretty much perfectly in like 15 minutes. Like you kind of have to use it. Yeah. And so I've been kind of like learning this new way of coding, which definitely feels a little bit different. I think it is like a little bit of an uncanny valley seal right now where like you kind of have to use it because it is just like accelerating so many things, but it's still like, you know, a little bit like not quite as good as a as a as a as a as a as a as a as a, it's a. worker. So, you know, I think like our priority is getting out of that in County Valley. Yeah. But yeah, it's definitely an interesting time.
Starting point is 00:18:29 Yeah, definitely. To kind of like speak to the recent moment, I think AlphaGo, for both of us was, you know, a very formative milestone in AI development. And at least for me, it was the reason I started working on this in the first place. And maybe partly because of our backgrounds in competitive programming, like I had this affinity. to building these models, which could do very, very well in these forms of contests. And going from, you know, solving eighth grade math problems to a year later, hitting our level of performance in these coding contests, it's crazy to see that progression. And you kind of imagine or like to think that you feel a set of the feelings at least it all felt too,
Starting point is 00:19:14 right? It's like, wow, this is really crazy, right? and what are the possibilities? And this is something that I took decades to do. And it took a lot of hard work to get to the forefront of. So you really do feel an implication of that is these models, what can't they do, right? And I do feel like already it's kind of transferred
Starting point is 00:19:34 the default for coding. This past weekend, I was talking to some high schoolers and they were saying, oh, you know, actually the default way to code is vibe coding. Like I think like they would consider, oh, it's like maybe sometimes for completeness, would go and actually do all of the mechanics of coding it from scratch yourself, but that's just a strange concept to them. Like, why would you do that? You know, you just vibe code by default.
Starting point is 00:19:58 Yeah, yeah. And so, yeah, I mean, I do think, you know, the future hopefully will be vibe researching. Yeah. I have a question about that, which is what makes a great researcher, right? When you say vibe researching. There's a big part of vibe coding is just having good taste in wanting to build something useful and interesting for the world. I think what's so awesome about tools like codex is if you've got a good intuition for what people want, it helps you articulate that and then and then basically actualize a prototype very fast. With research, what's the analog? What makes a great researcher? Persistence is a very key trait, right? I think like what is different about research when you're actually trying to,
Starting point is 00:20:46 I think a special thing about research, right, is you're trying to create something or learn something that is just not known, right? Like, it's not known to work. Like, you don't know whether it will work. And so always trying something that will most likely fail. And I think getting to a place where you are, like, in the minds of, like, being ready to fail
Starting point is 00:21:06 and being ready to learn from these failures. And, you know, so, and, you know, and of course with that comes creating kind of clear hypothesis and being extremely honest with yourself about how you're doing on them, right? I think a trap many people fall into is going out of the way to prove that it works, right? Which is quite different from, you know, like,
Starting point is 00:21:24 I think believing in your idea and sticking with it's extremely important, right? And you want to persist that, but you have to be honest with yourself about when it's working and when it's not so that you can learn and adjust. Yeah, I think there are just very few shortcuts for experience. I think through experience,
Starting point is 00:21:40 you kind of learn, you know, what's the right, horizon to be thinking of a problem, but you can't pick something that's too hard or it's not satisfying to do something that's too easy. And I think a lot of research is managing your own emotions over a long period of time, too. You know, there's just going to be a lot of things you try and they're not going to work. And sometimes you need to know when to persevere to that or sometimes when to kind of switch to a different problem. And I think interestingness is something, you know, you try to fit through reading good papers, talking to your colleagues. And
Starting point is 00:22:12 And you kind of maybe distilled their experience into your own process. When I was in grad school, you know, there's a big part, I'm a failed machine learning research. I was in grad school for bioinformatics. But a big part of my research advisor's thrust was about picking the right problems to work on such that you could then sustain and persist through the hard times. And you said something interesting, which was there's a difference between having conviction in an idea and then being maximally truth-seeking but when it's not working and both those things are sometimes in tension
Starting point is 00:22:45 because you kind of go native on a topic or a problem sometimes that you have deep conviction in. Have you found, is there any sort of heuristics you found are useful at the taste step at the problem picking step that help you arrive at the right set of problems
Starting point is 00:22:58 where that conviction and truth-seeking is not as much in zero-sum tension as other kinds of problems? Yeah, to be clear, I don't think conviction and truth-seeking are really in a zero-sum tension. I think, like, you can be like,
Starting point is 00:23:10 you can be convinced or you know you can have a lot of belief in idea and you can be very persistent in it while it's not working I think it's just important that you're kind of honest with yourself like like how much progress you're making and you're in a mindset where you're able to learn from the failures along the way I think it's important to look for problems that you really care about and you really believe are important right and so I think one one thing I've observed in in in many researchers that inspired me has been really going after the hard problems, like looking at the questions that are, you know, kind of like, you know, wildly known, but like not
Starting point is 00:23:51 really kind of considered tractable and just asking like, you know, why are they not tractable or like, you know, what, like, what about this approach? Like, why does this approach fail? I think you're always like thinking about what is really the barrier for the next step. If you're going after problems that like you really truly believe are important, right? then that makes it so much easier to find the motivation to persist with them over years. And in the development of, like,
Starting point is 00:24:16 during the training phase of GPD-5, for example, with any moments where there was a hard problem, their initial attempts that were being made to crack, that problem weren't working, and yet you found somebody persisted through that. And what was it about any of those stories that comes to mind that worked well, that you wish other people and other researchers did more of? I think on the path there, right, like along the sequence of models, like both the pre-trained models
Starting point is 00:24:52 and the research models. I think one very common theme is bags. And, you know, both like just like, yeah, silly bags in software that can kind of stay in your software for like months and kind of invalidate all your experiments a little bit in a way that you don't know. And identifying them can be a very meaningful breakthrough for your research program. But also kind of bugs in the sense of like, well, you have a particular way of thinking about something. That way is a little bit skewed, which causes you to make the wrong assumptions and identifying those wrong assumptions, rethinking frames from scratch.
Starting point is 00:25:30 I think both for getting the first reasoning models work. or getting the, you know, larger pre-trained models working. I think we've had multiple issues like that we've had to work through. As leaders of the research org, how do you think about what it takes to keep the best talent on your team and on the flip side, creating a very resilient org that doesn't crumble if a key person leaves? The biggest, I think, things that Open AI has going for it
Starting point is 00:26:00 in terms of keeping the best people motivated and exciting, excited, it is like we are in the business of doing fundamental research, right? We aren't the type of company that looks around and says, oh, what model did, you know, company X built or what model did company Y build? You know, we have a fairly clear and crisp definition of what it is we're out to build. We like innovating at the frontier. We really don't like copying. And I think people are inspired by that mission, right?
Starting point is 00:26:30 you are really in the business of discovering new things about the deep learning stack and I think we're kind of building something very exciting together. I think beyond that, a lot of it's creating very good culture. So we want a good pipeline for training up people to become very good researchers. We, I think, historically have hired the best talent and the most innovative talent. So I just think, you know, we have a very deep bench as well. And, yeah, I think most of our leaders are very inspired by the mission, and that's what's kept all of them there.
Starting point is 00:27:11 Like when I look at my direct reports, they haven't been affected by the Talon Moors. I was chatting with a researcher recently, and he was talking about wanting to find the cave dwellers. And these are often the people who are not posting on social media about their work. For whatever reason, they may not even be publishing. They're sort of in the background doing the work. I don't know if you would agree with this concept, but how do you guys hire for researchers?
Starting point is 00:27:38 And are there any non-obvious ways that you look for talent or attributes that you look for that are non-obvious? So I think one thing that we look for is having solved hard problems in any field. A lot of our most successful researchers, have started their journey with deep learning at Open AI and have worked in other fields like physics or computer science, theater science or finance
Starting point is 00:28:09 in the past, strong technical fundamentals coupled with the ability, the intends to work on very ambitious problems and actually stick with them. We don't purely look for who did the most visible work or is the most visible on social media. Yeah. As you were talking, I was thinking back to when I was a founder and I was running my own company and we would recruit for great talent engineers, many of the attributes you described were
Starting point is 00:28:39 ones that were on my mind then. And Elon recently tweeted that he thinks this whole researcher versus engineer distinction is silly. Is that just a semantic? Is he just being semantically nitpicky? Or do you think these two things are more similar than they actually look? Yeah, I mean, I do think they're, like, researchers, they don't just fit one shape. You know, we have certain researchers who are very productive at Open AI who are just so good at idea generation.
Starting point is 00:29:07 And, you know, they don't necessarily need to show great impact through implementing all of their ideas, right? I think there's so much alpha they generate in just kind of coming up with, oh, let's try this or let's try this. Or maybe we're thinking about that. And there's other researchers who, you know, they are just very, very efficient at taking one idea, rigorously exploring, you know, the space of experiments around that idea. So I think, you know, researchers come in very different forms. I think maybe that first type wouldn't necessarily map into the same bucket as a great engineer. But, you know, we do kind of try to have a fairly diverse set of research tastes and styles.
Starting point is 00:29:47 And say a little bit about what it takes to make, like, create a frontier sort of winning culture that can attract all kinds of shapes and researchers and then actually grow them, thrive them, make them win together at scale. What do you think of the most critical ingredients of a winning culture? So I think actually the most important thing is just to make sure you protect fundamental research. Right. I think you can get into this world with so many different companies these days where you're just thinking about, oh, how do I compete on, you know, a chat product or some other kind of product surface? And you need to make sure that you leave space and recognize the research for what it is. And also give them the space to do that. Right. Like you can't have them being pulled in all of these different product directions. So I think that's one thing that we pay attention to within our culture. especially now that there's so much spotlight on open AI, so much spotlight on AI in general, and the competition between different labs, it would be easy to fall into a mindset of,
Starting point is 00:30:57 oh, we're racing to beat this latest release or something. And, you know, there's definitely like areas that people kind of start looking over their shoulder and start thinking about, oh, what are these other things? And I see it as a large part of, our job to make sure that people have this comfort and space to think about, you know, what are things actually going to look like in a year or two, like, what are the actually big research questions that we want to answer and how do we actually get to models that
Starting point is 00:31:30 like vastly outperform what we see currently rather than just like iteratively improving in the current paradigm? Just to pull on that thread more unprotecting fundamental research. You guys are obviously one of the best research organizations in the world, but you're also one of the best product companies in the world. How do you balance, and especially with you've brought on some of the best product execs in the world as well, how do you balance that focus between the two and while protecting fundamental research also continue to move forward the great products
Starting point is 00:31:58 that you have out? Yeah. I mean, I think it's about kind of delineating a set of researchers who really do care about product and who really want to be accountable to the success of the product. And they should, of course, very closely coordinate with the, the research work at large. But I think just kind of people understanding their mandates and what they are rewarded for, that's a very important thing.
Starting point is 00:32:25 One thing, the other thing is also helpful is that our product team and broader company leadership is bought into this vision, right, where we are going with research. And so nobody is assuming that, like, the product we have now is a product we'll have forever and we'll just kind of wait for like new versions from research like we are able to think jointly about what the future looks like one of the things that you guys have done is let such a diversity of different ideas and bets flourish inside of open AI that you then have to figure out some way as research leaders to to make it all make
Starting point is 00:33:06 coherent sense as one part of a roadmap and you got you know people over here investigating the future of diffusion models and visual media. And over here, you've got folks investigating the future of reasoning when it comes to code. How do you paint a coherent picture of all that? How does that all come together? When there might be, at least
Starting point is 00:33:28 naively some tension between giving researchers the independence to go to fundamental research and then somehow making that all fit into one coherent research program. Our state of goal for a research program has been getting to an automated researcher for a couple years now. And so we've been building most of our projects with this goal in mind. And so this still leaves a lot of room for
Starting point is 00:33:57 kind of bottom-up idea generation for fundamental research on various domains. But we are always thinking about how do these ideas come together eventually. We are, you know, we We believe, for example, that reasoning models go much further and we have a lot of explorations on things that are not directly reasoning models, but we are thinking a lot about how they eventually combine and, you know, what does, what will this kind of innovation look like once you have something that is out there and thinking for months about a very hard problem. And so I think this clarity of like our long-term objectives is important, but yeah, but it doesn't mean that we are, you know,
Starting point is 00:34:41 about, like, oh, here are all the little pieces, right? Like, we definitely view this as a question of exploration and learning about these technologies. Yeah, I think you want to be opinionated and prescriptive at their very kind of course level, but, you know, a lot of ideas can bubble up in a finer level. And has there been any moments where those things have been intentioned at all recently? Well, one provocative example could be recently, you know, this new image model came out, which is nanobanana, right, from Google. It's extraordinary value shown.
Starting point is 00:35:11 that like lots of everyday people can unlock a lot of creativity when these models are good at understanding, editing prompts. And I could see how that would create some tension for a research program that may not be prioritizing that as directly.
Starting point is 00:35:28 If one of your, you know, somebody talented on your team came and said, guys, like, this thing is so clearly valuable in the world out there, we should be spending, you know, more effort, more energy on this. How do you reason about that question? I think there's definitely a question that we've been kind of thinking about for quite a while at Open AI.
Starting point is 00:35:43 I mean, if you look at GPT3, right, like once we kind of saw like, oh, like this is kind of where language models are going, we definitely had a lot of discussions about, well, clearly there are going to be so many magical things you can do with AI, right? And you will be able to go to this like extremely smart models that are out there pushing different tiers of science, but you will also have this like incredible media generation and this incredibly, you know, transformative, entertainment applications. And so, like, how do we prioritize among all these directions
Starting point is 00:36:20 has definitely been something we've been thinking about for quite a while. Yeah, absolutely. And the real answer is, like, we don't discourage someone from being really excited by that. And it's just, if we're consistent in the prioritization and our product strategy, then it just won't naturally fall in. And so it's just for us, like, we do encourage a lot of people to be excited about, you know, building this, you know, we're building kind of like agentic products, you know, whatever kind of products that they're excited by. But I think it's important for us to also have a separate group of people who you protect that their goal is to create the algorithmic advances.
Starting point is 00:37:01 How does that translate, just to build on Andre's question, into a concrete framework around resourcing? Like, do you think about, okay, X percent of compute resources will go to longer term, you know, very important, but maybe a bit more pie in the sky exploration versus there's also, you know, obviously current product inference, but sort of this thing in the middle where it's achievable in the short to medium term. Yeah. So I think that's a big part of both of our jobs. Yeah. Just this portfolio management question of how much compute do you give to which project. And I think historically we've put a little bit more on just the core algorithmic advances versus kind of the product research.
Starting point is 00:37:42 But it's something that you have to feel out over time, right? It's dynamic, I think, month to month, there could be different needs. And so I think it's important to stay fairly flexible on that. And if you had 10% more resources, would you put it toward compute, or is it data curation, people? where would you stick that from like a marginal good question
Starting point is 00:38:07 honestly yeah I think compute today fairly reasonable answer yeah yeah I mean honestly I do think kind of your question of prioritization right it's like in a vacuum any of these things you would love to like go and excel and win at I think the danger is
Starting point is 00:38:23 you end up like second place at everything and you know not like you know clearly leading anything. So I think prioritization is important, right? And you need to make sure there's some things you're clear-eyed on. This is the thing that we need to win. Yeah. But I think it makes sense to talk about it for just a little bit more, which is compute sets so much of, compute as destiny in a way, right, at a research organization like Open AI. And so a couple of years ago, I think it became very fashionable to say, oh, okay, we're not going to be compute constrained anytime soon because there's a bunch of CMs that are, you know, people are discovering
Starting point is 00:39:00 and we're going to get more efficient and all the algorithms are going to get better. And then eventually, like, really, we'll just be in a data-constrained regime. And it seems like, you know, a couple of years have come and gone, and we're still, like, this is sort of very compute-constrained environment. Does that change anytime soon, you think? I mean, I think, like, we've seen for long enough, like, how much we can do with compute. But, yeah, I haven't really bought that much into the, like, will-be-data-constrained claim. And, yeah, I don't expect that to change.
Starting point is 00:39:35 Yeah, anyone who says that should just step into my job for a week. There's no one who's like, oh, you know, I have all the compute that I need. Right. Yeah. You know, historically, the job of advancing fundamental research has historically been largely a mandate that universities have had. partly for the compute reasons you just described, that hasn't been the case for Frontier AI. You guys have done such an incredible job
Starting point is 00:39:59 kind of channeling the arc of frontier AI progress to help the sciences out. And I'm wondering when those worlds collide, the fundamental world of university research today and the world of Frontier AI, what comes out? So I guess I personally started as a resident at OpenAI, and it's a program that we had for people in different fields
Starting point is 00:40:21 to come in, you know, learn quickly about AI and become productive as a researcher. And I think there is a lot of really powerful elements in that program. And, you know, the idea is just like, you know, could we accelerate something that looks like a PhD in as little time as possible? And I think a lot of that just looks like
Starting point is 00:40:43 implementing a lot of, you know, very core results. And, you know, through doing that, you're going to make mistakes. You're going to be like, oh, wow, like build intuition for if I, you know, set this wrong, like, that's going to blow up my network in this way. And so you just need a lot of that hands-on experience. I think over time, you know, there have been curriculums developed at probably all of these large labs in, like, optimization, in architecture, in RL, and yeah, probably no better way than
Starting point is 00:41:12 to just kind of try to implement a lot of those things and read about them and think critically about them. Yeah. Yeah, I think maybe like one other nice thing that you get to experience at academia is like, yeah, just like persistence, right? Of like, oh, you know, you have a few years and you're kind of trying to solve a problem and it's a hard problem and you've never dealt with such a hard problem before. And yeah, I do feel like this is a thing that's like, well, currently the pace of progress is very fast. maybe also the ideas tend to work out a little bit more often than they did in the past because deep learning just wants to learn
Starting point is 00:41:54 and getting your hands on a more challenging problem for a little bit maybe being part of a team attacking like an ambitious challenge and getting that feeling of what it feels like to be stacked and what it feels like to finally be making progress I think is also something that's like very useful to learn. How does external perception, reception of a particular product launch, impact how you prioritize something? Is it to the extent where, you know, perception and usage, in the case where they're married, obviously, there's probably a clear directive there, but in a case where maybe they're divorced a bit, does that impact how you think about roadmap or where you emphasize resources?
Starting point is 00:42:42 So we generally have some pretty strong convictions about the future, and so we don't tie them that closely to the short-term reception of our products, right? Like, of course, we learn based on what is going on. We read other papers, and we look at what other labs are working on, but generally, like, we act from a place of a fairly strong belief in... in what we're building. And so, of course, like, you know, that is for like our long-term research program, of course, when it comes to product, right? Like, I think the cycle of iteration is much, much faster. Yeah.
Starting point is 00:43:30 I think, you know, with every launch, you know, we are trying to aim it to be something that's wildly successful on the product side. And, you know, I think from a fundamental research perspective, we're trying to create create models with all of the kind of core capabilities needed to build a very rich set of experiences and products. And there are going to be people who have some vision of like one particular thing they could build and we'll launch it and everything we launch, we really hope it goes wildly successful.
Starting point is 00:44:01 And we get that feedback and if it's not, like we'll kind of shape our product strategy a little bit. But yeah, we are definitely also in the business of launching very useful, wildly successful products. It feels like because of the sort of completely unbridled pace of progress that we've just spent a lot of time talking about, a lot is going to change over the next two years, right? It gets really hard to predict, I imagine, 10 years out, let alone 10 months out.
Starting point is 00:44:31 And so my question, I guess, is through all that change that the frontier of AI is going to bring, what are some priors that you actually think should stay constant? Is there anything? Well, one clearly is that we don't have enough compute. Is there anything else that you think doesn't change that you think would be strong, reasonably held priors as constant? I think more broadly than compute, there is physical constraints of, well, energy,
Starting point is 00:44:59 but also, like, you know, at some point not too far, like robotics will become a major focus. And so I think thinking about the physical constraints is going to remain important. But yeah, I do think on the intelligence front, I would not make too many assumptions. Very few startups can get to the scale that you have, both from a employee perspective but also revenue count
Starting point is 00:45:30 and maintain that breakneck speed that you probably add, I mean, seven, eight years ago when you, when you both joined, what's the secret sauce to doing that? And how do you continue to maintain this pressure almost to ship as quickly as possible, even though, you know, you're kind of on, you know, top now? I think one of the clearest markers that we have really good research culture, at least in my mind, is, you know, I've worked at different companies before. And there is a real thing, which is a learning plateau, right? You go to a company, you learn a lot for the first. one or two years and then you just find kind of like, you know, I know how to be fairly efficient in this framework and my learning kind of stops. And I've really never felt that at OpenAid.
Starting point is 00:46:15 Just like that experience you described of all these really cool results bubbling up. You're just learning so much over week. And it is a full-time job to kind of stay on top of all of it. And that's just been very fulfilling. So yeah, no, I think that's a very accurate design. description. We just want to generate a lot of really high quality research and it's almost a good thing, like if you're generating enough that you're barely able to keep on top of it. Yeah, exactly. I think the develop of technology, I think, is a driving force here where, you know, maybe, yeah, maybe we would kind of become comfortable after like a few years working in a given paradigm,
Starting point is 00:46:57 but we are always on the cusp of the, you know, new thing and, you know, trying to reconfigure are thinking around the kind of new constraints and new possibilities that we're going to be faced with. And so I think that kind of creates this feeling of constant change and the mindset of like always kind of learning the new thing. Well, you know, one thing that came up in our research about things that opening that have not changed through a lot of the change is the trust that the two of you guys have in each other. Because that case, I think there was an article or profile of you guys recently in the MIT Tech Review, and that was also one of the highlight themes that your
Starting point is 00:47:35 chemistry, your trust with each other, your oppose something a lot of the people at opening I've come to treat as a constant. So what's the backstory? How did you guys build trust there? How did that happen? How do that happen? Have you ever seen that when Harry met Sally? I feel like you're on the couch and now you got to talk about... What's your meet cute? Yeah, exactly. Well, I do think, you know,
Starting point is 00:48:05 we started working together a little bit more closely when we kind of had the first seeds of working on reasoning. I think, you know, we... At the time, you know, that wasn't a very popular research direction to work on. And I think both of us kind of saw glimmers of hope there. And, you know, we were kind of pushing in... um in this direction kind of figuring out how to make our all work and um yeah i think over time kind of growing a very small effort into increasing larger effort and um and i think that's kind of where
Starting point is 00:48:42 i um yeah really got to kind of work with yakub in depth i think um i he's just really a phenomenal researcher i think you know any of these rank lists like he should be number one um like Just his ability to take any very difficult technical challenge and almost like personally just kind of think about it for two weeks and just crush it. It's incredible that he has kind of the wide range that he does in terms of understanding, as well as that kind of depth that you can go and just personally solve a lot of these technical challenges. Now you get to say some nice stuff about him. I'm just saying anything nice about me. Thanks, Mark. Yeah, yeah. I think the big kind of the first like big thing that we did together was like we started seeing like, okay, like we think this algorithm is going to work. And so, you know, I was thinking like, okay, like how do we, you know, direct people at this? And we're talking with Mark like, oh, we should establish a team that's actually going to make this work. And then, you know, Mark and Mark went and actually did this, right? Like actually kind of like got a group of like people working on very different things, like got them all together and created a team with like incredible.
Starting point is 00:49:55 chemistry out of like this whole third group and that was like such an impressive thing to me. And yeah, I'm really grateful and as far that I kind of get to, you know, work with Mark and kind of experience that. Yeah, I think this incredible capacity to both, you know, understand and engage and and, you know, think about the technical matter of the research itself, but then coupled with this like great ability to lead and inspire teams and create an organizational structure that, you know, in this whole kind of mess of chaotic directions, actually like, is coherent and able to gel together. Yeah, very, very inspiring.
Starting point is 00:50:39 That's awesome. Well, on that note, yeah. Great note to end on. Yeah. Some of the greatest discoveries in science, especially in physics, have often come from a pair of collaborators, often across universities, across fields. And it seems like you guys have now added to that tradition. And so we're just super grateful that you guys made the time to chat today.
Starting point is 00:51:01 Thanks for coming by. Thank you. Thanks for being with us. Thanks for listening to this episode of the A16Z podcast. If you like this episode, be sure to like, comment, subscribe, leave us a rating or review, and share it with your friends and family. For more episodes, go to YouTube, Apple Podcast, and Spotify. Follow us on X at A16Z and subscribe to our substack.
Starting point is 00:51:24 at a16Z.substack.com. Thanks again for listening, and I'll see you in the next episode. As a reminder, the content here is for informational purposes only. It should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see a16Z.com forward slash disclosures.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.