The a16z Show - From Vibe Coding to Vibe Researching: OpenAI’s Mark Chen and Jakub Pachocki

Starting point is 00:00:00 The big thing that we are targeting is producing an automated researcher. So automating the discovery of new ideas, the next set of evils and milestones that we're looking at will involve actual movement on things that are economically relevant. I was talking to some high schoolers and they're saying, oh, you know, actually the default way to code is vibe coding. I do think, you know, the future hopefully will be vibe researching. What does it take to build an automated researcher and get AI discover new ideas on its own? OpenAI's chief scientist, Yakopohotsky, and chief research officer Mark Chen, joined A16Z general partners, Ageny Mitha, and Sarah Wang to unpack GPT5's reasoning push.

Starting point is 00:00:39 Why e-vowels must shift to economically meaningful benchmarks and the march towards an automated researcher. We get into Long Horizon Agency, why RL keeps working, the new codex for real-world coding, research culture versus product, and why, for now, compute, is destiny. Let's get into it. Thanks for coming, Jakob and Mark. Yaakov, you are the chief scientist at Open AI. Mark, you are the chief research officer at Open AI, and you guys have both the privilege and the stress

Starting point is 00:01:09 of running probably one of the most high-profile research teams in AI. And so we're just really stoked to talk with you about a whole bunch of things we've been curious about, including GPD-5, which was one of the most exciting updates to come out of OpenA in recent times, and then stepping back how you build a research team that can do not just GPD-5, but Codex and ChatGPT and an API business and can weave all of the many different bets you guys have across modalities, across product form factors into one coherent research culture and story.

Starting point is 00:01:43 And so to kick things off, why don't we start with GPD-5? Just tell us a little bit about the GPD-5 launch from your perspective. How did it go? So I think GPD-5 was really our attempt to bring reasoning into the mainstream. And prior to GPT5, right, we have two different series of models. You had the GPT kind of two, three, four series, which were kind of these instant response models. And then we had an O series, which essentially thought for a very long time

Starting point is 00:02:12 and then gave you the best answer that it could give. So tactically, we don't want our users to be puzzled by, you know, which mode should I use, and involves a lot of research in kind of identifying what the right amount of thinking for any particular prompt looks like, and taking that pain away from the user. So we think the future is about reasoning, more and more about reasoning, more and more about agents.

Starting point is 00:02:35 And we think GPD5 is this step towards delivering reasoning and more agentic behavior by default. There is also a number of improvements across the part in this model relative to all three other previous models, but our primary physio for this launch was indeed bringing the reason about more people. Can you say more about how you guys think about e-vals? I noticed even in that launch video, there were a number of e-vals where you're inching up from, you know, 98 to 99 percent,

Starting point is 00:03:05 and that's kind of how you know you've saturated the eval. What approach do you guys take to measuring progress and how do you think about it? One thing is that indeed for like these evils that we've been using for the last few years, they're indeed pretty close to saturated. And so, yeah, like for a lot of them, like, you know, inching from like 96 to 98 percent is not necessarily, the most important thing in the world. I think another thing that's maybe even more important, but a little bit subtler. When we were in this like GPD2,

Starting point is 00:03:32 GPT3, GD4 era, there was kind of one recipe. You just like pre-trained a model on a lot of data and you kind of like use these evils as just kind of a yard sync of how this generalizes to like different tasks. Now we have this different ways of training in particular reinforcement learning on like serious reasoning where we can pick a domain

Starting point is 00:03:52 and we can really train a model to become an expert in this domain to reason very hard about it, which lets us target particular kinds of tasks, which will mean that we can get extremely good performance on some evils, but it doesn't indicate as great generalization to other things. So the way we think about it in this world, we definitely think, like, we are in a little bit of a deficit, like, of great evaluations. And I think the big things that we look at are actual marks of the model being able to discover new things. I think for me, the most exciting trend and actual sign of progress this year has been our models performance in

Starting point is 00:04:29 math and programming competitions. Although I think they are also becoming saturated in a sense. And the next set of evils and milestones that we're looking at will involve actual discovery and actual movement on things that are economically relevant. Totally. You guys already got number two in the At-Coder competition, so there's only number one left. Yeah. Yeah. I mean, I think it is important to note that these e-vals, like, you know, I.O.I. At-Coder, IMO are actually real-world markers for success in future research. I think a lot of, you know, the best researchers in the world have gone through these competitions have gotten very good results. And yeah, I think we are kind of preparing for this frontier where we're trying

Starting point is 00:05:11 to get our models to discover new things. Yeah. Very exciting. Which capability from GPD-5 before the release surprised you the most when you were working through the Evald bench or using it internally. Were there any moments where you felt like this was starting to get good enough to release because it was useful in your daily usage? I think one big thing for me was just how much it moved the frontier in very hard sciences. You know, we would try the models with some of our friends who are, you know, professional physicists or professional mathematicians. And you already saw kind of some instances about this on Twitter

Starting point is 00:05:47 where you can take a problem and have it discover, maybe not like very complicated new mathematics, but some non-trivial new mathematics. And we see physicists, mathematicians, kind of repeating this experience over and over where they're trying QPD5 Pro and saying, wow, this is something that previous version of the models couldn't do.

Starting point is 00:06:08 And it is a little bit of a light bulb moment for them. It's like able to automate maybe like what could take, one of their students months of time. Well, GP5 is a definite improvement on O3. For me, O3 was definitely like that moment where the reasoning models became actually very useful on a daily basis, I think, especially for working through a math formula or a derivation.

Starting point is 00:06:34 Like, it actually got to a level where it is fairly trustworthy. I can actually use it as a tool for my work. And yeah, I think it is very exciting to get to that moment, But I expect that, well, now as we're seeing, you know, these models like actually able to automate, well, yes, like we're saying, solving contest problems over longer time horizons. I expect that that was quite small compared to what's coming over the next year. What is coming in the next one to five years? At whatever level you're comfortable sharing, what does the research roadmap look like? So the big thing that we are targeting with our research is producing an automated researcher.

Starting point is 00:07:09 So automating the discovery of new ideas. And, you know, of course, like a particular thing we think about a lot is automating our own work, automating ML research, but that can get a little bit self-referential. So we're also thinking about automating progress in other sciences. And I think, like, one good way to measure progress there is looking at, like, what is the time horizon on which these models actually can reason and make progress. And so now as we get to a level of near mastery of this high school competitions, let's say, I would say we get to like maybe on the order of one to five hours of reasoning. And so we are focused on extending that horizon, both in terms of like the models, the capability to plan over very long horizons and actually able to retain ability to retain memory.

Starting point is 00:07:56 And back to the eval's question. That's why I think evals of the form of how long does this model autonomously operate for are of particular interests to us. And actually maybe on that topic, there's been this huge move toward agency and model development. But I think at least the state that it's in currently, users have sort of observed this tradeoff between too many tools or planning hops can result in quality regressions versus something that maybe has a little bit less agency. The quality is at least observed today to be a bit higher. How do you guys think about the tradeoff between stability and depth? The more steps that the model is undertaking, maybe the less likely the 10th step is to be accurate versus you ask it to do one thing.

Starting point is 00:08:39 it can do it very, very well. And to have it keep doing that one thing better and better, but more complex things, there's sort of that trade-off. But of course, to get to full autonomy, you are taking multiple steps, you're using multiple tools. I think actually, like, well, the ability to maintain depth

Starting point is 00:08:52 is a lot of it being consistent over long horizons. Yeah. So I think there are very related problems. And in fact, I think like with the reasoning models, we have seen the models like greatly extend the length of which they are able to reason and work reliably without going off track. Yeah, I think this has remained a big area of focus for us.

Starting point is 00:09:13 Yeah, and I think reasoning is core to this ability to operate over a long horizon. Because, you know, you imagine kind of yourself solving a math problem, or you try an approach, it doesn't work. And, you know, you have to think about, you know, what's the next approach I'm going to take? What are the mistakes in the first approach? And then you try another thing. And, you know, the world gives you some hard feedback, right? And then you keep trying different approaches. And the ability to do that over a long period of time is reasoning and gives agents that robustness.

Starting point is 00:09:38 We talked a lot about math and science. I was curious to get your take on, do you think some of the progress that we've made can actually extend similarly to domains that are less verifiable? They're sort of less of an explicit right or wrong? Oh, yeah, this is a question. I really like. I think if you actually truly want to extend to research and, you know, discovering ideas that meaningfully advanced technology on the, you know, the scale of, like, months. years, like, I think these questions, like, stop being so different, right? Like, it is one thing to solve, like, a very well-post-constrained problem on the scale

Starting point is 00:10:17 of an hour, right? And there's, like, kind of a finite amount of ideas you need to look through, and that might feel extremely different from solving something very open-ended. But, you know, even if you want to solve, like, a very well-defined problem that is on much longer scale, right? Like, you know, prove this Millennium Price problem. Well, that suddenly requires you to think about, okay, like, what are the fields of mathematics or other sciences might possibly be relevant. You know, are there inspiration from physics that I must take? Like what is kind of the entire program that I want to develop around this?

Starting point is 00:10:44 Now this become very open-ended questions and it's actually hard to, you know, for our own research, right? Like if all we cared about is, you know, reduce the modeling clause on a given data set, right? Like measuring the progress on that, like, are we kind of actually asking the right questions in research, like actually becomes like a fairly open-ended affair? Yeah, and I think it also makes sense to think about what the limits of of open-ended means. I think a while back Sam tweeted about some of the improvements that we were making in having our models write more creatively. And we do consider the extremes here as well.

Starting point is 00:11:17 Right, right. Let's talk about RL, because it seems like since 01 came out, RL has been the gift that keeps giving. Every couple of months opening I puts out of release, and everyone goes, oh, that's great. But this RL thing is going to plateau. We're going to saturate the evals. the models won't generalize or there's going to be mode collapse because of too much synthetic data for whatever.

Starting point is 00:11:41 Everybody's got a laundry list of reasons to believe that the gains and performance from RL are going to tap out. And somehow they just don't. You guys just keep coming out and putting out continuous improvements. Why is RL working so well? And what, if anything, has surprised you about how well it works? RL is a very versatile method, right? And there are a lot of ideas you can explore once you have an, REL system working. A long time at Open AI, we started from this, before language models,

Starting point is 00:12:11 right? Like, we were thinking about like, oh, okay, like REL is this extremely powerful thing, of course, like, on top of deep learning, which is that's like incredible general learning method. But the thing that we struggled with for a very long time is like, what is the environment, like how do we actually anchor these models to the real world? Or like, should we, you know, simulate some island where they all learn to collaborate and compete? And then, you know, of course, came the language modeling break. through, right? And we saw that, oh, yeah, if we scale deep learning on modeling natural language, we can create models with this like incredibly new understanding of human language.

Starting point is 00:12:43 And so since then we've been, you know, seeking how to combine these paradigms and how to get our role to work on natural language. And once you do, right, like, then you kind of have the, well, you have the ability to actually like execute on these different ideas and objectives in this like extremely robust rich environment given by pre-training. And so, yes, I think it's been perhaps the most exciting period in our research over the last few years where we've really found so many new directions and promising ideas that all seemed to be working out and we're trying to understand how to compare.

Starting point is 00:13:14 One of the hardest things about RL for folks who are not practitioners of RL is the idea of crafting the right reward model. And so especially if you're a business or an enterprise who wants to harness all this amazing progress you guys are putting out, but doesn't even know where to start. What do the next few years look like for a company like that?

Starting point is 00:13:32 What is the right mindset for somebody who's trying to make sense of RL to craft the right reward model. Is anything you've learned about the best practices or an approach of thinking, of using this latest sort of family of reasoning techniques? What is the right way I should think about even approaching reward modeling as a biologist or a physicist? I expect this will evolve quite rapidly.

Starting point is 00:13:58 I expect it will become simpler, right? I think maybe like two years ago we would have been talking about, like, what is the right way to craft my fine-tuning data set and I don't think we are like at the end of that evolution yet. And I think we will be inching towards more and more human-like learning, which, you know, RL is still not quite. So I think maybe the most important part of the mindset is to like not assume that like what is now will be it forever. So I want to bring the conversation back to coding. We would be remiss not to say

Starting point is 00:14:24 congrats on GBT5 Codex, which just dropped today. Can you guys say a little bit more about what's different about it, how it's trained differently, maybe why you're excited about it? So I think one of the big focuses of the Codex team is to just take the raw intelligence that we have from our reasoning models and make it very useful for real world coding. So a lot of the work they've done is kind of consistent with this. They are working on kind of having the model be able to handle more difficult environments. We know that real world coding is very messy. So they're trying to handle all the intricacies there. there's a lot of coding that has to do with style,

Starting point is 00:15:04 with just like kind of softer things, like how proactive the model is, how lazy it is. And just being able to define in some sense, like a spec for how a coding model should behave. They do a lot of very strong work there. And as you seems like, they're also working on a lot better presets. You know, coders, they have some kind of notion of, this is how long I'm waiting,

Starting point is 00:15:28 I'm willing to wait for a particular solution. I think we've done a lot of work to dial in on, you know, for easy problems, being a lot, you know, lower latency. For harder problems, actually, the right thing is to be even higher latency. Get you the really best solution. And just being able to find that preset is very important. What's the sweet spot for, if you were to say, like, easier problems versus harder? What we found is the previous generation of the Codex models, they were spending too little time solving the hardest problems and too much time solving the easy problems. And I think that is actually just probably out of the box what you might get out of 03.

Starting point is 00:16:06 Maybe just on the topic of coding, since you guys are both competitive coders in prior lives. I know you've been at Open AI from a decade now, but I was struck by the story of Lee Cidall, the Go player, who kind of famously quit Go after he lost to AlphaGo multiple times. And I think in a recent interview, you guys were both saying that now the coding models are better than your capabilities. And that gets you excited. But say more about that. And how much would you say you code now?

Starting point is 00:16:37 Well, if you're hands on keyboard, you can talk about Open AI generally, but how much code is written by AI now? In terms of cutting models being better, I mean, I think, yeah, I think it is extremely exciting to see this progress. I think, like, the programming competitions have a nice kind of encapsulated test

Starting point is 00:16:54 of, like, ability to come off with some new ideas. in this boxed environment and time frame. I do think if you look at things like, well, I guess the IMO problem six or maybe some very hardest programming competitions problems. I think there's still a little bit of headway to go for the models, but I wouldn't expect that to last very long. I do go a little bit.

Starting point is 00:17:22 Historically I've been like... He's being humble. Historically, I've actually been like... really like them to use any sort of tools. I just used them pretty much. Oh, yeah. Okay. Old school. Yeah. Eventually, I think like, especially with this latest calling tools like GP-T-5, I've really kind of felt like, okay,

Starting point is 00:17:46 like this is no longer the way. Like, you can do a, you know, a 35-factor like pretty much perfectly in like 15 minutes, like you kind of have to use it. Yeah, and so I've been kind of like learning this new way of coding, which definitely feels a little bit different. I think it is like a little bit of an uncanny valley still right now where like you kind of have to use it because it is just like accelerating so many things, but it's still like, you know, a little bit like not quite as good as a co-worker.

Starting point is 00:18:20 I, so, you know, I think like our priority is getting out of that uncanny valley. But yeah, it's definitely an interesting time. Yeah, definitely. To kind of like speak to the recent moment, I think AlphaGo for both of us was, you know, a very formative milestone in AI development. And at least for me, it was the reason I started working on this in the first place. And maybe partly because of our backgrounds in competitive programming, like I had this affinity to building these models, which could do very, very well in these forms of contests.

Starting point is 00:18:55 And going from, you know, solving eighth grade math problems to a year later, hitting our level of performance in these coding contests, it's crazy to see that progression. And you kind of imagine or like to think that you feel a set of the feelings at least it all felt too, right? It's like, wow, this is really crazy, right? And what are the possibilities? And this is something that I took decades to do. And it took a lot of hard work to get to the forefront of. So you really do feel an implication of that is these models, what can't they do?

Starting point is 00:19:31 Right? And I do feel like already it's kind of transformed the default for coding. This past weekend, I was talking to some high schoolers and they were saying, oh, you know, actually the default way to code is vibe coding. Like, you know, I think like they would consider, oh, it's like maybe sometimes for completeness you would go and like actually do all of the mechanics of coding it from scratch yourself, but that's just a strange concept. to them. Like, why would you do that? You know, you just vibe code by default. Yeah, yeah.

Starting point is 00:19:59 And so, yeah, I mean, I do think, you know, the future hopefully will be vibe researching. Yeah. I have a question about that, which is what makes a great researcher, right? When you say vibe researching, there's a big part of vibe coding is just having good taste in wanting to build something useful and interesting for the world. And I think what's so awesome about tools like codex is if you've got a good intuition for what people want, it helps. to articulate that and then basically actualize a prototype very fast. With research, what's the analog? What makes a great researcher?

Starting point is 00:20:36 Persistence is a very key trait, right? I think what is different about research when you're actually trying to, I think a special thing about research, right, is you're trying to create something or learn something that is just not known, right? Like it's not known to work. You don't know whether it will work. And so always trying something that will most likely fail. And I think getting to a place where you are in the mind of being ready to fail and being ready to learn from these failures.

Starting point is 00:21:08 And of course with that comes creating kind of clear hypothesis and being extremely honest with yourself about how you're doing on them. I think a trap many people fall into is going out of the way to prove that it works, right? Which is quite different from, you know, like, I think, believing in your idea, and I'm thinking of it's extremely important, right? Then you want to persist that, but you have to be honest with yourself about when it's working and when it's not

Starting point is 00:21:31 so that you can learn and adjust. Yeah, I think there are just very few shortcuts for experience. I think through experience, you kind of learn, you know, what's the right horizon to be thinking of a problem, but you can't pick something that's too hard or it's not satisfying to do something that's too easy. And I think a lot of research is managing your own emotions over a long period of time too.

Starting point is 00:21:53 You know, there's just going to be a lot of things you try and they're not going to work. And sometimes you need to know when to persevere through that or sometimes when to kind of switch to a different problem. And I think interestingness is something, you know, you try to fit through reading good papers, talking to your colleagues. And you kind of maybe distill their experience into your own process. When I was in grad school, you know, there's a big part. I'm a failed machine learning research I was in grad school for bioinformatics but a big part of my research advisor's thrust

Starting point is 00:22:29 was about picking the right problems to work on such that you could then sustain and persist through the hard times and you said something interesting which was there's a difference between having conviction in an idea and then being maximally truth-seeking about when it's not working and both those things are sometimes intention

Starting point is 00:22:45 because you kind of go native on a topic or a problem sometimes that you have deep conviction in Have you found, is there any sort of heuristics you found are useful at the taste step, at the problem picking step, that help you arrive at the right set of problems where that conviction and truth-seeking is not as much in zero-sum tension as other kinds of problems? Yeah, to be clear, I don't think conviction and truth-seeking are really in a zero-sum tension. I think, like, you can be, like, you can be convinced or, you know, you can have a lot of belief in idea, and you can be, you know, very persistent in it while it's not working. I think it's just important that you're kind of honest with yourself.

Starting point is 00:23:20 like how much progress you're making and you're in a mindset where you're able to learn from the failures along the way. I think it's important to look for problems that you really care about and you really believe are important, right? And so I think one thing I've observed in many researchers that inspired me has been really going after the hard problems, like looking at the questions that are, you know, kind of like, you know, wildly known, but not really kind of considered tractable and just asking, like, you know, why are they not tractable? Or like, you know, what, like, what about this approach? Like, why does this approach fail? I think you're always like thinking about what is really the barrier for the next step.

Starting point is 00:24:04 If you're going after problems that like you really truly believe are important, right? Then that makes you so much easier to find the motivation to persist with them over years. And in the development of like during the training phase of GPD-5, for example, with any moments where there was a hard problem, the initial attempts that were being made to crack, that problem weren't working, and yet you found somebody persisted through that. And what was it about any of those stories

Starting point is 00:24:37 that comes to mind that worked well, that you wish other people and other researchers did more of? I think on the path there, right, like along the sequence of models, like above the pre-trained models, and the research models. I think one very common theme is bugs.

Starting point is 00:24:58 And both like, just like, yeah, silly bugs in software that can kind of stay in your software for like months and kind of invalidate all your experiments a little bit in a way that you don't know. And, you know, identifying them can be a very meaningful breakthrough for your research program.

Starting point is 00:25:16 But also kind of bugs in the sense of like, well, you have a particular way of thinking about something. And that way is a little bit skewed, which causes you to make the wrong assumptions and identifying those wrong assumptions, rethinking frames from scratch. I think, you know, both for getting the first reasoning models working or getting the, you know, larger pre-trained models working. I think we've had like multiple issues like that we've had to work through.

Starting point is 00:25:42 As leaders of the research org, how do you think about what it takes to keep the best talent on your team and on the flip side, creating a very resilient org that we've had to work. that doesn't crumble if a key person leaves. The biggest, I think, things that Open AI has going for it in terms of keeping the best people motivated and exciting. Excited is like we are in the business of doing fundamental research, right? We aren't the type of company that looks around and says, oh, what model did company X build or what model did company Y build? We have a fairly clear and crisp definition of what it is we're out to build. We like innovating at the frontier.

Starting point is 00:26:25 We really don't like copying. And I think people are inspired by that mission, right? You are really in the business of discovering new things about the deep learning stack. And I think we're kind of building something very exciting together. I think beyond that, a lot of it's creating very good culture. So we want a good pipeline for training up people to become very good researchers. We, I think, historically have hired the best talent and the most innovative talent. So I just think we have a very deep bench as well.

Starting point is 00:27:03 And yeah, I think most of our leaders are very inspired by the mission. And that's what's kept all of them there. Like when I look at my direct reports, they haven't been affected by the Talon Moors. I was chatting with a researcher recently, and he was talking about wanting to find the cave dwellers. And these are often the people who are not posting on social media about their work. For whatever reason, they may not even be publishing. They're sort of in the background doing the work. I don't know if you would agree with this concept, but how do you guys hire for researchers?

Starting point is 00:27:37 And are there any non-obvious ways that you look for talent or attributes that you look for that are non-obvious? So I think one thing that we look for is having solved hard problems in any field. A lot of our most successful researchers have started their journey with deep learning at OpenAI and have worked in other fields like physics or computer science, fear of research. computer science or finance in the past, strong technical fundamentals coupled with the ability intent to work on very ambitious problems and actually stick with them. We don't purely look for who did the most visible work or is the most visible on social media. As you were talking, I was thinking back to when I was a founder and I was running my

Starting point is 00:28:33 own company and we would recruit for great talent engineers. Many of the attributes you would described were ones that were on my mind then. And Elon recently tweeted that he thinks this whole researcher versus engineer distinction is silly. Is that just a semantic? Is he just being semantically nitpicky, or do you think these two things are more similar than they actually look? Yeah, I mean, I do think they're, like, researchers, they don't just fit one shape. You know, we have certain researchers who are very productive at OpenEye who are just so good

Starting point is 00:29:05 at idea generation. and they don't necessarily need to show great impact through implementing all of their ideas, right? I think there's so much alpha they generate in just kind of coming up with, oh, let's try this or let's try this, or maybe we're thinking about that. And there's other researchers who, you know, they are just very, very efficient at taking one idea, rigorously exploring, you know, the space of experiments around that idea. So I think, you know, researchers come in very different forms. I think maybe that first type wouldn't necessarily map into the same.

Starting point is 00:29:36 bucket as a great engineer but you know we we do kind of try to have a fairly diverse set of research tastes and styles yeah and say a little bit about what it takes to make like a create a frontier sort of winning culture that can attract all kinds of shapes and of researchers and then actually grow them thrive them make them win together at scale what is it a what do you think of the most critical ingredients of a winning culture. So I think actually the most important thing is just to make sure you protect fundamental research, right?

Starting point is 00:30:15 I think you can get into this world with so many different companies these days where you're just thinking about, oh, how do I compete on, you know, a chat product or some other kind of product surface? And you need to make sure that you leave space and recognize the research for what it is and also give them the space to do that, right? You can't have them being pulled in all of these different product directions. So I think that's one thing that we pay attention to within our culture. Especially now that there's so much spotlight on OpenAI,

Starting point is 00:30:48 so much spotlight on AI in general and the competition between different labs, it would be easy to fall into a mindset of, like, oh, we're racing to beat this latest release or something. And, you know, there's definitely like, areas that people kind of start looking over their shoulder and start thinking about, oh, what are these other things? And I see it as a large part of our job to make sure that people have this comfort and space to think about, you know, what are things actually going to look like in a year or two? Like, what are the actually big research questions that we want to answer?

Starting point is 00:31:28 And how do we actually get to models that like vastly outperform what we see currently, rather than just like iteratively improving in the current paradigm. Just to pull on that thread more on protecting fundamental research, you guys are obviously one of the best research organizations in the world, but you're also one of the best product companies in the world. How do you balance, and especially with, you've brought on some of the best product execs in the world as well, how do you balance that focus between the two

Starting point is 00:31:54 and while protecting fundamental research also continue to move forward the great products that you have out? Yeah. I mean, I think it's about kind of, delineating a set of researchers who really do care about product and who really want to be accountable to the success of the product. And they should, of course, very closely coordinate with the research work at large. But I think just kind of people understanding their mandates and what they are rewarded for, that's a very important thing. One of the other thing is

Starting point is 00:32:26 also helpful is that our product team and broader company leadership is bought into this vision, right, where we are going with research. And so, you know, nobody is assuming that, like, oh, the product we have now is the product we'll have forever and we'll just kind of wait for, like, you know, new versions from research. Like, we are able to think jointly about what the future looks like. One of the things that you guys have done is let such a diversity of different ideas and bets flourish inside of Open AI that you then have. have to figure out some ways as research leaders

Starting point is 00:33:04 to make it all make coherent sense as one part of a roadmap. And you got people over here investigating the future of diffusion models and visual media. And over here you've got folks investigating the future of reasoning when it comes to code. How do you paint a coherent picture of all that? How does that all come together?

Starting point is 00:33:25 When there might be at least naively some tension between giving researchers the independence to go to fundamental research and then somehow making that all fit into one current research program. Our state of goal for a research program has been getting to an automated researcher for a couple years now. And so we've been building most of our projects with this goal in mind. And so this still leaves a lot of room for bottom-up idea generation for fundamental

Starting point is 00:34:01 research on various domains, but we are always thinking about how do these ideas come together eventually. We believe, for example, that reasoning models go much further and we have a lot of explorations on things that are not directly reasoning models, but we are thinking a lot about how they eventually combine and what does, what will this kind of innovation look like once you have something that is out there and thinking for moms about a very hard problem. So I think this clarity of like our long-term objectives is important. But yeah, but it doesn't mean that we are prescriptive about like, oh, here are all the little pieces, right?

Starting point is 00:34:43 Like we definitely view this as a question of exploration and learning about these technologies. Yeah, I think you want to be opinionated and prescriptive at their very kind of course level, but you know, a lot of ideas can bubble up in a final level. And has there been any moments where those things have been intention at all recently? Well, one provocative example could be recently, you know, this new image model came out, which is nano banana, right, from Google. It's extraordinary value shown that, like, lots of everyday people can unlock a lot of creativity when these models are good at understanding editing prompts. And I could see how that would create some tension for a research

Starting point is 00:35:23 program that may not be prioritizing that as directly. If one of your, you know, somebody talented on your team came and said, guys, like this thing is so clearly valuable in the world out there, we should be spending more effort, more energy on this. How do you reason about that question? I think there's definitely a question that we've been kind of thinking about for quite a while at Open AI.

Starting point is 00:35:43 I mean, if you look at GPD3, right, like once we kind of saw like, oh, like this is kind of where language models are going. We definitely have had a lot of discussions about, well, clearly there are going to be so many magical things you can do with AI, right? And you will be able to go to to this like extremely smart models that are out there pushing different tiers of science, but you will also have this incredible media generation and this incredibly, you know, transformative

Starting point is 00:36:11 entertainment applications. And so like how do we prioritize among all these directions has definitely been something we've been thinking about for quite a while? Yeah, absolutely. And the real answer is like we don't discourage someone from being really excited by that. And it's just, if we're consistent in the prioritization and our product strategy, then it just will naturally fall in. And so it's just for us, like, we do encourage a lot of people to be excited about, you know, building this, you know, we're building kind

Starting point is 00:36:47 of like agentic products, you know, whatever kind of products that they're excited by. But I think it's important for us to also have a separate group of people who, you protect that, their goal is to create the algorithmic advances. How does that translate, just to build on Anja's question, into a concrete framework around resourcing? Like, do you think about, okay, X percent of compute resources will go to longer term, you know, very important, but maybe a bit more pie in the sky exploration versus there's also, you know, obviously current product inference, but sort of this thing in the middle where it's achievable in the short to medium term? Yeah. So I think that's a big part of both of our jobs, you know, just this portfolio management question. of how much compute do you give to which project.

Starting point is 00:37:33 And I think historically we've put a little bit more on just the core algorithmic advances versus kind of the product research. But it's something that you have to feel out over time, right? It's dynamic, I think, month to month, there could be different needs. And so I think it's important to stay fairly flexible on that. And if you had 10% more resources,

Starting point is 00:37:55 would you put it toward compute, or is it data curation, people, where would you stick that from like a marginal good question honestly yeah I think compute to compute today fairly reasonable answer yeah yeah I mean honestly

Starting point is 00:38:15 I do think kind of your question of prioritization right it's like in a vacuum any of these things you would love to like go and excel and win at I think the danger is you end up like second place at everything and you know not like you know clearly leading at anything. So I think prioritization is important, right? And you need to make sure there's some things you're clear-eyed on. This is the thing that we need to win. Yeah. But I think it makes sense to talk about it for just a little bit more, which is compute sets so much of, compute as

Starting point is 00:38:46 destiny in a way, right, at a research organization like OpenE Eye. And so, would you, a couple of years ago, I think it became very fashionable to say, oh, okay, we're not going to be compute constrained anytime soon because there's a bunch of CMs that are, you know, people are discovering and we're going to get more efficient and all the algorithms are going to get better. And then eventually, like, really, we'll just be in a data constrained regime. And it seems like, you know, a couple of years have come and gone, and we're still like, this is sort of very compute-constrained environment. Does that change anytime soon, you think? Or... I mean, I think, like, we've seen for long enough, like, how much we can do with compute. Yeah, I, I think.

Starting point is 00:39:26 I haven't really bought that much into the will-be data constraint claim. And yeah, I don't expect that to change. Yeah, anyone who says that should just step into my job for a week. There's no one who's like, I have all the compute that I need. Right. Yeah. You know, historically the job of advancing fundamental research has historically been largely a mandate that universities have had. Partly for the compute reasons you just described, that hasn't been the case for,

Starting point is 00:39:56 frontier AI. You guys have done such an incredible job kind of channeling the arc of frontier AI progress to help the sciences out. And I'm wondering when those worlds collide, the fundamental world of university research today and the world of frontier AI, what comes out? So I guess I personally started as a resident at Open AI, and it's a program that we had for people in different fields to come in, you know, learn quickly about, about AI, and become productive as a researcher. And I think there is a lot of really powerful elements in that program.

Starting point is 00:40:34 And the idea is just like, you know, could we accelerate something that looks like a PhD in as little time as possible. And I think a lot of that just looks like implementing a lot of, you know, very core results. And you know, through doing that, you're going to make mistakes. You're going to be like, oh, wow, like build intuition for if I, you know, set this wrong, like that's going to blow up my network in this way. And so you just need a lot of that.

Starting point is 00:40:56 hands-on experience. I think over time, you know, there been curriculums developed at probably all of these large labs in, in like, optimization, in architecture, in RL, and yeah, probably no better way than to just kind of try to implement a lot of those things and read about them and think critically about them. Yeah. Yeah, I think maybe like one other nice thing that you get to experience at academia is like, yeah, that's like persistence, right, of like, oh, you know, you have a few years and you're kind of trying to solve a problem and it's a hard problem and you've never dealt with such a hard problem before. And yeah, I do feel like this is a thing that's like, well, currently the pace of progress is very fast. Maybe also the ideas tend to work out a little bit more often than they did in the past because, yeah, deep learning just wants to learn.

Starting point is 00:41:56 and getting your hands on a more challenging problem for a little bit, maybe being part of a team, attacking like an ambitious challenge and getting that feeling of what it feels like to be stacked and what it feels like to finally be making progress, I think is also something that's very useful to learn. How does external perception, reception of a particular product launch, impact how you prioritize something? is that is it to the extent where, you know, perception and usage, in the case where they're married,

Starting point is 00:42:34 obviously there's probably a clear directive there, but in a case where maybe they're divorced a bit, does that impact how you think about roadmap or where you emphasize resources? So we generally, like, have some pretty strong convictions about the future. And so we don't tie them that closely to, like, the short-term reception of our products, right? like, of course, we learn based on what is going on. We read other papers and we look at what other labs are working on. But generally, like, we act from a place of fairly strong belief in what we're building. And so, of course, like, that is for like our long-term research program, of course, when it comes to product.

Starting point is 00:43:25 right? Like, I think the cycle of iteration is much faster. Yeah. I think with every launch, we are trying to aim it to be something that's wildly successful on the product side. And I think from a fundamental research perspective, we're trying to create models with all the kind of core capabilities needed to build a very rich set of experiences and products. And there are going to be people who have some vision of one particular thing that can it built and we'll launch it and everything we launch we really hope it goes wildly successful and

Starting point is 00:44:01 we get that feedback and if it's if it's not like we'll kind of shape our product strategy a little bit but yeah we we are definitely also in the business of launching very useful wildly successful products yeah it feels like because of the on sort of completely unbridled pace of progress that we've just spent a lot of time talking about a lot is going to change over the next two years It gets really hard to predict. I imagine 10 years out, let alone 10 months out. And so my question, I guess, is through all that change that the frontier of AI is going to bring, what are some priors that you actually think should stay constant?

Starting point is 00:44:42 Is there anything? Well, one clearly is that we don't have enough compute. Is there anything else that you think doesn't change, that you think would be strong, reasonably held priors as constants? I think more broadly than compute, there is physical constraints of, well, energy, but also, like, you know, at some point not too far, like robotics will become a major focus. And so I think thinking about like the physical constraints is going to remain important. But yeah, I do think on the intelligence front, I would not make too many assumptions. Very few startups can get to the scale that you have, both from a employee perspective, but also revenue count and maintain that breakneck speed that you probably had, I mean, seven, eight years ago when you both joined. What's the secret sauce to doing that? And how do you continue to maintain this pressure almost to ship as quickly as possible, even though, you know, you're kind of on, you know, top now? I think one of the clearest markers that we have really good research culture, at least in my mind, is, you know, I've worked at different companies before.

Starting point is 00:45:57 And there is a real thing, which is a learning plateau, right? You go to a company, you learn a lot for the first one or two years. And then you just find kind of like, you know, I know how to be fairly efficient in this framework. And my learning kind of stops. And I've really never felt that at Open Eye. Just like that experience you describe of all these really cool results bubbling up. You're just learning so much over week, over week, and it is a full-time job to kind of stay on top of all of it. And that's just been very fulfilling.

Starting point is 00:46:30 So, yeah, no, I think that's a very accurate description. We just want to generate a lot of really high-quality research, and it's almost a good thing. Like, if you're generating enough that you're barely able to keep on top of it. Yeah, exactly. I think that's the developer of technology, I think, is a driving force here, where maybe we would kind of become comfortable after a few years working in a event paradigm, but we are always on the cusp of that new thing and trying to reconfigure our thinking around the kind of new constraints and new possibilities that we're going to be

Starting point is 00:47:07 faced with. And so I think that kind of creates this feeling of constant change and the mindset of like always kind of learning the new thing. Well, you know, one thing, that came up in our research about things at OpenEAAA that have not changed through a lot of the change, is the trust that the two of you guys have in each other. Because I think there was an article or profile of you guys recently in the MIT Tech Review, and that was also one of the highlight themes that your chemistry, your trust with each other, your oppose something a lot of the people at OpenEIA have come to treat as a constant. So what's the backstory?

Starting point is 00:47:46 How did you guys build trust there? How did that happen? It's like asking you to, have you ever seen that when Harry met Sally? I feel like you're on the couch and now you got to talk about. What's your meet you? Yeah. Well, I do think, you know, we started working together a little bit more closely when we kind of had the first seeds of working on reasoning. I think, you know, we at the time, you know, that wasn't a very popular research direction to work on.

Starting point is 00:48:20 And I think both of us kind of saw glimmers of hope there. And we were kind of pushing in this direction, kind of figuring out how to make our work. And yeah, I think overtime kind of growing a very small effort into increasing larger effort. And I think that's kind of where I really got to kind of work with Jakub in depth. I think he's just really really

Starting point is 00:48:50 really a phenomenal researcher. I think, you know, any of these rank lists, like, he should be number one. Like, just his ability to, you know, take any very difficult technical challenge and, and almost like personally just kind of think about it for two weeks and just crush it. It's incredible that he has kind of the wide range that he does in terms of understanding, as well as that kind of depth that you can go and just personally solve a lot of these technical challenges. Now you get to say some nice stuff about you. I'm just to say anything nice about me. Thanks, Mark.

Starting point is 00:49:26 Yeah, yeah, I think the big, kind of the first, like, big thing that we did together was, like, we started seeing, like, okay, like, we think this algorithm is going to work. And so, you know, I was thinking, like, okay, like, how do we, you know, direct people at this? And we're talking with Mark like, oh, we should establish a team that's actually going to make this work. And then, you know, Mark and Mark went and actually did this, right? actually kind of like got a group of like people working on very different things, like got them all together and created a team with like incredible chemistry out of like this whole this third group and that was like such an impressive thing to me. And yeah, I'm really grateful and as far that I kind of get to, you know, work with Mark and kind of experience that.

Starting point is 00:50:11 Yeah, I think this incredible capacity to both, you know, understand and engage and and think about the technical matter of the research itself, but then coupled with this great ability to lead and inspire teams and create an organizational structure that in this whole kind of mess of chaotic directions actually is coherent and able to gel together. Yeah, very, very inspiring. That's awesome. Well, on that note, no.

Starting point is 00:50:42 Great note to end on. Yeah. Some of the greatest discoveries in science, especially in physics, have often come from a pair of collaborators, often across universities, across fields. And it seems like you guys have now added to that tradition. And so we're just super grateful that you guys made the time to chat today. Thanks for coming by. Thank you.

Starting point is 00:51:02 Thanks for being with us. Thanks for listening to this episode of the A60Z podcast. If you like this episode, be sure to like, comment, subscribe, leave us a rate. reading or review and share it with your friends and family. For more episodes, go to YouTube, Apple Podcast, and Spotify. Follow us on X at A16Z and subscribe to our Substack at A16Z.com. Thanks again for listening, and I'll see you in the next episode. As a reminder, the content here is for informational purposes only.

Starting point is 00:51:33 It should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security and is not directed at any investors or potential investors in any A16. 15Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see A16Z.com forward slash disclosures.

The a16z Show - From Vibe Coding to Vibe Researching: OpenAI’s Mark Chen and Jakub Pachocki

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.