The a16z Show - From Vibe Coding to Vibe Researching: OpenAI’s Mark Chen and Jakub Pachocki
Episode Date: September 25, 2025What comes after vibe coding? Maybe vibe researching.OpenAI’s Chief Scientist, Jakub Pachocki, and Chief Research Officer, Mark Chen, join a16z general partners Anjney Midha and Sarah Wang to go dee...p on GPT-5—how they fused fast replies with long-horizon reasoning, how they measure progress once benchmarks saturate, and why reinforcement learning keeps surprising skeptics.They explore agentic systems (and their stability tradeoffs), coding models that change how software gets made, and the bigger bet: an automated researcher that can generate new ideas with real economic impact. Plus: how they prioritize compute, hire “cave-dweller” talent, protect fundamental research inside a product company, and keep pace without chasing every shiny demo. Timecodes: 0:00 Introduction & Goals of Automated Researcher0:43 The Evolution of Reasoning in AI1:46 Evaluations: From Benchmarks to Real-World Impact5:15 Surprising Capabilities of GPT-56:56 The Research Roadmap: Next 1, 2, 5 Years7:46 Long-Horizon Agency & Model Memory9:44 Reasoning in Open-Ended Domains11:18 The Role and Progress of Reinforcement Learning13:14 Reward Modeling & Best Practices14:21 The New Codex: Real-World Coding16:20 AI vs. Human Coding: The New Default20:07 What Makes a Great Researcher?21:14 Persistence, Conviction, and Problem Selection26:00 Building and Sustaining a Winning Research Culture31:45 Balancing Product and Fundamental Research39:00 The Importance of Compute and Physical Constraints45:50 Maintaining Speed and Learning at Scale47:18 Trust and Collaboration at OpenAI Resources: Find Jakub on X: https://x.com/merettmFind Mark on X: https://x.com/markchen90Find Sarah on X: https://x.com/sarahdingwangFind Anjney on X: https://x.com/AnjneyMidha Stay Updated: If you enjoyed this episode, be sure to like, subscribe, and share with your friends!Find a16z on X: https://x.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zListen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYXListen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711Follow our host: https://x.com/eriktorenbergPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Stay Updated:Find a16z on YouTube: YouTubeFind a16z on XFind a16z on LinkedInListen to the a16z Show on SpotifyListen to the a16z Show on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Transcript
Discussion (0)
The big thing that we are targeting is producing an automated researcher.
So automating the discovery of new ideas, the next set of evils and milestones that we're looking at
will involve actual movement on things that are economically relevant.
I was talking to some high schoolers and they're saying, oh, you know, actually the default way to code is vibe coding.
I do think, you know, the future hopefully will be vibe researching.
What does it take to build an automated researcher and get AI discover new ideas on its own?
OpenAI's chief scientist, Yakopohotsky, and chief research officer Mark Chen,
joined A16Z general partners, Ageny Mitha, and Sarah Wang to unpack GPT5's reasoning push.
Why e-vowels must shift to economically meaningful benchmarks and the march towards an automated researcher.
We get into Long Horizon Agency, why RL keeps working, the new codex for real-world coding,
research culture versus product, and why, for now, compute, is destiny.
Let's get into it.
Thanks for coming, Jakob and Mark.
Yaakov, you are the chief scientist at Open AI.
Mark, you are the chief research officer at Open AI,
and you guys have both the privilege and the stress
of running probably one of the most high-profile research teams in AI.
And so we're just really stoked to talk with you about a whole bunch of things
we've been curious about, including GPD-5,
which was one of the most exciting updates to come out of OpenA in recent times,
and then stepping back how you build a research team
that can do not just GPD-5, but Codex and ChatGPT and an API business
and can weave all of the many different bets you guys have across modalities,
across product form factors into one coherent research culture and story.
And so to kick things off, why don't we start with GPD-5?
Just tell us a little bit about the GPD-5 launch from your perspective.
How did it go?
So I think GPD-5 was really our attempt to bring reasoning into the mainstream.
And prior to GPT5, right, we have two different series of models.
You had the GPT kind of two, three, four series,
which were kind of these instant response models.
And then we had an O series, which essentially thought for a very long time
and then gave you the best answer that it could give.
So tactically, we don't want our users to be puzzled by, you know,
which mode should I use, and involves a lot of research
in kind of identifying what the right amount of thinking
for any particular prompt looks like,
and taking that pain away from the user.
So we think the future is about reasoning,
more and more about reasoning, more and more about agents.
And we think GPD5 is this step
towards delivering reasoning and more agentic behavior by default.
There is also a number of improvements across the part
in this model relative to all three other previous models,
but our primary physio for this launch
was indeed bringing the reason
about more people. Can you say more about how you guys think about e-vals? I noticed even in that
launch video, there were a number of e-vals where you're inching up from, you know, 98 to 99 percent,
and that's kind of how you know you've saturated the eval. What approach do you guys take to measuring
progress and how do you think about it? One thing is that indeed for like these evils that we've
been using for the last few years, they're indeed pretty close to saturated. And so, yeah, like
for a lot of them, like, you know, inching from like 96 to 98 percent is not necessarily,
the most important thing in the world.
I think another thing that's maybe even more important,
but a little bit subtler.
When we were in this like GPD2,
GPT3, GD4 era, there was kind of one recipe.
You just like pre-trained a model on a lot of data
and you kind of like use these evils
as just kind of a yard sync of how this generalizes
to like different tasks.
Now we have this different ways of training
in particular reinforcement learning on like serious reasoning
where we can pick a domain
and we can really train a model
to become an expert in this domain to reason very hard about it, which lets us target particular
kinds of tasks, which will mean that we can get extremely good performance on some evils,
but it doesn't indicate as great generalization to other things. So the way we think about it in
this world, we definitely think, like, we are in a little bit of a deficit, like, of great
evaluations. And I think the big things that we look at are actual marks of the model being able
to discover new things. I think for me, the most
exciting trend and actual sign of progress this year has been our models performance in
math and programming competitions. Although I think they are also becoming saturated in a sense.
And the next set of evils and milestones that we're looking at will involve actual discovery
and actual movement on things that are economically relevant.
Totally. You guys already got number two in the At-Coder competition, so there's only number
one left. Yeah. Yeah. I mean, I think it is important to note that these e-vals, like, you know, I.O.I.
At-Coder, IMO are actually real-world markers for success in future research. I think a lot of,
you know, the best researchers in the world have gone through these competitions have gotten very
good results. And yeah, I think we are kind of preparing for this frontier where we're trying
to get our models to discover new things. Yeah. Very exciting. Which capability from GPD-5 before the
release surprised you the most when you were working through the Evald bench or using it internally.
Were there any moments where you felt like this was starting to get good enough to release
because it was useful in your daily usage?
I think one big thing for me was just how much it moved the frontier in very hard sciences.
You know, we would try the models with some of our friends who are, you know, professional
physicists or professional mathematicians.
And you already saw kind of some instances about this on Twitter
where you can take a problem and have it discover,
maybe not like very complicated new mathematics,
but some non-trivial new mathematics.
And we see physicists, mathematicians,
kind of repeating this experience over and over
where they're trying QPD5 Pro
and saying, wow, this is something that previous version of the models
couldn't do.
And it is a little bit of a light bulb moment for them.
It's like able to automate maybe like what could take,
one of their students months of time.
Well, GP5 is a definite improvement on O3.
For me, O3 was definitely like that moment
where the reasoning models became actually very useful
on a daily basis, I think, especially for working
through a math formula or a derivation.
Like, it actually got to a level where it is fairly trustworthy.
I can actually use it as a tool for my work.
And yeah, I think it is very exciting to get to that moment,
But I expect that, well, now as we're seeing, you know, these models like actually able to automate, well, yes, like we're saying, solving contest problems over longer time horizons.
I expect that that was quite small compared to what's coming over the next year.
What is coming in the next one to five years?
At whatever level you're comfortable sharing, what does the research roadmap look like?
So the big thing that we are targeting with our research is producing an automated researcher.
So automating the discovery of new ideas.
And, you know, of course, like a particular thing we think about a lot is automating our own work, automating ML research, but that can get a little bit self-referential.
So we're also thinking about automating progress in other sciences.
And I think, like, one good way to measure progress there is looking at, like, what is the time horizon on which these models actually can reason and make progress.
And so now as we get to a level of near mastery of this high school competitions, let's say,
I would say we get to like maybe on the order of one to five hours of reasoning.
And so we are focused on extending that horizon, both in terms of like the models,
the capability to plan over very long horizons and actually able to retain ability to retain memory.
And back to the eval's question.
That's why I think evals of the form of how long does this model autonomously operate for
are of particular interests to us.
And actually maybe on that topic, there's been this huge move toward agency and model development.
But I think at least the state that it's in currently, users have sort of observed this tradeoff between too many tools or planning hops can result in quality regressions versus something that maybe has a little bit less agency.
The quality is at least observed today to be a bit higher.
How do you guys think about the tradeoff between stability and depth?
The more steps that the model is undertaking, maybe the less likely the 10th step is to be accurate versus you ask it to do one thing.
it can do it very, very well.
And to have it keep doing that one thing
better and better, but more complex things,
there's sort of that trade-off.
But of course, to get to full autonomy,
you are taking multiple steps,
you're using multiple tools.
I think actually, like, well, the ability to maintain depth
is a lot of it being consistent over long horizons.
Yeah.
So I think there are very related problems.
And in fact, I think like with the reasoning models,
we have seen the models like greatly extend
the length of which they are able to reason
and work reliably without going off track.
Yeah, I think this has remained a big area of focus for us.
Yeah, and I think reasoning is core to this ability to operate over a long horizon.
Because, you know, you imagine kind of yourself solving a math problem, or you try an approach, it doesn't work.
And, you know, you have to think about, you know, what's the next approach I'm going to take?
What are the mistakes in the first approach?
And then you try another thing.
And, you know, the world gives you some hard feedback, right?
And then you keep trying different approaches.
And the ability to do that over a long period of time is reasoning and gives agents that robustness.
We talked a lot about math and science.
I was curious to get your take on, do you think some of the progress that we've made can actually extend similarly to domains that are less verifiable?
They're sort of less of an explicit right or wrong?
Oh, yeah, this is a question.
I really like.
I think if you actually truly want to extend to research and, you know, discovering ideas that meaningfully advanced technology on the, you know, the scale of, like, months.
years, like, I think these questions, like, stop being so different, right?
Like, it is one thing to solve, like, a very well-post-constrained problem on the scale
of an hour, right? And there's, like, kind of a finite amount of ideas you need to look
through, and that might feel extremely different from solving something very open-ended. But, you know,
even if you want to solve, like, a very well-defined problem that is on much longer scale, right?
Like, you know, prove this Millennium Price problem.
Well, that suddenly requires you to think about, okay, like, what are the fields of mathematics
or other sciences might possibly be relevant.
You know, are there inspiration from physics that I must take?
Like what is kind of the entire program that I want to develop around this?
Now this become very open-ended questions and it's actually hard to, you know, for our own research, right?
Like if all we cared about is, you know, reduce the modeling clause on a given data set, right?
Like measuring the progress on that, like, are we kind of actually asking the right questions in research, like actually becomes like a fairly open-ended affair?
Yeah, and I think it also makes sense to think about what the limits of
of open-ended means.
I think a while back Sam tweeted about some of the improvements
that we were making in having our models write more creatively.
And we do consider the extremes here as well.
Right, right.
Let's talk about RL, because it seems like since 01 came out,
RL has been the gift that keeps giving.
Every couple of months opening I puts out of release,
and everyone goes, oh, that's great.
But this RL thing is going to plateau.
We're going to saturate the evals.
the models won't generalize or there's going to be mode collapse because of too much synthetic data for whatever.
Everybody's got a laundry list of reasons to believe that the gains and performance from RL are going to tap out.
And somehow they just don't.
You guys just keep coming out and putting out continuous improvements.
Why is RL working so well?
And what, if anything, has surprised you about how well it works?
RL is a very versatile method, right?
And there are a lot of ideas you can explore once you have an,
REL system working. A long time at Open AI, we started from this, before language models,
right? Like, we were thinking about like, oh, okay, like REL is this extremely powerful thing,
of course, like, on top of deep learning, which is that's like incredible general learning method.
But the thing that we struggled with for a very long time is like, what is the environment,
like how do we actually anchor these models to the real world? Or like, should we, you know,
simulate some island where they all learn to collaborate and compete? And then, you know, of course,
came the language modeling break.
through, right? And we saw that, oh, yeah, if we scale deep learning on modeling natural
language, we can create models with this like incredibly new understanding of human language.
And so since then we've been, you know, seeking how to combine these paradigms and how to
get our role to work on natural language. And once you do, right, like, then you kind of have the,
well, you have the ability to actually like execute on these different ideas and objectives
in this like extremely robust rich environment given by pre-training. And so, yes, I think it's
been perhaps the most exciting period in our research over the last few years where we've really
found so many new directions and promising ideas
that all seemed to be working out
and we're trying to understand how to compare.
One of the hardest things about RL
for folks who are not practitioners of RL
is the idea of crafting the right reward model.
And so especially if you're a business or an enterprise
who wants to harness all this amazing progress
you guys are putting out,
but doesn't even know where to start.
What do the next few years look like for a company like that?
What is the right mindset for somebody
who's trying to make sense of RL to craft the right reward model.
Is anything you've learned about the best practices
or an approach of thinking,
of using this latest sort of family of reasoning techniques?
What is the right way I should think about
even approaching reward modeling as a biologist or a physicist?
I expect this will evolve quite rapidly.
I expect it will become simpler, right?
I think maybe like two years ago we would have been talking about,
like, what is the right way to craft my fine-tuning data
set and I don't think we are like at the end of that evolution yet.
And I think we will be inching towards more and more human-like learning, which, you know,
RL is still not quite. So I think maybe the most important part of the mindset is to like not
assume that like what is now will be it forever.
So I want to bring the conversation back to coding. We would be remiss not to say
congrats on GBT5 Codex, which just dropped today. Can you guys say a little bit more about what's
different about it, how it's trained differently, maybe why you're excited about it?
So I think one of the big focuses of the Codex team is to just take the raw intelligence that we have from our reasoning models and make it very useful for real world coding.
So a lot of the work they've done is kind of consistent with this.
They are working on kind of having the model be able to handle more difficult environments.
We know that real world coding is very messy.
So they're trying to handle all the intricacies there.
there's a lot of coding that has to do with style,
with just like kind of softer things,
like how proactive the model is, how lazy it is.
And just being able to define in some sense,
like a spec for how a coding model should behave.
They do a lot of very strong work there.
And as you seems like, they're also working on a lot better presets.
You know, coders, they have some kind of notion of,
this is how long I'm waiting,
I'm willing to wait for a particular solution.
I think we've done a lot of work to dial in on, you know, for easy problems, being a lot, you know, lower latency.
For harder problems, actually, the right thing is to be even higher latency.
Get you the really best solution.
And just being able to find that preset is very important.
What's the sweet spot for, if you were to say, like, easier problems versus harder?
What we found is the previous generation of the Codex models, they were spending too little time solving the hardest problems and too much time solving the easy problems.
And I think that is actually just probably out of the box what you might get out of 03.
Maybe just on the topic of coding, since you guys are both competitive coders in prior lives.
I know you've been at Open AI from a decade now, but I was struck by the story of Lee Cidall, the Go player,
who kind of famously quit Go after he lost to AlphaGo multiple times.
And I think in a recent interview, you guys were both saying that now the coding models are
better than your capabilities.
And that gets you excited.
But say more about that.
And how much would you say you code now?
Well, if you're hands on keyboard,
you can talk about Open AI generally,
but how much code is written by AI now?
In terms of cutting models being better,
I mean, I think, yeah,
I think it is extremely exciting to see this progress.
I think, like, the programming competitions
have a nice kind of encapsulated test
of, like, ability to come off with some new ideas.
in this boxed environment and time frame.
I do think if you look at things like,
well, I guess the IMO problem six
or maybe some very hardest programming competitions problems.
I think there's still a little bit of headway to go for the models,
but I wouldn't expect that to last very long.
I do go a little bit.
Historically I've been like...
He's being humble.
Historically, I've actually been like...
really like them to use any sort of tools.
I just used them pretty much.
Oh, yeah. Okay. Old school.
Yeah. Eventually, I think like, especially with this
latest calling tools like GP-T-5, I've really kind of felt like, okay,
like this is no longer the way. Like, you can do a, you know,
a 35-factor like pretty much perfectly in like 15 minutes, like you kind of have to
use it.
Yeah, and so I've been kind of like learning this new way of coding, which definitely feels
a little bit different.
I think it is like a little bit of an uncanny valley still right now where like you kind
of have to use it because it is just like accelerating so many things, but it's still like,
you know, a little bit like not quite as good as a co-worker.
I, so, you know, I think like our priority is getting out of that uncanny valley.
But yeah, it's definitely an interesting time.
Yeah, definitely.
To kind of like speak to the recent moment, I think AlphaGo for both of us was, you know,
a very formative milestone in AI development.
And at least for me, it was the reason I started working on this in the first place.
And maybe partly because of our backgrounds in competitive programming, like I had this affinity
to building these models, which could do very, very well in these forms of contests.
And going from, you know, solving eighth grade math problems to a year later,
hitting our level of performance in these coding contests, it's crazy to see that progression.
And you kind of imagine or like to think that you feel a set of the feelings at least it all felt too, right?
It's like, wow, this is really crazy, right?
And what are the possibilities?
And this is something that I took decades to do.
And it took a lot of hard work to get to the forefront of.
So you really do feel an implication of that is these models, what can't they do?
Right?
And I do feel like already it's kind of transformed the default for coding.
This past weekend, I was talking to some high schoolers and they were saying, oh, you know,
actually the default way to code is vibe coding.
Like, you know, I think like they would consider, oh, it's like maybe sometimes for completeness
you would go and like actually do all of the mechanics of coding it from scratch yourself,
but that's just a strange concept.
to them. Like, why would you do that? You know, you just vibe code by default. Yeah, yeah.
And so, yeah, I mean, I do think, you know, the future hopefully will be vibe researching.
Yeah. I have a question about that, which is what makes a great researcher, right? When you say
vibe researching, there's a big part of vibe coding is just having good taste in wanting to build
something useful and interesting for the world. And I think what's so awesome about tools like
codex is if you've got a good intuition for what people want, it helps.
to articulate that and then basically actualize a prototype very fast.
With research, what's the analog?
What makes a great researcher?
Persistence is a very key trait, right?
I think what is different about research when you're actually trying to,
I think a special thing about research, right,
is you're trying to create something or learn something that is just not known, right?
Like it's not known to work.
You don't know whether it will work.
And so always trying something that will most likely fail.
And I think getting to a place where you are in the mind of being ready to fail and being ready to learn from these failures.
And of course with that comes creating kind of clear hypothesis and being extremely honest with yourself about how you're doing on them.
I think a trap many people fall into is going out of the way to prove that it works, right?
Which is quite different from, you know, like, I think,
believing in your idea,
and I'm thinking of it's extremely important, right?
Then you want to persist that,
but you have to be honest with yourself
about when it's working and when it's not
so that you can learn and adjust.
Yeah, I think there are just very few shortcuts for experience.
I think through experience, you kind of learn,
you know, what's the right horizon to be thinking of a problem,
but you can't pick something that's too hard
or it's not satisfying to do something that's too easy.
And I think a lot of research is managing
your own emotions over a long period of time too.
You know, there's just going to be a lot of things you try and they're not going to work.
And sometimes you need to know when to persevere through that or sometimes when to kind of switch to a different problem.
And I think interestingness is something, you know, you try to fit through reading good papers, talking to your colleagues.
And you kind of maybe distill their experience into your own process.
When I was in grad school, you know, there's a big part.
I'm a failed machine learning research
I was in grad school for bioinformatics
but a big part of my research advisor's thrust
was about picking the right problems
to work on such that you could then sustain
and persist through the hard times
and you said something interesting which was
there's a difference between having conviction
in an idea and then being maximally truth-seeking
about when it's not working and both those things
are sometimes intention
because you kind of go native on a topic
or a problem sometimes that you have deep conviction in
Have you found, is there any sort of heuristics you found are useful at the taste step, at the problem picking step,
that help you arrive at the right set of problems where that conviction and truth-seeking is not as much in zero-sum tension as other kinds of problems?
Yeah, to be clear, I don't think conviction and truth-seeking are really in a zero-sum tension.
I think, like, you can be, like, you can be convinced or, you know, you can have a lot of belief in idea,
and you can be, you know, very persistent in it while it's not working.
I think it's just important that you're kind of honest with yourself.
like how much progress you're making and you're in a mindset where you're able to learn from the failures along the way.
I think it's important to look for problems that you really care about and you really believe are important, right?
And so I think one thing I've observed in many researchers that inspired me has been really going after the hard problems,
like looking at the questions that are, you know, kind of like, you know, wildly known,
but not really kind of considered tractable and just asking, like, you know, why are they not tractable?
Or like, you know, what, like, what about this approach?
Like, why does this approach fail?
I think you're always like thinking about what is really the barrier for the next step.
If you're going after problems that like you really truly believe are important, right?
Then that makes you so much easier to find the motivation to persist with them over years.
And in the development of like during the training phase of GPD-5, for example,
with any moments where there was a hard problem,
the initial attempts that were being made to crack,
that problem weren't working,
and yet you found somebody persisted through that.
And what was it about any of those stories
that comes to mind that worked well,
that you wish other people and other researchers did more of?
I think on the path there, right,
like along the sequence of models,
like above the pre-trained models,
and the research models.
I think one very common theme is
bugs.
And both like, just like,
yeah, silly bugs in software that can kind of
stay in your software for like months
and kind of invalidate all your experiments
a little bit in a way that you don't know.
And, you know, identifying them
can be a very meaningful breakthrough
for your research program.
But also kind of bugs in the sense of like,
well, you have a particular way of thinking
about something.
And that way is a little bit
skewed, which causes you to make the wrong assumptions and identifying those wrong
assumptions, rethinking frames from scratch. I think, you know, both for getting the first
reasoning models working or getting the, you know, larger pre-trained models working. I think
we've had like multiple issues like that we've had to work through.
As leaders of the research org, how do you think about what it takes to keep the best talent
on your team and on the flip side, creating a very resilient org that we've had to work.
that doesn't crumble if a key person leaves.
The biggest, I think, things that Open AI has going for it in terms of keeping the best people motivated and exciting.
Excited is like we are in the business of doing fundamental research, right?
We aren't the type of company that looks around and says, oh, what model did company X build or what model did company Y build?
We have a fairly clear and crisp definition of what it is we're out to build.
We like innovating at the frontier.
We really don't like copying.
And I think people are inspired by that mission, right?
You are really in the business of discovering new things about the deep learning stack.
And I think we're kind of building something very exciting together.
I think beyond that, a lot of it's creating very good culture.
So we want a good pipeline for training up people to become very good researchers.
We, I think, historically have hired the best talent and the most innovative talent.
So I just think we have a very deep bench as well.
And yeah, I think most of our leaders are very inspired by the mission.
And that's what's kept all of them there.
Like when I look at my direct reports, they haven't been affected by the Talon Moors.
I was chatting with a researcher recently, and he was talking about wanting to find the cave dwellers.
And these are often the people who are not posting on social media about their work.
For whatever reason, they may not even be publishing.
They're sort of in the background doing the work.
I don't know if you would agree with this concept, but how do you guys hire for researchers?
And are there any non-obvious ways that you look for talent or attributes that you look for that are non-obvious?
So I think one thing that we look for is having solved hard problems in any field.
A lot of our most successful researchers have started their journey with deep learning at OpenAI
and have worked in other fields like physics or computer science, fear of research.
computer science or finance in the past, strong technical fundamentals coupled with the ability
intent to work on very ambitious problems and actually stick with them. We don't
purely look for who did the most visible work or is the most visible on social media.
As you were talking, I was thinking back to when I was a founder and I was running my
own company and we would recruit for great talent engineers. Many of the attributes you would
described were ones that were on my mind then.
And Elon recently tweeted that he thinks this whole researcher versus engineer distinction is silly.
Is that just a semantic?
Is he just being semantically nitpicky, or do you think these two things are more similar
than they actually look?
Yeah, I mean, I do think they're, like, researchers, they don't just fit one shape.
You know, we have certain researchers who are very productive at OpenEye who are just so good
at idea generation.
and they don't necessarily need to show great impact through implementing all of their ideas, right?
I think there's so much alpha they generate in just kind of coming up with, oh, let's try this or let's try this,
or maybe we're thinking about that.
And there's other researchers who, you know, they are just very, very efficient at taking one idea,
rigorously exploring, you know, the space of experiments around that idea.
So I think, you know, researchers come in very different forms.
I think maybe that first type wouldn't necessarily map into the same.
bucket as a great engineer but you know we we do kind of try to have a fairly
diverse set of research tastes and styles yeah and say a little bit about what
it takes to make like a create a frontier sort of winning culture that can
attract all kinds of shapes and of researchers and then actually grow them
thrive them make them win together at scale what is it a what do you think of the
most critical ingredients of a winning culture.
So I think actually the most important thing is just to make sure you protect fundamental
research, right?
I think you can get into this world with so many different companies these days where you're
just thinking about, oh, how do I compete on, you know, a chat product or some other kind
of product surface?
And you need to make sure that you leave space and recognize the research for what it is and
also give them the space to do that, right?
You can't have them being pulled in all of these different product directions.
So I think that's one thing that we pay attention to within our culture.
Especially now that there's so much spotlight on OpenAI,
so much spotlight on AI in general and the competition between different labs,
it would be easy to fall into a mindset of, like,
oh, we're racing to beat this latest release or something.
And, you know, there's definitely like,
areas that people kind of start looking over their shoulder and start thinking about, oh,
what are these other things? And I see it as a large part of our job to make sure that people
have this comfort and space to think about, you know, what are things actually going to look
like in a year or two? Like, what are the actually big research questions that we want to answer?
And how do we actually get to models that like vastly outperform what we see currently,
rather than just like iteratively improving in the current paradigm.
Just to pull on that thread more on protecting fundamental research,
you guys are obviously one of the best research organizations in the world,
but you're also one of the best product companies in the world.
How do you balance, and especially with,
you've brought on some of the best product execs in the world as well,
how do you balance that focus between the two
and while protecting fundamental research also continue to move forward
the great products that you have out?
Yeah.
I mean, I think it's about kind of,
delineating a set of researchers who really do care about product and who really want to be
accountable to the success of the product. And they should, of course, very closely coordinate
with the research work at large. But I think just kind of people understanding their mandates
and what they are rewarded for, that's a very important thing. One of the other thing is
also helpful is that our product team and
broader company leadership is bought into this vision, right, where we are going with research.
And so, you know, nobody is assuming that, like, oh, the product we have now is the product
we'll have forever and we'll just kind of wait for, like, you know, new versions from research.
Like, we are able to think jointly about what the future looks like.
One of the things that you guys have done is let such a diversity of different ideas and
bets flourish inside of Open AI that you then have.
have to figure out some ways as research leaders
to make it all make coherent sense
as one part of a roadmap.
And you got people over here investigating
the future of diffusion models and visual media.
And over here you've got folks investigating the future
of reasoning when it comes to code.
How do you paint a coherent picture of all that?
How does that all come together?
When there might be at least naively some tension
between giving researchers the independence
to go to fundamental research and then somehow making that all fit into one current research
program.
Our state of goal for a research program has been getting to an automated researcher for
a couple years now.
And so we've been building most of our projects with this goal in mind.
And so this still leaves a lot of room for bottom-up idea generation for fundamental
research on various domains, but we are always thinking about how do these ideas come
together eventually. We believe, for example, that reasoning models go much further and we
have a lot of explorations on things that are not directly reasoning models, but we are thinking
a lot about how they eventually combine and what does, what will this kind of innovation look
like once you have something that is out there and thinking for moms about a very hard problem.
So I think this clarity of like our long-term objectives is important.
But yeah, but it doesn't mean that we are prescriptive about like, oh, here are all the little
pieces, right?
Like we definitely view this as a question of exploration and learning about these technologies.
Yeah, I think you want to be opinionated and prescriptive at their very kind of course level,
but you know, a lot of ideas can bubble up in a final level.
And has there been any moments where those things have been
intention at all recently? Well, one provocative example could be recently, you know, this new
image model came out, which is nano banana, right, from Google. It's extraordinary value shown
that, like, lots of everyday people can unlock a lot of creativity when these models are good
at understanding editing prompts. And I could see how that would create some tension for a research
program that may not be prioritizing that as directly. If one of your, you know, somebody talented on
your team came and said, guys,
like this thing is so clearly valuable in the world out there,
we should be spending more effort, more energy on this.
How do you reason about that question?
I think there's definitely a question
that we've been kind of thinking about for quite a while
at Open AI.
I mean, if you look at GPD3, right, like once we kind of saw like,
oh, like this is kind of where language models are going.
We definitely have had a lot of discussions about, well,
clearly there are going to be so many magical things
you can do with AI, right?
And you will be able to go to
to this like extremely smart models that are out there pushing different tiers of science,
but you will also have this incredible media generation and this incredibly, you know, transformative
entertainment applications.
And so like how do we prioritize among all these directions has definitely been something
we've been thinking about for quite a while?
Yeah, absolutely.
And the real answer is like we don't discourage someone from being
really excited by that. And it's just, if we're consistent in the prioritization and our product
strategy, then it just will naturally fall in. And so it's just for us, like, we do encourage
a lot of people to be excited about, you know, building this, you know, we're building kind
of like agentic products, you know, whatever kind of products that they're excited by. But I
think it's important for us to also have a separate group of people who, you protect that, their
goal is to create the algorithmic advances.
How does that translate, just to build on Anja's question, into a concrete framework around resourcing?
Like, do you think about, okay, X percent of compute resources will go to longer term, you know, very important, but maybe a bit more pie in the sky exploration versus there's also, you know, obviously current product inference, but sort of this thing in the middle where it's achievable in the short to medium term?
Yeah.
So I think that's a big part of both of our jobs, you know, just this portfolio management question.
of how much compute do you give to which project.
And I think historically we've put a little bit more
on just the core algorithmic advances
versus kind of the product research.
But it's something that you have to feel out over time, right?
It's dynamic, I think, month to month,
there could be different needs.
And so I think it's important to stay fairly flexible on that.
And if you had 10% more resources,
would you put it toward compute,
or is it data curation, people,
where would you stick that from like a marginal
good question
honestly yeah I think
compute to compute today
fairly reasonable answer
yeah yeah I mean honestly
I do think kind of your question of prioritization right
it's like in a vacuum any of these things
you would love to like go and excel and win at
I think the danger is you end up like second place at everything
and you know not like you know clearly leading at
anything. So I think prioritization is important, right? And you need to make sure there's some
things you're clear-eyed on. This is the thing that we need to win. Yeah. But I think it makes
sense to talk about it for just a little bit more, which is compute sets so much of, compute as
destiny in a way, right, at a research organization like OpenE Eye. And so, would you, a couple of years
ago, I think it became very fashionable to say, oh, okay, we're not going to be compute constrained
anytime soon because there's a bunch of CMs that are, you know, people are discovering and
we're going to get more efficient and all the algorithms are going to get better. And then eventually,
like, really, we'll just be in a data constrained regime. And it seems like, you know,
a couple of years have come and gone, and we're still like, this is sort of very compute-constrained
environment. Does that change anytime soon, you think? Or... I mean, I think, like, we've seen
for long enough, like, how much we can do with compute. Yeah, I, I think.
I haven't really bought that much into the will-be data constraint claim.
And yeah, I don't expect that to change.
Yeah, anyone who says that should just step into my job for a week.
There's no one who's like, I have all the compute that I need.
Right.
Yeah.
You know, historically the job of advancing fundamental research has historically been largely a mandate that universities have had.
Partly for the compute reasons you just described, that hasn't been the case for,
frontier AI. You guys have done such an incredible job kind of channeling the arc of frontier
AI progress to help the sciences out. And I'm wondering when those worlds collide, the fundamental
world of university research today and the world of frontier AI, what comes out?
So I guess I personally started as a resident at Open AI, and it's a program that we had
for people in different fields to come in, you know, learn quickly about, about AI, and
become productive as a researcher.
And I think there is a lot of really powerful elements
in that program.
And the idea is just like, you know, could we accelerate something
that looks like a PhD in as little time as possible.
And I think a lot of that just looks like implementing a lot of,
you know, very core results.
And you know, through doing that, you're going to make mistakes.
You're going to be like, oh, wow, like build intuition for if I, you know,
set this wrong, like that's going to blow up my network in this way.
And so you just need a lot of that.
hands-on experience. I think over time, you know, there been curriculums developed at
probably all of these large labs in, in like, optimization, in architecture, in RL,
and yeah, probably no better way than to just kind of try to implement a lot of those things
and read about them and think critically about them. Yeah.
Yeah, I think maybe like one other nice thing that you get to experience at academia is like, yeah,
that's like persistence, right, of like, oh, you know, you have a few years and you're kind of trying to solve a problem and it's a hard problem and you've never dealt with such a hard problem before.
And yeah, I do feel like this is a thing that's like, well, currently the pace of progress is very fast.
Maybe also the ideas tend to work out a little bit more often than they did in the past because, yeah, deep learning just wants to learn.
and getting your hands on a more challenging problem for a little bit,
maybe being part of a team, attacking like an ambitious challenge
and getting that feeling of what it feels like to be stacked
and what it feels like to finally be making progress,
I think is also something that's very useful to learn.
How does external perception, reception of a particular product launch,
impact how you prioritize something?
is that is it to the extent where, you know, perception and usage, in the case where they're married,
obviously there's probably a clear directive there, but in a case where maybe they're divorced a bit,
does that impact how you think about roadmap or where you emphasize resources?
So we generally, like, have some pretty strong convictions about the future.
And so we don't tie them that closely to, like, the short-term reception of our products, right?
like, of course, we learn based on what is going on.
We read other papers and we look at what other labs are working on.
But generally, like, we act from a place of fairly strong belief in what we're building.
And so, of course, like, that is for like our long-term research program, of course, when it comes to product.
right? Like, I think the cycle of iteration is much faster.
Yeah.
I think with every launch, we are trying to aim it to be something that's wildly successful
on the product side.
And I think from a fundamental research perspective, we're trying to create models with all
the kind of core capabilities needed to build a very rich set of experiences and products.
And there are going to be people who have some vision of one particular thing that can
it built and we'll launch it and everything we launch we really hope it goes wildly successful and
we get that feedback and if it's if it's not like we'll kind of shape our product strategy a little
bit but yeah we we are definitely also in the business of launching very useful wildly successful
products yeah it feels like because of the on sort of completely unbridled pace of progress
that we've just spent a lot of time talking about a lot is going to change over the next two years
It gets really hard to predict.
I imagine 10 years out, let alone 10 months out.
And so my question, I guess, is through all that change that the frontier of AI is going to bring,
what are some priors that you actually think should stay constant?
Is there anything?
Well, one clearly is that we don't have enough compute.
Is there anything else that you think doesn't change, that you think would be strong, reasonably held priors as constants?
I think more broadly than compute, there is physical constraints of, well, energy, but also, like, you know, at some point not too far, like robotics will become a major focus.
And so I think thinking about like the physical constraints is going to remain important.
But yeah, I do think on the intelligence front, I would not make too many assumptions.
Very few startups can get to the scale that you have, both from a employee perspective, but also revenue count and maintain that breakneck speed that you probably had, I mean, seven, eight years ago when you both joined. What's the secret sauce to doing that? And how do you continue to maintain this pressure almost to ship as quickly as possible, even though, you know, you're kind of on, you know, top now?
I think one of the clearest markers that we have really good research culture, at least in my mind, is, you know, I've worked at different companies before.
And there is a real thing, which is a learning plateau, right?
You go to a company, you learn a lot for the first one or two years.
And then you just find kind of like, you know, I know how to be fairly efficient in this framework.
And my learning kind of stops.
And I've really never felt that at Open Eye.
Just like that experience you describe of all these really cool results bubbling up.
You're just learning so much over week, over week, and it is a full-time job to kind of stay on top of all of it.
And that's just been very fulfilling.
So, yeah, no, I think that's a very accurate description.
We just want to generate a lot of really high-quality research, and it's almost a good thing.
Like, if you're generating enough that you're barely able to keep on top of it.
Yeah, exactly.
I think that's the developer of technology, I think, is a driving force here,
where maybe we would kind of become comfortable after a few years working in a
event paradigm, but we are always on the cusp of that new thing and trying to reconfigure
our thinking around the kind of new constraints and new possibilities that we're going to be
faced with. And so I think that kind of creates this feeling of constant change and the mindset
of like always kind of learning the new thing. Well, you know, one thing,
that came up in our research about things at OpenEAAA that have not changed through a lot of the change,
is the trust that the two of you guys have in each other.
Because I think there was an article or profile of you guys recently in the MIT Tech Review,
and that was also one of the highlight themes that your chemistry,
your trust with each other, your oppose something a lot of the people at OpenEIA have come to treat as a constant.
So what's the backstory?
How did you guys build trust there?
How did that happen?
It's like asking you to, have you ever seen that when Harry met Sally?
I feel like you're on the couch and now you got to talk about.
What's your meet you?
Yeah.
Well, I do think, you know, we started working together a little bit more closely when we kind of had the first seeds of working on reasoning.
I think, you know, we at the time, you know, that wasn't a very popular research direction to work on.
And I think both of us kind of saw glimmers of hope there.
And we were kind of pushing in this direction,
kind of figuring out how to make our work.
And yeah, I think overtime kind of growing a very small effort
into increasing larger effort.
And I think that's kind of where I really got to kind of work
with Jakub in depth.
I think he's just really really
really a phenomenal researcher. I think, you know, any of these rank lists, like, he should be number one.
Like, just his ability to, you know, take any very difficult technical challenge and,
and almost like personally just kind of think about it for two weeks and just crush it.
It's incredible that he has kind of the wide range that he does in terms of understanding,
as well as that kind of depth that you can go and just personally solve a lot of these technical challenges.
Now you get to say some nice stuff about you.
I'm just to say anything nice about me.
Thanks, Mark.
Yeah, yeah, I think the big, kind of the first, like, big thing that we did together was, like, we started seeing, like, okay, like, we think this algorithm is going to work.
And so, you know, I was thinking, like, okay, like, how do we, you know, direct people at this?
And we're talking with Mark like, oh, we should establish a team that's actually going to make this work.
And then, you know, Mark and Mark went and actually did this, right?
actually kind of like got a group of like people working on very different things, like
got them all together and created a team with like incredible chemistry out of like this whole
this third group and that was like such an impressive thing to me. And yeah, I'm really grateful
and as far that I kind of get to, you know, work with Mark and kind of experience that.
Yeah, I think this incredible capacity to both, you know, understand and engage and
and think about the technical matter of the research itself,
but then coupled with this great ability to lead and inspire teams
and create an organizational structure that in this whole kind of mess of chaotic directions
actually is coherent and able to gel together.
Yeah, very, very inspiring.
That's awesome.
Well, on that note, no.
Great note to end on.
Yeah.
Some of the greatest discoveries in science, especially in physics,
have often come from a pair of collaborators, often across universities, across fields.
And it seems like you guys have now added to that tradition.
And so we're just super grateful that you guys made the time to chat today.
Thanks for coming by.
Thank you.
Thanks for being with us.
Thanks for listening to this episode of the A60Z podcast.
If you like this episode, be sure to like, comment, subscribe, leave us a rate.
reading or review and share it with your friends and family.
For more episodes, go to YouTube, Apple Podcast, and Spotify.
Follow us on X at A16Z and subscribe to our Substack at A16Z.com.
Thanks again for listening, and I'll see you in the next episode.
As a reminder, the content here is for informational purposes only.
It should not be taken as legal business, tax, or investment advice,
or be used to evaluate any investment or security
and is not directed at any investors or potential investors in any A16.
15Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies
discussed in this podcast. For more details, including a link to our investments, please see A16Z.com
forward slash disclosures.
