Latent Space: The AI Engineer Podcast - [State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI
Episode Date: December 31, 2025From pre-training data curation to shipping GPT-4o, o1, o3, and now GPT-5 thinking and the shopping model, Josh McGrath has lived through the full arc of OpenAI’s post-training evolution—from the ...PPO vs DPO debates of 2023 to today’s RLVR era, where the real innovation isn’t optimization methods but data quality, signal trust, and token efficiency. We sat down with Josh at NeurIPS 2025 to dig into the state of post-training heading into 2026: why RLHF and RLVR are both just policy gradient methods (the difference is the input data, not the math), how GRPO from DeepSeek Math was underappreciated as a shift toward more trustworthy reward signals (math answers you can verify vs. human preference you can’t), why token efficiency matters more than wall-clock time (GPT-5 to 5.1 bumped evals and slashed tokens), how Codex has changed his workflow so much he feels “trapped” by 40-minute design sessions followed by 15-minute agent sprints, the infrastructure chaos of scaling RL (”way more moving parts than pre-training”), why long context will keep climbing but agents + graph walks might matter more than 10M-token windows, the shopping model as a test bed for interruptability and chain-of-thought transparency, why personality toggles (Anton vs Clippy) are a real differentiator users care about, and his thesis that the education system isn’t producing enough people who can do both distributed systems and ML research—the exact skill set required to push the frontier when the bottleneck moves every few weeks.We discuss:* Josh’s path: pre-training data curation → post-training researcher at OpenAI, shipping GPT-4o, o1, o3, GPT-5 thinking, and the shopping model* Why he switched from pre-training to post-training: “Do I want to make 3% compute efficiency wins, or change behavior by 40%?”* The RL infrastructure challenge: way more moving parts than pre-training (tasks, grading setups, external partners), and why babysitting runs at 12:30am means jumping into unfamiliar code constantly* How Codex has changed his workflow: 40-minute design sessions compressed into 15-minute agent sprints, and the strange “trapped” feeling of waiting for the agent to finish* The RLHF vs RLVR debate: both are policy gradient methods, the real difference is data quality and signal trust (human preference vs. verifiable correctness)* Why GRPO (from DeepSeek Math) was underappreciated: not just an optimization trick, but a shift toward reward signals you can actually trust (math answers over human vibes)* The token efficiency revolution: GPT-5 to 5.1 bumped evals and slashed tokens, and why thinking in tokens (not wall-clock time) unlocks better tool-calling and agent workflows* Personality toggles: Anton (tool, no warmth) vs Clippy (friendly, helpful), and why Josh uses custom instructions to make his model “just a tool”* The router problem: having a router at the top (GPT-5 thinking vs non-thinking) and an implicit router (thinking effort slider) creates weird bumps, and why the abstractions will eventually merge* Long context: climbing Graph Blocks evals, the dream of 10M+ token windows, and why agents + graph walks might matter more than raw context length* Why the education system isn’t producing enough people who can do both distributed systems and ML research, and why that’s the bottleneck for frontier labs* The 2026 vision: neither pre-training nor post-training is dead, we’re in the fog of war, and the bottleneck will keep moving (so emotional stability helps)—Josh McGrath* OpenAI: https://openai.com* X: https://x.com/j_mcgraphFull Video EpisodeTimestamps00:00:00 Introduction: Josh McGrath on Post-Training at OpenAI00:04:37 The Shopping Model: Black Friday Launch and Interruptability00:07:11 Model Personality and the Anton vs Clippy Divide00:08:26 Beyond PPO vs DPO: The Data Quality Spectrum in RL00:01:40 Infrastructure Challenges: Why Post-Training RL is Harder Than Pre-Training00:13:12 Token Efficiency: The 2D Plot That Matters Most00:03:45 Codex Max and the Flow Problem: 40 Minutes of Planning, 15 Minutes of Waiting00:17:29 Long Context and Graph Blocks: Climbing Toward Perfect Context00:21:23 The ML-Systems Hybrid: What's Hard to Hire For00:24:50 Pre-Training Isn't Dead: Living Through Technological Revolution This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Transcript
Discussion (0)
Light and space
Night
Wark up
Light and space
Welcome
How is you introduce this up
Yeah, I work on a bunch of the thinking models
at opening eye
And like recently I've been sort of focused on doing
search related stuff
But yeah, just a post-training researcher
At Okinae.
Yeah, and you were on with us for GPT4.1
We're talking with Michelle who is on
maternity leave, I didn't know that. And now we're 5.1. It's been a whole generation.
Yeah, it's been wild. And like, you know, 4.1 was a non-thinking model. And then since then I,
you know, we sort of switched into doing. Is that you the last? Was it your last?
No, we're so we still are releasing non-thinking models. But that one was the one that we did that
was like API specific non-thinking. So, you know, focuses shifted a little. Yeah. How'd you
get into post-training?
So previously, before opening I was doing like pre-training data curation stuff.
And I think what I was seeing from like the news and looking at papers is like, oh,
it seems like a lot of that you, not pre-training is dead, but I was like, oh, there's going to be
so much interesting stuff in post-training.
And at that point, I was like, I really want to like make some contributions there.
And I mean, it's not even necessarily that like pre-training was dead, but it was definitely
changing and like, you know, do I want to make compute efficiency wins of like 3% or do I want to
like change the behavior by 40%? And honestly, it just seemed more exciting to go to post-training
and many late nights later. That's definitely true. It's a different kind of data and engineering
discipline too. It's very strange. Like the the kind of work that you need in especially RL,
like scaling it. Yeah, definitely. I think like, for example, the number of moving
parts in an RL run is just a lot higher.
Like, in some ways...
You do order of magnitude or...
I don't know if I could do order or magnitude,
but if you think about, like, pre-training,
you know, you're moving tokens to many machines,
and then you're getting, like, basically a scaler from them,
and then you're back propping.
Yeah.
The issue with RL is, like, you're doing tasks,
and each task could have, like, a different grading setup.
And each one of those different grading setups,
that's, like, more infrastructure.
And so, you know, when I'm staying up late,
trying to figure out what's going on with a run.
It could be in way more things than there isn't a pre-training run generally.
Yeah.
And does it matter if you own the code of the task or is it an outsourced third party person?
Or, you know, my sense of it and the external sense of it, obviously I don't see it up close,
is that you work a lot of external partners.
And I'm sure also some internal stuff.
But which is better?
Honestly, I don't think I'll comment too much on like how many external partners
There are some and there's some internal.
Yeah, there's, we do like...
The technical trade-off of like, well, shit, like, I don't own this code.
Oh, okay.
So, well, when it comes to I don't own this code, actually, like, when, you know,
when I'm babysitting a run or something, it doesn't really matter if it's like internal,
external, whatever.
Like, do I understand the system that's going underneath?
And I think you end up having to, like, jump into a lot more code that you're like,
I actually don't know what this does.
because I'll be watching the, you know, I work on my pieces of a run.
And then there's also, you know, other people working on it.
And like, do I understand what their code is doing?
So that way, like, 1230 in the morning when I'm like, something looks wrong and it's, I'm like looking at this code, can I like get context fast enough to understand?
Throw a code I said it.
Oh, I use codex so much.
It's really changed how it work.
I feel like there's a degree to which like sometimes I feel trapped by codex, because if I'm,
I spend like, you know, 30, 40 minutes writing something that looks like a design doc or something,
Codex can do more work than I can do in a few hours in like 15 minutes.
But then, like, what do I do during those 15 minutes after?
And like the, it's actually just like really changed how the flow of my day goes because
I have to somehow now manage these like 40 minute sessions with like 15 minutes where like
I could do something. But it's actually not nearly as effective as like,
this new flow to the day.
So I think I'm still getting used to that, honestly.
Yeah, yeah.
I think it should be interesting for, like,
also just code-based understanding
when you're encountering unfamiliar code.
Absolutely.
So you,
briefly, before we started,
talked to a little bit about the shopping model,
which is like the latest hottest thing.
And obviously, we're just recording it
right after Black Friday, Saturday, Saturday,
every Monday.
First of all, any interesting findings
from basically releasing shopping in Chachabit
right into that period?
Okay, well, I think the first.
thing is, I don't know why I would say in a meeting in, you know, August or so, like,
oh, hey, Black Friday's coming up. Like, maybe we could, maybe we could do a release by them.
In hindsight, like, wait, why would I say somewhere like that?
They're like, yes. Now you own it. Yeah. Exactly. I guess the most interesting thing to me is
the new interruptibility and like the sort of qualitative experience of using it. And the same thing
happens with Codex, right? Like you, you write a prompt and you can like press escape and say
like, oh, I, like, I mess something up.
And we actually did the same thing in the shopping model.
So it shows you its chain of thought with, like, what products it's looking at.
And you can write it new messages saying, like, oh, you know, I actually wanted
doing this.
Yeah, like, I wanted USBC on this or whatever it is.
And, like, I think that's a really new interesting, like, interaction paradigm that we have
in a couple of our different services.
And I'm excited to see how people use it and if they enjoy it.
Yeah.
Why did it have to be its own model and not just, like, a new tool?
Stay tuned. I think, like, there's no reason that we couldn't do it in the same model eventually,
but I think, you know, if we want to try out new things, sometimes it makes sense to make a new model.
And I think it just made sense to this time say, like, can we do a deep research style model, but like for shopping where it's going to look really hard all across the internet for different things?
You know, I think if you look at like deep research, the original one and GPT5 thinking on like high reasoning today, I think you'll see that like eventually the model is all sort of.
converge in their capabilities.
Yeah. Would you say that this is a
discussion also a little spicy that I've kicked off
in the community? There's still
maybe 30% of the community is still
using deep research. A lot of them have moved
over to just using Five Thinking as deep
research. Is that the spiritual
successor, are they direct replacements, are there
things that we lose in
the original deep research model if we do that?
I mean, I think if you look
at our published e-vals, they
look like basically on par
if it's not better. So like, I mean, that's
personally what I do. I use thinking on high versus using the deep research model.
But like, you know, I think every, as we've learned over the past few months, there are sometimes
people prefer the quirks of like one model over another. And so people like the deep research
model, you know, more power to them. People like 40.
Anything special in the 40 post-trading that like, are people like really responding to
personality? Is that like a differentiator that people really care about? And it's a part of your
job to care about personality. Yeah, I mean, definitely people like care quite a bit about
personality. I think like over the past few months, we've been working a lot on giving users
more choice over what personality they want. Right, which is the toggles. Yeah, yeah. So now
we have those toggles. What's your favorite toggle? Honestly, custom instruction for like,
I want, I personally want my model to like be a tool. And so like, I don't, I don't necessarily
like want the warmth or anything. I just want some answers because I'm, you know, mostly using it at work.
Yeah, so I call this the Anton versus Clippy Divide. So Anton is the,
Silicon Valley HBO.
Okay.
Is it a machine?
It only does work.
It doesn't try to be helpful or friendly or anything.
It tries to be helpful, but like doesn't try to be cheery.
Or as Clippy tries to be cheery.
And I'm like, well, stop smiling at me.
I'm like having problems.
So it sounds like you also come down on the side of like using it.
Anton.
Yeah.
I think a lot of developers want Anton.
Yeah.
They're just like, it just quietly does its work.
And when it's done, it shuts up.
Yeah.
Yeah.
Well, I think like we're doing a lot of work to provide both like,
People, Anton's and Clifties, and I hope they all like it.
Yeah.
So just generally, I was thinking about, like, well, what can we update people on post-training?
You know, what do we know today in Neuros 2020-5 that we didn't know in New York 2024?
I would say, like, a lot of people at the time, there's still like this whole PPO versus DPO discussion that was there.
That was the whole era.
Yeah.
And since then, we've moved on to RLVR.
and I think a lot of like agents specific RL training.
I guess like am I missing any large chunks of the post-training debates that are going on?
Yeah, I mean, so not necessarily debates internal,
but like my read personally from like looking at different papers that are coming out,
when you look at like an RLVR paper or like a RLHF paper,
they read more like an optimization paper.
And to me like the sort of interesting thing that's going on is we have this like spectrum
of how high quality a signal is.
So, like, really, at the end of the day, like, RLHF, RLVR,
they're both policy gradient methods,
but what's different is just, like, the input data.
And it's always interesting to me that we call RLHF non-verifiable,
because we've trained this model to be good at, like,
predicting human feedback.
So in some sense, that's, like, verification.
But obviously...
It's human preference rather than truth.
Yeah, yeah.
But, like, if the...
If, like, your value...
of truth is like does the user like this more? Like there's there's something
strained that I think we haven't like looked at that axis of okay well how like sort of clean
is this signal how much do I trust it? And like I totally agree that you know you don't
necessarily trust the RLHF signal as much as like is this the solution to this polynomial.
But I think there's a whole spectrum of like how high quality is a signal what's going to happen
when I like do a lot of optimization against it. And that's very different than I think
worrying about like the variance of different gradients, which I think is what you end up seeing in a lot of the
papers that are currently coming out, rather than being like very data-centric. They're pretty
optimization-centric, even though I think the innovation really is where the data is coming from.
Yeah. And before, I want to go broad before I go deep. Yeah. Any other discussions that maybe having
in Europe's or sort of run about this time on post-training debates? Like what are, what are,
you meet your peer at Anthropic and Deepvine? What do you talk about?
Well, anthropic and deep mind, we're all saying I'm working on stuff and things.
You know, we're not...
And I think, like, it's more so talking a lot more broadly with my friends there.
Or we're just talking about, man, the infra's so hard to keep up.
We're not necessarily talking too much about methods directly.
Because on one level, it kind of doesn't matter.
Yeah.
And I think also, like, there's something that's very different about academic work where, like,
what really matters is how narrativeizable it is.
And I think that's one of the reasons you see a lot of optimization papers come out
is a lot of the data work, there's a less clear narrative around it.
I think the data and the scaling is actually more important than the specific.
Yeah, but it doesn't have like necessarily the same narrative that you get out of like some of the papers that you see here.
And so like there becomes more of a like given a specific vertical, how do I like understand that?
And I worked there was actually more papers on it here,
but I think it can sometimes be harder to wrap up into a clean story.
Yeah,
that's also something that,
like,
where we're actually having a lot of conversations about with other folks as well.
Like,
what's next,
right?
Like,
what do you go from here now that we have,
like,
some kind of roadmap?
I think what's interesting also for me is,
I guess the innovations that are exposed by the Chinese models
are maybe copies or,
like, discussions of what's going on in the labs.
I think obviously GRPO, you mentioned a lot of these RL optimizations.
They come on as, they present themselves as optimizations.
JRPO came out in the deep seek math paper, which when it came out, I read it and I was like,
okay, this is kind of cool.
It's like a little bit cheaper.
But like it does seem to have more broad impacts, I think, on the industry as a whole
than was initially appreciated.
I just want to, I don't feel like we've processed that enough.
Yeah, definitely.
I mean, like, yeah, as you said, it came out in the deep seek math paper.
and like it's an interesting optimization method,
but it's like the more interesting thing
that they have a new reward signal
that we can really, really trust.
Like when, you know, you find the answer
to a math problem,
it's a lot less debatable than like,
oh, well, is this thing that the human preferred
actually what we want to do?
Yeah, like, you want to be right at math.
Yeah, yeah.
And so I think in some ways,
that's underappreciated in,
I would say, what's getting published.
Yeah.
Yeah.
Let's talk about, I guess, Law Horizon.
Yeah.
What do people consider in terms of like very long horizon?
Like we're talking like 30 hours, you know, more than more than a day of autonomy.
Does this is it just more of the same or are there anything like sort of qualitatively different?
Okay.
So first off, what I would first say is I tend to think more in terms of like actual number of tokens than than time.
Because I think.
Yeah.
The human in the loop can take a while.
Yeah.
Well, and also like it gives you a different measure to optimize against, right?
Like as I was saying earlier with, um, when I used.
use codex. It does something that would take me much longer. It would take me like four hours in
10 minutes. What we can actually push on there is token efficiency. So like, yeah, and that has a
huge, huge research area. Yeah. And so you can see like from 5 to 5.1 our overall eVals,
you know, we bumped some. But if you look at a 2D plot of how many tokens it takes for us to get
that, it went way down. And so I think that's like a difference.
when you had that? Like, that was such a great chart.
Dude, I live by those charts. Like, that, that was your chart? Okay.
Not necessarily that, but like, that shape of chart. Like, I think that's something that we think
about a lot, just because it contributes so much to your experience, like, how long does it take to
to do this task? Yeah. And I think the other thing is, as you're pushing that token efficiency,
it changes, you know, how many tool calls can I make? And, like, how many different things can the agent
do in a reasonable number of tokens that we can actually serve.
And so I personally think in terms of tokens, yeah.
I think the interesting thing or the hard to understand thing from the outside
is having an explicit router in GPD-5, but then also basically having an implicit router
in terms of the thinking, spending thing, that conflates a little bit, right?
Like at some point, you do kind of need to merge them or else you're just going to get these
weird bumps where sometimes the router at the top,
decide something and it's wrong.
And actually, if you just handed it to GPD5,
it would have figured it out.
Yeah, and I think, you know,
we'll figure out the correct abstractions over time.
I think, like, there's a...
Is the intention still to merge?
Because that's what it was said in the paper.
Yeah, I think, like, eventually, you know,
we'll have AGI and, like,
you're not going to have to worry too much
about how hard to think directly.
It'll just, you know,
we'll have one tool that you always go to,
and it knows how long to think for
and things like that.
I think that the abstractions and the way
that we drive these things today,
it'll change. And like, you know, I think even the amount that we've changed from, you know, having a nom-thinking model to you can choose between two. And like, you know, now we can sort of route and how hard do you want to think? We're adding lots of knobs and, you know, eventually it'll probably simplify.
Yeah. Another super interesting knob that everyone is doing is context compaction or memory compaction. What's going on there?
Nothing to share at the moment. Let me share. Clearly an important feature, clearly inspired by Codex usage as well, obviously.
But I think, like, from the engineers' point of view, it feels like I used to do that as part of my harness, and now it's not the models doing it for me.
And I don't know how to think about that, like, in terms of, I guess, I'm used to having more control, and now I have less.
Yeah, is there a specific?
Like, there's no specific question.
I'm just getting, like, feedback on, like, well, is this a trend that, like, we need, where you, it's basically a permanent fact of life from here and out.
Oh, I see.
You know, I don't know.
I worked on long context.
That was why I was on last was for 4.1,
where we, you know, I think 10xed the effect of context window for 4.1.
And so there always be some dance of like, well, if we want to push as much as what we can do,
not only should we increase the length of the context window,
but like we should also have strategies for keeping that context window available for as long as possible.
I'm guessing that both things will sort of happen just because we want to put as much power into the models as possible.
Yeah.
Yeah, I think we're still in a period where we should all be expecting changes in the interfaces that all of the models give to us.
That way we can improve the models.
Because if we walk the interface, I think what would be sad from my perspective is if we walk the interface, if we discover something new about models,
we might sort of trap that improvement under an interface that needs to change.
Got it.
Talking about long context as well, there is some discussion about, I guess, context rot or like the utilization of the context.
even if you gave us like a million token context,
probably wouldn't use all of it.
What's the recommendation there?
Where are things going?
Are we going to have, I guess, perfect context by next year?
Is that an impossible dream?
I don't know.
No, it's not an impossible dream.
I think I'll give a shout out to some of the e-vals that we did 4.4.1
called Graph Walks.
I love Graph Walk.
We covered this in the podcast.
Yeah, yeah, we did.
I think if you look over time,
all of those e-vals are so fine.
And I think one of the interesting things about that is you have to do complicated transformations across the entire context window.
Like that's sort of the issue with those heat map plots of the those different.
A needle little piece there.
Yeah, but the problem is if you only have to sample from one point in the context window, it's like sort of easy.
Whereas with those graphwalks problems, you're having to do multiple transformations across the entire context window.
And so I think keep watching those.
I think they've been climbing.
They'll continue to climb.
I would say that that's definitely like a temporary issue that we are climbing on over time.
Yeah.
So, and then like, is 10 million tokens realistic?
Is 100 million?
Like, where does, is there a natural end or there's no end and we just are going as far as the eye can see?
Oh, gosh.
I don't know.
Like, what do you think?
Yeah.
I feel like, okay, there are use cases that require billions.
And there are use cases that require many, many billions, maybe trillions.
Yeah.
Out of curiosity, like, what would be billions of tokens?
we just had a context engineering discussion about like a ad code base over support issues for
a company and it was 100,000 documents totaling about 8 billion tokens. You can't stick that in
a context window for now. That's fair. I guess the, so I would still say like I don't know,
but I think I've been like really surprised. It reminds me of when I was doing like more
information retrieval stuff and like BM25 and these like very simple like Ngram indexes were like
just super hard to beat.
I think the agents with Grette are like,
they feel really similar to me
where it's like,
it's just unreasonably effective.
So then I will not use your 10 million token context window,
even if you gave it.
Maybe, but like,
what if we're using that context window
in service of like some larger goal
that just has a lot of sub-search calls?
Which is why I'm saying, like, I just don't know.
And I think that's what makes it so exciting.
Yeah, yeah.
I would say also like the other other modalities
like video would eat up a lot and like then obviously the heart sciences have proteins and
all that which a lot of information is encoded in in physics so so I mean yeah I I'm mixed feelings
about it just because I'm like well this will never scale not with like full attention and
we we probably just need to invest in systems anyway which means we're good with what we have
I mean like get your graph walks up
But like, I don't know if we need like 10, 100 X,
when actually maybe we need to figure out ways to 1,000, 1 million X.
Yeah.
Right?
Like, these are just different slopes.
I mean, I'm glad that you're happy with the current context windows.
I think my dream would be to push it and see what happens anyway.
But the engineers, the engineer's incentive is always to say,
well, the systems matter more than the models.
And the researchers' incentive is to say, well, screw your systems.
Well, we'll just put the models.
Oh, no.
It's so differently.
Yeah.
I think that's one of the most like sort of beautiful things about post-training and opening eye is everyone.
Co-design.
Yeah, it's also co-designed.
Like, you know, I spend a lot of time just doing our system stuff.
And I also do lots of stuff like where I'm making graph walks.
And I'm like doing a lot more like things on the learning side.
And I think it's a great culture to have a place where people just move seamlessly between the two.
Yeah.
What are you guys hiring for?
Presumably you're hiring.
What are you guys hiring for that is hard to hire?
What is the skill set that it's like, we really need this, can't find it, please everyone go skill up on this.
As my definitely personal opinion here, I think we're still having trouble, not at Open AI, but I think as a whole, producing lots of people that do want to do lots of both systems work and ML work.
And I think if you're trying to push the frontier, you don't know which place is currently bottlenecking the frontier.
And it changes all the time.
I mean, even within one project, it might change multiple times where the current bottleneck is.
But I think the education system we have right now isn't really optimized for that.
So, like, I personally, I studied math and then I was very, very lucky to have some, like, great mentors after school that, like, taught me to be a good software engineer.
But it seems like if we're going to be in this place for a while, and I think we will be, we should probably be producing more students that are great at doing both, you know, distributed systems and, like,
a lot of core engineering, as well as the statistics and other, like, things that are required
to be a good machine learning research.
If we were to throw codex at it, obviously, we can't do codex at everything.
That's why you still, let's say, which will progress faster, which is more solvable by LLM?
That is a, that's a spicy question.
You can't say they're both equally hard.
I don't know.
Maybe they are.
I mean, they're differently hard.
I think one is more hill climbable than the other, which is it?
Because then we can go do it.
Okay.
I think one thing that's slightly simpler about some of the ML research,
like, you know, ML research is also distributed systems, to be clear.
But like some of the things that I would say like get traditionally called ML research
are things that you can treat a bit more of as a black box,
whereas like, you know, the environment to train on, you know,
building these different systems is actually just like complicated.
engineering problem. And so
theoretically, I would say that they're like
probably roughly equal.
But I think that there's some
there's some amount of effort I feel like to
making the, the environments for it.
Yeah. But they require
yes. Yeah. They require GPUs
in themselves as well.
Yeah. Yeah. I guess they both would. But yeah,
that would be my guess.
But I don't have my confidence
in it. So a lot of people are building
this like AI scientists, right?
They automate research. You guys.
have your benchmark on taper bench and though that's the one area that um like for example at cognition
we've just decided to not do because it's so hard okay any other people on a post training team
they're going to shot out have done like interesting work this year they should get more attention but
they're they're not getting credit well okay for sure everyone on the shopping team that i was just
working with so like Andrew Hoyle um anukistrata john hallman all all great people yeah isa
whole friend, obviously the manager for it.
And she was the original deep
deep research person? Yeah, yeah.
There was like three of them. Yeah, yeah.
And so definitely that part of the team. But I mean,
everyone, everyone is so great. Like, I think it's hard
to take about a list. It's a
really fun time on
on post training right now. It's exciting every day.
Yeah, it feels like
we're all enjoying our Diet Coke together
in the office late at night.
Yeah. Oh, I did want to squeeze this
in before we end. Nobody actually
serious is saying that pre-training is dead.
It's just a meme. There's a lot of work going on in pre-training.
And in fact, actually, a lot of my researcher friends are saying too much money is going to post-training.
That's also spicy. I don't know.
One of the charts I hold in memory from this year is the GROC 4 chart.
I don't know if you seen it.
But it's basically saying, well, we scaled pre-training to here and about this level of compute.
And now we're spending the same level of compute on post-training as well.
That's very controversial, I guess, to me, because we're all used to post-training taker,
taking orders of magnitude less data, compute, whatever. And obviously we're scaling that up now.
Do we get to a point where they're equal? I don't know. But that's a topic for conversation.
I think how much do we invest in this versus more like different pre-trading?
Yeah. Yeah. So first off, neither one of those that I think it's really interesting to sort
of be living through something that I, you know, all my other like historic or technological revolutions,
things that I read about in history books.
And like, this was live as this happened.
Yeah, this one, we don't know the end yet.
Yeah, and so there's this almost like fog of war where I'm like, oh, did people think that like,
we got like the steam, like the steam engine and they would have, you know, the factories.
I don't know if you know this, but like the factories, they used to be like very linear
because you had to drive like one motor across it in an entire room.
And it made it so when electricity got developed, they just tried to do the same thing.
And they're like, ah, this isn't all that useful.
And it took, I think, like a couple of decades before.
they realize, wait, if we have electricity, we can move the little, like, stations in whatever
is most ergonomic. And then, you know, manufacturing was transformed by electricity. And I think,
like, it really gives me no confidence in being like, oh, this thing is dead. Yeah, our timelines
are so short. Yeah. Yeah. Yeah. The way, like, good ideas get experimented and funded and
propagated, actually, that's still a human timeline. It's not on AI timeline. Yeah. Yeah. And so I think,
like things will maybe be like dormant, but it'll be spiky. Like there'll be all some, you know, some,
yeah, yeah. And then we'll all feel different. It's like we're, what's, what's the meme? It's so over.
We're so back. Yeah. It's going to be that many times. And I think having like a, some, some emotional
stabilizing to it is probably going to be good for, for everyone's sanity. Yeah. More sanity. Well,
thank you so much for joining. Thanks for all the great post-training this year. Yeah, thank you.
Yeah, continue giving feedback.
I love to hear what you think.
Yeah, awesome.
