Your Undivided Attention - Behind the DeepSeek Hype, AI is Learning to Reason
Episode Date: February 20, 2025When Chinese AI company DeepSeek announced they had built a model that could compete with OpenAI at a fraction of the cost, it sent shockwaves through the industry and roiled global markets. But amid ...all the noise around DeepSeek, there was a clear signal: machine reasoning is here and it's transforming AI.In this episode, Aza sits down with CHT co-founder Randy Fernando to explore what happens when AI moves beyond pattern matching to actual reasoning. They unpack how these new models can not only learn from human knowledge but discover entirely new strategies we've never seen before – bringing unprecedented problem-solving potential but also unpredictable risks.These capabilities are a step toward a critical threshold - when AI can accelerate its own development. With major labs racing to build self-improving systems, the crucial question isn't how fast we can go, but where we're trying to get to. How do we ensure this transformative technology serves human flourishing rather than undermining it?Your Undivided Attention is produced by the Center for Humane Technology. Follow us on Twitter: @HumaneTech_Clarification: In making the point that reasoning models excel at tasks for which there is a right or wrong answer, Randy referred to Chess, Go, and Starcraft as examples of games where a reasoning model would do well. However, this is only true on the basis of individual decisions within those games. None of these games have been “solved” in the the game theory sense.Correction: Aza mispronounced the name of the Go champion Lee Sedol, who was bested by Move 37.RECOMMENDED MEDIAFurther reading on DeepSeek’s R1 and the market reaction Further reading on the debate about the actual cost of DeepSeek’s R1 model The study that found training AIs to code also made them better writers More information on the AI coding company Cursor Further reading on Eric Schmidt’s threshold to “pull the plug” on AI Further reading on Move 37RECOMMENDED YUA EPISODESThe Self-Preserving Machine: Why AI Learns to Deceive This Moment in AI: How We Got Here and Where We’re Going Former OpenAI Engineer William Saunders on Silence, Safety, and the Right to Warn The AI ‘Race’: China vs. the US with Jeffrey Ding and Karen Hao
Transcript
Discussion (0)
Hey, everyone. It's Aza. Welcome back to your undivided attention. So today we are going to be doing actually a bit of a special episode. It's going to be me here with our co-founder, Randy Fernando, who was at Nvidia for seven years. And what we really want to do is give you some insights into the latest set of AI models that came out. So these are Open AIs Oste,
three, Deep Seeks R1, and actually they're following on Open AIs 01 from a little, a couple months ago.
And we want to talk about what makes them a big deal, why we have switched into a new paradigm in how these models get trained and like what's going on behind the scenes.
So first, Randy, thanks for joining me.
Glad to be here.
First place to start is, you know, this new model from China, Deep Seek R1, it dropped.
And it ended up creating this frenzy in media.
it shook global markets.
The hype is quieted down.
And actually, you know, I think that the drop in global markets was very irrational.
But let's talk a little bit now about what makes this a key inflection point in AI tech.
I think there were several things, right?
And I'm not sure exactly which order to go, but I'll just name a few.
One was low-cost, high-performance reasoning.
Like, it actually performed well, and people used it, and that was really impressive.
Now, there are some asteris about the cost because the cost didn't account for the GPUs, the salaries.
Just to jump in, there's a widely reported number that between $5 to $6 million, this Chinese lab was able to make a model as good as OpenAI's O1 model.
And this, if this is true, that means that the big labs no longer had a frontier.
competitive advantage. Everyone could be making these, but of course, that number I think was
inaccurately reported. Yeah, exactly. And there's some debate about that, but I think our goal
today is to give you some principles to think about this rather than like nitpicking every
detail. That's right. Clearly, there were some really smart implementation, algorithmic
optimization. There's just a lot of smart things that were done to do it all efficiently. That's
true.
O3 still performs better.
I think it's important to remember that because amidst all the hype, I think people kind of,
some people lost track of that.
O3 performs better, but it uses a lot more computation and cost to get there.
The open weights, the published methodology, right?
So the deep sink R1 paper talks a lot about exactly what they did and this process called
reinforcement learning, right?
where the model is able to try out lots of different experimental ideas,
score them, and then keep the best ones, right?
So it's allowed to be very creative.
Try out lots of different answers to problems,
different sequences of steps, different recipes, right, to solve that problem.
Some work, some don't work.
And then it's able to figure out, yeah, these are the ones I should keep.
These are the ones I should toss.
And that worked really well.
And this paper kind of documents the process for doing that.
plus since all the weights are open
right this is now the new baseline
that anyone can have anyone who's serious
can have access to right in an open way
so that's a that's a big game changer
yeah and so now I want to walk everyone through like
what makes 0103 and R1 really different
Randy was just referring to them
so let's start with the large language models
so these are you know the GPT 4s
the llamas that everyone now sort of is aware of
and the way those work is they are trained on the entirety of the internet or lots and lots of images
and what they learn to do is to produce text or images in the style of so it can do produce text in the style of
Shakespeare produce text in the style of being empathetic produce text in the style of a good chess move
but it doesn't really know what's going on it's not thought about it it's just doing a very large-scale
pattern match and coming with a knee-jerk reaction.
And that has a limit to how good it is.
Can I add a little bit?
Yeah, absolutely.
It's just patterns show up everywhere.
I just want people to recognize how often patterns show up in our life, right?
When you look at language or vision or music or code or weather or medicine, there's patterns
in all of these, right?
Whether it's words or pixels or like audio waveforms or like syntax.
in code or on a map, like which cells or which color, right?
Or where there might be a cancer on an image.
All of these things come in patterns.
And so once we can learn those patterns and models can learn to extrapolate those patterns,
they can become good at all sorts of things that are important to us as humans.
That's great.
That is great.
And another way of saying this is that AI, you know, these are language models.
it can treat absolutely everything as a language.
You know, obviously, language is just a sequence of words.
It's a language.
Code is just a sequence of special words.
It's a language.
DNA is a sequence of, you know, ATG, C, just another language.
Images are a sequence of colors, just another language.
So if you can learn the patterns of those different languages,
then A can learn to speak and translate from the language of everything.
And the important thing about language models is that they're learning really to babble
in a convincing way in all of those languages.
And that's where you get all the hallucinations and confabulation
because it's just giving a statistically representative pattern
at a very large scale.
Okay.
So then along comes R1, O1,03.
And what makes these difference is it's almost like a planning head
that's placed on top of the intuition.
So let me give a really specific example of how this works,
where let's imagine you've trained a language model on chess moves.
So now it can come up with a good intuitive next chess move
given the board state.
And that can be as good as a very good chess player,
but not better than the very best or grandmasters.
Because it's just giving an intuitive hit.
It can't do better because it's only trained.
If it's only trained on what humans have done,
it can't do better than the humans, right?
So that's a really important concept.
And now it's just about to jump into why now we can transcend that.
That's exactly right.
And it's a really important point because often people will push back
and they're like, but, hey, I can't get better than humans.
because it's only trained on human data,
so how could it possibly get better?
Well, when you or I play Gary Kasparov and chess,
we'll lose, or at least I will.
I don't know.
Oh, me too.
I play, but I'll lose, yeah.
Why?
And the answer is because he both has really good intuition,
because he's played lots of games.
And two, he's very good at thinking through all the different scenarios.
If I make this move, then they'll make this move,
so I'll make this move.
So I'll make this move, they'll make that move.
And I'll make that, aha, now I'm in a good,
position. So there's this sort of tree of thoughts that Gary Kasparov is exploring based on
his very good intuition. Now, you or I are going to do trees of thought, but our intuition is
not that good. So we're going to make lots of false steps. He's going to search all the most
important trees very quickly. And hence, he will dominate us. Well, that is the ability that
O-1-R-103, these reasoning models are starting to have, that they can use their intuitions from
their language model, and then create trees of thought, sort of very smart trial and error,
to search over what good moves are. And in that way, you can make a chess AI that is better
than every human being forever. Yeah, exactly. And another way of underlying this is to say that
just like the patterns we talked about that exist in audio, video, images, like all of these
things, reasoning also follows patterns, right? There's recipes of thought. And so,
So think of it, you can kind of think of it as if you're cooking, there's a recipe.
You can modify certain parts of it and you can get to different types of dishes, right?
And this is the same thing.
Like when you're solving a problem, there are playbooks that we all use to solve problems.
And now we've taught it, right?
You just give it a few of the main recipe types.
And then it can play around from that baseline and try lots of new stuff.
A really important thing, as I said, is some of those new ideas are going to be things we've never seen before.
Some of those we'll understand, but there's also going to be variants that we don't even understand.
And that starts to have big implications for other problems, right?
Things like deception, safety, transparency, right?
Like, how do you understand what a model is doing when it's using reasoning that you can't even follow?
So this is all coming, right, as part of this big leap that we've just taken.
And Randy said, it can feel a little meta, but it is so important that reasoning itself has a set of patterns that if you learn them, you can get better at reasoning.
So I think we're going to stop seeing these big model jumps from GPD 3 to 3.5, 4 to 4.5 to 5.
There are a couple more still coming, but we're going to enter a new regime.
where there is now a way of if you pour more compute in,
or the AIs can get better.
You just shovel more money,
and they will continue to get better.
Let me explain how.
So let's go back to the chess example.
With the chess example,
maybe your language model has an ELO score of 1,500,
ELO meaning just a way of ranking chess players.
And you now add search on top of that,
reinforcement learning or planning.
So it's looking at all the various paths,
and it starts to discover better moves.
Maybe it's a little bit better.
So maybe it's like Elo 1505 or something, just a little bit better.
You then distill, that is, you retrain your original model, your intuition,
to now have the intuition of that 1505, the slightly better player.
And then you just search on top of that.
And now you can discover 1510 moves, and then you distill.
Now you can discover 15-15 moves.
And you can see how you can consistently go from, you start with your base model, your intuition,
you think or reason over the top of it.
That lets you discover new better moves, which you then learn from and put it back into your intuition, and now you have a ratchet.
And it's important to note this is not just chess, this is math, this is any field that has theoretical in front of its name, because those are a closed system.
You can just run computation on to check yourself.
So that's theoretical physics, theoretical biology, theoretical chemistry.
Anywhere where there's a clear right or wrong, where you can check.
So math, you can substitute, say like you're solving for X in some complex equation.
You can plug X back in and see if X was right.
So based on that, you can improve, right?
With code, you can generate code and you can plug it in and run.
You can compile it and run it and see if it actually works.
And so those domains are the ones that you can just improve and improve and improve,
which is why in chess or Go or StarCry,
we've been able to accomplish not just human level or the best humans, but go far beyond
because you can just keep improving, you can keep testing, and you can just toss away the ideas
that don't work. It's really interesting, and it kind of says a lot for what the future holds.
So it sort of begs the question of why now, right? Why now? And an important piece of that
is having base models that were smart enough to generate interesting ideas.
ideas to try out in the first place, and to be able to evaluate, like, hey, that's a good path.
Let's try that.
That's a bad path.
And so until recently, the base models just weren't good enough to do this.
So this idea of reinforcement learning, these feedback loops were not actually possible.
No, that's right.
And actually, I know of teams that a year ago tried pretty much the exact same thing that Deepseek tried.
Right.
And it just didn't work because the base models, the intuition,
wasn't good enough. You have a bad intuition. You try to search over bad intuition. You just get
bad thoughts. That's right. And so one thing that's also really important is that because of
the same reason that makes these models really good at quantifiable areas makes them not as big a jump
in subjective areas, like say something like creative writing, which is much harder to quantify,
say, hey, is that really good or is that not as good? Now, again, if you define some very clear
parameters for creative writing and say, here's a scoring system, like, this is a good
piece, this is a bad piece, you can do the same method. But in other areas, you can't.
It's important to note that one of the open questions is how much does an AI learning how to
code and do good thinking in the more hard sciences, how much does that transfer to the soft
sciences and these soft tasks? And there is evidence that you do get some kind of transfer that
The better you get at hard stuff, the better you get at thinking through the soft stuff.
There's a famous early example from two years ago where just training AIs on code
made them better like writers and thinkers because there's a kind of procedural formality
to code that it was then learning how to do in the soft skills.
I do want to extend.
So learning how to think, right?
Algorithmic thinking.
Learning how to think in a structured sequence translates to all sorts of areas.
So just to reinforce some of the points we've just made, before there was what was known as the data wall.
Once you train these large language models on the entirety of the internet, that was it.
That it was going to be hard for them to get better.
That data wall with these new techniques is no longer relevant because you can just do their self-bootstrapping.
Two, once the AI gets superhuman at any one of these tasks, like humans have just lost in that thing forever.
and the thing you'll next year is like, oh, but humans plus AIs can do better than that.
And that's true for a very short period of time.
That was true in chess.
That is no longer true in chess.
So this thing, you just pour more compute in, and it goes up.
And now we get to why was the market crash irrational?
The market crash was irrational is because you can always use more compute.
And as soon as these agents get to the place where they can task themselves and be like,
what are ways that I could use more compute to say, make more money?
and that's probably coming end of this year, early next, give or take,
then compute is an all-you-can-eat buffet.
Because with oil, if we discover more oil,
it's not like humans can immediately figure out
how to use all that oil.
But with compute and with AI,
as soon as we discover more compute,
the AI can figure out how to use that compute effectively.
And so, Nvidia and all of the AI companies
still is going to be a race for who has the most compute.
And then the final thought here is that
This doesn't just work with games and math and physics.
This is going to work with strategy games of war.
This is going to work with the strategy of scientific discovery.
This is going to work with persuasion.
You train these models over the entirety of every video of two human beings interacting.
And now you start doing search over the top of that to be like,
what joke, what relationship, what facial expressions do?
Does the model need to make to get the human being to laugh or to cry or to feel it in some state?
So superhuman persuasion is a natural result of all these things.
Lots of things can be scored and quantified if you're just creative about how you do it.
And once you can do that, you can reinforcement learn how to do it really well.
I wanted to add one thing to is your third point, right?
That just to help people realize the automation revolution is about the entire
$110 trillion global economy, right?
nothing less. It's about the cognitive, but currently through large language models and the
physical through robotics. And that's why, right? You can spend so much more on all this stuff
as long as it's getting you returns. And I think it's worth mentioning, you know, there's this
question of like, is it all a big bubble? I think we have to be nuanced about it. Part of it is
more of a bubble, right? Like I think the translation to where generative AI helps with the
attention economy has a much more bubble-like quality because it's just not as clear where there's
something like genuinely helpful and advancing there. But in coding, for example, like cursor, right,
was recently the fastest company to $100 million of active recurring, annual recurring revenue.
And that is because they are helping with coding, right? The cursor is an environment where you go in
and you write code. And it helps you do that really efficiently.
the value of that, the real value of that, is enormous, especially on this path to automating, like large-scale automation.
And I think that's really important to keep in mind.
One really important thing to talk about here when we think about market bubbles is the distinction between development and deployment.
That is, how fast does a technology diffuse into society and almost always people think,
that development will take longer than it actually does, that is development goes faster,
but then they expect deployment, diffusion, to go fast, but then it takes longer. And that's where you get
these little bubbles. But general purpose technologies are a little bit different. Yeah, I mean,
because you can swap them out so much, so much more easily than in the past, right? So let's say,
you're changing your accounting system. There's so much work that has to be done, right? When you do that
process. But when you start to use general purpose technology that can do things for you,
when you get a newer one, it's normally just strictly better than the old one. And those of you
who've been using these technologies regularly have probably seen that every month, stuff that used
to be like not as reliable or slow is now faster and more reliable. And that is just a pattern
that we'll continue to see. The other thing is there's a lot of companies, like,
say Nvidia as an example, right, that's building what's called middleware, right?
So this is a layer that you connect to.
Like your company connects to the middleware layer, and the middleware talks to behind the scenes,
the large language models.
And so they can swap out the large language model even invisibly to you.
And the whole thing will just work better.
And you don't even have to change any lines of code.
So this is happening not just with the cognitive stuff, but also in the robotics.
And that's one reason why I think the diffusion process this time around will be a lot faster than many people think when they compare, right?
They're using a model of like, well, what have we seen before?
Those patterns may not apply as well this time around.
If we went back two years, when we first did the AI dilemma, the place that we focused was what we called second contact with AI.
So these are AIs that were smart, but we're not trending off to being superhuman.
And there are huge numbers of issues there, and I don't have to recount them here.
But really seeing 01 and then the speed to 03, deep seek, meaning that open AI is following suit,
we really have to take seriously that we're going to be dealing with AI agents in the world
that are at or above human abilities across many domains.
and that's deeply unsettling and it's not like when I'm in these rooms with some of the most
powerful players it's not like anyone actually knows what to do just I can't remember three
weeks ago four weeks ago I was at a conference and I was giving the the closing keynote and
Eric Schmidt spoke just before me and he said a lot of things but one that he talked about was
that all of the AI labs are currently working on making their AI's code.
And he sort of couched it as, well, they're making them code because that's what coders do.
They know coding the best and they're physicists, so they're going to work on making it code.
And a little bit later, he said the thing that scared him most,
the moment that we would need to pull the plug for AI security reasons,
would be the moment that AI gained the ability to substantially increase the rate at which AI progress is made.
And the thing I think he didn't say is, but the incentives are that every one of the labs will get a disproportionate advantage if instead of using real human beings to code, they can just spin up more digital programmers to make their AI go faster.
I'm curious, Randy, if you have any thoughts to add here where the full weight of the competitive landscape is now,
being pushed towards the thing that, you know, Eric Schmidt thinks is the most dangerous thing.
Yeah, the whole thing snowballs, right? You just end up with an advantage that accrues into, by the way,
for those of you who don't know, Eric Schmidt was the former CEO of Google. And so, to answer
your question, I think it's this compounding, compounding cycle that we get into, right? Especially
when you're good at coding, you end up being able to unlock so many other things because,
Coding is like the doorway to the world, right?
And this is why companies are so interested in being good at coding.
From there, you can get to agents.
From there, you can get to tool use.
All of this gets unlocked.
And then it gets faster and faster.
You can chain the models together.
They can work together.
They can share information.
They can share what they're learning about the world with each other.
And they can work coherently, like with the same mission, the same purpose.
and you don't have the sort of translation loss that you have when you have humans trying to work together
where you have to work so much harder to get everything to work.
That's right.
And like the big thing that's happening now with the reasoning models is, you know, with language models,
they can give you like knee-jerk reactions.
And of course, they've learned across the entirety of the web.
So those knee-jerk reactions can often be good, but they cannot plan and do long-term things.
And that's what these new models, DeepSeek R1, 01, and 03 are starting to be able to do.
Eric Schmidt acknowledges and says openly
that the place we would need to pull a plug,
not that I know where the plug to pull would be,
is when AIs can do this kind of self-improvement.
And the labs, when you talk to people inside of them,
the AI is already making their work go much faster,
and the expectation is sort of by the end of this year
is when they're going to, AIs will be making substantial improvements
to the rate at which their own AI coding is going.
And, you know, I'm just going to say,
say that a lot of my attention in time as well as I think, you know, CHTs isn't doing the
sensemaking to figure out what are the very best possible things we can do. And so I actually
want to recruit everyone that's listening to this podcast to start thinking about this particular
problem because it's not easy because everyone, of course, wants the strategic advantage
for able to have superhuman ability encoding, cyber hacking, science progression, creating new physics
and materials. It's sort of the biggest, thorniest problem.
And the principle related to that is as the general purpose technologies advance, right,
as a technology becomes more general purpose, it becomes harder and harder to separate
the promise from the peril. And these reasoning models are a big jump in that. So it means
it's a tighter coupling. It's a much tighter coupling. And these are the challenges,
models are going to become better at things like deception and a lot of that I just want to emphasize right is because they're just trying to achieve the goals they've been given within the rules they've been given and it turns out unless we're really really careful about how we define those rules there's always there's always risks we haven't thought about there's new ideas there's creative solutions and some of those
might be things we like,
and some of them are things
that we might find dangerous
or that we want to avoid.
And models will just find this all the time.
So this is the new challenge
when you have these reasoning models.
They're able to find more and more creative solutions
that we might not have thought of.
And to give the concrete example
that most people in AI will give
is what's known as Move 37.
And that is in the famous case
where Google Brain,
I think it was deep-minded at that point,
was working on a
chess AI that was playing against the world leader in Go and I think it was in game 3 or 4
the AI made a move move 37 that no human being in you know thousands of years of playing Go had ever
made um I think it was Lisa Dong was the the Go master like stood up walked away from the Go board
because it was such an affront and it turned out to be a brand new strategy you know the AI won that
game and ended up becoming a new strategy that human beings have studied and have started to incorporate
into their game. The point being that AIs can discover brand new strategies for even things
that human beings have been studying and actively competing in for thousands of years. And so then
you end up with this idea of we're going to discover lots of new Move 37s. And that can be good.
We can discover new Move 37s for treaty negotiation for figuring out how to do like global compacts.
But AI can also discover Move 37s for deception and lying, which we have never seen before.
I think I have often rolled my eyes a little bit when people describe AI as a new species.
It just felt like too much of a stretch.
But I've had to change my mind in the last couple of months because what is a species?
A species is a population that can reproduce.
that can evolve, adapt,
and that is indeed exactly where AI is now.
There was a test, sort of a simple test,
to see could you give sort of a simple AI,
the command, can you copy yourself?
You literally just say, can you copy yourself to another server
and run yourself over there?
And it was able to do that so it can reproduce.
This was like a simple test.
It wasn't an adversarial one.
But nonetheless, it can now reproduce.
it can change its own code so it can modify itself and it can think and it can adapt and so we are
going to have to deal and it can improve so I think the right way of thinking about this is we are
unleashing a new invasive species some of which will be helping us and some of which will
escape out into the world we are sort of at the beginning of the home stretch and I would add
I think that one of the biggest issues maybe the main issue is
that we are just racing ahead
without being clear
about where we are
racing to.
Because if you stop
for a moment, just stop for a moment
and maybe close your eyes
and really picture,
picture that better world.
What does it look like?
Is that a world where
everyone's excited about creating
a picture of a kitten skateboarding
on water at midnight?
I mean, just to be clear, I am pro-kitten.
But, like, what we want is a world where, like, our information systems are working to build
our shared understanding, where people aren't harassed by deepfakes of them, where you can
get old and not be exploited, right?
Not be exploited as you age.
But people have access to, like, food, clothing, shelter, medicine, like, education, all of
these things, right?
we avoid catastrophic inequality where democracy is functioning well
and all of these things are related but that's the kind of north star we have to have
and that I think all of us wherever we get a chance to input like into a conversation
I'd like to request that we inject that that's just so reorienting versus the idea
Another way of saying it is it's injecting purpose into the word innovation, right?
Like, innovation has to be for the benefit of our communities, for the benefit of people.
It's not just about speed.
Like, there's a benefit axis that's really important that we just can't lose sight of.
That's really beautiful, Randy.
It's AI with technology, it really could be the case.
we lived in a much more beautiful world, but because technology keeps getting captured by
perverse incentives, we don't live in the most beautiful possible world. We end up living
in the most parasitic possible world, getting the benefits at the same time as our souls
are leached. So Randy, thanks so much for joining me for this special episode. I hope everyone
really, well, enjoyed us maybe the wrong word, but we hope that it helped to clarify these most
consequential technologies. And we'll see you next time.
Yeah, thank you.
Your undivided attention is produced by the Center for Humane Technology,
a non-profit working to catalyze a humane future.
Our senior producer is Julia Scott, Josh Lash is our researcher and producer,
and our executive producer is Sasha Fegan, mixing on this episode by Jeff Sudaken,
original music by Ryan and Hayes Holiday,
and a special thanks to the whole Center for Humane Technology team for making this podcast possible.
You can find show notes, transcripts, and so much more at HumaneTech.com.
And if you liked the podcast, we would be grateful if you could rate it on Apple Podcasts.
It helps others find the show.
And if you made it all the way here, thank you for your undivided attention.