Today, Explained - DeepSeek deepdive
Episode Date: January 30, 2025The emergence of DeepSeek — a Chinese AI model that was developed for a fraction of the cost of leading Western ones, but seems to perform on par with them — caused chaos in the markets and electr...ified the tech industry. This episode was produced by Miles Bryan and Victoria Chamberlin with help from Amanda Lewellyn, edited by Amina Al-Sadi, fact-checked by Laura Bullard, engineered by Andrea Kristinsdottir and Rob Byers, and hosted by Noel King. Transcript at vox.com/today-explained-podcast Support Today, Explained by becoming a Vox Member today: http://www.vox.com/members The DeepSeek logo on a phone in front of a flag of China. Photo illustration by Anthony Kwan/Getty Images. Learn more about your ad choices. Visit podcastchoices.com/adchoices
Transcript
Discussion (0)
Everyone's talking about DeepSeek.
Satya Nadella at Microsoft.
I think DeepSeek has had some real innovations.
Mark Zuckerberg at Meta.
There's a number of novel things that they did that I think we're still digesting.
And there are a number of things that they have advances that we will hope to implement in our systems.
The president of America?
The release of DeepSeek AI from a Chinese company
should be a wake-up call for our industries
that we need to be laser-focused on competing to win.
It's a chatbot. It's a white paper.
It could help write code for a Tetris game.
Tetris. It could solve a math theorem. It's a white paper. It could help write code for a Tetris game. Tetris.
It could solve a math theorem. It's chips, you guys.
On Today Explained, the week the world wilded out over DeepSeek.
And also, what is DeepSeek? Coming up.
Hey Spotify, this is Javi.
My biggest passion is music.
And it's not just sounds and instruments.
It's more than that to me.
It's a world full of harmonies with chillers.
From streaming to shopping, it's on Prime.
It's Donald Trump's first week in office again, and it sure feels like it.
There is a great deal of, as Heather said, a great deal of anger about oligarchs, about
rich people controlling everything.
I'm Preet Bharara.
And this week I'm joined by Kara Swisher, Estet Herndon, and Heather Cox Richardson
for a special Inauguration Week episode of Stay Tuned with Preet.
The episode is out now.
Search and follow Stay Tuned with Preet wherever you get your podcasts.
You're listening to Today Explained.
Eleanor Olcott is the Financial Times China Technology correspondent.
We reached her in Beijing, where she has been following DeepSeek from its very beginnings.
I first heard about this mysterious AI company in early 2023 because one of my contacts said that this hedge fund
silently built one of the largest clusters of Nvidia GPUs in China.
So Nvidia GPUs are graphic processing units. These are basically the AI chips
that you need to power AI model training and inference. They're really, really important for this whole AI race.
And they're in short supply in China.
So somehow this quant fund, that's a hedge fund in China,
had kind of silently built one of the biggest clusters
in the country.
We took notice and they started publishing
more and more advanced models over the past year.
And their work finally pierced through
the Western consciousness when we were all
on Christmas holiday right at the end of 2024
with this new model release, V3.
There is a new model that has all of the Valley buzzing,
and it does not come from OpenAI or Meta or Google or any of those names.
DeepSeek V3 is the first open source model in all of AI history that is better than the closed source models.
DeepSeek version 3 is free and it's absolutely insane.
Later in January it then published another model
which again shocked the world with its sophistication.
And the key thing here is that they reasoned
that they've prompted somewhat of a existential crisis,
especially amongst the US players,
is that they claim to have been doing this
on such a bootstrap budget.
All right, when I learned about DeepSeek, it was because the stock market had
absolutely collapsed amid the news that this Chinese company had made this thing.
What exactly went on earlier this week?
I mean, the stock market is an incredibly mysterious beast, right?
I mean, we at the FT have been writing about how DeepSeq
and other Chinese companies are building really competitive models for months now. But I think
what happened over the past week was we saw all of this frenzied activity on Twitter.
It's not that people want DeepSeq to win. It's that they want OpenAI to lose.
DeepSeq R1 is one of the most amazing and impressive breakthroughs I've ever seen.
DeepSeek this, DeepSeek that.
A profound gift to the world.
How about you seek a deep connection with a woman?
Personally, I'm staying away from DeepSeek.
I don't want the Chinese spying on me and seeing what kind of videos I'm watching on
TikTok.
Wait.
Wait.
Wait.
What came out on Monday was a moment, right? It was really,
really important because DeepSeek, this kind of little known Chinese lab, for the first time,
released a paper with a very, very detailed explanation, a kind of technical recipe, as it were for building a reasoning model. Now, reasoning models are important.
It's a fairly new area of AI,
but it basically means models that can teach
themselves and improve themselves without human supervision.
This is really important because if we can use this in practical applications,
it means that AI will be capable of critical thinking and
will be useful in tasks that are vastly more complex than what we currently have on the
market.
The dream, right, is to have an AI, for example, running in the background of your computer
and kind of preempting your needs, like booking travel, doing things that you
haven't even thought of maybe. It's kind of acting as your actual personal
assistant. They don't just respond to demands, they preempt things.
Hello Noelle, what can I get for you today?
They make decisions on their own. They might, for example, I mean, figure out
that you have not got
enough groceries in your fridge and think, okay, well, we'll preemptively order
that so you don't even have to do it yourself, right?
I ordered extra Cheetos this week. You deserve them.
It's still very much an open question as to whether or not we're going to get
there. It's important to note as well that this is just like a big marketing strategy
on the part of a lot of AI companies also to justify continuing to raise billions of
dollars. But what I think DeepSeek proved over the past week is actually China is a
viable and competitive player in this field.
So let's talk about where DeepSeek comes from.
Who's behind this?
So unlike other AI companies, AI startups in China, it hasn't raised any external financing.
So you think, okay, how the hell has a company managed to build what we know is a very expensive
endeavor of buying all of these GPUs and also hiring the best
talent.
They're known along by dance for paying the top dollar for the best AI researchers in
China.
And that's basically a story about the founder Liang Wenfang, who has a background as a quant
hedge fund manager.
So he basically made a whole bunch of money trading stocks
and decided to plow some of those resources
into this new pet project.
And he started in 2021 building this large Nvidia cluster
because he recognized the potential for this technology.
And the timing of that is important for two reasons.
The first is that it was really before the world woke up
to the potential of generative AI.
It was before the release of ChatGBT and the rest of the Chinese players had kind of
neglected generative AI as a field research. They were much more focused on surveillance
technology, surveillance AI, because it was clear you could make money with that form of AI.
The other reason it's significant is because it was really before the first tranche of
kind of blanket export controls were put in place on China.
The restrictions will limit Chinese companies access to advanced computer chips and slow
their progress in artificial intelligence.
U.S. chipmakers, NVIDIA and AMD, tumbling after the US ramped up its chip export rules.
Washington says the aim is to prevent Beijing using the most advanced semiconductors for
its military modernization.
So really when the race in China in early 2023 started to replicate or seeking to replicate OpenAI success.
Actually, Liang and DeepSeek were in a pretty good position to get ahead.
Okay.
So Brilliant Man made a bunch of money and now presumably will make a trillion more dollars.
Is that the objective here?
That isn't the objective here.
And actually, that's what makes DeepSeek so unique, right?
They have not made any serious moves to commercialize their technology.
They have an AI chatbot.
It's free to use.
What I think he's doing here and from people who know him,
is he wants to just, you know, add to the great canon of LLM research.
He wants to push this technology forward.
And actually also there is a bit of a national pride here element as well, right?
In interviews with domestic press, he says it's important that China also plays a role
in developing this technology and being a leader.
So I think there's various ambitions at play, but he's a pure technologist.
And actually, because DeepSeek is not interested
in commercializing their technology, right,
it's like a pure research lab.
People have described it to me being like the early days of DeepMind,
where you just have a bunch of engineers,
a bunch of researchers working on whatever they think
is the best technical pathway forward. But because they don't care about commercialization, We just have a bunch of engineers, a bunch of researchers working on whatever they think
is the best technical pathway forward.
But because they don't care about commercialization, that means that they're willing to share the
secrets of how they've done that with the rest of the world and kind of enable the others
to also learn from their learnings.
And for players like OpenAI who are also working on the same research, but not telling the
world how they got there, this is really a bit of a challenge. players like OpenAI who are also working on the same research, but not telling the world
how they got there. This is really a bit of a challenge.
Earlier this week, as the stock market was whipsawing about, we heard people asking whether
or not this is AI's Sputnik moment. They're referring to the Soviet Union launching a
satellite into space before the United States back in the 50s, kicked off the space race, a very big deal.
And one of those terms that you don't hear very often because a Sputnik moment is a big
moment.
Do you think that this development just kicked off the AI race in a way?
As a journalist, I'm all for fancy metaphors and comparisons. I think the comparison is not completely correct in this case, right?
DeepSeek is a private company that has just been plugging away on AI research.
It's not building rockets to send to space.
But having said that, you know, US and China undeniably are in a tech war.
We've known this since 2019.
And China is very, very concerned about the US getting ahead on AI.
And it's been providing a huge amount of support
to kind of select players that they think are going to help.
It's remain competitive and gain an edge.
But really, the Sputnik element, it's really about and gain an edge. But really the Sputnik element,
it's really about the hardware itself, the AI chips.
I think the real race here is on
Chinese companies and the Chinese ecosystem overall,
trying to make Huawei or maybe one of
the other Chinese competitors a true long-term and successful rival to Nvidia.
Eleanor Olcott of the Financial Times in Beijing.
Coming up next, can DeepSeek's competitors, looking at you, OpenAI, compete? compete.
Support for the show today comes from Vanta.
Trust isn't just earned, Vanta says.
It's demanded.
Have you demanded someone's trust lately?
Whether you're a startup founder navigating your first audit
or a seasoned security professional
scaling your governance, risk and compliance program,
proving your commitment to security is critical and complex.
And that's where Vanta comes in.
You know the deal.
Vanta says they can help businesses establish trust
by automating compliance needs across
35 frameworks like SOC 2 and ISO 27001.
They say they can also centralize security workflows, complete questionnaires up to 5
times faster, and proactively manage vendor risks.
You can join over 9,000 global companies
like Atlassian, like Cora, and Factory
who use Vanta to manage risk,
improve security in real time for a limited time.
Our audience can get $1,000 off Vanta
at vanta.com slash explained.
That's V-A-N-T-A dot com slash explained
for $1,000 off.
You're listening to Today Explained. Noelle King here with Reid Alberghati, technology editor at Semaphore.
Reid, the reaction on Monday to DeepSeek was huge.
The markets are swinging around.
People are yelling about Sputnik.
It's on everyone's homepage. What did you think about all that?
Well, I was really kind of slapping my forehead because I think it was a complete overreaction.
People knew that this company existed. And in fact, this whole idea of distilling these
larger models into smaller, more powerful ones that are more efficient, this is something
that had been going on really since Chachi PT came out.
The biggest takeaway for me is that the market really does not
understand the AI industry yet.
What's been going on with all of DeepSeek's Western competitors?
What have they been up to?
They're all investing massively in these huge data centers.
It's my honor to welcome three of the world's leading technology CEOs.
With hundreds of thousands of graphics processors, tens of billions of dollars.
In fact, you probably heard last week there was a deal announced for 500 billion dollars.
Stargate. So put that name down in your books.
With OpenAI and Oracle and MGX and SoftBank,
I mean, massive amounts of money.
I think this will be the most important project
of this era for AGI to get built here,
to create hundreds of thousands of jobs,
to create a new industry centered here.
And what that investment is for is running these models
because there is so much demand.
These companies really can't meet it right now.
And what we're also finding is that inference,
which is just the fancy term for running these models,
actually can now increase capability of the models a lot.
That wasn't the case before when ChatGPT first came out.
It was just you prompt ChatGPT, it comes back with an answer. Now you prompt
the most advanced model of these models and they are doing a whole bunch of stuff in the
background. They're running over and over and over again. They're trying to find the
best answer and that is exponentially more expensive and that is just going to continue
and this new R1 model that DeepSeek came out with,
it's an advance, but it is not nearly a big enough
breakthrough to sort of negate those market dynamics.
Can you explain why?
Because yeah, I saw that as well.
DeepSeek did it on the cheap.
All of this money and energy and investment
is for naught because they came up with the little AI that could
and it didn't even cost that much.
Yeah, I mean, they showed that you can do
some of these types of queries at a lower cost,
but it's just not nearly low enough.
You might've seen Microsoft CEO Satya Nadella
talking about Javan's paradox.
Javan's paradox strikes again.
You know, basically that's this, as the technology becomes more efficient and the cost declines,
the paradox is that you would think, well, okay, that just means that it just gets cheaper
and these companies are just not going to make as much money on it.
But actually what happens is it becomes more useful and people want to use it more.
And then there was another wrinkle that appeared a few hours before we're speaking on Wednesday.
There is a suggestion that DeepSeq may have borrowed from OpenAI or stolen from OpenAI.
What is the allegation? What are people seeing and saying?
You know, it's stolen. I mean, that's a very strong word. We saw David Zaks, who's the incoming AI czar,
sort of accused DeepSeek of stealing from OpenAI.
And there's substantial evidence that what DeepSeek did here
is they distilled the knowledge out of OpenAI's models.
And I don't think OpenAI is very happy about this.
So you need data to train these AI models. And I don't think OpenAI is very happy about this. So, you need data to train these AI models. But what you can actually do is you can use the models
themselves to create a very, very specialized kind of data. It's really synthetic data because it's
being generated by an AI model. But you can create exactly the kind of data that you want,
and you can check that over with other AI models.
And what you end up with is,
it's how you make these models much more efficient.
So this is also not surprising,
because that's how all of these models work.
I mean, we've seen lots of companies do this.
So I just, again, the process of distillation
makes total sense, whether or not it's stealing. I think just, again, the process of distillation makes total sense.
Whether or not it's stealing, I think that's something that's a gray area in the AI industry
that we really haven't ironed out yet.
I think it's such a new thing that we'll have to sort of come up with the norms and the
rules and regulations, maybe even copyright law around this.
I want to ask you about Nvidia, the company that makes the chips that AI requires.
Nvidia is now basically a household name.
It takes up a big part of the stock market.
And so when Nvidia stock is riding high, so is my 401k.
And when it's tanking, good Lord, I'm going to retire under a bridge.
And on Monday, the bridge was looking like a real possibility.
What exactly happened on Monday with Nvidia
and why did it seem to get hit so hard by this news?
Yeah, well, Nvidia makes these graphics processing units
that are the most powerful, the most advanced in the world.
And they're very expensive.
I mean, the older models, the H100s,
which were state of the art,
they cost about $40,000 a piece.
And these data centers have about 100,000 of those.
So, you know, you do the math there.
NVIDIA is selling a ton of these chips.
Really, they can't sell enough.
There's way more demand than they can even produce.
And it's all because these models take a lot of energy to run.
And so if you can have a more efficient one
that doesn't require these powerful GPUs,
then maybe you don't need to spend $40,000 on a GPU now with OpenAI.
But again, that's not really what happened here.
What happened here is there's a bit of an advance in how efficient these models can
get, but in order to get the most use out of them, you need to run a lot of inference
on those.
You're still going to need really powerful GPUs.
And as there are more advances
in the pre-training part of the models,
they'll get even bigger and more powerful.
So, NVIDIA is not going anywhere.
I mean, certainly they have competition.
There are chip makers that want to build
more efficient inference chips.
There are people who want to get rid of Nvidia's advantage, which is its CUDA software that
the whole AI industry basically runs on right now.
It creates a big moat for them.
Those are the things that I think are the risks for Nvidia, not a company building a
more efficient open source AI model.
I think there was a reason that so many watchers and analysts and reporters framed this as China
catches up to the US and that is because China and the United States are in a quiet war,
cold war, existential struggle.
We compete with the Chinese and it raises some questions, does it not, about how worried the US should be that China
beats us in the artificial intelligence competition?
Yeah.
And this is where, you know, I think there really are national security concerns about
China and AI.
And you know, if China does win the AI war, the AI race, let's say,
it will probably give them a military advantage.
I mean, this is all, this is far into the future.
There's a lot of debate about this, right?
But I mean, I think the conventional wisdom is
if you win the AI race and you get your first to AGI
or super intelligence or whatever you want to call it,
it becomes a military tool
very quickly.
And I think the US, that's the whole reason the US has put so much energy into figuring
out how to curb the exports of the most powerful AI chips to China.
They don't want to see China be able to sort of control its own destiny when it comes to
AI.
After the events of this week, Reid,
is there a sense of renewed competitive energy?
Everybody needs to now go back and work harder, faster, smarter for less money.
Yeah. I think Sam Altman said that it was invigorating on X.
We will obviously deliver much better models.
And also, it's legit invigorating
to have a new competitor.
This is how research works.
Somebody comes out with a new idea,
and it inspires other people both creatively and also
competitively.
This is a dynamic that we've seen for the past couple of years in AI, or even really
more than that.
This is why so many of these tech companies have been publishing their research instead
of keeping it a trade secret for so long, because the genius researchers who write these
papers, they want to present them.
They want bragging rights at the NRIPS conference. That's the big
AI model conference every year. They want to get pats on their back from their co-workers.
And I think there's actually probably a lot of mutual respect between the AI researchers at OpenAI
and Anthropic and the ones at DeepSeek. They think in that world, I think it's possible to kind of
put aside all the geopolitics and
just say, hey, nice job, you've created a really interesting model and we're going to
learn from it and try to do better.
I think the other way to look at it is, look, if the US doesn't win the race to AGI, then
what you could see is a Chinese military advantage that leads to something like an invasion of
Taiwan and maybe potentially a hot war between these two superpowers.
And that would be very, very bad. I think people who, you know, the most fervent China hawks, what they really want is a US
military advantage that is so big that, you know, there just will be no war.
And I think if you look at it through that lens, then yes, I mean, this AI race is very,
very consequential geopolitically and really there are dire consequences if the right outcome isn't achieved.
Semaphore's Reed Alberghati, thanks to him.
Miles Bryan and Victoria Chamberlain produced today's show
with an assist from Amanda Llewellyn.
Amina El-Saddi is our editor.
Andrea Christin's daughter and Rob Byers engineered.
Laura Bullard checks the facts. I'm Noelle King.
Team Sink.
It's Today Explained. you