Front Burner - DeepSeek and China’s AI power move
Episode Date: January 30, 2025A small Chinese tech company called Deepseek has upended the world of AI. Deepseek recently released a large language model that rivals ChatGP called R1 and it shot almost immediately to #1 on t...he app charts. The interesting thing about it is that the company built their model really cheap and that has called into question this narrative that you need an endless supply of chips and data centres and money to develop AI.On today’s show we’re speaking to WIRED’s senior tech writer Zeyi Yang about the deepening AI cold war between the US and China and the lingering questions about where AI is headed and what it’s good for?For transcripts of Front Burner, please visit: https://www.cbc.ca/radio/frontburner/transcripts
Transcript
Discussion (0)
A prime minister resigns, a president returns, a whole world of changes to navigate and understand.
If you're someone trying to sort through what's real and what's relevant from a Canadian perspective,
we're here for you every night of the week.
Your World Tonight is more than just a recap of daily news.
Our award-winning team goes deeper on stories that speak to the moment.
The full picture, context and analysis, all in about 25 minutes.
I'm Tom Harrington.
Find and follow your world tonight from CBC News, wherever you get your podcasts.
This is a CBC Podcast. Hey, I'm Jamie Poisson, and today on the show, we're going to talk about this small
Chinese company called DeepSeek that has really upended the world of AI.
If you haven't been following, DeepSeek recently released this large language model that rivals
chat GPT.
It shot up to number one on the app charts.
The interesting thing about it is that the company built it
really cheap.
And that has called into question this narrative
that you need an endless supply of chips and data centers
and money to develop AI.
After people started noticing the biggest market loss
in history ensued,
chipmaker Nvidia saw its stock plummet 17% on Monday. Microsoft and Google
also took big hits. They've since recovered somewhat. Anyhow, wrapped up in all of this
is an AI cold war between the US and China. And as always, we still have lots of questions
about where AI is headed and what it's actually good for. I'm going to get into all of that
today with Wired senior tech writer, Zui Yang.
Zui, thanks so much for coming on to the show.
Hi, glad to be here.
It's great to have you.
So let's start with DeepSeek.
What can you tell me about this company and where it comes from?
I understand it started as like an offshoot of a Chinese hedge fund.
Yeah, that is exactly correct.
So there has been this Chinese quant hedge fund called High Flyer since I believe it
was founded in 2015.
It was actually one of the best performing quantitative hedge fund in China.
And one thing special about them is that they have been using machine learning to come up
with their trading strategy.
But the founder of this company called Liang Wen when phone, he has a master degree in computer science. And
as I understand it, he's just himself very obsessed with AI.
So in 2023, he decided that, well, we already have a lot of
experience in AI, but we wanted to establish a new entity called
deep sea to really focus on researching larger language
models and all the kind of AI technologies out there. So DeepSea has existed for less than two years so far, and it has released
quite a few models open source out there to the public. And the latest one released, R1,
in January, was the one that really kind of set off the discussions.
Yeah, this is the one that came out just very recently, you said, and just tell me what
it is and what it does and then why it is freaking everybody out.
Sure.
What it did with R1 is that it really focused
on the reasoning test.
And people have been familiar with this
because OpenAI released their old one a little bit earlier
last year.
Is that like the chat GPT?
It is a version of chat GPT, but this version just really focused on
telling you the train of thought that goes on
within the model to come to,
for it to come to the conclusion.
So it's, for example, you tell your math problem,
it's gonna tell you every step it takes
to get to the answer,
which is really appreciated by a lot of people.
And what we're seeing with R1 is that it kind of like
replicated that train of thoughts processing there. And if you ask a question to R1 now,
you will also show you like the step by step calculation or
reasoning that gets to the answer.
Could you give me an example of what something like that might
look like if I asked it a question?
Sure. Yeah, I've been poking with the model right now. For
example, if you ask it, please give me a summary of what's
the most important historical events that happened in the 20th century, right?
It will actually give you this very human reasoning process.
They would tell you that, okay, I should start thinking about the first 10 years, first decade
of 20th century, and then the second decade of 20th century.
And then after going through all of that, you will say, oh, maybe I have missed something
because I focus on things like wars.
What about culture?
What about other entertainment effects?
So it kind of mimics the process that
goes inside a human brain of answering this complete
question.
I think before opening at 01, most of the models
will hide these processes.
Or maybe they don't go through this process at all.
But then us models started to realize that people actually appreciate seeing what's going on
within the models. They started to show that transparently.
And R1 is an example of doing that really well. As I mentioned, I think in the intro, this shot up to number one on the app and people
really started to pay attention.
And then we saw this massive tanking of Nvidia.
You're seeing markets being absolutely eviscerated
on news from a company most people have never even heard of.
Let's talk about DeepSeek because it is mind-blowing
and it is shaking this entire industry to its core.
Google and Microsoft take a hit as well.
And like, why? So explain to me that freakout.
Sure. I think the important background here is that
so far, up to before DeepSec R1 was released, most people had believed that there's one thirding pathway
for AI to become more powerful and capable.
And that is to buy more computer chips,
to train the AI model on more chips for a longer period
of time, and just through this
scaling effect the AI model is just going to be better and better. The good thing about that kind
of belief is that it's a very reliable pathway. It's like if you just follow the instructions
you're going to get a better AI over time. But what happens with DeepSeq is that when DeepSeq
releases R1 model and people started
to realize that, oh, it actually performs really well,
it also released quite a few academic papers
explaining how it reached here.
And that includes explaining how it
trained its model with a surprisingly small budget.
And at that point, people started to realize, actually,
there isn't just this one way
of getting more powerful AI model.
If you focus on, for example,
making the training process more efficient
or some innovative architecture within the model,
then maybe you don't need that many chips
or that long like a training time.
So this kind of questions established belief
in the AI industry.
And I think that's what caused a
really large shock in the market.
Right. And fair for me to say, you know, when we're talking, they did it a lot cheaper,
like we're talking about six million, so they say, versus hundreds of millions of dollars,
right? And these chips that we're talking about, they're produced by Nvidia and then
these enormous data centers owned by Microsoft and Google. So that's why those companies
started to see that sell off, right?
That's true. I do think there are some nuances with that 6 million number. A lot of people
just take it for granted. But yeah, the fact is that people realize that you can train the models for a lot
cheaper.
Okay. So does this ultimately upend this billion dollar industry that has been
created around developing AI?
I wouldn't say completely disrupted because first thing to know that, well,
Deepsea still has to use this Nvidia chips,
like they made it themselves to train their R1 model. So it's know that, well, DeepSea still has to use these NVIDIA chips, like they made it themselves, to train their R1 model.
So it's not like, oh, with some new innovative techniques,
you can just get rid of the NVIDIA chips completely.
That's not true.
And the other thing is that I've started
to see more people reason that what DeepSea has done,
it's really to propose a more efficient way
to use whatever chips you have.
And at that point, if you have a thousand chips, you're still going to have a model that's better
than someone with a hundred chips. So it's not to say that the scaling effect doesn't exist anymore.
It's more like it's not that simple and we can put in more resources into other things to make,
to change the efficiency of training but still like the number of
chips you have or the resource the money you have is still a very important
factor in making a good model.
Hey guys we're gonna be back to the show in a second, but if I could just ask you a
favor if you could hit that follow button.
If you could give us a follow.
I know I've been asking you that a lot lately, but it's super helpful for us and hopefully
it helps you too.
Okay, back in a second.
I'm Natalia Melman Petruzzella and from the BBC, this is Extreme Peak Danger.
The most beautiful mountain in the world.
If you die on the mountain, you stay on the mountain.
This is the story of what happened when 11 climbers died on one of the world's
deadliest mountains, K2, and of the risks it will take to feel truly alive.
If I tell all the details, you won't believe it anymore.
Extreme peak danger.
Listen wherever you get your podcasts.
DeepSeek released this latest model as open source, right?
Just explain to me what that is and why it was significant.
Yeah, so I think there has long been this kind of
divergence within the AI industry as to closed source and open source.
With closed source, we're mostly looking at companies like OpenAI, Microsoft and Google,
because they invested a lot of resources and talents into making their own models.
They're not going to open the model up to everyone to use it.
They're charging people a fee to use their most advanced models.
On the other hand, we're seeing companies like Meta or
quite a few Chinese companies including Deepsea or
the Chinese tech giant like Alibaba,
who have chosen this more open source route.
When they come up with a really powerful model,
they will decide to put that model online for anyone to download and test it
and run it by themselves.
In this way, they're giving up a lot of control.
Like they cannot charge you anymore, right?
For using that model.
Yeah, why would they do that?
The reason why they want to do that is that,
first of all, they started a little bit late.
Like openly I was already the leader.
If they want to catch up,
they need to find out some other ways.
And providing your models for free is a really good way to attract more attention from the industry,
and also to get more users to just try to use it,
to test it, and to maybe collaborate with you.
Those kind of advice can really help the company itself catch up faster.
That is one way for a lot of companies to decide that,
okay, we're not going to catch up with opening apps through a traditional commercial way.
So we're going to try to find out another way to have more people just willingly collaborate
with us and help us grow.
Are there concerns that people could use that, you know, for evil, like, right?
Are there security concerns with that?
Yeah, I think the way that these models have been shared under certain open source license
is that if you are someone who downloaded the model and did it for malicious purposes,
you are the one responsible for it. The companies themselves, they're giving it out there,
but they're also making sure that they're not the one who will be legally responsible for that.
And I think there has been kind of been an established open source community rules,
so people won't really go after the company for releasing one,
and for other people to do something wrong.
In that way, the companies are not completely responsible for it.
I do think there's another thing to notice here,
which is that there's a separate set of law in China
to regulate AI companies,
and they do put this AI companies
under very strict regulations.
However, they also sort of carve out some space there.
They're like, if you're releasing a model
for scientific research,
or you're releasing it not to a general public,
but only to people who are very savvy enough in technology,
then you're not going to subject to us treating it scrutiny
or legal responsibility in there.
I guess the question I have is,
could it be bad for humanity?
Well, that's a good question.
Is it?
I feel like...
I don't know.
Yeah.
I think that is actually a very hard question to answer
because a lot of times with this open source model,
what's happening is that there are already a lot of models out there.
Even if this company doesn't release another one,
if you are someone with malicious intent,
you can probably find another one for that.
So I guess the marginal responsibility there is, it's ignorable.
I don't know.
But that's what I'm thinking right now.
Yeah.
I also wanted to ask you because you were talking about the Chinese government and the
guardrails that they put on these companies.
I've also heard people concerned about this model in particular.
For example, I saw that if you asked about Tiananmen Square,
just won't reply to you.
Or I think somebody tried to ask it about the Uyghurs.
And the response was that China is a multi-ethnic society
where its entire population
has equal rights, right?
Obviously that's not true.
So just, are there legitimate concerns about that?
Definitely, I think this applies to every single
Chinese model out there, which is that they are subject
to a set of different counter moderation rules,
they're very specific to the Chinese context.
And a lot of times we're talking about these models
cannot talk about political sensitive situations.
It cannot talk about kind of like territorial disputes
in China or the political movement, protest movements,
all of things like that.
So personally, I was not surprised at all
that DeepSeq also has some kind of censorship mechanisms when user tries to
prompt it with this kind of questions. I do think there's a
difference here that because DeepSeq is open source, there's
a lot more possibility to tweak with the model and try to get it
to sensor less. So I think one thing for the open source AI
community figure out is that there is this model audio and we know it has this drawback of censoring political information.
Can we actually modify it so it doesn't do that anymore?
I think it's a tall tax, but it's still it's worth trying. I just want to situate this conversation in this broader context of this larger global
tech war, right?
Largely between the United States and China.
Trump seemed, I think it's fair to say, a bit
blindsided by the rise of DeepSeek. The release of DeepSeek AI from a Chinese
company should be a wake-up call for our industries that we need to be laser
focused on competing to win. You know, just for our listeners, could you just
briefly talk to me a bit about this so-called, I'd
try to call it like a cold war between the US and China when it comes to developing AI
and what we've seen in recent years?
Yeah, I think this like cold war interaction has really defined the US-China relations
in the past, I would say two to three years.
We're talking about how those companies feel like
they have to come up ahead of the other one
in terms of the key technologies.
And AI is definitely the one that most,
both countries are paying their most attention to right now.
And specifically, we're talking about the US government
and US companies really want to make sure
that the American AI models are performing much better
than the Chinese ones.
And so far, like on Tilted Seek,
I think most people agree that that's true.
Like Chinese models are just not as good
as the Western ones.
And usually when a Western model first introduce a feature,
you will take a few months or even a year
for its Chinese counterpart to come up with a similar one. And I think that kind of status quo has made
a lot of people like comfortable feeling that while the US still leading in this field.
What's happening with DeepSec is that DeepSec R1 really is rivalsAI's R1 in a lot of the key reasoning and math benchmarks.
And I think a lot of people are just kind of scared and surprised by that.
And that kind of just changes the, like, their calculation of like, how long does it take
China to catch up?
Right.
I've seen people make the argument that the breakthrough that DeepSeq made was actually
like the unintended consequence of the trade war.
That essentially DeepSeq was forced to innovate because the US had banned the export of high-end
processing chips, right?
So they couldn't get them, so they had to kind of work around it.
What do you make of that?
I think that's a reasonable explanation of what happened.
Because I think this really happened under the Biden administration,
where they started to put this like cheat expert control against China.
So we're talking about they're restricting the most advanced GPUs from being exported
to China and help the Chinese companies grow their AI models.
And DeepSeq's innovation is really inspired by that.
Because DeepSeq as a company knows
that I'm not going to get an infinite supply of GPUs
like OpenAI has.
So what I really need to do is focus on,
can I circumvent the obstacle by coming up
with software-focused innovations or coming up with some more scientific
research that helps my model grow. So it's kind of like an innovation born out of necessity.
That's always been the criticism to the expert control in the recent years because people are
saying that, yes, you are cutting off a very important pathway to Chinese advance on AI.
But there are other pathways out there, and you're just encouraging them to figure out
these other pathways. And I do think DeepSea's success in some way
proved that it's possible to find those alternative pathways.
But we're still waiting to figure out how effective these alternative pathways. But we're still, we're still waiting to figure out like how effective
these alternative pathways are.
I also wanted to ask you about Stargate, right? This is this initiative that Trump recently
announced alongside all of these tech CEOs, including OpenAI's Sam Altman.
Together these world-leading technology giants are announcing the formation of Stargate. So put
that name down in your books because I think you're gonna hear a lot about it in the future. A new
American company that will invest $500 billion at least in AI infrastructure in the United States and
very quickly moving very rapidly. The idea being that they want to ramp it up.
And do you think that project Stargate will go ahead as planned? Is it going to change now?
Well, I think even before DeepSeq became the talk of town, there will have been a lot of suspicions to Stargate, right?
And I think it goes back to this idea
that Stargate is the example,
it's the culmination of this belief
that the more chips you have,
the more data centers you have,
the more powerful your AI will be.
And up until DeepSeq,
there hasn't really been a challenging narrative of that.
So that's why I think a lot of people,
there is a lot of genuine support for a project like Stargate.
They're really concentrates the resources,
the money and the talents towards
getting more chips and building more powerful AI.
At the same time, I think there are a lot of
criticism to that too because when you are building data centers,
when you are piling up this compute resources,
you are also kind of, it also has the environmental effects.
It also like endangers the other research
that could use those resources, right?
So I think what DeepSec has provided at this moment
is an evidence for these other arguments
that, well, yes, like maybe we can can achieve AI supremacy through your Stargate way.
But you're asking us to forget all of these counter-effects
out there.
But maybe there's another way.
Maybe if we follow the more model efficiency path,
we don't need to spend so much money and so much, I guess,
resources on a Stargate project like this.
And I think that will prompt more kind of
a counterarguments out there.
Which is what prompts me to believe that
it will be really hard for Open9 to carry out Stargate
as it wants to be, but still,
they have a lot of political support,
so maybe they can put it off.
Okay, and just since I have you here, I wonder if I could end this conversation
by asking you like a bigger picture question, because it looks like,
it seems to me, at least that this, a Chinese AI company has done something really interesting
in terms of process and cost.
But there are these other big questions lingering about whether AI can surpass its current use case,
right? Or current utility. In other words, can it get better? Or I guess worse for people who think
that we're headed towards a dystopia. And does this get us closer to that answer?
So the founder of DeepSec, Liang Wenfeng. He's basically the same agent as Sam Altman.
And it also strikes me that he has probably very similar
beliefs with Sam Altman.
He's a firm believer of AGI, which
means that AI at some point will just completely
surpass human in terms of their reasoning,
their all kinds of abilities.
And the reason why he has dedicated so much resources and built up a team to build DeepSeq
is that he believes that AGI is achievable.
So in some way, the success of DeepSeq
proves that there's still this very strong belief
in the AI industry that we should build
the most powerful AI ever,
and it will solve a lot of the problems in this society. Even though Deep Sea
provides a more efficient way to get there, I don't think it kind of changes the direction that it's
heading to. I feel like if we want more kind of resistance to the narrative or if we want more
kind of other alternatives out there, it might not come from DeepSeek. Okay. Zui, this was really interesting. Thank you so
much for this. Of course, glad to help. All right, so before we go today, Trump's Commerce Secretary
pick was asked during a Senate committee hearing on Wednesday if he would maintain US AI leadership.
Howard Lutnick responded by saying that DeepSeek had stolen US technology to create a dirt cheap
model and said that he would impose new restrictions on China. They stole things, they broke in,
they've taken our IP, Lutnick said of China, adding, I'm going to be rigorous in our pursuit
of restrictions and enforcing those
restrictions to keep us in the lead because we must stay in the lead."
All right. That is all for today. I'm Jamie Ploesson. Thanks so much for listening.
Talk to you tomorrow. For more CBC podcasts, go to cbc.ca slash podcasts.