Hard Fork - Your Guide to the DeepSeek Freakout: an Emergency Pod
Episode Date: January 28, 2025A Chinese firm called DeepSeek managed to upend global markets at the start of the week, to drag down chipmaker Nvidia — and to surge to No. 1 in the iPhone app store, all in basically no time at al...l.Today, in this bonus episode, we talk through the news behind the freakout. How is a new A.I. model from one Chinese A.I. company making such a big splash? And what does it mean for the U.S. artificial intelligence industry? We want to hear from you. Email us at hardfork@nytimes.com. Find “Hard Fork” on YouTube and TikTok.. Unlock full access to New York Times podcasts and explore everything from politics to pop culture. Subscribe today at nytimes.com/podcasts or on Apple Podcasts and Spotify.
Transcript
Discussion (0)
Well, Casey, the last time we recorded an emergency podcast, you were at Gate E8 of
the San Francisco airport and we were talking about OpenAI and how Sam Altman had just been
fired.
Are you at the airport today?
And if not, would you like me to mail you an Auntie Anne's pretzel so you feel more comfortable?
Yeah, that'd be, no, all things being equal, Kevin, it's actually much more comfortable
to record here in my home studio and not have to compete
with the PA system announcing flights to Houston. Casey, we are here today to talk about a little company called DeepSeek, which probably most
people had not heard of, but that is causing a major series of events in the U.S. stock
market and around the U. the US tech industry this week.
That's right. By now our listeners have probably seen that the stock market dipped on Monday and
that some companies whose fortunes are closely tied to AI dipped quite dramatically, but they
also might have just noticed it in the App Store where DeepSeek has hit number one this week, which is a rarity for a Chinese consumer app to do in the United States.
So, yes, suddenly everywhere you look,
there are signs of this DeepSeek affecting the world.
And, you know, we should say to maybe talk directly to the things
some listeners may be thinking about, like,
why we are interrupting our normal production schedule
to do a special emergency episode about DeepSeek.
Like it is not unusual for people in the AI world
to start freaking out about some new development
or breakthrough or some new model that was released.
But I believe that this is the real deal.
I think this is a big moment
in the history of AI development.
And it is really taking a toll on stock markets in ways that I think are really interesting.
I mean, you said dip, but Nvidia stock,
one of the highest performing stocks on the market
over the past few years, and certainly the one
that is most closely correlated with people's feelings
about AI is down about 18% today.
That represents hundreds of billions of dollars
wiped off the market cap of just one company
by this announcement from DeepSeek.
So I think this is a broader story
than just the stock market.
I think it has tons of implications
for other companies developing in AI
and also for concerns that a lot of people
working on AI safety have about how this technology could get out of hand.
Yeah, I'm excited to get into it too, but I will signal that I think that there are
also some reasons not to freak out.
And so I'm going to be trying to bring some of those to the discussion.
But today, Kevin, I think we just really want to do three things.
One, we want to tell you what deep seek is.
Two, we want to give you some insight into why people think
this is such a big deal, and then three,
I think we want to debate a little bit back and forth
just how big a deal this really is.
So yeah, let's get into it.
All right, so let's start with what DeepSeq is, Kevin.
We have mentioned it on the show before,
but tell us a little bit about this new model
and why it has taken the world by storm.
Well let's talk first about DeepSeek itself.
You may remember if you've listened to our episode a couple weeks ago on this, that DeepSeek
is a Chinese AI company.
It is about a year old and it spun out of a hedge fund called High Flyer.
And it was something that I think outside of China,
most people were not paying attention to until late last year
when they released something called V3.
That was an AI model that was, they said, competitive
with some of the leading AI models created by American AI companies.
And it really caught people's attention, not just because it came out of this little-known
Chinese AI startup, but because of what DeepSeek said about how it was trained and how much
it cost to train.
Yeah.
So tell us about what was so interesting about how they did it and what it cost.
Yeah.
So the first interesting thing about DeepSeek
that caught people's attention was that they had managed
to make a good AI model at all from China.
Because for several years now, the availability
of the best and most powerful AI chips has been limited
in China by Chinese export controls.
You are not allowed if you are Nvidia
or another American company,
to export your most powerful AI chips to China. So DeepSeek came out with this paper and they said,
well, we actually didn't use your fancy AI chips. We used a kind of second rate AI chip that was
artificially limited in order to be able to export them to China. We have a bunch of those, and using just those kind of lesser AI chips, we were able to get
a model to perform as well as you American tech companies with all your fancy H100s.
And then the second thing that really caught people's attention was about the cost.
Deepsea claimed that they had spent just $5.5 million training V3,
and I think there's some reasons to take that number with a grain of salt.
But just in terms of the raw cost of the training run for that model,
$5.5 million is peanuts relative to what American AI companies spend
training their leading models, you know,
something on the order of a hundred times cheaper than what something like an
open AI model of equivalent performance would cost to train. Right, and this comes
against a backdrop of all the US tech giants saying we're gonna spend tens of
billions of dollars this year to increase our capacity and data centers
and the amount of compute power that we'll have.
So this really stood in stark contrast to that.
So that tells us a little bit about what V3 is.
Sort of the training and the costs were maybe more interesting than the model itself, which
is just kind of like a chatbot like a lot of us have already used.
But V3 came out around Christmas, Kevin.
So why is the market reacting so strongly now?
So a couple things happened in the past week or so that have led to the freakout that we're seeing now
The first is that last week deep-seek released another model R1 which was its attempt at a so-called
reasoning model basically V3 the last model was kind of similar to things like
Claude or Gemini. It was sort of a basic language model. But R1 was more like OpenAI's
O1 and O3, which are its newest reasoning models. So that happened early last week. And then a
little later, a couple days later, DeepSeq did something else, which was that it released an app where
anyone could download DeepSeek and go use its model in a very easy way.
And this is when people really started to go from being interested and fascinated by
DeepSeek to really panicking about it.
Because all of a sudden, millions of Americans were downloading this app, using DeepSeek's models, and realizing, oh wait, this is like as good or better
than ChatGPT, it's free, it doesn't have any ads, it seems to be quite smart.
And it does something that OpenAI's models don't do, which is it shows you the
internal thought process that it is going through as it is producing these answers.
That is something that OpenAI's models do not show the user,
but DeepSeek's models do,
and I think people found that really compelling.
Yes, and that last point that you mentioned,
I think is really important because I suspect actually
all the AI companies are going to copy this now,
because the process of using a chatbot today
is you ask a question, I've likened it before
to like throwing a penny in a fountain, right?
You're just sort of making a wish,
you see what you get back.
What's really interesting about the DeepSeq thing
is that as it's answering your question,
you're seeing how the computer understood your query.
And so if you wanna ask a follow-up question,
you now have a much better sense
of how the computer understood you.
And this actually does seem to be
a sort of
conceptual breakthrough in product design, you know, just as much as the like underlying science.
Alright, so that gives us a sense of what DeepSeq is and what its latest models are. Let's talk about
why everyone is freaking out and maybe more specifically take a look at who is freaking out.
So as we mentioned at the top, one of the big ways people are noticing this
is through the decline in the stock market.
Kevin, give us a sense of the industry reaction
to what the DeepSeek models might mean.
Yeah, so I would say the people who are freaking out the most
are investors in the biggest American AI companies,
as evidenced by all of the tech stocks selling off today.
And I think you could categorize that as a fear of declining margins and commoditization.
Now that sounds very boring, but basically what they're saying is, look, if a Chinese
AI company that no one had ever heard of until a few weeks ago can come along and for a fraction of our costs develop a model that is as good or better as the leading
models on the market with substandard chips, by the way, then the barrier to entry in this
market is just not nearly as high as we thought it was.
One of the fundamental assumptions over the past few years when it came to AI
was that bigger was better, right?
That in order to build the most powerful models,
you needed billions of dollars,
maybe tens or hundreds of billions of dollars
and huge data centers and all of the leading chips.
And that assumption may no longer be true.
If what Deepsea claims is true and checks out,
then it may mean that it only costs single digit millions
or double digit millions of dollars
to build a leading model,
which would just radically shift
what these companies are able to charge for their models,
as well as the number of competitors in the market.
Yes, I definitely agree,
it changes what companies might be able to charge,
but I would also
just note that nothing that DeepSeek did is possible without American innovation.
One of the reasons that it was cheap for them is because it was expensive for everyone else
and other companies did spend hundreds of billions of dollars figuring this out.
So worth saying.
All right, let's talk about a second group of people who have been really rattled by
this series of announcements, Kevin, and that is
folks who are paying attention to geopolitics. Yeah, so a lot of people who worry about China
in general are worried about this DeepSeek announcement because DeepSeek is obviously
a Chinese company. If you're a person who's sort of worried about Chinese tech dominance or the
possibility that Chinese firms could eventually get to something like AGI first. I think you are especially worried about what DeepSeek is doing. And I think
we should also say that the models themselves are recognizably Chinese. So people over the
weekend, I saw testing out various queries on DeepSeek R1, including things like, you
know, tell me about what happened at Tiananmen Square.
And the model just refuses to answer them.
And so I think there is a worry that if Chinese companies do take the lead in AI,
then
Chinese values and censorship laws may become embedded into this technology in a way that is very hard to extract.
Yeah, I think that's true.
I also just always urge caution when people try to use the existence of China as a reason
to dramatically accelerate the AI race.
A lot of those people have made investments that will pay off handsomely if we find ourselves
in some sort of protracted and awful conflict with China.
So whenever anyone starts talking about China in the context of AI, my eyebrows arch up a little bit. All right, now Kevin, there is
one more group of folks that I think is quite justly nervous about what they're
seeing out there with DeepSeq and who is that? So the third group of people that I
would say are freaking out about DeepSeq are AI safety experts, people who worry
about the growing capabilities of AI systems and the
potential that they could very soon achieve something like general intelligence or possibly
super intelligence and that that could end badly for all of humanity.
And the reason that they're spooked about DeepSeq is this technology is open source.
DeepSeq released R1 to the public.
It's an open weights model, meaning that anyone can download it and run their own versions
of it or tweak it to suit their own purposes.
And that goes to one of the main fears that AI safety experts have been sounding the alarms
on for years, which is that just that this technology, once it is invented, is very hard
to control.
It is not as easy as stopping something like nuclear weapons from proliferating.
And if future versions of this are quite dangerous, it suggests that it's going to be very hard
to keep that contained to one country or one set of companies.
Yeah.
I mean, say what you will about the American AI labs, but they do have safety
researchers.
They do at least have an ethos around how they're going to try to make these models
safe.
It's not clear to me that DeepSeq has a safety researcher.
Certainly they have not said anything about their approach to safety, right?
As far as we can tell their approaches, yeah, let's just like build AGI, give it to as many
people as possible, maybe for free, and see what happens.
And that is not a very safety forward way of thinking.
So Casey, that is a lot of information that we just dumped on our listeners.
But really what I want to know is like, are you freaked out about this?
Do you think that this is as big a deal as some people seem to think it is? Mm, I think as I'm doing my reading
and having conversations with folks this morning,
my sense is I am freaking out a bit less
than some other folks that I'm talking to.
I think this is a big deal and merits discussion,
but I also think that people may be getting
a bit over their skis when it comes to thinking through the implications here
So make that case because all I'm seeing all over my timelines is people saying
This is the Sputnik moment for AI. This is the biggest moment in AI since the release of chat GPT
Everyone needs to stop what they're doing and pay attention
So what is the the case that you are seeing out there
that people are hyperventilating a bit over nothing?
Sure, so let's take a few different points.
One reason why people are really nervous here
is that DeepSeq was able to train this model very cheaply.
And I wanna be clear,
this is a significant technical achievement.
At the same time, the cost of training and inference
has been falling rapidly in AI for a long time now.
Ethan Malik, who we've had on the show before,
posted a chart on X that showed this decline.
And in some cases, for example,
running inference on a GPT-4 level model,
the cost of that has fallen a thousand fold
over the past couple of years.
So things have already been moving in this direction. level model, the cost of that has fallen a thousandfold over the past couple of years.
So things have already been moving in this direction.
And I think most people who work in AI expected that it would continue to go there.
So that is the first point that I would make.
Got it.
And if you are Satya Nadella at Microsoft or Sam Altman at OpenAI or Asunder Parchai
at Google, are you worried that you are going to spend tens
or hundreds of billions of dollars building out
new data centers and filling them with all the fanciest GPUs
and that some Chinese startup is going to just take
everything that you do and copy it three months later
for pennies on the dollar?
So this is a great question, which leads me to a second reason why I think at least some
folks may be overreacting here.
And that is that in most cases, the money that is being spent to build out the data
centers that will handle these giant training runs can be repurposed.
The same servers and ships that you would use to do that can also be used to serve what is called
inference so basically actually answering the questions so as more and more people start to use AI
It will be those giants that actually have the capacity to serve those queries
They will be able to build businesses around that and by the way
That is another reason why I don't think
that DeepSeek is evidence that the export controls failed
because the folks over at DeepSeek would love to have
all of these chips, not just to do the big training runs,
but also that they could serve all of the demand
that they are currently generating, right?
So I think that's another important thing to keep in mind
as this discussion moves forward.
Yeah, that makes a lot of sense to me.
I do think the cost dynamics here are very important
because I think, I talked to a person that I know
who works at one of these big companies
and he said that a lot of their customers
are already starting to ask,
well, could we shift over from using the OpenAI APIs
and their models to using DeepSeek
if it saves us 80% of our costs?
And so I think in the short term,
there is reason for the American AI companies to worry
because people want these things to be as cheap as possible.
Yeah, and let me just say, as somebody who spent $200
to upgrade to GPT Pro last week so I could try Operator, I'm looking forward to
that price going down.
But that leads to, I think, maybe the third reason that I think people might be overreacting
a little bit here, which is a lot of what we are seeing here is just essentially a fancy
ripping off of techniques that were pioneered here in the United States, right? It has long been the case that open source models
were just a little bit behind the models
that are made by the big labs, right?
You look at Meta's Lama models,
which until DeepSeq were seen as the best open weights models
that were out there, they weren't as good as what OpenAI
or Google or others were doing, right?
Where I do think that this gets super interesting is that DeepSeek is showing us
open source can now catch up faster than it used to, right? That the labs used to have a little bit longer lead
but now people are just getting cleverer and cleverer about these techniques and so it is getting harder to build that defensible moat
because this is just one of those technologies
where once you figure out basically how people are doing it,
you could just get in there and do it too.
Yeah, now Casey, I'm curious what, if anything,
you are hearing from inside Meta specifically,
because I think this is one of the most fascinating angles.
Meta is a company that has spent billions of dollars
developing AI models and unlike most of its competitors,
has chosen to release those models freely.
And part of what DeepSeek has shown is that you can take
a model like Llama 3 or Llama 4 and you can distill it.
You can make it smaller and cheaper.
You can do that without sacrificing
a lot of the performance.
And so there were some reports in recent days
that Metta is basically at DEFCON one right now,
that they are, the information reported
that they have four war rooms at Metta headquarters
devoted to trying to figure out how to respond
to the DeepSeek threat.
Yeah, and by the way, I hope they were the same war rooms
that Metta used to use to protect America from election interference. They say, Hey, get out of here. We got something else
we got to figure out. So are you hearing anything from meta? Because I think that is the company
that I would say has the most to worry about when it comes to deep seek, because deep seek is doing
essentially what they do, but at a fraction of the cost. Yeah, so I do not have my own original reporting
to share on this yet, but I do trust the information
that they are freaking out.
And the reason is that Metta is supposed to be
the best company at ripping other people off.
And so when they find out that some Chinese Johnny-come-latelys
are going to be better than they are at ripping things off,
they're going to have something to say about it.
And so nothing could be more poetic now that, you know,
DeepSeek has ripped up all the American companies,
Metta is coming back and they say,
oh, you think you're good at ripping people off?
Just wait until we have plumbed the guts of V3 and R1.
We're going to be back on top sooner or later, Bucko.
Yes. Now, I want to ask you about one other reaction
that I saw on social media, which
was from Satya Nadella, the CEO of Microsoft. He went on his ex account late last night
and posted the following, Jevons paradox strikes again. As AI gets more efficient and accessible,
we will see its use skyrocket, turning into a commodity we just can't get enough of." And then he linked to a Wikipedia article about Jevons Paradox.
So Casey, did you understand this?
And if so, what did you make of it?
Well, I did because we had just discussed Jevons Paradox on this very show, Kevin.
It's true.
When Hugging Faces' Sasha Lucioni came on and explained Jevons Paradox, which is essentially as stuff becomes
more efficient, you simply increase demand for it, thereby canceling out a lot of the
efficiency gains.
So when I saw Satya tweet Jevons Paradox, I said, once again, Hardfork has set the national
news agenda.
And if you're not listening, fix that.
Yeah, many people are talking about Jevons Paradox.
I predict that this is going to be something I'm going to hear about at every single party
I go to for the next six months.
And I think the you know, just to connect the dots a little bit, I think what Satya
is trying to say here is that DeepSeek is not actually a threat to companies like Microsoft,
because as the cost of building and using AI models comes way down, people are just going to want to use them
more and more, and so the overall demand
and Microsoft's overall profitability will not change.
Which could be true, but I would also just say
is exactly what you would expect the CEO of Microsoft
to say on a day where investors were panicking
and selling their stock.
It is, it is.
All right, well, Kevin,
I think that's a pretty good overview
of what DeepSeek is doing,
why people are freaking out,
and at least some thoughts
about exactly how freaked out you should be.
There is a lot more to say about this subject,
and if you are starving for even more discussion of DeepSeek,
I can promise you that we'll have more to say on our regularly scheduled episode of Hard Fork
this Friday.
Yes.
Casey, I love doing these emergency podcasts.
They fill me with a sense of, you know, danger, excitement, living on the edge.
I love them for that reason.
I love them for a second reason, Kevin, which is that I get paid by the episode.
So here's to many more emergencies in 2025.
["Emergency of Hard Fork"]
This emergency episode of Hard Fork
was produced by Whitney Jones and Rachel Cohn.
This episode was edited by Rachel Dry
and was engineered by Daniel Ramirez.
Original music by Dan Powell.
Our executive producer is Jen Poyant.
Our audience editor is Nelga Logli.
Special thanks to Paula Schumann,
Pli Wing Tam, Dalia Haddad, and Jeffrey Miranda.
You can email us as always at hardforkatnytimes.com. Thank you.