Today, Explained - DeepSeek deepdive

Starting point is 00:00:00 Everyone's talking about DeepSeek. Satya Nadella at Microsoft. I think DeepSeek has had some real innovations. Mark Zuckerberg at Meta. There's a number of novel things that they did that I think we're still digesting. And there are a number of things that they have advances that we will hope to implement in our systems. The president of America? The release of DeepSeek AI from a Chinese company

Starting point is 00:00:33 should be a wake-up call for our industries that we need to be laser-focused on competing to win. It's a chatbot. It's a white paper. It could help write code for a Tetris game. Tetris. It could solve a math theorem. It's a white paper. It could help write code for a Tetris game. Tetris. It could solve a math theorem. It's chips, you guys. On Today Explained, the week the world wilded out over DeepSeek. And also, what is DeepSeek? Coming up.

Starting point is 00:01:00 Hey Spotify, this is Javi. My biggest passion is music. And it's not just sounds and instruments. It's more than that to me. It's a world full of harmonies with chillers. From streaming to shopping, it's on Prime. It's Donald Trump's first week in office again, and it sure feels like it. There is a great deal of, as Heather said, a great deal of anger about oligarchs, about

Starting point is 00:01:24 rich people controlling everything. I'm Preet Bharara. And this week I'm joined by Kara Swisher, Estet Herndon, and Heather Cox Richardson for a special Inauguration Week episode of Stay Tuned with Preet. The episode is out now. Search and follow Stay Tuned with Preet wherever you get your podcasts. You're listening to Today Explained. Eleanor Olcott is the Financial Times China Technology correspondent.

Starting point is 00:01:52 We reached her in Beijing, where she has been following DeepSeek from its very beginnings. I first heard about this mysterious AI company in early 2023 because one of my contacts said that this hedge fund silently built one of the largest clusters of Nvidia GPUs in China. So Nvidia GPUs are graphic processing units. These are basically the AI chips that you need to power AI model training and inference. They're really, really important for this whole AI race. And they're in short supply in China. So somehow this quant fund, that's a hedge fund in China, had kind of silently built one of the biggest clusters

Starting point is 00:02:39 in the country. We took notice and they started publishing more and more advanced models over the past year. And their work finally pierced through the Western consciousness when we were all on Christmas holiday right at the end of 2024 with this new model release, V3. There is a new model that has all of the Valley buzzing,

Starting point is 00:03:04 and it does not come from OpenAI or Meta or Google or any of those names. DeepSeek V3 is the first open source model in all of AI history that is better than the closed source models. DeepSeek version 3 is free and it's absolutely insane. Later in January it then published another model which again shocked the world with its sophistication. And the key thing here is that they reasoned that they've prompted somewhat of a existential crisis, especially amongst the US players,

Starting point is 00:03:37 is that they claim to have been doing this on such a bootstrap budget. All right, when I learned about DeepSeek, it was because the stock market had absolutely collapsed amid the news that this Chinese company had made this thing. What exactly went on earlier this week? I mean, the stock market is an incredibly mysterious beast, right? I mean, we at the FT have been writing about how DeepSeq and other Chinese companies are building really competitive models for months now. But I think

Starting point is 00:04:12 what happened over the past week was we saw all of this frenzied activity on Twitter. It's not that people want DeepSeq to win. It's that they want OpenAI to lose. DeepSeq R1 is one of the most amazing and impressive breakthroughs I've ever seen. DeepSeek this, DeepSeek that. A profound gift to the world. How about you seek a deep connection with a woman? Personally, I'm staying away from DeepSeek. I don't want the Chinese spying on me and seeing what kind of videos I'm watching on

Starting point is 00:04:40 TikTok. Wait. Wait. Wait. What came out on Monday was a moment, right? It was really, really important because DeepSeek, this kind of little known Chinese lab, for the first time, released a paper with a very, very detailed explanation, a kind of technical recipe, as it were for building a reasoning model. Now, reasoning models are important. It's a fairly new area of AI,

Starting point is 00:05:10 but it basically means models that can teach themselves and improve themselves without human supervision. This is really important because if we can use this in practical applications, it means that AI will be capable of critical thinking and will be useful in tasks that are vastly more complex than what we currently have on the market. The dream, right, is to have an AI, for example, running in the background of your computer and kind of preempting your needs, like booking travel, doing things that you

Starting point is 00:05:45 haven't even thought of maybe. It's kind of acting as your actual personal assistant. They don't just respond to demands, they preempt things. Hello Noelle, what can I get for you today? They make decisions on their own. They might, for example, I mean, figure out that you have not got enough groceries in your fridge and think, okay, well, we'll preemptively order that so you don't even have to do it yourself, right? I ordered extra Cheetos this week. You deserve them.

Starting point is 00:06:16 It's still very much an open question as to whether or not we're going to get there. It's important to note as well that this is just like a big marketing strategy on the part of a lot of AI companies also to justify continuing to raise billions of dollars. But what I think DeepSeek proved over the past week is actually China is a viable and competitive player in this field. So let's talk about where DeepSeek comes from. Who's behind this? So unlike other AI companies, AI startups in China, it hasn't raised any external financing.

Starting point is 00:06:55 So you think, okay, how the hell has a company managed to build what we know is a very expensive endeavor of buying all of these GPUs and also hiring the best talent. They're known along by dance for paying the top dollar for the best AI researchers in China. And that's basically a story about the founder Liang Wenfang, who has a background as a quant hedge fund manager. So he basically made a whole bunch of money trading stocks

Starting point is 00:07:25 and decided to plow some of those resources into this new pet project. And he started in 2021 building this large Nvidia cluster because he recognized the potential for this technology. And the timing of that is important for two reasons. The first is that it was really before the world woke up to the potential of generative AI. It was before the release of ChatGBT and the rest of the Chinese players had kind of

Starting point is 00:07:54 neglected generative AI as a field research. They were much more focused on surveillance technology, surveillance AI, because it was clear you could make money with that form of AI. The other reason it's significant is because it was really before the first tranche of kind of blanket export controls were put in place on China. The restrictions will limit Chinese companies access to advanced computer chips and slow their progress in artificial intelligence. U.S. chipmakers, NVIDIA and AMD, tumbling after the US ramped up its chip export rules. Washington says the aim is to prevent Beijing using the most advanced semiconductors for

Starting point is 00:08:33 its military modernization. So really when the race in China in early 2023 started to replicate or seeking to replicate OpenAI success. Actually, Liang and DeepSeek were in a pretty good position to get ahead. Okay. So Brilliant Man made a bunch of money and now presumably will make a trillion more dollars. Is that the objective here? That isn't the objective here. And actually, that's what makes DeepSeek so unique, right?

Starting point is 00:09:06 They have not made any serious moves to commercialize their technology. They have an AI chatbot. It's free to use. What I think he's doing here and from people who know him, is he wants to just, you know, add to the great canon of LLM research. He wants to push this technology forward. And actually also there is a bit of a national pride here element as well, right? In interviews with domestic press, he says it's important that China also plays a role

Starting point is 00:09:38 in developing this technology and being a leader. So I think there's various ambitions at play, but he's a pure technologist. And actually, because DeepSeek is not interested in commercializing their technology, right, it's like a pure research lab. People have described it to me being like the early days of DeepMind, where you just have a bunch of engineers, a bunch of researchers working on whatever they think

Starting point is 00:10:04 is the best technical pathway forward. But because they don't care about commercialization, We just have a bunch of engineers, a bunch of researchers working on whatever they think is the best technical pathway forward. But because they don't care about commercialization, that means that they're willing to share the secrets of how they've done that with the rest of the world and kind of enable the others to also learn from their learnings. And for players like OpenAI who are also working on the same research, but not telling the world how they got there, this is really a bit of a challenge. players like OpenAI who are also working on the same research, but not telling the world how they got there. This is really a bit of a challenge.

Starting point is 00:10:29 Earlier this week, as the stock market was whipsawing about, we heard people asking whether or not this is AI's Sputnik moment. They're referring to the Soviet Union launching a satellite into space before the United States back in the 50s, kicked off the space race, a very big deal. And one of those terms that you don't hear very often because a Sputnik moment is a big moment. Do you think that this development just kicked off the AI race in a way? As a journalist, I'm all for fancy metaphors and comparisons. I think the comparison is not completely correct in this case, right? DeepSeek is a private company that has just been plugging away on AI research.

Starting point is 00:11:15 It's not building rockets to send to space. But having said that, you know, US and China undeniably are in a tech war. We've known this since 2019. And China is very, very concerned about the US getting ahead on AI. And it's been providing a huge amount of support to kind of select players that they think are going to help. It's remain competitive and gain an edge. But really, the Sputnik element, it's really about and gain an edge. But really the Sputnik element,

Starting point is 00:11:46 it's really about the hardware itself, the AI chips. I think the real race here is on Chinese companies and the Chinese ecosystem overall, trying to make Huawei or maybe one of the other Chinese competitors a true long-term and successful rival to Nvidia. Eleanor Olcott of the Financial Times in Beijing. Coming up next, can DeepSeek's competitors, looking at you, OpenAI, compete? compete. Support for the show today comes from Vanta.

Starting point is 00:12:35 Trust isn't just earned, Vanta says. It's demanded. Have you demanded someone's trust lately? Whether you're a startup founder navigating your first audit or a seasoned security professional scaling your governance, risk and compliance program, proving your commitment to security is critical and complex. And that's where Vanta comes in.

Starting point is 00:12:59 You know the deal. Vanta says they can help businesses establish trust by automating compliance needs across 35 frameworks like SOC 2 and ISO 27001. They say they can also centralize security workflows, complete questionnaires up to 5 times faster, and proactively manage vendor risks. You can join over 9,000 global companies like Atlassian, like Cora, and Factory

Starting point is 00:13:27 who use Vanta to manage risk, improve security in real time for a limited time. Our audience can get $1,000 off Vanta at vanta.com slash explained. That's V-A-N-T-A dot com slash explained for $1,000 off. You're listening to Today Explained. Noelle King here with Reid Alberghati, technology editor at Semaphore. Reid, the reaction on Monday to DeepSeek was huge.

Starting point is 00:14:00 The markets are swinging around. People are yelling about Sputnik. It's on everyone's homepage. What did you think about all that? Well, I was really kind of slapping my forehead because I think it was a complete overreaction. People knew that this company existed. And in fact, this whole idea of distilling these larger models into smaller, more powerful ones that are more efficient, this is something that had been going on really since Chachi PT came out. The biggest takeaway for me is that the market really does not

Starting point is 00:14:30 understand the AI industry yet. What's been going on with all of DeepSeek's Western competitors? What have they been up to? They're all investing massively in these huge data centers. It's my honor to welcome three of the world's leading technology CEOs. With hundreds of thousands of graphics processors, tens of billions of dollars. In fact, you probably heard last week there was a deal announced for 500 billion dollars. Stargate. So put that name down in your books.

Starting point is 00:15:01 With OpenAI and Oracle and MGX and SoftBank, I mean, massive amounts of money. I think this will be the most important project of this era for AGI to get built here, to create hundreds of thousands of jobs, to create a new industry centered here. And what that investment is for is running these models because there is so much demand.

Starting point is 00:15:25 These companies really can't meet it right now. And what we're also finding is that inference, which is just the fancy term for running these models, actually can now increase capability of the models a lot. That wasn't the case before when ChatGPT first came out. It was just you prompt ChatGPT, it comes back with an answer. Now you prompt the most advanced model of these models and they are doing a whole bunch of stuff in the background. They're running over and over and over again. They're trying to find the

Starting point is 00:15:55 best answer and that is exponentially more expensive and that is just going to continue and this new R1 model that DeepSeek came out with, it's an advance, but it is not nearly a big enough breakthrough to sort of negate those market dynamics. Can you explain why? Because yeah, I saw that as well. DeepSeek did it on the cheap. All of this money and energy and investment

Starting point is 00:16:22 is for naught because they came up with the little AI that could and it didn't even cost that much. Yeah, I mean, they showed that you can do some of these types of queries at a lower cost, but it's just not nearly low enough. You might've seen Microsoft CEO Satya Nadella talking about Javan's paradox. Javan's paradox strikes again.

Starting point is 00:16:45 You know, basically that's this, as the technology becomes more efficient and the cost declines, the paradox is that you would think, well, okay, that just means that it just gets cheaper and these companies are just not going to make as much money on it. But actually what happens is it becomes more useful and people want to use it more. And then there was another wrinkle that appeared a few hours before we're speaking on Wednesday. There is a suggestion that DeepSeq may have borrowed from OpenAI or stolen from OpenAI. What is the allegation? What are people seeing and saying? You know, it's stolen. I mean, that's a very strong word. We saw David Zaks, who's the incoming AI czar,

Starting point is 00:17:28 sort of accused DeepSeek of stealing from OpenAI. And there's substantial evidence that what DeepSeek did here is they distilled the knowledge out of OpenAI's models. And I don't think OpenAI is very happy about this. So you need data to train these AI models. And I don't think OpenAI is very happy about this. So, you need data to train these AI models. But what you can actually do is you can use the models themselves to create a very, very specialized kind of data. It's really synthetic data because it's being generated by an AI model. But you can create exactly the kind of data that you want, and you can check that over with other AI models.

Starting point is 00:18:08 And what you end up with is, it's how you make these models much more efficient. So this is also not surprising, because that's how all of these models work. I mean, we've seen lots of companies do this. So I just, again, the process of distillation makes total sense, whether or not it's stealing. I think just, again, the process of distillation makes total sense. Whether or not it's stealing, I think that's something that's a gray area in the AI industry

Starting point is 00:18:29 that we really haven't ironed out yet. I think it's such a new thing that we'll have to sort of come up with the norms and the rules and regulations, maybe even copyright law around this. I want to ask you about Nvidia, the company that makes the chips that AI requires. Nvidia is now basically a household name. It takes up a big part of the stock market. And so when Nvidia stock is riding high, so is my 401k. And when it's tanking, good Lord, I'm going to retire under a bridge.

Starting point is 00:18:59 And on Monday, the bridge was looking like a real possibility. What exactly happened on Monday with Nvidia and why did it seem to get hit so hard by this news? Yeah, well, Nvidia makes these graphics processing units that are the most powerful, the most advanced in the world. And they're very expensive. I mean, the older models, the H100s, which were state of the art,

Starting point is 00:19:23 they cost about $40,000 a piece. And these data centers have about 100,000 of those. So, you know, you do the math there. NVIDIA is selling a ton of these chips. Really, they can't sell enough. There's way more demand than they can even produce. And it's all because these models take a lot of energy to run. And so if you can have a more efficient one

Starting point is 00:19:44 that doesn't require these powerful GPUs, then maybe you don't need to spend $40,000 on a GPU now with OpenAI. But again, that's not really what happened here. What happened here is there's a bit of an advance in how efficient these models can get, but in order to get the most use out of them, you need to run a lot of inference on those. You're still going to need really powerful GPUs. And as there are more advances

Starting point is 00:20:11 in the pre-training part of the models, they'll get even bigger and more powerful. So, NVIDIA is not going anywhere. I mean, certainly they have competition. There are chip makers that want to build more efficient inference chips. There are people who want to get rid of Nvidia's advantage, which is its CUDA software that the whole AI industry basically runs on right now.

Starting point is 00:20:34 It creates a big moat for them. Those are the things that I think are the risks for Nvidia, not a company building a more efficient open source AI model. I think there was a reason that so many watchers and analysts and reporters framed this as China catches up to the US and that is because China and the United States are in a quiet war, cold war, existential struggle. We compete with the Chinese and it raises some questions, does it not, about how worried the US should be that China beats us in the artificial intelligence competition?

Starting point is 00:21:12 Yeah. And this is where, you know, I think there really are national security concerns about China and AI. And you know, if China does win the AI war, the AI race, let's say, it will probably give them a military advantage. I mean, this is all, this is far into the future. There's a lot of debate about this, right? But I mean, I think the conventional wisdom is

Starting point is 00:21:37 if you win the AI race and you get your first to AGI or super intelligence or whatever you want to call it, it becomes a military tool very quickly. And I think the US, that's the whole reason the US has put so much energy into figuring out how to curb the exports of the most powerful AI chips to China. They don't want to see China be able to sort of control its own destiny when it comes to AI.

Starting point is 00:22:05 After the events of this week, Reid, is there a sense of renewed competitive energy? Everybody needs to now go back and work harder, faster, smarter for less money. Yeah. I think Sam Altman said that it was invigorating on X. We will obviously deliver much better models. And also, it's legit invigorating to have a new competitor. This is how research works.

Starting point is 00:22:32 Somebody comes out with a new idea, and it inspires other people both creatively and also competitively. This is a dynamic that we've seen for the past couple of years in AI, or even really more than that. This is why so many of these tech companies have been publishing their research instead of keeping it a trade secret for so long, because the genius researchers who write these papers, they want to present them.

Starting point is 00:23:01 They want bragging rights at the NRIPS conference. That's the big AI model conference every year. They want to get pats on their back from their co-workers. And I think there's actually probably a lot of mutual respect between the AI researchers at OpenAI and Anthropic and the ones at DeepSeek. They think in that world, I think it's possible to kind of put aside all the geopolitics and just say, hey, nice job, you've created a really interesting model and we're going to learn from it and try to do better. I think the other way to look at it is, look, if the US doesn't win the race to AGI, then

Starting point is 00:23:50 what you could see is a Chinese military advantage that leads to something like an invasion of Taiwan and maybe potentially a hot war between these two superpowers. And that would be very, very bad. I think people who, you know, the most fervent China hawks, what they really want is a US military advantage that is so big that, you know, there just will be no war. And I think if you look at it through that lens, then yes, I mean, this AI race is very, very consequential geopolitically and really there are dire consequences if the right outcome isn't achieved. Semaphore's Reed Alberghati, thanks to him. Miles Bryan and Victoria Chamberlain produced today's show

Starting point is 00:24:39 with an assist from Amanda Llewellyn. Amina El-Saddi is our editor. Andrea Christin's daughter and Rob Byers engineered. Laura Bullard checks the facts. I'm Noelle King. Team Sink. It's Today Explained. you

Your Ad Here

Today, Explained - DeepSeek deepdive

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.