Front Burner - DeepSeek and China’s AI power move

Starting point is 00:00:00 A prime minister resigns, a president returns, a whole world of changes to navigate and understand. If you're someone trying to sort through what's real and what's relevant from a Canadian perspective, we're here for you every night of the week. Your World Tonight is more than just a recap of daily news. Our award-winning team goes deeper on stories that speak to the moment. The full picture, context and analysis, all in about 25 minutes. I'm Tom Harrington. Find and follow your world tonight from CBC News, wherever you get your podcasts.

Starting point is 00:00:32 This is a CBC Podcast. Hey, I'm Jamie Poisson, and today on the show, we're going to talk about this small Chinese company called DeepSeek that has really upended the world of AI. If you haven't been following, DeepSeek recently released this large language model that rivals chat GPT. It shot up to number one on the app charts. The interesting thing about it is that the company built it really cheap. And that has called into question this narrative

Starting point is 00:01:10 that you need an endless supply of chips and data centers and money to develop AI. After people started noticing the biggest market loss in history ensued, chipmaker Nvidia saw its stock plummet 17% on Monday. Microsoft and Google also took big hits. They've since recovered somewhat. Anyhow, wrapped up in all of this is an AI cold war between the US and China. And as always, we still have lots of questions about where AI is headed and what it's actually good for. I'm going to get into all of that

Starting point is 00:01:43 today with Wired senior tech writer, Zui Yang. Zui, thanks so much for coming on to the show. Hi, glad to be here. It's great to have you. So let's start with DeepSeek. What can you tell me about this company and where it comes from? I understand it started as like an offshoot of a Chinese hedge fund. Yeah, that is exactly correct.

Starting point is 00:02:04 So there has been this Chinese quant hedge fund called High Flyer since I believe it was founded in 2015. It was actually one of the best performing quantitative hedge fund in China. And one thing special about them is that they have been using machine learning to come up with their trading strategy. But the founder of this company called Liang Wen when phone, he has a master degree in computer science. And as I understand it, he's just himself very obsessed with AI. So in 2023, he decided that, well, we already have a lot of

Starting point is 00:02:34 experience in AI, but we wanted to establish a new entity called deep sea to really focus on researching larger language models and all the kind of AI technologies out there. So DeepSea has existed for less than two years so far, and it has released quite a few models open source out there to the public. And the latest one released, R1, in January, was the one that really kind of set off the discussions. Yeah, this is the one that came out just very recently, you said, and just tell me what it is and what it does and then why it is freaking everybody out. Sure.

Starting point is 00:03:08 What it did with R1 is that it really focused on the reasoning test. And people have been familiar with this because OpenAI released their old one a little bit earlier last year. Is that like the chat GPT? It is a version of chat GPT, but this version just really focused on telling you the train of thought that goes on

Starting point is 00:03:27 within the model to come to, for it to come to the conclusion. So it's, for example, you tell your math problem, it's gonna tell you every step it takes to get to the answer, which is really appreciated by a lot of people. And what we're seeing with R1 is that it kind of like replicated that train of thoughts processing there. And if you ask a question to R1 now,

Starting point is 00:03:47 you will also show you like the step by step calculation or reasoning that gets to the answer. Could you give me an example of what something like that might look like if I asked it a question? Sure. Yeah, I've been poking with the model right now. For example, if you ask it, please give me a summary of what's the most important historical events that happened in the 20th century, right? It will actually give you this very human reasoning process.

Starting point is 00:04:12 They would tell you that, okay, I should start thinking about the first 10 years, first decade of 20th century, and then the second decade of 20th century. And then after going through all of that, you will say, oh, maybe I have missed something because I focus on things like wars. What about culture? What about other entertainment effects? So it kind of mimics the process that goes inside a human brain of answering this complete

Starting point is 00:04:37 question. I think before opening at 01, most of the models will hide these processes. Or maybe they don't go through this process at all. But then us models started to realize that people actually appreciate seeing what's going on within the models. They started to show that transparently. And R1 is an example of doing that really well. As I mentioned, I think in the intro, this shot up to number one on the app and people really started to pay attention.

Starting point is 00:05:20 And then we saw this massive tanking of Nvidia. You're seeing markets being absolutely eviscerated on news from a company most people have never even heard of. Let's talk about DeepSeek because it is mind-blowing and it is shaking this entire industry to its core. Google and Microsoft take a hit as well. And like, why? So explain to me that freakout. Sure. I think the important background here is that

Starting point is 00:05:50 so far, up to before DeepSec R1 was released, most people had believed that there's one thirding pathway for AI to become more powerful and capable. And that is to buy more computer chips, to train the AI model on more chips for a longer period of time, and just through this scaling effect the AI model is just going to be better and better. The good thing about that kind of belief is that it's a very reliable pathway. It's like if you just follow the instructions you're going to get a better AI over time. But what happens with DeepSeq is that when DeepSeq

Starting point is 00:06:24 releases R1 model and people started to realize that, oh, it actually performs really well, it also released quite a few academic papers explaining how it reached here. And that includes explaining how it trained its model with a surprisingly small budget. And at that point, people started to realize, actually, there isn't just this one way

Starting point is 00:06:45 of getting more powerful AI model. If you focus on, for example, making the training process more efficient or some innovative architecture within the model, then maybe you don't need that many chips or that long like a training time. So this kind of questions established belief in the AI industry.

Starting point is 00:07:04 And I think that's what caused a really large shock in the market. Right. And fair for me to say, you know, when we're talking, they did it a lot cheaper, like we're talking about six million, so they say, versus hundreds of millions of dollars, right? And these chips that we're talking about, they're produced by Nvidia and then these enormous data centers owned by Microsoft and Google. So that's why those companies started to see that sell off, right? That's true. I do think there are some nuances with that 6 million number. A lot of people

Starting point is 00:07:42 just take it for granted. But yeah, the fact is that people realize that you can train the models for a lot cheaper. Okay. So does this ultimately upend this billion dollar industry that has been created around developing AI? I wouldn't say completely disrupted because first thing to know that, well, Deepsea still has to use this Nvidia chips, like they made it themselves to train their R1 model. So it's know that, well, DeepSea still has to use these NVIDIA chips, like they made it themselves, to train their R1 model. So it's not like, oh, with some new innovative techniques,

Starting point is 00:08:10 you can just get rid of the NVIDIA chips completely. That's not true. And the other thing is that I've started to see more people reason that what DeepSea has done, it's really to propose a more efficient way to use whatever chips you have. And at that point, if you have a thousand chips, you're still going to have a model that's better than someone with a hundred chips. So it's not to say that the scaling effect doesn't exist anymore.

Starting point is 00:08:34 It's more like it's not that simple and we can put in more resources into other things to make, to change the efficiency of training but still like the number of chips you have or the resource the money you have is still a very important factor in making a good model. Hey guys we're gonna be back to the show in a second, but if I could just ask you a favor if you could hit that follow button. If you could give us a follow. I know I've been asking you that a lot lately, but it's super helpful for us and hopefully

Starting point is 00:09:14 it helps you too. Okay, back in a second. I'm Natalia Melman Petruzzella and from the BBC, this is Extreme Peak Danger. The most beautiful mountain in the world. If you die on the mountain, you stay on the mountain. This is the story of what happened when 11 climbers died on one of the world's deadliest mountains, K2, and of the risks it will take to feel truly alive. If I tell all the details, you won't believe it anymore.

Starting point is 00:09:46 Extreme peak danger. Listen wherever you get your podcasts. DeepSeek released this latest model as open source, right? Just explain to me what that is and why it was significant. Yeah, so I think there has long been this kind of divergence within the AI industry as to closed source and open source. With closed source, we're mostly looking at companies like OpenAI, Microsoft and Google, because they invested a lot of resources and talents into making their own models.

Starting point is 00:10:19 They're not going to open the model up to everyone to use it. They're charging people a fee to use their most advanced models. On the other hand, we're seeing companies like Meta or quite a few Chinese companies including Deepsea or the Chinese tech giant like Alibaba, who have chosen this more open source route. When they come up with a really powerful model, they will decide to put that model online for anyone to download and test it

Starting point is 00:10:47 and run it by themselves. In this way, they're giving up a lot of control. Like they cannot charge you anymore, right? For using that model. Yeah, why would they do that? The reason why they want to do that is that, first of all, they started a little bit late. Like openly I was already the leader.

Starting point is 00:11:00 If they want to catch up, they need to find out some other ways. And providing your models for free is a really good way to attract more attention from the industry, and also to get more users to just try to use it, to test it, and to maybe collaborate with you. Those kind of advice can really help the company itself catch up faster. That is one way for a lot of companies to decide that, okay, we're not going to catch up with opening apps through a traditional commercial way.

Starting point is 00:11:26 So we're going to try to find out another way to have more people just willingly collaborate with us and help us grow. Are there concerns that people could use that, you know, for evil, like, right? Are there security concerns with that? Yeah, I think the way that these models have been shared under certain open source license is that if you are someone who downloaded the model and did it for malicious purposes, you are the one responsible for it. The companies themselves, they're giving it out there, but they're also making sure that they're not the one who will be legally responsible for that.

Starting point is 00:12:02 And I think there has been kind of been an established open source community rules, so people won't really go after the company for releasing one, and for other people to do something wrong. In that way, the companies are not completely responsible for it. I do think there's another thing to notice here, which is that there's a separate set of law in China to regulate AI companies, and they do put this AI companies

Starting point is 00:12:32 under very strict regulations. However, they also sort of carve out some space there. They're like, if you're releasing a model for scientific research, or you're releasing it not to a general public, but only to people who are very savvy enough in technology, then you're not going to subject to us treating it scrutiny or legal responsibility in there.

Starting point is 00:12:59 I guess the question I have is, could it be bad for humanity? Well, that's a good question. Is it? I feel like... I don't know. Yeah. I think that is actually a very hard question to answer

Starting point is 00:13:14 because a lot of times with this open source model, what's happening is that there are already a lot of models out there. Even if this company doesn't release another one, if you are someone with malicious intent, you can probably find another one for that. So I guess the marginal responsibility there is, it's ignorable. I don't know. But that's what I'm thinking right now.

Starting point is 00:13:35 Yeah. I also wanted to ask you because you were talking about the Chinese government and the guardrails that they put on these companies. I've also heard people concerned about this model in particular. For example, I saw that if you asked about Tiananmen Square, just won't reply to you. Or I think somebody tried to ask it about the Uyghurs. And the response was that China is a multi-ethnic society

Starting point is 00:14:02 where its entire population has equal rights, right? Obviously that's not true. So just, are there legitimate concerns about that? Definitely, I think this applies to every single Chinese model out there, which is that they are subject to a set of different counter moderation rules, they're very specific to the Chinese context.

Starting point is 00:14:28 And a lot of times we're talking about these models cannot talk about political sensitive situations. It cannot talk about kind of like territorial disputes in China or the political movement, protest movements, all of things like that. So personally, I was not surprised at all that DeepSeq also has some kind of censorship mechanisms when user tries to prompt it with this kind of questions. I do think there's a

Starting point is 00:14:51 difference here that because DeepSeq is open source, there's a lot more possibility to tweak with the model and try to get it to sensor less. So I think one thing for the open source AI community figure out is that there is this model audio and we know it has this drawback of censoring political information. Can we actually modify it so it doesn't do that anymore? I think it's a tall tax, but it's still it's worth trying. I just want to situate this conversation in this broader context of this larger global tech war, right? Largely between the United States and China.

Starting point is 00:15:44 Trump seemed, I think it's fair to say, a bit blindsided by the rise of DeepSeek. The release of DeepSeek AI from a Chinese company should be a wake-up call for our industries that we need to be laser focused on competing to win. You know, just for our listeners, could you just briefly talk to me a bit about this so-called, I'd try to call it like a cold war between the US and China when it comes to developing AI and what we've seen in recent years? Yeah, I think this like cold war interaction has really defined the US-China relations

Starting point is 00:16:20 in the past, I would say two to three years. We're talking about how those companies feel like they have to come up ahead of the other one in terms of the key technologies. And AI is definitely the one that most, both countries are paying their most attention to right now. And specifically, we're talking about the US government and US companies really want to make sure

Starting point is 00:16:45 that the American AI models are performing much better than the Chinese ones. And so far, like on Tilted Seek, I think most people agree that that's true. Like Chinese models are just not as good as the Western ones. And usually when a Western model first introduce a feature, you will take a few months or even a year

Starting point is 00:17:04 for its Chinese counterpart to come up with a similar one. And I think that kind of status quo has made a lot of people like comfortable feeling that while the US still leading in this field. What's happening with DeepSec is that DeepSec R1 really is rivalsAI's R1 in a lot of the key reasoning and math benchmarks. And I think a lot of people are just kind of scared and surprised by that. And that kind of just changes the, like, their calculation of like, how long does it take China to catch up? Right. I've seen people make the argument that the breakthrough that DeepSeq made was actually

Starting point is 00:17:45 like the unintended consequence of the trade war. That essentially DeepSeq was forced to innovate because the US had banned the export of high-end processing chips, right? So they couldn't get them, so they had to kind of work around it. What do you make of that? I think that's a reasonable explanation of what happened. Because I think this really happened under the Biden administration, where they started to put this like cheat expert control against China.

Starting point is 00:18:17 So we're talking about they're restricting the most advanced GPUs from being exported to China and help the Chinese companies grow their AI models. And DeepSeq's innovation is really inspired by that. Because DeepSeq as a company knows that I'm not going to get an infinite supply of GPUs like OpenAI has. So what I really need to do is focus on, can I circumvent the obstacle by coming up

Starting point is 00:18:42 with software-focused innovations or coming up with some more scientific research that helps my model grow. So it's kind of like an innovation born out of necessity. That's always been the criticism to the expert control in the recent years because people are saying that, yes, you are cutting off a very important pathway to Chinese advance on AI. But there are other pathways out there, and you're just encouraging them to figure out these other pathways. And I do think DeepSea's success in some way proved that it's possible to find those alternative pathways. But we're still waiting to figure out how effective these alternative pathways. But we're still, we're still waiting to figure out like how effective

Starting point is 00:19:25 these alternative pathways are. I also wanted to ask you about Stargate, right? This is this initiative that Trump recently announced alongside all of these tech CEOs, including OpenAI's Sam Altman. Together these world-leading technology giants are announcing the formation of Stargate. So put that name down in your books because I think you're gonna hear a lot about it in the future. A new American company that will invest $500 billion at least in AI infrastructure in the United States and very quickly moving very rapidly. The idea being that they want to ramp it up. And do you think that project Stargate will go ahead as planned? Is it going to change now?

Starting point is 00:20:19 Well, I think even before DeepSeq became the talk of town, there will have been a lot of suspicions to Stargate, right? And I think it goes back to this idea that Stargate is the example, it's the culmination of this belief that the more chips you have, the more data centers you have, the more powerful your AI will be. And up until DeepSeq,

Starting point is 00:20:40 there hasn't really been a challenging narrative of that. So that's why I think a lot of people, there is a lot of genuine support for a project like Stargate. They're really concentrates the resources, the money and the talents towards getting more chips and building more powerful AI. At the same time, I think there are a lot of criticism to that too because when you are building data centers,

Starting point is 00:21:03 when you are piling up this compute resources, you are also kind of, it also has the environmental effects. It also like endangers the other research that could use those resources, right? So I think what DeepSec has provided at this moment is an evidence for these other arguments that, well, yes, like maybe we can can achieve AI supremacy through your Stargate way. But you're asking us to forget all of these counter-effects

Starting point is 00:21:34 out there. But maybe there's another way. Maybe if we follow the more model efficiency path, we don't need to spend so much money and so much, I guess, resources on a Stargate project like this. And I think that will prompt more kind of a counterarguments out there. Which is what prompts me to believe that

Starting point is 00:21:54 it will be really hard for Open9 to carry out Stargate as it wants to be, but still, they have a lot of political support, so maybe they can put it off. Okay, and just since I have you here, I wonder if I could end this conversation by asking you like a bigger picture question, because it looks like, it seems to me, at least that this, a Chinese AI company has done something really interesting in terms of process and cost.

Starting point is 00:22:19 But there are these other big questions lingering about whether AI can surpass its current use case, right? Or current utility. In other words, can it get better? Or I guess worse for people who think that we're headed towards a dystopia. And does this get us closer to that answer? So the founder of DeepSec, Liang Wenfeng. He's basically the same agent as Sam Altman. And it also strikes me that he has probably very similar beliefs with Sam Altman. He's a firm believer of AGI, which means that AI at some point will just completely

Starting point is 00:22:56 surpass human in terms of their reasoning, their all kinds of abilities. And the reason why he has dedicated so much resources and built up a team to build DeepSeq is that he believes that AGI is achievable. So in some way, the success of DeepSeq proves that there's still this very strong belief in the AI industry that we should build the most powerful AI ever,

Starting point is 00:23:21 and it will solve a lot of the problems in this society. Even though Deep Sea provides a more efficient way to get there, I don't think it kind of changes the direction that it's heading to. I feel like if we want more kind of resistance to the narrative or if we want more kind of other alternatives out there, it might not come from DeepSeek. Okay. Zui, this was really interesting. Thank you so much for this. Of course, glad to help. All right, so before we go today, Trump's Commerce Secretary pick was asked during a Senate committee hearing on Wednesday if he would maintain US AI leadership. Howard Lutnick responded by saying that DeepSeek had stolen US technology to create a dirt cheap model and said that he would impose new restrictions on China. They stole things, they broke in,

Starting point is 00:24:19 they've taken our IP, Lutnick said of China, adding, I'm going to be rigorous in our pursuit of restrictions and enforcing those restrictions to keep us in the lead because we must stay in the lead." All right. That is all for today. I'm Jamie Ploesson. Thanks so much for listening. Talk to you tomorrow. For more CBC podcasts, go to cbc.ca slash podcasts.

Front Burner - DeepSeek and China’s AI power move

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.