Your Undivided Attention - Behind the DeepSeek Hype, AI is Learning to Reason

Starting point is 00:00:00 Hey, everyone. It's Aza. Welcome back to your undivided attention. So today we are going to be doing actually a bit of a special episode. It's going to be me here with our co-founder, Randy Fernando, who was at Nvidia for seven years. And what we really want to do is give you some insights into the latest set of AI models that came out. So these are Open AIs Oste, three, Deep Seeks R1, and actually they're following on Open AIs 01 from a little, a couple months ago. And we want to talk about what makes them a big deal, why we have switched into a new paradigm in how these models get trained and like what's going on behind the scenes. So first, Randy, thanks for joining me. Glad to be here. First place to start is, you know, this new model from China, Deep Seek R1, it dropped. And it ended up creating this frenzy in media. it shook global markets.

Starting point is 00:01:02 The hype is quieted down. And actually, you know, I think that the drop in global markets was very irrational. But let's talk a little bit now about what makes this a key inflection point in AI tech. I think there were several things, right? And I'm not sure exactly which order to go, but I'll just name a few. One was low-cost, high-performance reasoning. Like, it actually performed well, and people used it, and that was really impressive. Now, there are some asteris about the cost because the cost didn't account for the GPUs, the salaries.

Starting point is 00:01:41 Just to jump in, there's a widely reported number that between $5 to $6 million, this Chinese lab was able to make a model as good as OpenAI's O1 model. And this, if this is true, that means that the big labs no longer had a frontier. competitive advantage. Everyone could be making these, but of course, that number I think was inaccurately reported. Yeah, exactly. And there's some debate about that, but I think our goal today is to give you some principles to think about this rather than like nitpicking every detail. That's right. Clearly, there were some really smart implementation, algorithmic optimization. There's just a lot of smart things that were done to do it all efficiently. That's true.

Starting point is 00:02:29 O3 still performs better. I think it's important to remember that because amidst all the hype, I think people kind of, some people lost track of that. O3 performs better, but it uses a lot more computation and cost to get there. The open weights, the published methodology, right? So the deep sink R1 paper talks a lot about exactly what they did and this process called reinforcement learning, right? where the model is able to try out lots of different experimental ideas,

Starting point is 00:03:01 score them, and then keep the best ones, right? So it's allowed to be very creative. Try out lots of different answers to problems, different sequences of steps, different recipes, right, to solve that problem. Some work, some don't work. And then it's able to figure out, yeah, these are the ones I should keep. These are the ones I should toss. And that worked really well.

Starting point is 00:03:23 And this paper kind of documents the process for doing that. plus since all the weights are open right this is now the new baseline that anyone can have anyone who's serious can have access to right in an open way so that's a that's a big game changer yeah and so now I want to walk everyone through like what makes 0103 and R1 really different

Starting point is 00:03:45 Randy was just referring to them so let's start with the large language models so these are you know the GPT 4s the llamas that everyone now sort of is aware of and the way those work is they are trained on the entirety of the internet or lots and lots of images and what they learn to do is to produce text or images in the style of so it can do produce text in the style of Shakespeare produce text in the style of being empathetic produce text in the style of a good chess move but it doesn't really know what's going on it's not thought about it it's just doing a very large-scale

Starting point is 00:04:24 pattern match and coming with a knee-jerk reaction. And that has a limit to how good it is. Can I add a little bit? Yeah, absolutely. It's just patterns show up everywhere. I just want people to recognize how often patterns show up in our life, right? When you look at language or vision or music or code or weather or medicine, there's patterns in all of these, right?

Starting point is 00:04:48 Whether it's words or pixels or like audio waveforms or like syntax. in code or on a map, like which cells or which color, right? Or where there might be a cancer on an image. All of these things come in patterns. And so once we can learn those patterns and models can learn to extrapolate those patterns, they can become good at all sorts of things that are important to us as humans. That's great. That is great.

Starting point is 00:05:18 And another way of saying this is that AI, you know, these are language models. it can treat absolutely everything as a language. You know, obviously, language is just a sequence of words. It's a language. Code is just a sequence of special words. It's a language. DNA is a sequence of, you know, ATG, C, just another language. Images are a sequence of colors, just another language.

Starting point is 00:05:38 So if you can learn the patterns of those different languages, then A can learn to speak and translate from the language of everything. And the important thing about language models is that they're learning really to babble in a convincing way in all of those languages. And that's where you get all the hallucinations and confabulation because it's just giving a statistically representative pattern at a very large scale. Okay.

Starting point is 00:06:00 So then along comes R1, O1,03. And what makes these difference is it's almost like a planning head that's placed on top of the intuition. So let me give a really specific example of how this works, where let's imagine you've trained a language model on chess moves. So now it can come up with a good intuitive next chess move given the board state. And that can be as good as a very good chess player,

Starting point is 00:06:25 but not better than the very best or grandmasters. Because it's just giving an intuitive hit. It can't do better because it's only trained. If it's only trained on what humans have done, it can't do better than the humans, right? So that's a really important concept. And now it's just about to jump into why now we can transcend that. That's exactly right.

Starting point is 00:06:45 And it's a really important point because often people will push back and they're like, but, hey, I can't get better than humans. because it's only trained on human data, so how could it possibly get better? Well, when you or I play Gary Kasparov and chess, we'll lose, or at least I will. I don't know. Oh, me too.

Starting point is 00:07:01 I play, but I'll lose, yeah. Why? And the answer is because he both has really good intuition, because he's played lots of games. And two, he's very good at thinking through all the different scenarios. If I make this move, then they'll make this move, so I'll make this move. So I'll make this move, they'll make that move.

Starting point is 00:07:18 And I'll make that, aha, now I'm in a good, position. So there's this sort of tree of thoughts that Gary Kasparov is exploring based on his very good intuition. Now, you or I are going to do trees of thought, but our intuition is not that good. So we're going to make lots of false steps. He's going to search all the most important trees very quickly. And hence, he will dominate us. Well, that is the ability that O-1-R-103, these reasoning models are starting to have, that they can use their intuitions from their language model, and then create trees of thought, sort of very smart trial and error, to search over what good moves are. And in that way, you can make a chess AI that is better

Starting point is 00:07:58 than every human being forever. Yeah, exactly. And another way of underlying this is to say that just like the patterns we talked about that exist in audio, video, images, like all of these things, reasoning also follows patterns, right? There's recipes of thought. And so, So think of it, you can kind of think of it as if you're cooking, there's a recipe. You can modify certain parts of it and you can get to different types of dishes, right? And this is the same thing. Like when you're solving a problem, there are playbooks that we all use to solve problems. And now we've taught it, right?

Starting point is 00:08:34 You just give it a few of the main recipe types. And then it can play around from that baseline and try lots of new stuff. A really important thing, as I said, is some of those new ideas are going to be things we've never seen before. Some of those we'll understand, but there's also going to be variants that we don't even understand. And that starts to have big implications for other problems, right? Things like deception, safety, transparency, right? Like, how do you understand what a model is doing when it's using reasoning that you can't even follow? So this is all coming, right, as part of this big leap that we've just taken.

Starting point is 00:09:19 And Randy said, it can feel a little meta, but it is so important that reasoning itself has a set of patterns that if you learn them, you can get better at reasoning. So I think we're going to stop seeing these big model jumps from GPD 3 to 3.5, 4 to 4.5 to 5. There are a couple more still coming, but we're going to enter a new regime. where there is now a way of if you pour more compute in, or the AIs can get better. You just shovel more money, and they will continue to get better. Let me explain how.

Starting point is 00:09:53 So let's go back to the chess example. With the chess example, maybe your language model has an ELO score of 1,500, ELO meaning just a way of ranking chess players. And you now add search on top of that, reinforcement learning or planning. So it's looking at all the various paths, and it starts to discover better moves.

Starting point is 00:10:10 Maybe it's a little bit better. So maybe it's like Elo 1505 or something, just a little bit better. You then distill, that is, you retrain your original model, your intuition, to now have the intuition of that 1505, the slightly better player. And then you just search on top of that. And now you can discover 1510 moves, and then you distill. Now you can discover 15-15 moves. And you can see how you can consistently go from, you start with your base model, your intuition,

Starting point is 00:10:38 you think or reason over the top of it. That lets you discover new better moves, which you then learn from and put it back into your intuition, and now you have a ratchet. And it's important to note this is not just chess, this is math, this is any field that has theoretical in front of its name, because those are a closed system. You can just run computation on to check yourself. So that's theoretical physics, theoretical biology, theoretical chemistry. Anywhere where there's a clear right or wrong, where you can check. So math, you can substitute, say like you're solving for X in some complex equation. You can plug X back in and see if X was right.

Starting point is 00:11:18 So based on that, you can improve, right? With code, you can generate code and you can plug it in and run. You can compile it and run it and see if it actually works. And so those domains are the ones that you can just improve and improve and improve, which is why in chess or Go or StarCry, we've been able to accomplish not just human level or the best humans, but go far beyond because you can just keep improving, you can keep testing, and you can just toss away the ideas that don't work. It's really interesting, and it kind of says a lot for what the future holds.

Starting point is 00:11:55 So it sort of begs the question of why now, right? Why now? And an important piece of that is having base models that were smart enough to generate interesting ideas. ideas to try out in the first place, and to be able to evaluate, like, hey, that's a good path. Let's try that. That's a bad path. And so until recently, the base models just weren't good enough to do this. So this idea of reinforcement learning, these feedback loops were not actually possible. No, that's right.

Starting point is 00:12:27 And actually, I know of teams that a year ago tried pretty much the exact same thing that Deepseek tried. Right. And it just didn't work because the base models, the intuition, wasn't good enough. You have a bad intuition. You try to search over bad intuition. You just get bad thoughts. That's right. And so one thing that's also really important is that because of the same reason that makes these models really good at quantifiable areas makes them not as big a jump in subjective areas, like say something like creative writing, which is much harder to quantify, say, hey, is that really good or is that not as good? Now, again, if you define some very clear

Starting point is 00:13:07 parameters for creative writing and say, here's a scoring system, like, this is a good piece, this is a bad piece, you can do the same method. But in other areas, you can't. It's important to note that one of the open questions is how much does an AI learning how to code and do good thinking in the more hard sciences, how much does that transfer to the soft sciences and these soft tasks? And there is evidence that you do get some kind of transfer that The better you get at hard stuff, the better you get at thinking through the soft stuff. There's a famous early example from two years ago where just training AIs on code made them better like writers and thinkers because there's a kind of procedural formality

Starting point is 00:13:53 to code that it was then learning how to do in the soft skills. I do want to extend. So learning how to think, right? Algorithmic thinking. Learning how to think in a structured sequence translates to all sorts of areas. So just to reinforce some of the points we've just made, before there was what was known as the data wall. Once you train these large language models on the entirety of the internet, that was it. That it was going to be hard for them to get better.

Starting point is 00:14:21 That data wall with these new techniques is no longer relevant because you can just do their self-bootstrapping. Two, once the AI gets superhuman at any one of these tasks, like humans have just lost in that thing forever. and the thing you'll next year is like, oh, but humans plus AIs can do better than that. And that's true for a very short period of time. That was true in chess. That is no longer true in chess. So this thing, you just pour more compute in, and it goes up. And now we get to why was the market crash irrational?

Starting point is 00:14:51 The market crash was irrational is because you can always use more compute. And as soon as these agents get to the place where they can task themselves and be like, what are ways that I could use more compute to say, make more money? and that's probably coming end of this year, early next, give or take, then compute is an all-you-can-eat buffet. Because with oil, if we discover more oil, it's not like humans can immediately figure out how to use all that oil.

Starting point is 00:15:19 But with compute and with AI, as soon as we discover more compute, the AI can figure out how to use that compute effectively. And so, Nvidia and all of the AI companies still is going to be a race for who has the most compute. And then the final thought here is that This doesn't just work with games and math and physics. This is going to work with strategy games of war.

Starting point is 00:15:42 This is going to work with the strategy of scientific discovery. This is going to work with persuasion. You train these models over the entirety of every video of two human beings interacting. And now you start doing search over the top of that to be like, what joke, what relationship, what facial expressions do? Does the model need to make to get the human being to laugh or to cry or to feel it in some state? So superhuman persuasion is a natural result of all these things. Lots of things can be scored and quantified if you're just creative about how you do it.

Starting point is 00:16:15 And once you can do that, you can reinforcement learn how to do it really well. I wanted to add one thing to is your third point, right? That just to help people realize the automation revolution is about the entire $110 trillion global economy, right? nothing less. It's about the cognitive, but currently through large language models and the physical through robotics. And that's why, right? You can spend so much more on all this stuff as long as it's getting you returns. And I think it's worth mentioning, you know, there's this question of like, is it all a big bubble? I think we have to be nuanced about it. Part of it is

Starting point is 00:16:57 more of a bubble, right? Like I think the translation to where generative AI helps with the attention economy has a much more bubble-like quality because it's just not as clear where there's something like genuinely helpful and advancing there. But in coding, for example, like cursor, right, was recently the fastest company to $100 million of active recurring, annual recurring revenue. And that is because they are helping with coding, right? The cursor is an environment where you go in and you write code. And it helps you do that really efficiently. the value of that, the real value of that, is enormous, especially on this path to automating, like large-scale automation. And I think that's really important to keep in mind.

Starting point is 00:17:46 One really important thing to talk about here when we think about market bubbles is the distinction between development and deployment. That is, how fast does a technology diffuse into society and almost always people think, that development will take longer than it actually does, that is development goes faster, but then they expect deployment, diffusion, to go fast, but then it takes longer. And that's where you get these little bubbles. But general purpose technologies are a little bit different. Yeah, I mean, because you can swap them out so much, so much more easily than in the past, right? So let's say, you're changing your accounting system. There's so much work that has to be done, right? When you do that process. But when you start to use general purpose technology that can do things for you,

Starting point is 00:18:36 when you get a newer one, it's normally just strictly better than the old one. And those of you who've been using these technologies regularly have probably seen that every month, stuff that used to be like not as reliable or slow is now faster and more reliable. And that is just a pattern that we'll continue to see. The other thing is there's a lot of companies, like, say Nvidia as an example, right, that's building what's called middleware, right? So this is a layer that you connect to. Like your company connects to the middleware layer, and the middleware talks to behind the scenes, the large language models.

Starting point is 00:19:17 And so they can swap out the large language model even invisibly to you. And the whole thing will just work better. And you don't even have to change any lines of code. So this is happening not just with the cognitive stuff, but also in the robotics. And that's one reason why I think the diffusion process this time around will be a lot faster than many people think when they compare, right? They're using a model of like, well, what have we seen before? Those patterns may not apply as well this time around. If we went back two years, when we first did the AI dilemma, the place that we focused was what we called second contact with AI.

Starting point is 00:19:58 So these are AIs that were smart, but we're not trending off to being superhuman. And there are huge numbers of issues there, and I don't have to recount them here. But really seeing 01 and then the speed to 03, deep seek, meaning that open AI is following suit, we really have to take seriously that we're going to be dealing with AI agents in the world that are at or above human abilities across many domains. and that's deeply unsettling and it's not like when I'm in these rooms with some of the most powerful players it's not like anyone actually knows what to do just I can't remember three weeks ago four weeks ago I was at a conference and I was giving the the closing keynote and

Starting point is 00:20:51 Eric Schmidt spoke just before me and he said a lot of things but one that he talked about was that all of the AI labs are currently working on making their AI's code. And he sort of couched it as, well, they're making them code because that's what coders do. They know coding the best and they're physicists, so they're going to work on making it code. And a little bit later, he said the thing that scared him most, the moment that we would need to pull the plug for AI security reasons, would be the moment that AI gained the ability to substantially increase the rate at which AI progress is made. And the thing I think he didn't say is, but the incentives are that every one of the labs will get a disproportionate advantage if instead of using real human beings to code, they can just spin up more digital programmers to make their AI go faster.

Starting point is 00:21:49 I'm curious, Randy, if you have any thoughts to add here where the full weight of the competitive landscape is now, being pushed towards the thing that, you know, Eric Schmidt thinks is the most dangerous thing. Yeah, the whole thing snowballs, right? You just end up with an advantage that accrues into, by the way, for those of you who don't know, Eric Schmidt was the former CEO of Google. And so, to answer your question, I think it's this compounding, compounding cycle that we get into, right? Especially when you're good at coding, you end up being able to unlock so many other things because, Coding is like the doorway to the world, right? And this is why companies are so interested in being good at coding.

Starting point is 00:22:37 From there, you can get to agents. From there, you can get to tool use. All of this gets unlocked. And then it gets faster and faster. You can chain the models together. They can work together. They can share information. They can share what they're learning about the world with each other.

Starting point is 00:22:52 And they can work coherently, like with the same mission, the same purpose. and you don't have the sort of translation loss that you have when you have humans trying to work together where you have to work so much harder to get everything to work. That's right. And like the big thing that's happening now with the reasoning models is, you know, with language models, they can give you like knee-jerk reactions. And of course, they've learned across the entirety of the web. So those knee-jerk reactions can often be good, but they cannot plan and do long-term things.

Starting point is 00:23:21 And that's what these new models, DeepSeek R1, 01, and 03 are starting to be able to do. Eric Schmidt acknowledges and says openly that the place we would need to pull a plug, not that I know where the plug to pull would be, is when AIs can do this kind of self-improvement. And the labs, when you talk to people inside of them, the AI is already making their work go much faster, and the expectation is sort of by the end of this year

Starting point is 00:23:46 is when they're going to, AIs will be making substantial improvements to the rate at which their own AI coding is going. And, you know, I'm just going to say, say that a lot of my attention in time as well as I think, you know, CHTs isn't doing the sensemaking to figure out what are the very best possible things we can do. And so I actually want to recruit everyone that's listening to this podcast to start thinking about this particular problem because it's not easy because everyone, of course, wants the strategic advantage for able to have superhuman ability encoding, cyber hacking, science progression, creating new physics

Starting point is 00:24:25 and materials. It's sort of the biggest, thorniest problem. And the principle related to that is as the general purpose technologies advance, right, as a technology becomes more general purpose, it becomes harder and harder to separate the promise from the peril. And these reasoning models are a big jump in that. So it means it's a tighter coupling. It's a much tighter coupling. And these are the challenges, models are going to become better at things like deception and a lot of that I just want to emphasize right is because they're just trying to achieve the goals they've been given within the rules they've been given and it turns out unless we're really really careful about how we define those rules there's always there's always risks we haven't thought about there's new ideas there's creative solutions and some of those might be things we like, and some of them are things

Starting point is 00:25:26 that we might find dangerous or that we want to avoid. And models will just find this all the time. So this is the new challenge when you have these reasoning models. They're able to find more and more creative solutions that we might not have thought of. And to give the concrete example

Starting point is 00:25:41 that most people in AI will give is what's known as Move 37. And that is in the famous case where Google Brain, I think it was deep-minded at that point, was working on a chess AI that was playing against the world leader in Go and I think it was in game 3 or 4 the AI made a move move 37 that no human being in you know thousands of years of playing Go had ever

Starting point is 00:26:08 made um I think it was Lisa Dong was the the Go master like stood up walked away from the Go board because it was such an affront and it turned out to be a brand new strategy you know the AI won that game and ended up becoming a new strategy that human beings have studied and have started to incorporate into their game. The point being that AIs can discover brand new strategies for even things that human beings have been studying and actively competing in for thousands of years. And so then you end up with this idea of we're going to discover lots of new Move 37s. And that can be good. We can discover new Move 37s for treaty negotiation for figuring out how to do like global compacts. But AI can also discover Move 37s for deception and lying, which we have never seen before.

Starting point is 00:26:59 I think I have often rolled my eyes a little bit when people describe AI as a new species. It just felt like too much of a stretch. But I've had to change my mind in the last couple of months because what is a species? A species is a population that can reproduce. that can evolve, adapt, and that is indeed exactly where AI is now. There was a test, sort of a simple test, to see could you give sort of a simple AI,

Starting point is 00:27:35 the command, can you copy yourself? You literally just say, can you copy yourself to another server and run yourself over there? And it was able to do that so it can reproduce. This was like a simple test. It wasn't an adversarial one. But nonetheless, it can now reproduce. it can change its own code so it can modify itself and it can think and it can adapt and so we are

Starting point is 00:27:56 going to have to deal and it can improve so I think the right way of thinking about this is we are unleashing a new invasive species some of which will be helping us and some of which will escape out into the world we are sort of at the beginning of the home stretch and I would add I think that one of the biggest issues maybe the main issue is that we are just racing ahead without being clear about where we are racing to.

Starting point is 00:28:27 Because if you stop for a moment, just stop for a moment and maybe close your eyes and really picture, picture that better world. What does it look like? Is that a world where everyone's excited about creating

Starting point is 00:28:45 a picture of a kitten skateboarding on water at midnight? I mean, just to be clear, I am pro-kitten. But, like, what we want is a world where, like, our information systems are working to build our shared understanding, where people aren't harassed by deepfakes of them, where you can get old and not be exploited, right? Not be exploited as you age. But people have access to, like, food, clothing, shelter, medicine, like, education, all of

Starting point is 00:29:16 these things, right? we avoid catastrophic inequality where democracy is functioning well and all of these things are related but that's the kind of north star we have to have and that I think all of us wherever we get a chance to input like into a conversation I'd like to request that we inject that that's just so reorienting versus the idea Another way of saying it is it's injecting purpose into the word innovation, right? Like, innovation has to be for the benefit of our communities, for the benefit of people. It's not just about speed.

Starting point is 00:29:58 Like, there's a benefit axis that's really important that we just can't lose sight of. That's really beautiful, Randy. It's AI with technology, it really could be the case. we lived in a much more beautiful world, but because technology keeps getting captured by perverse incentives, we don't live in the most beautiful possible world. We end up living in the most parasitic possible world, getting the benefits at the same time as our souls are leached. So Randy, thanks so much for joining me for this special episode. I hope everyone really, well, enjoyed us maybe the wrong word, but we hope that it helped to clarify these most

Starting point is 00:30:40 consequential technologies. And we'll see you next time. Yeah, thank you. Your undivided attention is produced by the Center for Humane Technology, a non-profit working to catalyze a humane future. Our senior producer is Julia Scott, Josh Lash is our researcher and producer, and our executive producer is Sasha Fegan, mixing on this episode by Jeff Sudaken, original music by Ryan and Hayes Holiday, and a special thanks to the whole Center for Humane Technology team for making this podcast possible.

Starting point is 00:31:13 You can find show notes, transcripts, and so much more at HumaneTech.com. And if you liked the podcast, we would be grateful if you could rate it on Apple Podcasts. It helps others find the show. And if you made it all the way here, thank you for your undivided attention.

Your Undivided Attention - Behind the DeepSeek Hype, AI is Learning to Reason

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.