Better Offline - What The Hell Is DeepSeek?

Starting point is 00:00:00 This is an IHeart podcast. Guaranteed human. Run a business and not thinking about podcasting. Think again. More Americans listen to podcasts than ads supported streaming music from Spotify and Pandora. And as the number one podcaster, IHearts twice as large as the next two combined.

Starting point is 00:00:15 Learn how podcasting can help your business. Call 844-844-I-Hart. Another podcast from some SNL late-night comedy guy, not quite. Unhumor me with Robert Smygel and friends. Me and hilarious guests from Bob Odenkirk to David Letterman help make you funnier. This week, my guest, S&L's Mikey Day and headwriter, Streeter Seidel,

Starting point is 00:00:35 help an a cappella band with their between songs banter. Where does your group perform? We do some retirement homes. Those people are starving for banter. Listen to humor me with Robert Smigel and friends on the I-Heart Radio app, Apple Podcasts, or wherever you get your podcasts. A win is a win. A win is a win.

Starting point is 00:00:52 I don't care which I'm saying. Yep, that's me. Clifford Taylor the 4th. You might have seen the skits, my basketball and college. college football journey or my career in sports media. Well, now I'm bringing all of that excitement to my brand new podcast, The Clifford Show. This is a place for raw unfilled conversations with athletes, creators, and voices that not only deserve to be heard, but celebrated.

Starting point is 00:01:14 So let's get to it. Listen to The Clifford Show on the IHeard Radio app, Apple Podcast, or wherever you get your podcast. And for more behind the scenes, follow at Clifford and at TikTok podcast network on TikTok. I'm Michelle McPhee, and I've been unraveling the strangest, Alliance I've ever reported on, a Mormon polygamist and an Armenian businessman. Multimillion dollar house, Ferraris and Lamborghinis, private jets, a billion dollar fraud. But how long can this alliance last? Tell me what you know. Is somebody coming after me?

Starting point is 00:01:48 Listen to Kingdom of Fraud on the Aihar Radio app, Apple Podcasts, or wherever you get your podcasts. Cool Zone Media. Hello and welcome to Better Offline. I'm your host Ed Zittron. A lot of you have been getting in touch. Yes, you're getting your Deep Seek episode. In fact, this is the first of a two-parter. This will come out on Friday,

Starting point is 00:02:23 which is when you're listening to this, and then it'll follow up on Monday. I apologize. I spent a lot of Monday writing this and also learning about a lot of this stuff in an attempt to distill it as best I could. This situation is extremely weird, and it's developing. And I think even when I put out this episode, there will be new parts of it that I have yet to really get to.

Starting point is 00:02:45 I will do my absolute best to explain in these episodes both what is happening with Deepseek, what it means, what they've built, and what it's going to do in the future. But let's begin. So as January came to a close, the entire generative AI industry found itself in a kind of chaos. In short, the recent AI bubble, and in particular the hundreds of billions of dollars being spent on it, hinged on this big idea. that we need bigger models, which are both trained and run on bigger and even larger GPUs, almost entirely sold by Nvidia. And in turn, they're based in bigger and bigger data centers, owned by companies like Microsoft, Oracle, Amazon, and Google.

Starting point is 00:03:24 Now, there was also this expectation that this would always be the case. Hubris within this industry is kind of part of the whole deal. And generative AI was always meant to be this way, at least for the American developers. It was always meant to be energy and compute hunting. throwing entire zoos worth of animals and boiling lakes was necessary to do this. There was never any other way to do it. And I thought, at least I've thought for a while, that this was because they just, they tried to make them more efficient, but they couldn't. There was just something about transformer-based architecture, like the stuff that underpins chat GPT, so the GPT model under chat GPT either.

Starting point is 00:04:02 It wasn't the case, though. A Chinese artificial intelligence company that few people had really heard of called DeepSeek came a along a few weeks ago, with multiple models that aren't merely competitive with open AIs, but actually undercut them in several meaningful ways. DeepSeaks models are both open source, which means that their source code and research is public, and they're significantly more efficient as well. As much as 30 times cheaper to run in the case of their reasoning model R1, which is competitive with open AIs 01, and 50 or more times more efficient than GPT4O.

Starting point is 00:04:37 it's actually kind of crazy when you think about it. And as you're going to hear, this whole thing has jocified me all over again. And what's crazy is that some of them can be distilled, which I'll get to later, and run on local devices like a laptop. It's kind of crazy. And as a result, the markets have kind of panicked because the entire narrative of the AI bubble has been that these models have to be expensive because they are the future. And that's why hyperscalers had to burn $200 billion in capital expenditures for infrastructure to support this wonderful boom, and specifically the ideas of Open AI and Anthropic. The idea that there was another way to do this, that in fact we didn't need to spend all

Starting point is 00:05:16 this money and that maybe we could find a more efficient way of doing it, well, that would require them to have another idea other than throw as much money at the problem as possible. Yeah, they just didn't consider it, it turns out. And now, how long has come, this outsider that's upended the whole conventional understanding and perhaps even dethroned a member of America's tech royalty. Sam Altman, a man who has crafted, if not a cult of personality, some sort of public image of an unassailable visionary that will lead the vanguard and the biggest technological change since the internet.

Starting point is 00:05:48 Yeah, he's wrong. He never was doing that. I've been saying it for a while. He's never been doing this. But Deep Seek isn't just an outsider. No, there are a company that's emerged as a side project from a tiny, tiny Chinese hedge fund, at least, by the standards of hedge funds like $5.5 billion on assets under management, and their founding team has nowhere near the level of fame and celebrity, or even the accolades of Sam Altman.

Starting point is 00:06:14 It's distinctly humiliating for everyone involved that isn't Deepseek. And on top of all of that, DeepSeek's biggest, ugliest insult, is that its model, DeepSeek R1 is competitive, like I said, with OpenAI's incredibly expensive O1 reasoning model, yet significantly, and I mean 96% cheaper to run. And it can even be run locally, like I said. Speaking to a few developers, I know, one was able to run DeepSeek's R1 model

Starting point is 00:06:39 on their 2021 MacBook Pro with an M1 chip. That is a four-year-old computer. Not a 30,000 GPU in sight. It's kind of crazy. Worse still, Deep Seek's models are made freely available to use with the source code published under the MIT Tech license, along with the research on how they were made, although not the training data,

Starting point is 00:06:59 which makes some people say it's not really open source, but for the sake of argument, I'm just going to say open source. And this means, by the way, that Deep Seeks models can be adapted and used for commercial use without the need for royalties or fees. Anyone can take this and build their own. It's kind of crazy. By contrast, Open AI is anything but open, and its last LLM to be released under the MIT license was 2019's GPT2.

Starting point is 00:07:24 No, no, wait, shit. Let me correct that. Deep Seek's biggest ugliest secret is actually that it's obviously taking aim at every element of Open AIsport. As the company was already dominating headlines this week, it quietly dropped its Janus Pro 7B Image Generation and Analysis Model, which the company says outperforms both stable diffusion and OpenAI's Darleaf 3. And those are, by the way, image generation things. So you type in something like Garfield with boobs and then out comes a Garfield with juicy cans. And that's probably the first time you hear that on the podcast, but probably not the last. And as with its other code, Deepseek has made this freely.

Starting point is 00:08:02 available to both commercial and personal users alike, whereas Open AI is largely paywall Dali 3. This is really, it's a truly crazy situation, and it's also this cynical, vulgar version of David and Goliath, where a tech startup backed by a shadowy Chinese hedge fund, with $8 billion under management, is somehow the plucky upstart against the lumbering lossy-o-fish $150 billion startup backed by multiple public tech companies with a market capitalization of over $3 trillion. dollars. I realize, by the way I said earlier, $5.5 billion under management, this is why you check your notes in advance, but I'm not cutting it. This is fresh. I am inside a closet in New York. The content must flow. Anyway, DeepSeaks V3 model, which is comparable and competitive with both open

Starting point is 00:08:48 AI's GPT40 and Anthropics Claude Sonnet 3.5 models, which, by the way, has some reasoning features. Like I said, it's 53 times cheaper to run the R1 when using the company's own cloud. services. And as mentioned earlier, said model is effectively free for anyone to use, locally or on their own cloud instances, and can be taken by any commercial enterprise and turned into a product of their own, should they desire, to say compete with OpenAI, the loudest and most annoying startup of all time. In essence, DeepSeek, and I'll get into its background and the concerns people might have about its Chinese origins, release two models that perform competitively and even beat models from both Open AI and Anthropic,

Starting point is 00:09:28 undercut them in price, and then made them open, undermining not just the economics of the biggest generative AI companies, but laying bare exactly how they work. The magic's gone. There's no more voodoo inside Sam or Man's soul. It's all out there. And the last point is extremely important when it comes to Open AI's reasoning model, which specifically hid its chain of thought for fear of these unsafe thoughts that might manipulate the customer. And then they added slightly under their breath that the actual reasons they did it was a competitive advice. Now, to explain what that means, when you make a request with OpenAIs 01 model, say, give me all the states with the letter R in them, it actually shows you like the thinking.

Starting point is 00:10:10 And by the way, these things don't fucking think. They're computer bullshit. Like, they don't think at all. But I'm going to use it just for this. So you see it say, okay, here are all the American states. Which ones have that letter? I'm checking all of those. It's effectively having a large language model check a large language model.

Starting point is 00:10:26 Now, the thing is, the steps they were showing you were all cleaned up. They would look nice, they would be formatted nicely. Deep Seek's chain of thought is completely laid bare, which is very interesting because it really takes the wind out of Open AI's sales. And on top of that, it allows you to see actually how these things think through things. Again, not really thinking. But still, you can see things about how large language models work that these companies didn't want you to have.

Starting point is 00:10:54 On top of this, Open AI's O1 model has something even shittier to it, which is these chain of thought things all cost money. When you see it generate these thoughts, it's actually generating more thoughts than you see, because they're hiding the chain of thought. So Open AI is just charging you an indeterminate amount of money, an insane amount of money as I'll get to later, but nevertheless, you don't know what you're being charged for. You don't even know what's really going on under the hood. Or you could use deepseek. And let's be completely clear. by the way. OpenAI's literal only competitive advantage against Meta and Anthropic was its reasoning models, O1 and O3, and O3, by the way, is currently in a research preview and is mostly just more

Starting point is 00:11:35 of the same. Although I mentioned earlier in the show that Anthropics Claude Sonnet 3.5 has some reasoning features, they're comparatively more rudimentary than those in O1 and O3, and I'd argue R1, which is Deepseek's model. In an AI context, reasoning works by breaking down a prompt into a series of different steps with considerations of different approaches. Like I said earlier, effectively a large language model checking its own homework with no thinking involved because, like I said, they do not think or know things. An open AI rushed to launch its O1 reasoning model last year because, and I quote Fortune from last October, Sam Wormann was eager to prove to potential investors that in the company's latest funding round, the Open AI remains at the forefront of AI development. And, as I've noted in my

Starting point is 00:12:20 newsletter at the time. It was not particularly reliable, failing to accurately count the number of times the letter R appeared in the word strawberry, which was the codename 401. Very funny stuff. At this point, it's fairly obvious that OpenAI wasn't anywhere near the forefront of AI development, and now that its competitive advantage is effectively gone, there are genuine doubts about what comes next for the company. As I'll go into, there are many questionable parts of Deepseek story. It's funding, what GPUs it has, and how much it actually spent training these models. But what we definitively understand to be true is bad news for Open AI.

Starting point is 00:12:56 And I would argue every other large U.S. tech firm that's jumped onto the generative AI bandwagon in the past few years. Another podcast from some SNL late-night comedy guy, not quite. Unhumor me with Robert Smygel and friends. Me and hilarious guests from Jim Gaffigan to Bob Odenkirk to David Letterman

Starting point is 00:13:20 help make you funnier. This week, my guest, SNL's Mikey Day and head writer, Streeter Seidel, help an a cappella band with their between songs banter. There's that worst singer in the group? The worst? Yeah. Me. Is there anything to the idea

Starting point is 00:13:35 that because you're from Harvard, you only got in because your parents made a huge donation. The yard birds, right? That's the name. The Harvard yard, but they're open. Do you have a name suggestion? We're open. Since you guys are middle age, one erection.

Starting point is 00:13:55 Listen to humor. with Robert Smygel and Friends on the IHeart Radio app, Apple Podcasts, or wherever you get your podcast. Hum me. I need some jokes to make me seem funny. Run a business and not thinking about podcasting, think again. More Americans listen to podcasts than ads supported streaming music from Spotify and Pandora. And as the number one podcaster, IHearts twice as large as the next two combined. So whatever your customers listen to, they'll hear your message.

Starting point is 00:14:23 Plus only IHeart can extend your message to audiences across broadcast radio. Think podcasting can help your business. Think IHeart. Streaming, radio, and podcasting. Call 844-844-I-Hart to get started. That's 844-844-I-Hart. Imagine an Olympics where doping is not only legal but encouraged. It's the enhanced games.

Starting point is 00:14:45 Some call it grotesque. Others say it's unleashing human potential. Either way, the podcast's superhuman documented it all, embedded in the games and with the athletes for a full year. Within probably 10 days, I'd put on 10 pounds. I was having trouble stopping the muscle growth. Listen to Superhuman on the I-Hard Radio app, Apple Podcasts, or wherever you get your podcasts.

Starting point is 00:15:09 A win is a win. A win is a win. I don't care which I'm saying. Yep, that's me, Cliver Taylor the 4th. You might have seen the skits, the reactions, my journey from basketball to college football, or my career in sports media. Well, somewhere along the way, this platform became bigger than I ever imagined.

Starting point is 00:15:26 And now I'm bringing all of that excitement to my brand new podcast, The Clifford Show. This is a place for raw, unfiltered conversations with some of your favorite athletes, creators, and voices that not only deserve to be heard, but celebrated. One week, I'll take you behind the scenes of the biggest moments in sports and entertainment, and the next we'll talk about life, mental health, purpose, and even music. The Clifford Show isn't just a podcast. It's a space for honest conversations, stories that don't always get told, and for people who are chasing something bigger.

Starting point is 00:15:55 So, if you've ever supported me, or you're just chasing down a dream, this is right what you need to be. Listen to The Clifford Show on the IHeart Radio app, Apple Podcasts, or wherever you get your podcast. And for more behind the scenes, follow at Clifford and at TikTok podcast network on TikTok. Deep Seeks models actually exist. They work, at least by the standards of hallucination pro NELO limbs that don't, at the risk of repeating myself, know anything. They've been independently verified to be competitive in performance, and their magnitude's cheaper in price than those from both hyperscalers, Google's Gemini, Metz-Lama, Amazon Q, and so on and so forth, and from those released by Open AI and Anthropic. DeepSeaks models don't require massive new data centers.

Starting point is 00:16:40 They run on GPUs currently used to run services like ChatGPT, and even work on more austere hardware, nor do they require an endless supply of bigger, faster Nvidia GPUs every single year to progress. The entire AI bubble was inflated based on the premise that these models were simply impossible to build without burning massive amounts of cash, straining the power grid and blowing past emission skulls, and that these costs were both necessary and really good because they'd lead to creating powerful AI, something that's yet to happen. And it's kind of obvious at this point that wasn't true. Now the markets are sitting around, they're asking a very reasonable question. Shit, did we just waste $200 billion?

Starting point is 00:17:23 Anyway, let's get into the nitty-gritty. What is DeepSeek? First of all, if you want a super deep dive into what it is, I can't recommend venture beats write up enough. I'll link to it in the show notes, as I usually do. It's really good, and it goes into a lot more detail than I will. But here's the too long didn't read for you. Deepseek is a spinoff from a Chinese hedge fund called High Flyer Quant.

Starting point is 00:17:44 It's a relatively small and young company, and from its inception, it went big on algorithm. an AI-driven trading. Later, it started building its own standalone chatbox, including a chat GPT equivalent for the Chinese market. This is what we know right now. I'm sure some of you will say, oh, well, who knows if that's really true?

Starting point is 00:18:01 Sure, I think that that's fair. I also think that there are parts of Sam Altman's legend that we should question as well. I think the circumstances under which Sam Ormuntman got made head of Y Combinator are extremely questionable. I'm saying you can question Deepseek, and indeed you should. We should be more critical of these powerful companies.

Starting point is 00:18:18 But don't do it halfway. If we're going to be worried, let's be worried about everyone. Now, Deepseek did a few things differently, like open sourcing its models, although it likely built upon tech from other companies like Metaslama and the ML library Pi Torch. To train its models, it secured over 10,000 Nvidia GPUs right before the US imposed export restrictions, which sounds like a lot, but it's a fraction of what the big AI labs like Google, OpenAI Anthropic have to play with. I think I've heard estimates of like 100,000.

Starting point is 00:18:48 to 300,000 each, if not more. Now, you've likely seen or heard that Deep Seek trained its latest model for $5.6 million, as opposed to the insane amounts that I'll get to later. And I want to be clear that any and all mentions of this number are estimates. In fact, the provenance of the $5.58 million number appears to be a citation of a post made by an Nvidia engineer in an article from the South China Morning Post, which links to another article from the South China Morning Post, which simply states that DeepSeek V3 comes with 671 billion parameters

Starting point is 00:19:21 and was trained in around two months at the cost of $5.58 million, with no additional citations of any kind. So you should take it with a pinch of salt, but it's not totally ludicrous. While there are some that have estimated the cost, Deep Seek's V3 model was allegedly trained using 2048 NVDA H800 800 GPUs according to its paper, and Ben Thompson of Stratory has made this clear

Starting point is 00:19:45 that the $5.5 million number only covers the literal training cost of the official training run. And this is made fairly clear in the paper, by the way, of V3. And that's the one that's competitive with opening eyes GPT4O model, meaning that any costs related to prior researcher experiments on how to build the bottle were left out. Now, big shout out to Minimax here, the guy on Blue Sky and Twitter, he's great. He is wonderful and also added that this is fairly standard for the industry. Again, you choose how you feel about this, but I want to give you the information. And while it's safe to say that DeepSeek's models are cheaper to train, the actual costs,

Starting point is 00:20:20 especially as Deepseek doesn't share its training data, which some might argue means its models are not really open source, as I said, the numbers get a little harder to guess at. Thomson notes that Deepseek had to craft a bunch of elegant workarounds to make the model perform, including writing code that ultimately changed how GPUs actually communicated with each other. This functionality isn't otherwise possible using NVIDIA's developer tools. They really had to get in there. It's kind of cool. Deepseek's models, V3 and R1, are more efficient and as a result cheaper to run and can be accessed via its API at prices that are astronomically cheaper than Open AIs. DeepSeek chat running Deepseek's GPT40 competitive V3 model costs 0.07 per 1 million input tokens as in commands given to the model and $1.110 per 1 million output tokens, as in the resulting output from the model.

Starting point is 00:21:11 I know that these numbers kind of like just sound like numbers, like maybe you don't have context. So let me give you some. This is a dramatic price drop from the $2.50 per 1 million input tokens and $10 per 1 million output tokens, the OpenAI charges for GPT40. This isn't just undercutting. This is a bunker buster. Now, there is a side that I'll kind of get into a little bit later in that you are using models hosted in a country that you don't know, probably China, there are data concerns.

Starting point is 00:21:45 But again, you can put this on your own server. You could put this in Google Cloud. Both Microsoft and Google are apparently thinking about it. Now, the information reported that Google had added it to Google Cloud, no, they did not. They didn't do that. They allowed you to connect Hugging Face. This is a whole bunch of technical stuff that if you understand, you're like, yeah, right, I know.

Starting point is 00:22:03 Long story sure, the hyperscalers are already bringing Deepseek out. And I'll get to why that's bad. later in detail, but it's also very funny. Now, here's something else that's funny. Deep Seek Reasoner, its reasoning model, costs that 55 cents per 1 million input tokens and $2.19 per 1 million output tokens. Now, that sounds expensive, maybe it is, whatever. That's goddamn nothing compared to the $15 per 1 million input tokens

Starting point is 00:22:32 and $60 per 1 million output tokens of Open AI. Woof. If I'm Sam Altman, I'm shitting myself. But there's a... obvious about here. We do not know where DeepSeek is hosting its models. Who has access to that data or where that data is coming from or going to? We don't know who funds Deepseek, other than it's connected to High Flyer, the hedge fund that I mentioned earlier that it's split from in 2023. There are concerns that Deep Seek could be state funded and that Deep Seek's

Starting point is 00:22:58 low prices are a kind of geopolitical weapon, breaking the back of the generative AI industry in America. I'm not really sure whether that's the case or not. It's certainly true that China has long treated AI as a strategic part of its national industrial policy and is reported to help companies in sectors where it wants to catch up with the Western world. The Made in China 2025 initiative saw a reported hundreds of billions of dollars provided to Chinese firms working in industries like chip making, aviation and yeah, AI. The extent of that support isn't exactly transparent, surprise, and so it's not entirely out of the realm of possibility that DeepSeek is also the recipient of state aid. The good news is that we're going to find out fairly quickly. America,

Starting point is 00:23:38 Open AI infrastructure company GROC is already bringing DeepSeek's model online, meaning that we'll get at least a very, some sort of confirmation of whether these prices are realistic, or whether they're heavily subsidized by whoever it is that backs DeepSeek. It's also true that Deepseek is owned in part by a hedge fund, which likely isn't short of cash to pump into them. But as an aside, given the OpenAI is the benefactor of billions of dollars of cloud compute credits and gets reduced pricing for Microsoft's Azure cloud services to run, it's actual models, it's a bit tough for them to complain about a rival being subsidized by a larger entity with the ability to absorb the costs of doing business, should that be the case. Same goes for Anthropic, by the way. And yes, I know Microsoft isn't a state, but with a market cap of $3.2 trillion and quarterly revenues larger than the combined GDPs of some EU and NATO nations, it's kind of the next best thing. But I digress. Whatever concerns there may be

Starting point is 00:24:32 about malign Chinese influence are bordering on irrelevant, outside of the low prices, of course, offered by Deepseek itself. And even that is speculative at this point. Once these models are hosted elsewhere and once Deep Seek's methods, which I'll get to in a little bit, are recreated, and by the way, that's not really going to take very long, I believe we're going to see that these prices are indicative of how cheap these models are to run. Another podcast from some SNL, late-night comedy guide, not quite.

Starting point is 00:25:07 Unhumor me with Robert Smygel and friends, me and hilarious guests from Jim Gaffigan to Bob Odin Kirk, to David Letterman, help make you funnier. This week, my guest, SNL's Mikey Day and headwriters, Streeter Seidel, help an Acapella band with their between songs banter. There's that worst singer in the group? The worst? Yeah. Me.

Starting point is 00:25:27 Is there anything to the idea that because you're from Harvard, you only got in because your parents made a huge donation. The group. The yard herds, right? That's the name. The Harvard Yard. But they're open. Do you have a name suggestion?

Starting point is 00:25:42 We're open. Since you guys are middle aged. One erection Listen to humor me With Robert Smigel and Friends On the IHeart Radio app Apple Podcasts Or wherever you get your podcast

Starting point is 00:25:56 Humor me I need some jokes To make me seem funny Run a business and not thinking about podcasting Think again More Americans listen to podcasts Than ads supported streaming music From Spotify and Pandora

Starting point is 00:26:10 And as the number one podcaster IHearts twice as large As the next two combined So whatever your customers listen to, they'll hear your message. Plus only IHeart can extend your message to audiences across broadcast radio. Think podcasting can help your business.

Starting point is 00:26:24 Think IHeart. Streaming, radio, and podcasting. Let us show you at iHeartadvertising.com. That's iHeartadvertising.com. Imagine an Olympics where doping is not only legal but encouraged. It's the enhanced games. Some call it grotesque.

Starting point is 00:26:40 Others say it's unleashing human potential. Either way, the podcast's superhuman documented at all. embedded in the games and with the athletes for a full year. Within probably 10 days, I'd put on 10 pounds. I was having trouble stopping the muscle growth. Listen to Superhuman on the I-Hard Radio app, Apple Podcasts, or wherever you get your podcasts. A win is a win.

Starting point is 00:27:03 A win is a win. I don't care which I'm saying. Yep, that's me, Cliver Taylor the 4th. You might have seen the skits, the reactions, my journey from basketball to college football, or my career in sports media. Well, somewhere along the way, this platform became bigger than I ever imagined. And now I'm bringing all of that excitement to my brand new podcast, The Clifford Show.

Starting point is 00:27:24 This is a place for raw, unfiltered conversations with some of your favorite athletes, creators, and voices that not only deserve to be heard, but celebrated. One week, I'll take you behind the scenes of the biggest moments in sports and entertainment, and the next we'll talk about life, mental health, purpose, and even music. The Clifford Show isn't just a podcast. It's a space for honest conversations, stories that don't always get told, and for people who are chasing something bigger. So, if you've ever supported me, or you're just chasing down a dream, this is right what you

Starting point is 00:27:53 need to be. Listen to The Clifford Show on the IHeart Radio app, Apple Podcasts, or wherever you get your podcast. And for more behind the scenes, follow at Clifford and at TikTok Podcast Network on TikTok. So you might be wondering, how hell is this so much cheaper? And that's a bloody good question. And because I'm me, I have a hypothesis. I do not believe that the company's making these foundations models such as Open Air and Anthropic have actually been incentivized to do more with less. And because their chummy little relationships with hyperscalers like Amazon, Google and Microsoft were focused almost entirely on making the biggest, most hugest models possible, using the biggest, even hugererous chips. And because the absence of profitability didn't stop them for raising

Starting point is 00:28:38 more money, well, they've never had to be fucking efficient, have they? They've never had to try. Maybe they should buy less avocado fucking toast anyway. Let me put it in simpler terms. Imagine living on $1,500 a month, and then imagine how you'd live on $150,000 a month, and that you have to, like Brewster's millions, spend as much of it as you can to complete a mission, a very simple mission. Live. In the former example, you concern survival. You have a limited amount of money and must make it go as far as possible with real sacrifices to be made with every dollar you spend. If you want to have fun, you're going to have to eat less potentially, all the food you eat will have to be cheaper. You have to live on a budget. You have to live on a budget.

Starting point is 00:29:19 have to make decisions and indeed you might learn to cook at home. You might walk more. You might do things that will help you not spend all your money. In the latter example, when you have $150,000 a month that you must spend, you're incentivized the splurge to lean into excess, to pursue this vague idea of living your life. Your actions are dictated not by any existential threats or indeed any kind of future planning, but by whatever you perceive to be an opportunity to live. Open AI and Anthropic are emblematic of what happens when survival takes a back seat to living. They have been incentivized by frothy venture capital and public markets desperate for the next big thing, the next big growth, to build bigger models and sell even bigger dreams like Dario

Starting point is 00:30:03 Amaday of Anthropics saying that your AI and I quote could surpass almost all human beings at almost everything shortly after 20s 27 and I just want to take a fucking second. Journalists, if you're listening to this, stop fucking quoting this bullshit. Stop it. You're doing nothing. You are failing at your goddamn job. Every single time you quote this bullshit, this nonsense. Shortly after 2027, what the fuck does that mean?

Starting point is 00:30:28 2028, 2029, 2030? What does surpassing humans and almost everything even mean? This shit doesn't work. This shit is not good. Oh my God. Anyway, back to the podcast, Ed, calm down. Both open AI and Anthropic have effectively lived their existence with the infinite money cheat from the Sims. And I know some of you might say, by the way, it's not an infant money,

Starting point is 00:30:50 just add, you go into the console, you get my point. And both companies have been bleeding billions of dollars a year after revenue. And that's, by the way, making billions of dollars and then still losing billions is insane. And they still operated as if money would never run out. Because it, and I wouldn't. If they were actually worried about that happening, they would have certainly tried to do what Deepseek has done, except they didn't have to, because both of them had the endless cash and access to GPUs from either Microsoft, Amazon or Google. And the Stargate thing is just, I will mention it later, just long story short, they're not going to put $500 billion into the, it was up to $500 billion.

Starting point is 00:31:27 I'm so tired of this shit. Open AI and Anthropic have never been made to sweat, unlike me in this closet where I'm recording this. And they've received endless amount of free marketing from a tech and business media happy to print whatever vapid bullshit they spout. And it's just very frustrating. They've raised money at will with an Anthropic, by the way, is currently raising another $2 billion, valuing the company at $60 billion.

Starting point is 00:31:50 And this was, I think, happening while Deep Z-C was going on, which is really funny. And they've done all of this off of a narrative of the, we need more money than any company has ever needed ever, because the things we're doing have to cost this much. There is no other way. You must give us more money. My name is Sam Altman. I need more money than has ever been made from my huge, beautiful company. that sucks and needs money to train it. Help me, please. My big, beautiful, sick company is dying,

Starting point is 00:32:19 but the best and most important company of all time. It's also normal. Now, do I think that they were aware that there were methods to make their models more efficient? Sure. OpenAI tried and failed in 2023 to deliver a more efficient model to Microsoft called... Aracus. I'm sure there are teams at both Anthropic and Open AI that are specifically dedicated to making things kind of more efficient, but they didn't have to do it, and so they didn't. And as I've written before in my newsletter and argued on this very podcast, OpenAI simply burns money and have been allowed to burn money, and up until recently likely would have been allowed to burn even more money, because everybody, all of the American model developers,

Starting point is 00:33:01 appeared to agree that the only way to develop large language models was to make them as big as humanely possible, and work out troublesome stuff like making them profitable or turning them into a useful thing, later, which is I pursue. when AGI happens, a thing that they're still in the process of defining, let alone doing. DeepSeek, on the other hand, had to work out a way to make its own large language models within the constraints of the hamstrung Nvidia chips that can be legally sold to China. While there's a whole cottaged industry of selling chips in China using resellers and other parties to get restricted silicon into the country, the entire way in which DeepSeek went about

Starting point is 00:33:37 developing its models suggests that it was working around very specific memory bandwidth constraints, meaning that the amount of data that can be fed into it and out of it and into the chips. In essence, doing more with less wasn't something it shows, but it's something they had to do. I've touched already on the technical how of these models in greater depth, and you can really read in that in my newsletter. And you can go to where's your ad, not at, it's at the end of the episode, but I'll also have show notes to articles like Ben Thompson's from Stratitory, because there are lots of things to read here. I know there are some really technical listeners, and I'm sure you're going to flay me in my emails.

Starting point is 00:34:11 Please go and read it. I'm not wrong. I've checked with a lot of people too. And by the way, all of this austerity stuff seems to have worked. There's also the training data situation and another mayor-culp. I previously discussed the concept of model collapse and how feeding synthetic data, which is training data created by a generative model into another model,

Starting point is 00:34:32 can end up teaching it bad habits, which in turn would destroy the model. But it seems that deep seekers succeeded in training its models using generative data. Specifically though, and I'm quoting Geekwires John Tiro, like mathematics where correctness is unambiguous, and using, and I quote again, highly efficient reward functions that could identify which new training examples would actually improve the model, avoiding wasted compute on redundant data. And it seems to have worked. Though model collapse may still be a possibility, this approach, extremely precise use of synthetic data,

Starting point is 00:35:01 is in line with some of the defenses against model collapse I've heard from LLM developers I've talked to. This is also a situation where we don't know the, exact training data, and it doesn't negate any of the previous points I've made about model collapse. Now, we'll see what happens there, but synthetic data might work where the output is something that you could figure out using a calculator, but when you get into anything a bit more fuzzy, like written text or anything with an element of analysis, you'll likely encounter some unhappy side effects. But I don't know if that's really going to change how good these things are. There's also a little scuttlebut about where Deepseek got its data.

Starting point is 00:35:36 Ben Thompson at Stratiress suggests that Deepseek's models are potentially distilling other models outputs, by which I mean having another model, say, Metas Lama or OpenAIs GPD 4-O, which is why Deepseek identified itself as chat GPT at one point, spit out outputs specifically to train parts of Deepseek. This obviously violates the terms of service of these tools as Open AI and its rivals would much rather have you not use its technology to create its next rival. And OpenAI, by the way, has recently, reportedly found evidence that Deepseek used Open AIS models to train its rival. and this is from the Financial Times,

Starting point is 00:36:12 although it failed to make any formal allegations, but it did say that using CHETGPT to train a competing model violates its terms of service, and David Sacks, the investor in Trump administration, AI and CryptoZar, says it's possible that this occurred. Although he failed to provide evidence, I just want to say how fucking funny it is. The Open AI is going,

Starting point is 00:36:31 whey! Where? You're stealing my stuff! Don't steal my things! We're fucking coward, pansy bastard, bitches. fucking hell. What a bunch of whiny babies. Oh no, my plagiarism machine got plagiarized. Where? Kiss my entire asshole, Sam Altman, you little worm. You fucking embarrassment to Silicon Valley. You should be ashamed of yourself for many reasons. But so much this though. Where? Oh no, you starve from you. My plagiarism machine that requires me to steal from literally every artist and author on the internet. The thing where we went on YouTube and transcribed everything and fed it into the the machine. That's not stealing, that's good, but you using our model to generate answers.

Starting point is 00:37:15 That's just not fair. What a bunch of babies. You guys, Sam almost worth billions of dollars. He has a $5 million car. Cry more, your little worm. Personally, I genuinely want Open AI to point a finger at Deepseek and accuse it of IP theft, mostly for the yucks, but also for the hypocrisy factor. This is a company that, as I've just very cleanly said, exists purely from the wholesale industrial larceny of content produced by literally a fucking everyone. And now they're crying, I'm Sam Altman, I'm a big baby, I've filled my diaper because someone stole from my plagiarism machine. Kiss my ass. Kiss my ass. These companies haven't got shit. Open AI doesn't have shit. They don't have anything. They don't have a next product. Without reasoning, they haven't got anything.

Starting point is 00:38:03 And now they don't have that disgusting justification, that overspendings. the fat, ugly American startup culture of spending as much as you can to build America's next top monopoly. They should be fucking ashamed of themselves. They shouldn't be billionaires. They should be poverty-stricken. They should have to pay everyone they stole for. And it's just, it sickens me seeing the reaction from some people on this, seeing the cynophobia, but seeing this level of defensiveness of a company like OpenAI or Anthropic. And as I'll get into next episode, we are really running out of time here. And I think DeepSeek is really, I think it could be really the end of days for these companies. I don't know how much they've got left, time-wise or even

Starting point is 00:38:48 money-wise, and I'm not sure how they even raise money. But in the next episode, I'm going to deep dive into DeepSeek, and I'll tell you how they sent the US tech market into a panic and what it actually means for the feature of Open AI Anthropic and the hyperscalers backing them. This has been a crazy few days. I hope that. This has helped. And on Monday, you'll find out more. Thank you so much for listening. The support I've got for the show has been incredible.

Starting point is 00:39:13 And the emails I've got about Deep Seek. I've been trying. Okay? I've really been trying. It's the fastest I could do it. But I'm so happy to do this show, and I'm so grateful for all of you. Thank you for listening to Better Offline.

Starting point is 00:39:34 The editor and composer of the Better Offline theme song is Mattersowski. You can check out more of his music and audio projects at Mattisowski.com. M-A-T-T-O-S-K-I-com. You can email me at E-Z at Better Offline.com or visit Better Offline.com to find more podcast links and, of course, my newsletter. I also really recommend you go to chat. Where's Your Ed dot at to visit the Discord and go to R-S-Better-O-Line to check out our Reddit. Thank you so much for listening. Better Offline is a production of Cool Zone Media.

Starting point is 00:40:07 For more from Cool Zone Media, visit our website, CoolzoneMedia.com, or check you. Check us out on the IHeart Radio app, Apple Podcasts, or wherever you get your podcast. Another podcast from some SNL, late-night comedy guy, not quite. Unhumor me with Robert Smygel and friends. Me and hilarious guests from Bob Odenkirk to David Letterman help make you funnier. This week, my guest, SNL's Mikey Day and head writer, Streeter Seidel, help an a cappella band with their between songs banter. Where does your group perform? We do some retirement homes.

Starting point is 00:41:00 Those people are starving for banter. Listen to humor me with Robert Smigel and. friends on the IHeart radio app, Apple Podcasts, or wherever you get your podcasts. A win is a win. A win is a win. I don't care what you're saying. Yep, that's me, Clivert Taylor the 4th. You might have seen the skits, my basketball and college football journey, or my career

Starting point is 00:41:20 in sports media. Well, now I'm bringing all of that excitement to my brand new podcast, The Clifers Show. This is a place for raw, unfills of conversations with athletes, creators, and voices that not only deserve to be heard, but celebrated. So let's get to it. Listen to The Clifford Show on the IHeart Radio app, Apple Podcasts, or wherever you get your podcast. And for more behind the scenes, follow at Clifford and at TikTok podcast network on TikTok. I'm Michelle McPhee, and I've been unraveling the strangest criminal alliance I've ever reported on.

Starting point is 00:41:50 A Mormon polygamist and an Armenian businessman. Multi-million dollar house, Ferraris and Lamborghinis, private jets, a billion dollar fraud. But how long can this alliance last? Tell me what you know. Is somebody coming after me? Listen to Kingdom of Fraud on the IHeart Radio app, Apple Podcasts, or wherever you get your podcasts. Life is full of hurdles. So how do you keep going?

Starting point is 00:42:17 On Hurtle with Emily Abadi, we're talking with the most inspiring women in sports and wellness, from professional athletes, coaches, and Olympic champions about the challenges that shape them and the mindset that keeps them moving forward. At our level, at this scale, being able to fail in front of the entire, world. Like, I can do anything. I can do anything. Listen to Hurtle with Emily Abadi on the IHeart Radio app, Apple Podcasts, or wherever you get your podcasts. Presented by Capital One, founding partner of IHeart Women's Sports. This is an IHart podcast. Guaranteed human.

Better Offline - What The Hell Is DeepSeek?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.