Bankless - DeepSeek R1 & The Short Case For Nvidia Stock | Jeffrey Emanuel

Episode Date: January 28, 2025

China’s new DeepSeek AI model, which reportedly matches GPT-4’s performance at 1/45th the cost, has rattled the AI hardware market and contributed to a 20% dip in Nvidia’s stock price. Investor-...technologist Jeffrey Emanuel argues that DeepSeek’s efficiency gains aren’t the only story, as his viral 12,000-word article “The Short Case for Nvidia Stock” also catalyzed the market’s panic. In this episode, we explore how these converging factors could unbundle Nvidia’s once-unassailable lead and drastically reshape AI compute economics. Jeff's Article:  https://youtubetranscriptoptimizer.com/blog/05_the_short_case_for_nvda Jeff on X: https://x.com/doodlestein  ------ BANKLESS SPONSOR TOOLS: 🪙FRAX | SELF SUFFICIENT DeFi https://bankless.cc/Frax  🦄UNISWAP | BUG BOUNTY PROGRAM https://bankless.cc/Uniswap-Bug-Bounty  ⚖️ARBITRUM | SCALING ETHEREUM ⁠https://bankless.cc/Arbitrum  🛞MANTLE | MODULAR LAYER 2 NETWORK https://bankless.cc/Mantle  🌐CELO | BUILD TOGETHER AND PROSPER https://bankless.cc/Celo  🎮RONIN | THE FUTURE OF WEB3 GAMING https://bankless.cc/Ronin ----- ✨ Mint the episode on Zora ✨ https://zora.co/collect/base:0x4be6cd4d402fed49eb2de95fbc8e737e8ffd3e7f/23?referrer=0x077Fe9e96Aa9b20Bd36F1C6290f54F8717C5674E   ------ TIMESTAMPS 00:00 Start 05:20 Intro To Jeff 07:27 The Issue With DeepSeek 16:00 Inference Compute Explained 24:51 Nvidia's Competition 29:14 DeepSeek's Impact On Valuations 36:46 Criticism Around Jeff's Thesis 44:17 DeepSeek's 45x Upgrade 51:30 The Transformer Explained 01:03:15 Why Is Everyone Shocked? 01:12:07 Synthetic Data 01:21:30 Why Jeff's Article Went Viral ------ Not financial or tax advice. See our investment disclosures here: https://www.bankless.com/disclosures⁠ 

Transcript
Discussion (0)
Starting point is 00:00:00 I basically was trying to help my organic search ranking of my little YouTube tool. And then it's like in the process, I may have inadvertently contributed to $2 trillion getting wiped off global equity markets. Because, you know, the fact is all of the news headlines came out saying the stock market crash because deepseek. I'd like to point out that the Deepseek V3 technical paper, it came out December 27th. A month ago. That's a month ago.
Starting point is 00:00:28 Even the newer model, the R1 model that does the chain of thought, that paper came out a week ago, and people were all over that. So why suddenly on Monday did everything crash? And I'd like to think it's because I'm pretty sure it is that I wrote this article in a way that sort of speaks to headstrong managers so they can understand it. And I published it like in the middle, you know, the night on Friday. and then it started taking off, and then it got shared by Chammoth, who has, you know, whatever, 1.8 million, right? And Chimath, and it's been viewed over 2 million times.
Starting point is 00:01:07 Naval-Rav account has 2.5 million. And then, like, the Y Combinator, Gary Tan, and the Y Combinator account. Between them, they have millions of followers. And not only did they share it, but they were, like, very effusive in their praise about, this is really smart. and that went praising.
Starting point is 00:01:28 Everyone is talking about this new Deep Seek AI model from China that is reportedly 45 times more cost efficient than U.S.-based eye models and charges 95% less money to use than ChatGPT. As a result, Navidia is down 20% wiping out $600 billion in market value, and both OpenAI and META's AI labs are scrambling to discover how a relatively unheard of Chinese AI lab was able to outperform their, very expensive models with a Chinese grown model that just cost $6 million to train. The guest on the show today is Jeffrey Emmanuel, who actually thinks that this part of the story,
Starting point is 00:02:05 the DeepSeek AI model part, is over indexed on. And it's actually a confluence of other factors that is contributing to the unbundling of Navidia's market share. And it's not the release of DeepSeek that triggered the 20% drawdown, but instead a 12,000 word article that he wrote on his blog that quickly went from just a few handful of readers to over two million readers, over the weekend that actually coincided with the 20% drop in Nvidia price when the market opened on Monday. In this episode, Jeffrey and I go through his article and reasoning behind why Nvidia is under threat of getting unbundled by other chip suppliers in addition to DeepSeek's impact upon the entire resource supply chain of training and inference around LLM models. Let's go
Starting point is 00:02:46 ahead and get right into this episode with Jeffrey. But first, a moment to talk about some of these fantastic sponsors that make this show possible. Are you ready to swap smarter? Unitswap apps are simple, secure, and seamless tools that crypto users trust. The Uniswap protocol has processed more than $2.5 trillion in all-time swap volume, proving it's the go-to liquidity hub for swaps. With support for growing numbers of chains, including Ethereum, Mainnet, Base, Arbitron, Polygon, ZKSink, Uniswap apps are built for a multi-chain world. Uniswap syncs your transactions across its web interface, mobile apps, and Chrome browser extensions,
Starting point is 00:03:19 so you're never tied to one device. And with self-custody for your funds and MED protection, Uniswap keeps your cryptocurrency. secure while you swap anywhere anytime. Connect your wallet and swap smarter today with the Uniswap web app or download the Uniswap wallet available now in iOS, Android, and Chrome. Uniswap, the simple secure way to swap in a multi-chain world. With over $1.5 billion in TVL, the M-Eath protocol is home to M-Eath, the fourth largest ETH liquid staking token, offering one of the highest APRs among the top 10 LSTs.
Starting point is 00:03:48 And now, CEMETH takes things even further. This restaked version captures multiple yields across Kerak, eigenlayer, symbiotic, and many more. making CMEE the most fission and most composable LRT solution on the market. Metamorphosis, season one, dropped $7.7 million in Cook rewards to M.Eath holders. Season two is currently ongoing, allowing users to earn staking, re-staking, and AVS yields, plus rewards in Cook, M-Eath Protocol's governance token, and more. Don't miss out on the opportunity to stake, restake, and shape the future of M-Eath protocol with Cook. Participate today at m-eath.mantle.xyZ.
Starting point is 00:04:21 What if the future of Web3 gaming wasn't just a fantasy, but something you could explore today? the blockchain already trusted by millions of players and creators is opening its doors to a new era of innovation starting February 12th. For players and investors, Ronan is a home to a thriving ecosystem of games, NFTs, and live projects like Axi and pixels. With its permissionless expansion, the platform is about to unleash new opportunities in gaming, defy, AI agents, and more. Sign up for the Ronan wallet now to join 17 million others exploring the ecosystem. And for developers, Ronan needs your platform to build, grow, and scale. With fast transactions, low fees, and proven infrastructure, it's optimized for creativity at scale. Start building on the TestNet today and prepare to launch your ideas, whether it's games,
Starting point is 00:05:01 meme coins, or an entirely new Web3 experience. Ronin's millions of active users in wallets means tapping into a thriving ecosystem, a 3 million monthly active addresses, ready to explore your creations. Sign up for Ronin wallet at wallet.roninchain.com and explore the possibilities, whether your player, investor, or builder, the future of Web3 starts on Ronan. Bankless Nation, very excited to introduce Jeffrey Immanuel. both an investor and a technologist. He, however, is a very specific flavor of both of those things. On the tech side, he is deeply informed about the research advances that come out of major AI labs like OpenAI, Meta, Google. And on the investing side, he plays in the markets as a value investor, one who dares to go short at times. Jeffrey released an article on his blog called
Starting point is 00:05:44 The Shortcase for Navidia Stock, which has been echoing across the tech industry as this new deep-seek model has fired a shot all the way from China. across the bow of the U.S. AI industry and has left the U.S.-based AI companies scrambling, reeling both tradfi and crypto markets as everyone learns to digest deepseek's impact upon the world. Jeffrey, welcome to Bankless. Thanks for having me. Jeffrey, I really enjoyed your article. I want to kind of start with the punchline.
Starting point is 00:06:11 I want to read one of the last paragraphs in your article that I really felt summed up the entire digestion of everyone's analysis on how the new deep seek model has impacted the market. So this is actually the second to last paragraph in your article. You wrote, perhaps the most devastating to NVIDIA's moat is DeepSeek's recent efficiency breakthrough, achieving comparable model performance at approximately 1.45th the compute cost. This suggests the entire industry has been massively over-provisioning compute resources. Combined with the emergence of more efficient inference architectures through chain of thought models, the aggregate demand for compute could be significantly lower than current projections assume. The economics here are compelling. When Deepseek can match GPT4 level performance while
Starting point is 00:06:54 charging 95% less for API calls, it suggests either Navidias customers are burning cash unnecessarily or margins must come down dramatically. To me, Jeffrey, that was the punchline for, I think, what everyone fell on the market is Monday when Navidia stock fell 17%. To me, I'm summing this up as there is a tug-of-war between hardware and software. And with the emergence of Deep-seek, the software side of this tug-of-war got a very large W. That's my interpretation. That's my analysis. Check me on that.
Starting point is 00:07:25 How do you feel about that kind of conclusion? You know, it's funny because the deep seek is the part that everybody's the most focused on. But I actually think the whole shirt thesis still works pretty well without that for all the other reasons that we can discuss. And the one issue with the deep seek is that it's funny, there's this thing, Jevin's. paradox, which is like nobody was talking about this until suddenly now everybody's saying Jevin's every other word. And it's something that comes from energy economics, which is like you think you make things more energy efficient, great, we're going to use less energy.
Starting point is 00:08:06 But then what ends up happening is that the price of energy goes down and everybody wants to use more energy. And so it actually increases demand for energy. And so everyone's saying now that, oh, this deep-seek thing is. wrong because of Jevins. And, you know, I am sympathetic to that, to a degree, but it's not always so clear. And it's not like the Jevin stuff happens immediately. Like, there's often, you know, sort of what causes booms and busts as these sort of temporary dislocations between anticipated demand and realized demand. And really, you know, what I think people miss is that
Starting point is 00:08:48 the big decisions about CAPEX come down to a couple people like, you know, Mark Zuckerberg, and a lot of it is sort of gut feel like Masayoshi's son, like, is this a good time to just push on the accelerator? And I think someone like Zok has to take a set back and say, listen, I know my guys are really smart, but maybe, you know, the answer is not necessarily to spend another, you know, $3 billion on Nvidia chips that are very expensive, you know, where, I mean, literally, like, they're paying 40 grand for a GPU that's costing Nvidia maybe, what, $3,500 to make. So it's, they're putting a lot of money in, uh, in Vidia's pocket. And maybe they can, you know, pump the brakes just a little bit and then see if they can sort of still, you know, because they projected that they needed a certain amount
Starting point is 00:09:40 of chips for their forecasted demand. So if they can, you know, and the deep seek stuff is all public. So they can look at the technical report. They can start making these changes themselves internally, theoretically, you know, at least for the next generation of models, their training. And as a result, maybe they can kind of, you know, because it's, I think there is still some skepticism on, you know, Wall Street that, like, are they going to see a return on this money? Because it's not like anyone's paying to use all this, you know, meta-AI stuff yet. And so I think it's a little, I'm not convinced that the Jevin, oh, yeah, well, Jevins.
Starting point is 00:10:23 It's like, okay, well, let's see if that's actually the case. But then, you know, really separately from that, like I was saying, even if you remove Deep Seek entirely, I believe that Invidia particular, and I want to clarify, I'm such a bull on a, I'm about as bullish, like 99th percentile on AI as anyone, you will ever meet. I live in the AI feature all day every day. I have three cloud accounts. I'm like, you know, using, I'm using this stuff nonstop all day every day. So I'm a huge believe,
Starting point is 00:10:56 like, but Nvidia as a company, they, you know, this is just goes back to my sort of training and investing is that you see, you see this over and over again. With the one exception of a regulatory, like, enforced monopoly, um, you do not have companies just get to print infinite profits without, you know, with, you know, triple-digit revenue growth with 90% gross margins. You don't get to see, and without having everyone in their brother trying to figure out a way to beat them. And that's what's happening. And so you look at, you know, these companies, Cerebris and Grock with a queue, like these companies already have extremely compelling hardware that, you know, largely does. get around the NVIDIA mode, at least for inference, and, you know, in the case of servers
Starting point is 00:11:51 for, I think, for training too. And, you know, there's all these other short, and I mean, the other thing is like, you know, normal companies of the scale of Nvidia tend to have extremely diversified revenue sources, whereas NVIDIA, all the high margin data center revenues coming from like, you know, five hyperscalers or something. Like, it's a very much power law distribution. And I just, it's funny because when I started writing the article, which I started writing because, you know, my friend who's a hedge fund guy asked me about it on Friday. And I just started writing about it. After, as I was explaining it to him, I realized, like, I should just write this up. And it's funny because it started out as, you know, if I was
Starting point is 00:12:34 forced to make the shortcase for Nvidia, here's what it would be. And by the time I had finished, I was like, shit, this actually is a short just from, because I, I wasn't, I wasn't, Like, I knew there was a lot of custom silicon in the works, but it was kind of eye-opening to me that every single hyperscalor customer is literally making their own custom silicon, in some cases, for both training and inference. So it's like Amazon, Microsoft, OpenAI, meta. It's like, they're all doing this, and it's like, you don't, like,
Starting point is 00:13:06 and as soon as they get this stuff to work, the other thing that's so important to remember is it doesn't necessarily have to be better than Nvidia's stuff, right? Right? Because Nvidia is charging 10x what it costs them. So if you can make it yourself for, you know, 1X what it costs, then you can cut the price by 50% to your end customers. You'll still make a huge margin.
Starting point is 00:13:28 And what matters to you as a hyperscaler is, you know, how many requests you can handle to your APIs and stuff per dollar. You know, you don't care if you need more chips. Like, it's fine. as long as you don't have to pay these inflated prices for them. And so I think all these – look, I mean, there are other parts of the thesis we can talk about. But I actually think all of that stuff should be just as much of a focal point as the Deepseek news. Yeah, maybe to go back and trace over your article, I see your article in two parts.
Starting point is 00:14:01 It's the moat of Nvidia and how it's being unbundled at the margins by the various set of companies, some of which you just mentioned. And some of these modes are the fast GPU interconnect. Invidia has had this amazing ability to make their GPUs talk to each other with extreme bandwidths, as if they are one big unit, like one big GPU. And that is getting unbundled by another company that is just making very large GPUs that reduces the need. Well, not CPU. They're making custom spent, like, it's not really a GPU. It's like this weird mega chip. Like, I mean, it's funny because it's like the, the, the, you know, the, you know,
Starting point is 00:14:39 H-100 is considered like an absolute unit when it comes to chip size because it's like this massive freaking package. But then the Cerebrous thing is like they literally took an entire 300-millimeter wafer and just made the entire thing one enormous. I mean, these chips are extremely expensive to me. And but yeah, you don't need to worry about wiring things together if they're all on the same wafer, right? And I actually just want to point out to that even in video, didn't make that technology. They bought Melanox, this Israeli company,
Starting point is 00:15:14 the doubles of the side, I think they had 10,000 employees by the time they bought Melanox for $7 billion, and that brought in like another, you know, about the same, so it was a big, really smart thing.
Starting point is 00:15:25 I mean, if they hadn't bought that company, like they would not be in the dominant position they are today with data center stuff. But yeah, I mean, everyone has been sort of relying on, oh, yeah, but what about Interconnect? Even if, like,
Starting point is 00:15:37 even if AMD could get their acting, and come out with a decent driver and come up with some alternative to Kuta, they don't have the interconnect, so you can't use it for this. And you hear that argument a lot. And I think, well, you know, you're starting to see on the training side, this company's Cerebrus with the wafer scale chip. But then also, you know, the other big news that started before DeepSeek was, you know, the 01 model from Open Aon.
Starting point is 00:16:09 and that sort of unlocked this other new scaling law, which is about inference time compute, which is like it used to be almost all the processing power was needed on the training side, and then the inference was pretty fast. But nowadays, with these models that do chain of thought, you know, the more they compute at the time you give them a request, the better the answer they can give.
Starting point is 00:16:33 And so people are now saying, whoa, so actually most of the compute might be on the inference side. But the inference side is a very different, you know, computer problem that can be, so if you want, like right now, they use the same GPUs for training and inference. Okay. Can we just quickly define training and inference? Yeah. Training is, like, actually making the model. Like, you have, like, a zillion, you know, gigs worth of data of, like, text from the Internet, Wikipedia, blah, blah, blah, broken up into these tokens.
Starting point is 00:17:06 DeepSeek used 15 trillion of them. And then you take thousands of GPUs and you basically learn how to condense all that data down to 99% less space in the weights. And then in the process, you basically, the models learn these like coherent model of the world and how to understand things. Because it's the only way to compress stuff that much without losing all the information is to understand it. Whereas inference is you already have a trained model. and now I want to ask it, you know, to write me an essay or do a logic problem for me. And so the inference is a very different problem. Like you don't need to have insane, you don't need thousands of GPUs to do it because you've already got the trained model.
Starting point is 00:17:51 You just need a couple GPUs maybe and you can get the answers. And so. So just to really trace that over one more time. Like training is like chat, GPT, open AI, creating their products, creating their models that, you know, I go on to chat GPT. and then when I type in a query, I am doing inference. And so it's really, there's a weight here of just, like, a ton of compute up front to make the model once. And then hopefully, like, a little amount of compute to run inference on it, which is just the daily requests. And, like, in theory, there's, like, a tradeoff here between how much compute you do initially to train the model.
Starting point is 00:18:27 And hopefully that just makes all future inferences as efficient as possible. But there's still compute on both sides. It just makes the model smarter. Yeah. And that way you get better answers. But what changed recently, it used to be that basically all the inferences use this sort of moderately, you know, or a fixed compute budget. But now it's like open-ended.
Starting point is 00:18:48 Now like, you know, 01 is like their flagship model from OpenAI. If you pay $20 a month for chat GPT plus, you can use 01, you know, certain number of requests per week. If you pay 10 times as much, $200 a month for chat cheap. which I do and I recommend anyone who uses this stuff a lot, it's got O1 Pro. It's the same model as regular 01. The only difference is that
Starting point is 00:19:15 it takes much longer to respond because while it's doing inference, it's using up far more of these intermediate logic tokens, as it were, this chain of thought, which is sort of like the scratch pad of its internal thinking process. And then
Starting point is 00:19:33 it gives you an answer, but the answer is better. Like your code will work the very first time. You won't have any kind of mistakes in your essay or whatever. Can we go over this one more time? So like it's the same, it's the same model. Pro, the $200 a month version and the $20 month version is the same model. But there's this extra step, there's an extra layer of things happening where the pro version is running that same model over and over and over again in chunks.
Starting point is 00:19:58 And it is able to go back and trace over previous work to like check its work before it actually gives you an output. And you're saying that just because of this. It's not an additional layer. It's just like they just do it for longer. They just, it's like, it's because you basically, it's a dial. You say, how much money do I want to spend generating tokens before I give the final answer? And with pro, it's like, it would not be economical for them to use the amount of tokens that they use for pro for the plus.
Starting point is 00:20:24 In fact, Sam Altman said that, you know, it's funny because everyone on Hacker News was like, in the industry, you know, all these developers were like $200 a month to get real. How could that make sense? and Sam Altman came out later and said, believe it or not, we're actually losing money charging $200 a month because people are using it and it just uses like insane amounts of compute. And so it really flips
Starting point is 00:20:44 the equation in terms of how much compute is being used for inference versus training. And then this isn't really relevant because like I said, with the Nvidia GPUs, you buy an H-100 GPU, data center GPU for 40 grand from
Starting point is 00:21:01 Nvidia. You're going to use the same GPU to train the model and do inference on it. But this company, GROC with the Q, everyone gets confused because GROC with the K. Not the Twitter GROC. Not the Twitter. Right, exactly. But GROC with the Q should be better known because this company is really, I mean, they've got unbelievable technology. They basically said, we're not going to try to solve training at all. We only care about inference. And so if you want to optimize the entire stack for inference only, how might you approach that? And, And the result of that is that they can do inference from, you know, like a standard model,
Starting point is 00:21:40 like the Lama 3.370 billion, which was like until the deep seat came out, it was the sort of leading edge open source model. And, you know, if you get a fancy desktop computer with one, let's say, Nvidia 4090 GPU, which you can get for under $1,000 now, you could get, I don't know, know, maybe 40 tokens per second, which is actually like good enough that you could use that as your sort of home version of chat GPT that works pretty well. When you try it on GROC, and anyone can try this for free, you just sign up with your Google account, and you can do inference from this model, and it's like insane. It's like, instead of like 40 or 50 tokens per second, it's like
Starting point is 00:22:22 1,500 per second. And so you click your thing, and you just, boom, there's your answer. And it's like, whoa, that's pretty interesting. And so even though that GROC hardware costs like millions of for one server, if you have enough demand that you can just keep it busy all the time, it's actually much cheaper to use. And most importantly, you're not giving your money to Nvidia. You're giving it to GROC, you know, so it's just an example of how people manage to, you know, like if you're trying to assault like a castle that has a big moat, instead of trying to cross the mode and get, you know, shut up by arrows, why don't you like dig a hole under the mode
Starting point is 00:22:59 or do, you know, the catapult to go over it. You find creative ways to get around it. And that's what's happening is everybody's been focused on, well, of frontal assault's not going to work. And it's like, okay, but there are other ways to seize your castle. And that's what you're seeing is that all the ingenuity of the market of like, because the, and the reason is because the prize is so big that if you can, you know, you too can make your company worth a trillion dollars if you can take a big piece of this pie. Whereas that was not true in 2016. It was like a backwater, you know. And so it's just, the wheels take a lot of time.
Starting point is 00:23:34 Like if you want to make your own custom, even if you're Amazon with infinite money to spend, if you want to make your own chips, you know, what do you know about making silicon? First you have to, you know, poach or hire the really brilliant people. And then it's going to take them probably two or three years to design a really good chip. And then you're going to have to try to, you know, come with giant sacks of cash to TSM and try to convince them to give you, like, volume. at their fabs, because they're like already just being, you know, inundated with money from NVIDIA and Apple and stuff. And it takes a while to get ramped up. But eventually the chips
Starting point is 00:24:12 start coming out. And, you know, the irony of it is like, again, it's like even though, you know, none of these custom silicon chips are going to be as good as the NVIDIA chips, the sort of way they're made is pretty similar in that they're both going to be using TSM as the fab. And they're, and they're both using the same machines from this Dutch company, ASML, that actually, like, does the lithography. So it's like, yeah, they won't have the same brilliant design maybe. But again, they don't need, that's the thing people miss. It doesn't need to be as good. It could be, it could be one-fifth as good. And it still makes sense for Amazon to use it because they don't have to pay 90% gross margin to Nvidia. Because Nvidia has been, it has the luxury of having very
Starting point is 00:24:55 high margins. And what that creates is like, well, if your product is 90% as good, but you you only take 10% of the margins, then all of a sudden you're solving, like, a lot of market problems. And I'm saying when your margins are so high, like just to put things into perspective, companies that sell chips, like in the semiconductor industry,
Starting point is 00:25:13 it's like generally not such a great industry. It's very subject to boom and bust cycles of like overcapacity and, you know. And so if you look at another area like memory, DRAM, which, you know, everyone has it in their phones and their computers, you know, you might think on the surface that this should be like this great business because there's only basically three companies in the world that do it.
Starting point is 00:25:36 It's like Micron, Samsung, and SK Hinex. I mean, there used to be like 15 memory companies, but they all like either went bust or merged. And so you would think it would be this oligopolistic thing with great pricing and margins. But if you look at the history of it over the last 10, 15 years, it's like it's very cyclical. and at the very peak when supply demand mismatch is really out of whack and they can charge really high prices,
Starting point is 00:26:03 they make like a 60% gross margin. And then, but if you take the average over the cycle, it's closer to like 20%. And at the bottom of the cycle, gross margins actually turn negative. Negative, right, right, right. And so then you look at Nvidia and you're like,
Starting point is 00:26:18 you have a 90 plus percent gross margin on data. Their overall gross margins, more like 75%, because they make much lower margins on the consumer stuff like for playing video games and that's because they have competition from A&D. You know, like that's what happens
Starting point is 00:26:33 in a competitive market. And so, but my point is that when your margins are that high, it doesn't need to be 90% as good. It could be literally like 40% as good and it still is a no-brainer for Amazon to switch as many loads over to their own thing
Starting point is 00:26:48 because it's like, you know, it's like when you buy like a handbag from Eremes for, you know, 40 grand. How much do you think it costs them to me? Even though it's made by hand by some French guy, like, you know, it's probably only like, it's like two or three thousand bucks tops,
Starting point is 00:27:04 and then they're charging you $40,000 for the end. And it's like very similar margins for the GPUs from Nvidia. And so it's like you don't, what matters is like, and the users don't care. They're submitting requests. They want to use a model, Lama 3.370, but they don't care if an NVIDIA card is doing,
Starting point is 00:27:25 the inference on it. And so Amazon is going to, you know, Amazon made their own CPUs called like Graviton. And they are very aggressive with the pricing of that to try to switch people over from, if you normally use like an Intel or AMD CPU, try using one of our things and you'll save a lot of money. And you're going to see that work. They're going to try to push people over to their product by making it more, you know, they're going to basically split the savings, you know, with the customers. And I think that's a lot... And so all that stuff, you know,
Starting point is 00:27:59 it's like death by a million cuts, like the combination of the competition from these different areas. And then, of course, it's like A&D does compete with them effectively in consumer stuff, but they've been completely absent in this whole data center AI stuff,
Starting point is 00:28:14 which is, you know, it's this crazy, like, I mean, they're going to be writing like business school case studies about how they squandered trillion-dollar opportunity. You can't get too mad at them because they also, like, managed to kill Intel. So it's like... At the same time.
Starting point is 00:28:27 It's not like they're not good, too. And it's so funny because Lisa Sue, the CEO of AMD is like First Cousins with Jensen Huang from a video. I did not know that. Yeah, which is just like, how good are these genes in this family? But so, yeah, I mean, but if they can get their act together, and it's so funny because it's like, they're so out of it. Like, I just don't understand it.
Starting point is 00:28:48 But, like, they were literally, like, people like George Hops, the guy who's famous for jailbreaking in the iPhone. and all the stuff. He's like literally by himself without any help from them writing his own stack of that's like, you know, we'll be able to make these GPUs usable for doing at least, you know, some training and inference. And so you might see even A&D coming up as a real competitor. And yeah. Yeah. So, yeah, going back to tracing over like the broad strokes of your article, I kind of break it out into two parts, two halves. There is the unbundling, of in Vindia's mode in the hardware side of things via hardware competitors, as you've kind of just
Starting point is 00:29:29 traced over. But then also the deep seek side of things is a rebalancing of the value of software and an algorithm design maybe is one way to put it. Maybe you can take us to the second half of that equation where how did deep seek really impact people's understanding of the value of software and its impact on the value of hardware? Well, so, you know, when you say like what is the software side of the thesis, It's not, it actually has very little to do with Deepseek. What it has to do with is one of the sort of biggest source of Nvidia's mode has been, because you know,
Starting point is 00:30:03 AMD has quite reasonably good, you know, chips. So the reason is that Nvidia basically was a very forward thinking. And when they noticed that like this deep learning stuff was really taking off from like back in with 2012, and they so they really figured out that we need to, make it easy to use our chips for this sort of thing. And so they have this system called Kuda, which, because you have to understand,
Starting point is 00:30:32 like, these GPUs are insanely complicated. I mean, in the old days, you'd have one CPU with one core. Now, CPUs are pretty complicated. Like, I have a CPU in my computer has 32 cores, but these NVIDIA GPUs are like, they have like thousands of cores. Right, that's their whole deal.
Starting point is 00:30:48 They have loss of course. And so it's like very, like, if you were to try to, like, write code naively to, like, take your problem and break it up and send it to thousands of cores and reassemble it. Like, no one can do that, you know, basically. And so instead you described your problem using these much more abstract, high-level concepts, and then Kuda turns that into hyper-optimized code that runs really, really well on NVIDIA GPUs, but not on anywhere else.
Starting point is 00:31:16 And Kuda is a NVIDIA-built software package to allow developers to use NVIDIA GPUs to their, their best degree possible. Yeah, without being like, you know, Einstein. Like, they can be very smart, but I mean, it's, it's, it's, it's kind of like a driver. Is it a, no, it's, it's, it's like a framework for, yeah, the driver is a sort of separate layer of,
Starting point is 00:31:39 but it allows the power of it, Nvidia, GPUs to be expressed to more people without them having to be yeah, it's like, it's like the difference between writing code and Python versus writing code in like, like, assembler, like, like, like, which is the lowest level, you know. And so, and then it's actually, so CUDA is even, most people
Starting point is 00:32:00 actually don't even write CUDA directly. Most people use machine learning frameworks like, you know, used to be TensorFlow, but it's been sort of totally replaced by something called Pi Torch, which is sponsored by meta. And so that's what most researchers use is Pi Torch, which lets them think, like in terms of the math and, you know, as a researcher, say, oh, I have this loss function. I have this optimizer, and everything's like modular and plug and play. And then you write high-level Python code, which is like very, very high-level. And then internally, Pi-Torch can then run that on Kuda and then run it on a GPU from Nvidia very, very officially. But if you have an AMD GPU, it's not as easy to have your stuff run really, really fast, writing using like Pi-torch
Starting point is 00:32:52 and stuff. And so, and a lot of people were saying that it doesn't matter what anyone else does in terms of chips. If they don't have Kuda, you know, it's game over. And there's like a two, I think, big assaults on that, which is that you're seeing the rise of these sort of even more high-level frameworks for expressing highly parallelized programming. And so you have this one, MLX is one. There's another one called Triton, and these are gaining, you know, momentum. And then for that, it's like Kuda is just one, they can't, you can write your stuff in MLX and then basically run it on an Nvidia GPU really, really fast. But you could also make another, you know, compilation target of MLX that could run on a completely different chip, like the one, you know, the Amazon is making internally, you know, Traneum chat.
Starting point is 00:33:51 And so, and it's also very high-level language. So maybe it makes, you know, instead of writing and, you know, targeting Kuda, maybe you should target MLX or Triton. And then you can also get run it on using Kuda, but you could also run it using these other things. And then you're not locked into using the really expensive Nvidia chips. So that's one assault. And then the other one, I think, is this idea that,
Starting point is 00:34:15 and I haven't heard about a lot of people talk about this, But one thing I'll tell, like I use LLMs all the time for programming, and they're just stunningly good at that now. But what they're really, really good at is if you already have a working prototype of code in Python or JavaScript or whatever, so it can really understand what it is you're trying to do, they're unbelievably good at porting that to another language. So if you have this Python algorithm,
Starting point is 00:34:45 and you want to turn it into like Rust or Golan, They do that unbelievably well. Like, maybe not on the first shot, but, you know, with a couple iterations, you can get it all working. And so what that made me realize is that, you know, because the part of what the CUDA thing, it's like about, it's become a lingua franca. Like everyone who's good at this kind of programming knows it. And so they think in terms of CUDA concepts. It's just the fastest way for them to express these algorithms. And so I was thinking that, like, they could write their code in CUDA like they normally,
Starting point is 00:35:18 do. But then instead of using it on a Nvidia GPU, they could use it almost as like a, what is called a specification language, where it's just for documenting the algorithm in a very efficient, elegant way. And then they could feed that into an L.M and say, all right,
Starting point is 00:35:34 now port this into this other framework, which will work really well with NDGPUs or with, you know, NDGPUs or with, you know, Cerebrus or something. And I think you really explained this well in the article when you illustrated, there's like a job market for Kuda engineers.
Starting point is 00:35:49 Yeah. And it's insular to the rest of like, you know, engineering jobs, engineers out there. It's very special. It's like there's this own independent, like, vertical of job markets and like the cost for these engineers. And the way that you illustrated in the article is like, well, those walls break down. And all of a sudden there's just like not really the same monopoly around Kuda. No, no, it's not that it's, I think they'll still use Kuda.
Starting point is 00:36:09 But the question is like, can they use Kuda but then not use an NVIDIA GPU? Right. Which is where the moat comes from. Right. Indy gets at least part of its value from. Yeah. And now you did bring up a point about like, so the deep seek in a sense is software because by writing smarter training software, they did reduce the demand.
Starting point is 00:36:32 But I'd say that's sort of separate. That's like kind of orthogonal, if you will, to this other stuff, which again, it's like, so if you took away the deep seek part of it, I mean, you can see the big threats to the mode software and hardware, how is this? Now, let me just say, I just saw, but right before we started talking, somebody said, here's why, you know, my thesis is all wrong. And they're saying that, well, the problem is that TSM, which is Taiwan Semiconductor, which builds all these chips.
Starting point is 00:37:01 And they're basically the only ones that can do. I mean, not the only ones because, like, Samsung can also make pretty good chips, but, like, for the most, yeah. Yeah, they make all the Nvidia stuff and most of the Apple. stuff. But by the way, I want to point out, again, it's like, yes, it would be best to do something in a four-nanometer process node, which is the smallest you can do. But, you know, you could use like a bigger, like an older process node and your chips won't be as fast and they won't be as energy efficient. But you've got a lot of room, wiggle room, because you just, you don't need
Starting point is 00:37:37 it to be as good. You just need to be cheap. But, so anyway, but the objection to my thesis, is that these guys are book solid. Even if you came to them with, like, you know, giant bags of money, they're book solid. And the reason is because... The manufacturer is book solid. They're backed up. They have too much orders. Yeah.
Starting point is 00:37:56 For the next couple years, they don't care how much money you give them because they're all book solid and they can't, you know, can't just instantly make a new... Although, I will say, like, you know, Taiwan-Sembe built a fab in Arizona and there was all stuff about, oh, it's taking them so long and they can't hire good people. But you know what? They finally did get it all up and running. And they could literally, if there was enough money in there to do it, they could just copy paste the blueprints, get another big chunk of land and like just replicate what they just did again. And they could do that. Like, and it wouldn't take like that, it wouldn't take that long. And so in any case, but so that's the objections that. So even if
Starting point is 00:38:42 everything I said is true. These companies, Cerebris and Grock, and the hyperscalers like Amazon and, you know, Google and blah, blah, blah, did meta and so that they won't even be able to make these chips in enough volumes that it's going to dent in VDIA. And my response to that is like, okay, your analysis is essentially conceding
Starting point is 00:39:06 that this is a highly sort of transitory circumstance here that like they're just very temporarily going to have this advantage and then as soon as the additional capacity comes online or opens up then there's going to be this massive flood of alternative supply which is going to pressure market share potential you know if the even if the pie grows the market share is going to go down but most importantly it's like it doesn't take you know something there's some stuff that has nothing to do with technology. That's just basic, you know, economic, industrial finance kind of thinking about how do markets work. And the difference between having basically a monopoly and having even one or two competitors is like the margins really can
Starting point is 00:39:58 fall quickly because it's like, you know, if you have two office buildings that are like, you know, 98% occupied, nobody's, you know, it's a race to the bottom to try to cut your risk. But like if both of them start losing tenants and, you know, every day that goes by and this floor is empty is just they're losing money. And so there's just race to the bottom where they just, and there's this critical threshold where, you know, once, let's say the occupancy rate in a market for office, you know, dips below, let's say, 80%. The rents, it's very nonlinear, you know, like if occupancy falls another 5%, rents are going to fall a hell of. a lot more than 5% to make the market clear. And I think you'll see that the margins can fall very, very quickly once they're the real competitors.
Starting point is 00:40:49 And then the question is, okay, again, this is not about technology. This is about how do you rationally value a stock? And I mean, one of my favorite, I mentioned in my piece that, you know, I once won a prize from this Value Investor Club website for a short idea. This was like more than 10 years ago, but I'll quickly tell you the story of it because I think it's so relevant here, which is that this was a company called Petrologistics.
Starting point is 00:41:16 PDH was the ticker. And they were a company that just had a single plant that took propane and turned it into propylene, which is a, and through this, like, random, you know, it's basically because the shale play happened, all this, like, I don't have to get into all the details. Suffice it to say, They were earning unbelievably high spread, much, much higher than historical or like when they built the plant what they ever expected to earn.
Starting point is 00:41:44 They're earning so much that their profit in one year from running this plant was like 80% of the cost of building a new plant. And it's not like rocket science to build one of these plants. You can just go to a big construction company like Bechtel and say, I want a conversion plant for propane to propylene. and they have off-the-shelf blueprints, they'll make it for you, guaranteed in a couple of years. And sure enough, this company was earning these higher turns,
Starting point is 00:42:17 and people were putting a big multiple on the earnings, because they're like, look at this, the earnings have gone up so much. But you could tell that all these other plants were already under construction, and you actually knew approximately when those plants would come online. And so you could basically figure out, all right, even if I grant you that they're going to continue earning these massive margins,
Starting point is 00:42:38 it's going to start stepping down in like a year, and then in 18 months it's really going to step down. And in 24 months, it's going to be right back to the... So if I want to value this as, let's say, what is the present value of the future cash flows discounted because of the time value of money, I can do that. I can say big, big profits this year, a little bit less profits next year, and then after that, normal profits. And, and... add up the discounted cash flows and you realize like you can't put a big multiple on earnings that are not sustainable and and right now like so so if you tell me that oh well but you're wrong because invidia's going to keep earning these huge profits for the next two or three years it's like dude
Starting point is 00:43:21 you're putting like a 30 40x multiple on that that's essentially implying that it's going to sustain at this rate like indefinitely and that's just not how you know you should think about this value of a stock. And so it's really, this is why I want to just say like, and a lot of the Jevin's stuff, it's like, yeah, I am bullish on the aggregate, like the amount of total demand for inference is going to skyrocket. The pie is going to grow. That's totally separate question from will Nvidia be able to continue growing revenues triple digit percentage year over year at these insanely high margins. That's a completely separate bit. And you need to answer that question if you want to feel comfortable putting such a high multiple on that earning stream. You have to
Starting point is 00:44:13 know that it's going to sustain, and it seems actually quite likely that it won't sustain. I do want to dive headfirst into the deep-seek efficiency gains part of this conversation, because I think that's kind of where we should go next. One thing that you wrote in your article, you said the sum total of all of these innovations, these are innovations referring to the lab that made DeepSeek, when layered together has led to the 45x efficiency improvement numbers that have been tossed around online. And I am perfectly willing to believe that these are in the right ballpark. Maybe you can just like explain the significance of this new chat GPT-like model, Deepseek, and how it got to be 45x more efficient and what that efficiency, what 45x efficiency means when it comes to the,
Starting point is 00:44:59 industries that are like are the supply chain to create like the usage of these models. Sure. So look, I mean, it's funny like in in the West, it's like we have this sort of resource curse, you know, almost of like we have too much money. It's easier almost to just throw money at a problem to try to be like really clever. And so, you know, the joke or the sort of parallel I make is like when you look at people's houses in Saudi Arabia, they're not very energy efficient. And that's because they get subsidized power because they have unlimited energy there. And so there's no, yeah, there's no point in wasting all this extra construction cost on double
Starting point is 00:45:39 playing glass and blah, blah, blah. And it's a similar thing at like meta and Google and they just have so much operating cash flow hitting, you know, every quarter. They're like, fuck it. Let's just, let's just hire more. Money's not an object. Yeah, yeah. Let's pay our people $5 million a year and let's, or whatever, a million a year.
Starting point is 00:45:58 and let's just, you know, send over to Jensen another $3 billion. And whereas, you know, this, China, it's, they're not getting paid that much, that's for sure. And, you know, they do have these export control. Now, I know a lot of people say, oh, they don't, they're smuggling them in through Singapore. I'm sure that's happening. But like. Smuggling chips. Yeah, because they, first of all, under Biden, they made it, they have basically have like a slightly,
Starting point is 00:46:28 crippled version of the Nvidia GPU just for the China market or export market. That's not as good as the H-100. But then also, I mean, what people point to you, which I think makes a lot of sense, is like something between 15 and 20 percent of Nvidia's revenue
Starting point is 00:46:48 comes from the tiny nation state of Singapore. It's like, really? They're using that many GPUs there? And it's like, because everyone knows that they're somehow getting laundered and smuggled into China. And so the question is, we don't even know how many Nvidia GPUs are in China. And so we don't really know how many DeepSeek used. But the point is they don't have as many as we do. And it's not as easy for them to get
Starting point is 00:47:14 them. And so they have to... Maybe the punchline you're making is like that Tony Stark Iron Man meme of Tony Stark was able to build this in a cave. And that's China. They don't have an abundance of capital. They don't have an abundance of chips. They have some chips. They have plenty of capital. But they do have plenty of capital. They don't have the ability to... And by the way, they're quickly... That's a whole other story, but, like, they hired, like, you know,
Starting point is 00:47:37 they poached some of the smartest guys at TSM to make their national champion and Smick or whatever it's called. And they're obviously not there yet, but, like, they made a pretty good Huawei CPU. And I wouldn't be surprised if... I mean, that's the other giant... wild card that nobody's really
Starting point is 00:48:00 taking into account. Like, don't count out they got some of the smartest people from Taiwan semi over there. And it's like, they'll buy the machines from ASML too and, you know. Right. So, but anyway, what I wanted to say is that, you know,
Starting point is 00:48:16 their engineers are A, necessities mother of invention. But also, you know, in the West, we tend to have this sort of bifurcation in the market where you're either in the like research track, in which case you have a PhD and you've written these papers and you're like a guy like does stuff on the whiteboard or whatever. And often these people are not,
Starting point is 00:48:38 they're not very good engineers. Like there's a joke that these researchers are actually horrible at programming. They're good at math, horrible at writing optimized code. It's not obviously not universally true. There are some people who are great at both. But so what happens is usually the researchers think at these high level, and then they make like a prototype, and then they hand it over to these people who are more engineers, like high-performance optimization guys, people like John Carmack or Jeff Dean at Google,
Starting point is 00:49:10 who, you know, they're not going to invent the new optimizer or like, you know, some new loss function for AI models. But if you give them an algorithm, they know how to make it run really fast, you know, on a computer. And so it's sort of like they, the way we do it is this sort of two-step process in the West where the researchers design the thing and prototype it
Starting point is 00:49:35 handed off to the engineering department that says, all right, we have this algorithm, how can we make it go fast? The deep seat guys are unbelievable at books. So it's like instead of having it be two teams working one than the other, it's like they kind of inverted. it and they started out with, let's start out first with how can we saturate every ounce of performance on these GPUs so that nothing is wasted. Because it's like it almost doesn't matter how fast the GPU can calculate if it's waiting to get data that it needs to do the calculation,
Starting point is 00:50:11 then it's just sitting there idle. Okay. And there's a lot of this interconnect, right? There's a lot of talking to each other. And so normally you have to dedicate a big chunk of your processing power just to handling that communication overhead. So they did a lot of really clever work with making the communication stuff as efficient as possible, so there's very little overhead. So they basically started with, rather than say, how do I make this algorithm go fast? They said, how can I make a really, really fast algorithm that'll really run these GPUs as much as I can?
Starting point is 00:50:48 and then sort of design a smart training system based on that. So they sort of inverted things. And so there's just this collection of sort of optimization tricks. And by the way, I want to point out, like many of these ideas were not invented by them. Many of them were actually published by American and other researchers, like Noam Shazir, who just got rehired by Google for a zillion dollars. They bought his startup just to get him because he's so, he's like, that. smart. I mean, and, and, but it's, it's, it's, it's into, it's implementing them in a clever way.
Starting point is 00:51:24 And so I'll just give you just a couple examples of this. It's like, so, you know, the, all this whole Chachyb-T thing really exploded because there was this model design called the Transformer, which came out in 2017. This is probably the most cited paper in history now. It's called attention is all you need. And this kind of, it combined the sort of regular neural nets that we've been using for a while with something called like the attention mechanism, which is this very clever way of like kind of contextualizing the information so that like instead of always processing it the same way, it depends on its context. And you sort of automatically learn how to think about that context. and storing all that data while you're training
Starting point is 00:52:14 is like one of the major things that use up memory. And the memory, it's very important because you can't use like the system memory on a computer. You have to do everything on the, what's called the V-RAM, the memory that's very fast memory on the GPU itself. And that's pretty limited. And so if you can save on the amount of memory you're using, that's huge because not only can you do more,
Starting point is 00:52:40 with fewer GPUs, but you're also not transferring as much data because it's just smaller. And so anyway, there's something called these KV key value caches and indices that you need to keep in memory while you're training a transformer model. And they came up with this incredibly smart, I mean, this is probably the coolest thing in the whole paper, the Deepseek V3 technical paper, is that they realize that, you know, really it's very wasteful how it's done normally, that you're storing way more data than you need to, that the sort of only like some very small subset of that data
Starting point is 00:53:19 actually is meaningful. And in fact, by storing more than you need to, you're almost like overfitting to like noise, and it's not necessary. And so they... Maybe a simple way to explain this for listeners who wants some extra help with that is it's just maybe closer to how your brain works with attention
Starting point is 00:53:35 where when you're applying attention somewhere, you're not thinking about every single thing under the sun all at once. You're kind of focusing on what's necessary. You can't go too far with the anthropomorphizing. Like attention in this context means a very specific thing. And it's not, I don't think it's going to help people understand it. I can't remember if I heard this in your article, maybe a different one. But it's like if a house has, you know, 20 different rooms and lights are on in every single room,
Starting point is 00:54:02 even though a person is only in one room. And this new model only keeps the lights on. for the specific room that the person is in at that one given time. It's some loose broad stroke pattern like that. Sort of. I mean, it's basically, like, instead of just naively storing this massive amount of, like, key value data
Starting point is 00:54:20 that shows you, like, it's basically, like, if you have, like, the word, job, it's very different if you say, nice job versus I just got a new job or, you know, like, you know, are you going to be able to handle that job for me? Like, don't, and it's like, knowing so, so like the word job has like a certain representation in the model, but that representation has to be altered depending on its context.
Starting point is 00:54:46 That's a broad attention about. And that means that for every word, every token, you have to really have to store lots of different things depending on the context. And that's why it takes up so much memory. And they're able to store that in a very efficient way using, you know, basically like just storing this sort of subset of the data in a compressed representation. that's one thing they did that saved a lot of things. Another thing they did that's very smart
Starting point is 00:55:13 is what's called like multi-token predictions. So like usually these models, they predict the next token, the next word, basically, based on the preceding tokens or words. And, you know, one at a time. And so it kind of is this bottleneck. And they're like, well,
Starting point is 00:55:31 what if we try to do, let's say, two or three at a time? Now, the problem with that is you can't really predict the second token without knowing what the next token is. And so how can you start with the second token until you know the first token? But you can do what's called the speculative decoding. And so, but your speculative decoding might be wrong, in which case you wasted your time computing that second token. But what they did is they got very good at guessing what that second one would be, such that 95% of the time they get it right. And so they basically, just from that, you can sort double your throughput on inference because, and by the way, that's part of the reason why they're
Starting point is 00:56:13 able to charge so little for their API is because that's about inference costs. And so they said that one trick let them almost double throughput for no additional cost, but just by, so that's a very clever trick they did. And then they did another very clever trick with just the, you know, these models are basically just a gigantic list of numbers, if you will, called the parameters of the model. And they figured out a way to store those parameters in a much more kind of compressed form.
Starting point is 00:56:47 And like normally the way these models are trained is they use more precision. You can think of it almost as like more decimal places of accuracy, but it's not actually how it works, but it's just sort of close enough to understand conceptually. And then often what they do is once they then train the model that way to make it, so that it can run on a cheaper GPU,
Starting point is 00:57:08 they do what's called quantization, where they sort of then kind of truncate and round off the numbers a little bit. But that does hurt the accuracy, or not the quality, the intelligence of the model. And what the deep-seat guys did is they managed to, instead of having to train at a higher precision and then quantize to a lower precision at the end,
Starting point is 00:57:29 they managed to figure out how to mostly do the entire process end-to-end using the smaller representation. And again, it's like, it's one of these things where efficiency gains are, they pay for themselves so many times. Because it's like, not only do you use less memory, but the calculations go faster. And, you know, and then you don't need to do as much inter-GPU communication because there's less data.
Starting point is 00:57:58 And so it's like it's these efficiency gains pay off in multiple different ways. And so that's another thing they did. I mean, there's just this whole laundry list of like little tricks and optimizations they did that when you add them all together, and they're not additive, right? They're multiplicative.
Starting point is 00:58:18 Like each thing, you know, if this thing doubles it and this one increases it by 40%, and this one doubles it also, you're multiplying those multipliers, if you will. And that's how you can get this very big number, like 45 times, which by the way, we don't really know. You know, we don't know for sure. They could have lied about the number of GPU hours they use.
Starting point is 00:58:43 One thing is clear, though, that they are charging 95% less for inference. So either they're losing money on that or they really can do at least the inference part much cheaper than, you know, we can here in the West because, yeah. that 95% less money for inference, I think, is really the sticker, the shocking number that is sending companies like META and OpenAI reeling, like Sam Altman had to put out of tweet. No, actually, meta was up, I think, because it's actually, look, on the one hand, it's bad for META in that they have spent so many billions of dollars on GPUs and they pay so much money to their team to, like, come up with the Lama models and stuff like that. and then it sort of does make them look a little, like, foolish when these guys are able to beat them at their own game for, you know, on a shoe string. Right. But at the same time, what they really care about is how much does it cost them to serve AI to all of their billions of users around the world?
Starting point is 00:59:41 And so it's actually good for them if they can cut their cost 95%. That's great. Who is bad for is open AI and anthropic because it's going to put more pressure on their pricing. Like they're not going, right now, right now OpenAI charges a fortune for the 01 model API. And even, you know, GPT40 is much more expensive. And so they're going to probably have to respond by cutting, you know, their API prices significantly. Which is their, that's where they get their profit from, right? Well, they don't actually have profits.
Starting point is 01:00:18 Both companies are deeply, both companies are deeply unprofitable at the, you know, consolidated level. And I actually suspect even at sort of, you know, incremental marginal level, they're not all that profitable because they're prioritizing revenue growth above sort of everything else. I don't think it's a case where they lose money on every unit, you know, sold on the margin. You know, any fast-growing company is going to post consolidated losses just because they're always spending on growth and the new model. And so. So the real question is, like, what if OpenAI and Anthropic completely stop trying to do R&D and making the new model and just try to milk the business for money what they have now,
Starting point is 01:01:04 would they be able to eke out of profit? And I think, yeah, the answer is probably yes. But if they have to cut their pricing by 80%, then it's very unclear. You know, so that's where it starts to be pretty relevant. The Arbitrum Portal is your one-stop hub to entering the Ethereum ecosystem. With over 800 apps, Arbitrum offers something for everyone. Dive into the epicenter of Defy, where advanced trading, lending, and staking platforms
Starting point is 01:01:29 are redefining how we interact with money. Explore Arbitrum's rapidly growing gaming hub from immersed role-playing games, fast-paced fantasy MMOs, to casual luck-battle mobile games. Move assets effortlessly between chains and access the ecosystem with ease via Arbitrum's expansive network of bridges and onrifts.
Starting point is 01:01:48 Step into Arbitrum's flourishing NFT and creators-based, where artists, collectors, and social converge and support your favorite streamers all on chain. Find new and trending apps and learn how to earn rewards across the Arbitrum ecosystem with limited time campaigns from your favorite projects. Empower your future with Arbitrum. Visit portal.arbitrum.io to find out what's next on your web-free journey.
Starting point is 01:02:12 Thello is transitioning from a mobile-first, EVM-compatible Layer 1 blockchain to a high-performance Ethereum Layer 2 built on OP-Stack with EigenDA and One Block Finality, all happening soon with a hard-finding With over 600 million total transactions, 12 million weekly transactions, and 750,000 daily active users, Sellow's meteoric rise would place it among one of the top layer 2s, built for the real world, and optimized for fast, low-cost global payments. As the home of the stable coins, Sellow hosts 13 native stable coins across seven different currencies,
Starting point is 01:02:40 including native USDT on Opera MiniPay, and with over 4 million users in Africa alone. In November, stablecoin volumes hit $6.8 billion, made for seamless on-chain FX trading. Plus, users can pay gas with ERC 20 tokens like USDT and USDC and send crypto to phone numbers in seconds. But why should you care about Selo's transition to a layer two? Layer 2's Unify Ethereum. L1's fragmented. By becoming a layer two, Sello leads the way for other EVM-compatible layer ones to follow. Follow Sello on X and witness the great Sello happening where Sello cuts its inflation in half as it enters its layer two era and continuing its environmental leadership.
Starting point is 01:03:15 So, Jeffrey, I just want to kind of zoom out and summate everything. We have this new model, this deep seek model, which is 45 times more efficient than, than, you know, chat GPT or other competitive models. That's caused a repricing in VDIA because people think like, oh, wow, 45 times more efficient. We just need much less hardware in order to make that outcome happen. It's just we're getting more from less hardware. And so maybe we've been overpricing a hardware. And that's what has shocked the market with a repricing of Nvidia.
Starting point is 01:03:44 And then also now, Open AI, Sam Altman, are getting a squeeze because, because DeepSeek is charging 95% less money for inference requests. But my broad question to you is like, well, isn't this the expected outcome? Like, AI and AI technology is on a very steep curve. And we're seeing, you know, breakthrough efficiency gains across the complete tech stack, whether it's hardware or the models, we've always known, like AI is going to accelerate very, very quickly. And isn't this just what this looks like?
Starting point is 01:04:16 Isn't this kind of the expected outcome here? like, of course we're going to get more efficient. That's how technology works. Like, why is everyone surprised? I mean, it's clearly not the expected outcome because the stock wouldn't have moved so much. I mean, it was the expected outcome for me, which is what I wrote my article. But I think the answer is that everyone does expect progress, progress on the hardware front, that every year the chips are going to get faster and bigger, progress on the algorithmic front,
Starting point is 01:04:46 that you're going to come up with a better way to train the models or do inference that it's going to make things faster. I mean, when these LMs first really came out a couple years ago, they had a much more limited context window, like the amount of text you could put into them. That has gone up dramatically. And originally everyone thought that that was going to be really hard to make that be able to go up because they thought it was going to dramatically increase the amount of memory. But people came up with really brilliant inventions to, to, to, you know, new algorithms to make it faster. And so people do expect some level of algorithmic improvement,
Starting point is 01:05:23 some level of hardware improvement every year. But they expect it to be, you know, like a Moore's Law-type progression where it's somewhat predictable. And what really catches people off guard are step function changes where overnight it goes, no. So like if the news was that they tripled efficiency, you know, That would be, you know, I mean, can you imagine, like, if you made an air conditioner that was three times more energy efficient, like you'd crush them, you would get huge market share. Tripling something, it's like in any normal, you know, if you had something with triple the mileage for your car, like, that would do great.
Starting point is 01:06:03 But, like, we've become so sort of used to it in technology that it's like, you know, but 45 times, okay, now we're talking here. That's really crazy. And so when that happens overnight in a way that people didn't anticipate, that's when you get this sort of shocking thing. And, you know, the thing is like, you know, there's this expression of being priced to perfection. Like the invidious share price, it only looked reasonable to people who extrapolated these curves out. And like, you have to be very careful when you extrapolate revenue growth that has been going at 120% year over, year. And again, it's not just revenues. It's about the margins. And they were basically saying that the margins would maintain and the revenues would keep growing at this incredible rate.
Starting point is 01:06:57 And as a result, yeah, sure, this is, and that's why every single, every single investment bank, basically, had a strong buy on Nvidia. All of them. They all got caught completely off sides with this thing. They were all like scrambling, honestly, to read my article and they're like, you know, I got inbound request from some investment banks to like help to, because nobody even wants to talk to their analysts about this. They want to talk to experts. And so they're scrambling to find experts, not that I'm even an expert, but compared to the equity analysts at the sales side, apparently I am. And so it was not expected at all, like that it would happen like step function change like this. And that's what just is like the,
Starting point is 01:07:41 the body blow to the stock is that, hey, this thing was pricing in, you know, clear skies. And then it's like, all of a sudden it's like, oh, there actually are these threats. And then again, it's not just the deep seek. It's like the people were ignoring a lot of these other threats. And I don't know why because they're literally, these are people where their full-time job is to cover Nvidia for Goldman Sachs and Morgan Stanley. And I don't know what the hell they're doing that they didn't, you know, how come they were talking about, you know, the competitive threats to Kuda.
Starting point is 01:08:11 or like, you know, Cerebrus and GROC. And maybe they mentioned it, but they certainly didn't figure out that this actually is going to be really important. And with the step function change. It's not just a step function improvement because it's also a step function improvement in a slightly different direction
Starting point is 01:08:25 than what the market was thinking, correct? It's not, we aren't just skipping ahead on Moore's Law. We're also going in a different direction. Well, it's additive to everything else. It's like you are going to have faster chips next year. You are going to have more chips next year. You know, like you always. are going to have other algorithmic improvements on the margin. But on top of that now, every big
Starting point is 01:08:49 AI lab in the world is going to be, you know, the Lama team at META, the Anthropic guys. You better believe Zuck has brought these guys into his office and said, we need to use every one of the tricks these guys are using for Lama 4. Yeah. So like as a consumer of AI products, It's great. If you're not exposed to, you know, NVIDIA, if you don't have open AI equity, private equity, if you are just a consumer, you're stoked.
Starting point is 01:09:17 Oh, God. The products coming down the pipeline are going to be sick in a very short order. Oh, and not only just that, but from a standpoint of like, you'll be able to run this shit on your own computer. Like, you get like a $1,000 Mac laptop. You're going to be able to have like AGI
Starting point is 01:09:32 AI on your computer on tap privately. And it's like, it's the most miraculous thing ever. I mean, no one would have believed this even a few years ago. Is that why Apple is up on the week? Because I think I saw Apple being up three or four percent when Nvidia was down on its 20 percent. Apple is one of the guys that actually, it's so funny because Amazon and Microsoft and Open AI, they're all like trumpeting in these big press releases about their custom chips that they're making. And, you know, Apple's so different. Apple's like so secretive. And it's like, but you know they have like one of the best silicon
Starting point is 01:10:07 teams in the world. But they only announce something if it's like they're ready to sell it to consumers. If they're making chips internally for their own uses, like no one even freaking knows about it. And all the people who do know about it are like signed up with NDAs like and no one talks about it. And for all we know, they have pretty fucking awesome chips already.
Starting point is 01:10:29 And so, but they're essentially like users of AI. You know, so it's good for them. It means that they'll be able to use some of these tricks to make some of these models. In fact, there's an app you can get, I think it's called Apollo, on the App Store, that lets you download these models. And if you have like an iPhone 16 Pro or something, you can just run this thing. And you could be on an airplane or whatever with no internet in a bunker somewhere and have essentially, you know, not quite AGI on tap, but like, you know, certainly like smarter than most college students on a lot of topics. and it's wild to see it go. You know, you could go into airplane mode
Starting point is 01:11:08 and be like asking it all these questions about, you know, chemistry and physics and history and it'll give you really good responses at a reasonable, you know, pace. And so, yeah, it's good for Apple. It's good for Apple. I think it's ultimately good for meta, which is why meta stock wasn't down.
Starting point is 01:11:25 Right. You know, so it's not, it's not a bad thing. It's just bad. It's so far as, again, it's a recalibration But, you know, I do think it was excessive for what in one day that the whole, you know, two trillion of capital wiped out. But it's, it's, it's, I'm not, I'm not saying that you should be buying the dip in Nvidia, though, because I think it did get ahead of itself and it could still, look, it could fall to two trillion.
Starting point is 01:11:55 And, you know, two trillion is still a lot of money. Okay. Like, this is a company that earned, you know, like five billion dollars like, you know, a few years ago. So that's still, you know, quite a big valuation. Jeffrey, there's one last conversation before I let you go is the conversation of synthetic data. And this, I think, comes from just having stronger and better models, creates this notion of synthetic data. And this is also like part of the equation of like the rebalancing of how people value things.
Starting point is 01:12:27 Can you just walk us through this synthetic data conversation? What is synthetic data? What do different and stronger models? models have to do with synthetic data? And what does it mean for the overall supply chain of AI? Well, I'm not so sure that it, I mean, I think it's an important concept. I'm not sure how much it applies to sort of those things. What it is is that, you know, when you're training these models, the pre-training that actually makes the model smart, it's partly a function of how much compute you apply, you know, how many GPUs and how fast they are, but it's also the amount
Starting point is 01:13:03 and quality of the data that you're training on. When DeepSeek says we use 15 trillion tokens in our training set, that's what they're talking about. And the thing is, it's like there's only so much data that's of high enough quality that you'd even want to use it to train a model out there. Like, if you take all of Wikipedia, I don't know how many tokens that is,
Starting point is 01:13:26 but it's like not that many. You know, it's like less than, you know, It's measured in the low single-digit billions. Not even close, actually. Sorry, maybe billions, yeah. But it's like if you take all the books out there, we're talking really like it's just like a couple trillion. Like if you talked about like all the newspapers that have ever been written, it's a couple trillion. But what you're saying is the quality data that's out there is a processable amount of data.
Starting point is 01:13:57 No, no, I'm saying that we're running out of data. We're running out of data. Like that people write smart books, like, you know, they're not writing, you know, the books fast enough, basically, to keep supplying us with more and more data. And so that's a big wall that we've been facing. Like, how are we going to keep improving the models if we're not going to be able to scale up the data that they're using? And people say, oh, but you could just take every YouTube video. But it's like, have you seen most YouTube videos? It's not going to make your model smarter.
Starting point is 01:14:24 No, it's going to make it dumber. No. And so, but there is an exception to this rule. And so the exception, now, so synthetic data is using an LLM to generate text and then turning around and training a new model on that text. And so that sounds like very circular. Like how, it's like me trying to teach myself in a room without a book or anything, just talking to myself and I'm going to teach myself. Like, how is that supposed to work like in terms of getting new information? Isn't that in a sense almost getting high on your own supply that it's not going to,
Starting point is 01:14:58 help you. And that is sort of true if, let's say, you're talking about, like, you know, the history of the Peloponnesian War or something, you're not going to get anything new by just regurgitating your own output. The exception to all that is if you're talking about, like, logic, math, computer programs, because in those things, the big difference is that you can verify that what you said is correct. So, you know, just like the rules of chess are very simple, but it's like almost unlimited complexity of the possible chess games. It's the same thing. Like, there are so many possible simple Python programs that are like 100 lines or less that, you know, we've only ever seen a tiny subset of them. So you could come up with a, you could say, oh, I want to make a Python program that
Starting point is 01:15:44 does XYZ, generate a candidate, and then test and be like, well, when I run it, did I get that output? And if you did, you know that the program's right. And so now you can say, okay, well, let me add that to the training set. And it wasn't in the training set originally, but it is correct and good. And so what you can do is you could start exploring the world of like all possible math theorems and working out, you know, all these math proofs, verify that they're right and then add them to the training set. And in that way, you could basically come up with lots of data that's known to be super high quality. And that's why these models are getting better at logic and math. at a much faster rate than they're getting better at anything else.
Starting point is 01:16:29 Because you could sort of just keep cranking and getting this synthetic training data and then the scaling can just keep going forever. So that's why it's sort of funny that which jobs are most at risk from AI, you know, I think a lot of people thought it was like, well, we're still going to need people who are like really, really smart, quantitative stuff.
Starting point is 01:16:54 And it's like, I got news for you. That's like the thing that they're going to become superhuman at before anything else. Because like you're still going to want to read the history book by like a really smart human before you read, you know, the AI's history book. But like the AI mathematician might, you know, keep pretty good, you know, two years from now, like three years from now. Jeffrey, we heavily talked about your article, which I'll have linked in the show notes. and people want to go read that for themselves firsthand. But also just tell us a little bit more about you, like where you come from, what you do,
Starting point is 01:17:29 what else you're working on. Sure. So, in my day job for the last couple years, I founded the CEO of Pastel Network, which is actually a crypto project. PSL is our ticker. We trade on a few exchanges like MECC and Gate. And so we started out as sort of,
Starting point is 01:17:53 NFT platform. It's an interesting project. It's based on the Bitcoin core proof of work concept, but with all this additional layers to it. And but sort of in the last year, we've done a big pivot to decentralized AI inference. And so I've written just tremendous amount of code in the last year to essentially let you do inference across all sorts of modalities, all sorts of providers, all sorts of providers, of AI models, including totally uncensored models. And you don't have to docks yourself by giving an email address and a credit card and your IP address. You can just pay with crypto, and it's all pseudonymous. And it's decentralized.
Starting point is 01:18:41 It's going, all the inferences being handled by these supernotes that anyone can start and themselves. And you can even, I mean, the example I like to joke about is you can use one of these uncensored versions of the Lama models and say, like, how do I make meth at home? And this will actually just tell you exactly the recipe, whereas, you know, good luck trying that on chat GPT or claws. Is this, would you call this the sovereign AI sector? Yeah, yeah. But it's really sort of decentralized. I mean, part of the thinking of it was that, you know, for me is not necessarily on the consumer level. like chat cpt, although I did make something like that.
Starting point is 01:19:22 If you go to inference.pastel.network, you can try it all in a browser and you could do inference across all these models. But it's also that to make it sort of an API that if you have another crypto project, like let's say you have a prediction market and you want to make it so anyone can, in a decentralized way, create their own prediction event. But you want to have some rules around that. Like you don't want people to make like assassination markets where they're predicting that somebody's going to die by a certain date. So you need to have some kind of moderation. You don't want to necessarily have it be that there's a moderator who has like power to delete stuff, right? Because how's that decentralized? So but so the, I think the better way to implement something like that is to have an LLM do it in a totally impartial way where we have this prompt that says,
Starting point is 01:20:14 you're not allowed to do an event about, you know, involving any of these subjects. And then you have the user wants to create their prediction event. They have to describe what is being predicted. At the time they're trying to create that event in the system, it's going to show it to an LM. The LM's going to say yes or no. And based on that, they're going to say,
Starting point is 01:20:37 no, you can't do this. You have to change it. Now, if you have this prediction market that's decentralized, you can't really go and use OpenAI or Claude for this because that requires an API key hooked up to a credit card which means that's not decentralized. It can't work like that. It has to actually be decentralized.
Starting point is 01:20:57 So that's the idea of pastel is that they could use pastel and they can say with a straight face in all honesty that this is decentralized right down the line that this is fully decentralized and that it can never be shut down by just, turning off this one API key or credit card or, you know. And so that's the basic idea. And then I have some other side projects like my YouTube transcript optimizer,
Starting point is 01:21:24 which is where I published. People are very confused. Why is this not on Medium or something or a substack? And I'm like, sorry. It's so funny because I basically was trying to, you know, help my organic search ranking of my little YouTube tool, which, you know, I've generated like $1,000 or something. of revenue from it.
Starting point is 01:21:45 And then it's like in the process, I may have inadvertently contributed to $2 trillion getting wiped off global equity markets. Because, you know, the fact is, I really, look, I don't want people to say that I'm some melaniac here, but the fact of the matter is all of the news headlines came out saying the stock market crash because deep seek. And it's like, I'd like to point out that the Deepseek V3 technical paper, the top talked about the efficiency gains. It came out December 27th. A month ago.
Starting point is 01:22:19 That's a month ago. And all these famous people like Andre Carpathie were all over this, talking about this weeks ago on Twitter. Even the newer model, the R1 model that does the chain of thought, that paper came out a week ago, and people were all over that. So why suddenly on Monday did everything crash? And I'd like to think it's because I'm pretty sure it is that I wrote this article in a way that sort of speaks to hedge fund managers. So they can understand it.
Starting point is 01:22:49 And I published it like in the middle, you know, the night on Friday. And then it started taking off. And then it got shared by Chalmuth, who has, you know, whatever, 1.8 million. Right? And Chimoth, and it's been viewed over two million times. Naval.
Starting point is 01:23:06 Navarav account has two and a half million. And then like the Y Combinator, Gary Tan, and the Y Combinator account. between them, they have millions of followers. And not only did they share it, but they were like very effusive in their praise about this is really smart. And that went crazy.
Starting point is 01:23:23 And I can tell you, I have been inundated by, you know, requests from huge funds that want to talk to me about this. And I believe that it did, in fact, you know, as crazy as it sounds, precipitate the decline. Obviously, I didn't cause it. It was caused by the underlying situation. But in terms of highlighting it, it didn't come from the, you know, investment banks. And I think part of the problem is just people like are talking in different circles. Like they're not like the people who are buying Nvidia, you know, with billions of dollars at a big fund,
Starting point is 01:24:01 are not reading the technical papers. And they're not even necessarily reading the tweets from Andre Carpathie. You know, they're just relying on sort of this consensus of. of where things are going. And all it took was sort of a really in-depth explanation that made sense to them. And they were like, holy shit, I didn't know this. And you know, can I say one other funny thing is I have,
Starting point is 01:24:26 because it's running on my own blog, I have Google Analytics. I can see real time, like who's, you know, not who, but where they are. And it's so funny because I just, when it started going, at first I was so thrilled that 50 people were reading it at once. And then it was like, before I knew it was like 1,500. people at any given moment. And it's a 60-minute read. It's 12,000 words, so it's not short. And, but at first it was like mostly guys in New York, because that's where all the hedge ones are. And, but then I noticed right before I fell asleep on Saturday night that, you know,
Starting point is 01:24:59 where the biggest place where people were reading it was San Jose. And I'm like, who? That sounds like like where Nvidia is based. And because there were like hundreds of people from San Jose reading the thing at the same time. And as of yesterday is when I last checked, over 2,000 people from San Jose read my article. And, you know, the funny thing about Nvidia is that they've gone up so much that something like 80% of the employees have more than like $10 million with the stock. And you know it's like the main thing that they talk about with their spouse and friends of like, man, I have a lot of the stock. Should I keep staying on for this ride? And they understand the technology, but maybe they don't.
Starting point is 01:25:42 don't understand how to value a company. Right. And they read this, and this thing started passing around like wildfire. And I was like, oh, my God, I bet you, Jensen's reading this, too. And I think there's a lot of stock that sort of never hit the market because it was awarded as RSUs and options to these people. And it only takes a little bit of that on the margin to start causing imbalances. And so I wouldn't be surprised if a lot of that cell pressure came from Nvidia employees.
Starting point is 01:26:11 But also it's like these big hedge funds like control a lot of this. A lot of fast money players and they suddenly got spooked and it's like, so it's wild to think about that, you know, it could have actually been this sort of,
Starting point is 01:26:25 you know, the Reichstag fire, if you will, of setting off this whole course of events. But I actually do believe without it being so, you know, I mean, I'm sure there people will say, no, about this other guy wrote this and this other guy wrote this.
Starting point is 01:26:39 And I was like, yeah, but my thing went pretty freaking viral. And from the right people. And other people's stuff cited yours, your article. Sure. Or, you know, maybe they didn't. I mean, I saw the guy, Ben Thompson from Statitri. It sort of sounded like he paraphrased my thing without giving me any credit, but whatever.
Starting point is 01:26:55 You know, it's, but, but I just think it's really funny how, like, there's headline stories today from the New York Times and the Wall Street Journal that both said, you know, they're always, they're always trying to assign causality to stuff. And they said it was caused by this. And I was like, not really that because it's a ludicrous concept that the 45x efficiency gains, that was known a month ago. So you have to explain why there was a one month lag. Okay. Whereas like this is very understandable that this spread like wildfire from thought leaders like Jamath and Naval. I mean, Naval is like put on such a pedestal by the VC guys. And you know, and the tech hedge fund guys look up to the VC guys.
Starting point is 01:27:41 like Andresa Narawitz and all these guys and the Ycombinator guys. They're the experts, right? And then you have those guys saying that this is a great article and it's like, well, okay. And so of course that can very quickly convince and it's not like you have to convince everyone.
Starting point is 01:27:59 You just have to convince, you know, the guys that like KOTU who are managing $70 billion that they should maybe like sell a little bit to get in front of this. And you're, you know, that's all you need. And so anyway, I email both of the journalist that at least, you know, you should be aware that you may have gotten the causality a little rowing on this.
Starting point is 01:28:19 But anyway. Well, Jeffrey, it's an honor to have the original source of the information on the podcast. It was great to have you as a guest. And perhaps as these AI wars, NVIDIA, chip wars, USA, we didn't even get a chance to talk about USA versus China. But as this progresses, maybe we can get you back on to just keep on commentating. Yeah, I really appreciate you coming on. Great. Thanks a lot.
Starting point is 01:28:40 Bankless Nation, you guys know the deal. Crypto is risky. You could lose what you put in. But it sounds like the traditional market is also risky too. But we're headed west. This is the frontier. It's not for everyone,
Starting point is 01:28:50 but we're glad you are with us on the bankless journey. Thanks a lot.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.