Dwarkesh Podcast - Dylan Patel — Deep Dive on the 3 Big Bottlenecks to Scaling AI Compute

Starting point is 00:00:00 All right, this is the episode of my roommate teaches me semiconductors. It's also the sendoff for this current set. Yeah, you know, after you use it, I'm like, I can't use this again. I got to get out here. No sloppy seconds for Dorken. Okay, Dylan is the CEO of semi-analysis. Dylan, the burning question I have for you, if you add up the big four, Amazon, meta, Google, Microsoft, their combined forecasted cabbacks that you published recently,

Starting point is 00:00:28 this year is $600 billion. and given, you know, yearly prices of renting that compute, that would be like close to 50 gigawatts. Now, obviously, we're not putting on 50 gigawatts this year. So presumably that's paying for compute that is going to be coming online over the coming years. So I have a question about how to think about the timeline around when that CAPEX comes online. Similar question for the labs where, you know, Open AI just announced that they raised $110 billion. Anthropic just announced they raised $30 billion. and if you look at the compute that they have coming online this year,

Starting point is 00:01:02 you should tell me how much it is, but is it not, is it not another four gigawatts total that they'll have this year? It feels like the cost to rent the compute that Open AI and Anthropic will have this year to like sustain their compute spend at, you know, $10, $13 billion a gigawatt. Those individual raises alone are like enough to cover their compute spend for the year. And then this is not even including the revenue that they're going to earn this year. So help me understand first.

Starting point is 00:01:28 First, when is the time scale at which the big tech CAPEX is actually coming online? And two, what are the labs raising all this money for if like the yearly price of a one gigawatt data center is like $13 billion? So when you talk about the CAPX of these hyperscalers, right, on the order of $600 billion, and you look at the cross the rest of the supply chain, gets you to on the order of a trillion dollars. A portion of this is, you know, immediately for compute going online this year, right? the chips and the other parts of CAPEX that do get paid this year. But there's a lot of setup CAPEX as well, right? So when we have, when we're talking about 20 gigawatts this year in America, roughly, incremental.

Starting point is 00:02:09 Incremental added capacity. A portion of this is not spent this year. A portion of that CAPX has actually spent the prior year. And so when you look at, hey, Google's got $180 billion. Actually, a big chunk of that is spent on turbine deposits for 28 and 29. A chunk of that is spent on data center construction for 27. A chunk of that is spent on, you know, power purchasing agreements and down payments and all these other things that they're doing for further out into the future so that they can set up this super fast scaling, right? And this applies to all the hyperscalers and other people in the supply chain.

Starting point is 00:02:45 And so, you know, 20 gigawatts roughly deployed this year, a big chunk of that being hypers, a chunk of not being. And all of these companies, their biggest customers are Anthropic and Open AI. Anthropic and Open AI are in the, you know, two gigawatt and, you know, two and a half gigawatt and one and a half gigawatts roughly right now. They're trying to scale too much larger, right? If you look at what Anthropic has done over the last few months, you know, $4 billion, six billion revenue added, and if we just draw a straight line, hey, yeah, they'll add another $6 billion of revenue a month.

Starting point is 00:03:15 People would argue that's bearish and that they should go faster. What that implies is that they're going to add $60 billion of revenue across the next 10 months, right? And $60 billion of revenue at the current gross margins that Anthropic had, at least last reported by media, would imply that they have, you know, roughly $40 billion of compute spend for that inference for that 60 bill of revenue. That 40 billion of compute at roughly $10 billion a gigawatt rental cost means that they need to add four gigawatts of inference capacity just to grow revenue. And that's saying that their research and development training fleet stays flat, right?

Starting point is 00:03:53 So, you know, in a sense, Anthropic needs to get to well above 5 gigawatts by the end of this year, and it's going to be really tough for them to get there, but it's possible. Can I ask a question about that? So if Anthropic was not on track to have 5 gigawatts by the end of this year,

Starting point is 00:04:08 but it needs that to serve both the revenue that's gone crazier than expected, and maybe it's going to be even more than that, plus the research and training to make sure its models are good enough for next year, how, where is that going to come from? You know, Dario, when he was on your podcast, podcast was very, very, like, conservative.

Starting point is 00:04:24 He's like, you know, I'm not going to go crazy on compute because if my revenue inflex at a different rate, at a different point, I don't want to go bankrupt. You know, I want to make sure that we're being responsible with this scaling. But in reality, you know, he's definitely missed the pooch in terms of like going like Open AI, which was let's just sign these crazy fucking deals, right? And Open AI has kind of got way more access to compute than Anthropic by the end of the year. And so what does Anthropic have to do to get the compute?

Starting point is 00:04:50 Well, they have to go to lower quality providers that they would not have gone to before, right? You know, optimally, you know, Anthropic, at least historically, has had the best quality providers been like Google and Amazon. Whereas, you know, at least historically minded, you know, the biggest companies in the world, now Microsoft and now they're expanding across the supply chain and going to other players that are newer. OpenA. OpenA. has been, you know, a bit more aggressive on going to many players. Yes, they have tons of capacity from Microsoft. They have Google and Amazon as well, but they also have like tons with CoreWeave and Oracle. and they've gone to like random companies or you know one would think random companies like soft bank energy who has never built a data center in their life but you know they're building data centers now for opening i so they've gone to and many others like n scale and others that they're going and getting capacity from and so there's this like conundrum for anthropic because they were so conservative on compute um because they didn't want to go crazy right and in some sense a lot of the financial freakouts in the second half of last year were like opening i signed a

Starting point is 00:05:49 all these deals, but they don't have the money to pay for them. Okay, Oracle stock's going to tank. Oh, okay, Corrieve stock's going to take. Oh, okay, like, you know, all these company stocks tanked. And credit markets went crazy because people were like, the end buyer can't pay for this. Now it's like, oh, wait, they raised a ton of money. Okay, fine, they can pay for it. But in the sense, Anthropic was a lot more conservative.

Starting point is 00:06:07 They were like, we'll sign contracts, but we'll be principled and we'll purposely undershoot what we think we can possibly do and be conservative because we don't want to potentially go bankrupt. The thing I want to understand is, so, you know, what, What does it mean to have to acquire compute in a pinch? Is it that you have to go with, like, neoclouds that, is it that they have worse computers? Like, in what way is it worse? And is it that you have to pay gross margins to a coprider that you wouldn't have otherwise

Starting point is 00:06:32 had to pay to because they're coming in the last minute? Who built the spare capacity such that it's available for Anthropic and OpenEye to get last minute? And, like, basically, what is the concrete advantage that Open AI has gotten if they end up at similar compute numbers by 2027? is it just like they're going to end this year with different gigawatts? If so, how many gigawatts is anthropic and opening I'm going to have

Starting point is 00:06:53 by the end of this year? Yeah. So to acquire excess compute, I mean, yes, there is capacity at hyperscalers that and not all contracts for compute are long term, right, five years, right? There's compute that in 2023 or 2024, H-100, 2020-25,

Starting point is 00:07:08 that were signed at not five-year deals, right? Open-I, the vast majority of their compute is signed at five-year deals. But they could, you know, there were many other customers that had one year, two year, three year deals, six-month deals on demand. And as these contracts roll off, who is the participant in the market most willing to pay price?

Starting point is 00:07:27 And in this sense, right, we've seen H-100 prices inflect a lot and go up and people willing to sign long-term deals for, you know, as above $2 even, right? Like I've seen deals where certain AI labs, I'm going to be a little bit vague here for a reason, have signed at as high as $2.40 for two to three years for H-100s, which if you think about the margin, $1.40 for Hopper when you release it or Hopper to build it across five years. And now two years in your signing deals that are two to three years that are $2.40. Those margins are way higher. And so now you can crowd out all of these other suppliers, whether it's Amazon had these or Corrieve had these or Together AI or Nebius or whoever it is, right? You know, these

Starting point is 00:08:14 neoclouds are the firms that had a higher percentage of hopper in general because they were more aggressive on it a and b b they tended to sign shorter term deals you know not core weave but the others tended to sign shorter term deals and so hey if i want hopper there is some capacity out there and then also while most of the capacity at like an or a cor weave is signed for a long-term deal in terms of blackwell uh anything that's going online this quarter's already sold. And in some cases, they're not even hitting all the numbers that they promised they would sell because there are some data center delays, not just those two, but like Nebius and all the other folks, Microsoft, Amazon, Google. But there is a lot of Neal Clouds, as well as some of the hypers who have capacity they're building that they did not sell yet,

Starting point is 00:08:58 or capacity that they were going to allocate to some internal use that is not necessarily super AGI focused that they may now turn around and sell. Or they may, you know, in the case of Anthropic, they don't have to have all the compute directly, right? Amazon can have the compute, they can serve bedrock or Google can have the compute and serve Vertex or Microsoft can have the compute and serve foundry and then do a revenue share with Anthropic or vice versa. Basically, you're saying, Anthropic is having to pay either this like 50% markup in the sense of the revenue share or in the sense of last minute spot compute that they wouldn't have otherwise had to pay had they bought the computer early.

Starting point is 00:09:32 Right. And, you know, there's a trade off there. But also at the same time, you know, for a solid like four months, everyone was like opening we're not going to sign deals with you. Like, that sounds crazy, right? Because you guys don't have the money. Now everyone's like, yeah, open eye, we believed you the whole time. We can sign any deal because you've raised all this money.

Starting point is 00:09:50 But in a sense, Anthropic is constrained in that sense. There are not that many incremental buyers of compute yet because Anthropic hit the capabilities here first where their revenue is moaning. That's interesting. Like that's this, you know, because otherwise we're like, well, having the best model is an extremely depreciating asset that, you know, three months later are you don't have the best model. But like, the reason it's important is that you can sign these deals and then lock in the compute in advance, get better prices. Doesn't this also imply, by the way,

Starting point is 00:10:21 and maybe this is an obvious point, but there's, at least until recently, people had made this huge point about, oh, what is the depreciation cycle of a GPU? And the bears, Michael Burry's or whatever, have said, look, people are saying that four or five years for these GPUs. And in fact, if you, maybe it's because the technology is improving so fast or whatever, in fact it makes sense to have two-year depreciation cycles for these GPUs, which increases the sort of like reported amortized KAPX in a given year, and so it makes it maybe financially less lucrative to building all these clouds. But in fact, you're pointing at like maybe the depreciation cycle is even longer than five years,

Starting point is 00:11:00 because if we're using hoppers and then especially if AI really takes off and in 2030, we're like, fuck, we got to like get the seven nanometer fabs up and we got to like, We've got to go back to the A100s. Like, turn on the A100s again. Then it's like, actually the depreciation cycle is incredibly long. And I feel like that's an interesting financial implication of what you're saying. There's a few strings to pull on there. One is what happens to depreciation of GPUs, right?

Starting point is 00:11:29 And I guess I didn't answer your prior question, which is like Anthropic, I think we'll be able to get to like five gigawatts-ish, maybe a little bit more by the end of the year. through themselves as well as their product being served through bedrock or through vertex or through foundry. I think they'll be able to get to five or six gigawatts, which is way above their like initial plans, right? You know, and anyways, that's sort of like, and an open eye will be a little, roughly the same, maybe a little higher. Actually, a little bit higher based on our numbers. But anyways, the depreciation cycle of a GPU, right? Michael Burry was saying it's, you know, three years or less, right? is like sort of his argument.

Starting point is 00:12:06 And there's sort of two ways and lenses to look at this. Like mechanically, in this, you know, there's a TCO model, right, total cost of ownership of a GPU where we sort of project pricing out for GPUs and build up the total cost of a cluster. But there's a number of costs, right? There's your data center cost, right? There's your networking costs. There's your smart hands and people in the data center swapping stuff out.

Starting point is 00:12:29 There's your spare parts, right? There's your actual chip cost. There's your server costs. All these various costs get slumped together. and there's some depreciation cycles on it. You know, there's certain credit costs on it. And you get to, okay, that's how you build up, hey, an H100 cost $1.40 an hour to deploy at volume across five years if your depreciation

Starting point is 00:12:46 is five years. And then if you sign a deal at $2 an hour for those five years, your gross margin is roughly 35%. It's a little bit above that. But, you know, if you sign it for $1.90, it's 35% roughly. And then you assume at that fifth year, the GPU falls off a bus, right? It's dead. And in some cases, you know, sort of the R&DGPWRGP.

Starting point is 00:13:04 argument people are making is, well, if you didn't sign a long-term deal, because every two years in videos, tripling, quadrupling the performance, while only two-xing the price or 50% increasing the price, then the price of an H-100, sure, maybe the value in the market was $2 at 35% gross margins in 2024, but in 26, when Blackwell is in super high volume and deploying millions a year, you're actually now worth a dollar an hour. And when Rubin in 27 is in super high volume, even though it starts shipping this year isn't super high volume next year doing millions of chips a year

Starting point is 00:13:38 deployed into clouds you've got another 3x in performance and another 50% or 2x in price actually the hopper's only worth 70 cents an hour and so the price of a GPU would continue to fall that's like one lens

Starting point is 00:13:51 the other lens is what is the utility you get out of the chip right because if you could build infinite Rubin or infinite of the newest chip then yes that's exactly what would happen the price of a hopper would fall at a spot or a short-term contract rate as the new chips come out

Starting point is 00:14:07 and the price per performance goes up. But because you are so limited on semiconductors and deployment timelines and all these things, you end up with actually what prices these chips is not, hey, what's the comparative thing I can buy today? It's actually what is the value I can derive out of this chip today, right?

Starting point is 00:14:26 And in that sense, let's take GPD 5.4. GPD 5.4 is both way cheaper to run than GPD 4, has fewer active parameters. It's much smaller, right, in that sense of active parameters. Plus, because it's a sparser M.O.E versus GPD4 being a coarser M.O.E. There's also been so many other advancements in training, RL, model architecture, et cetera, data qualities, all these things that have made GPD5.4 way better than GPD4, and it's cheaper to serve. And so when you look at an H100, it can serve more tokens per GPU of 5.4 than if you had ran for GPD4 on it. So at some sense, it's producing more tokens of a model that is of higher quality. Interesting.

Starting point is 00:15:09 And so in some sense, you know, obviously GPD4, what is the maximum tam for its tokens? You know, maybe it was a few billion dollars, maybe those tens of billions of dollars, adoption takes time. For GPD 5.4, that number is probably north of 100 billion, but there's an adoption lag and there's competitions. Other people are getting it. and there's the constant improvements that everyone else is having. So if improvements stopped, you know, here, the value of an H-100 is now predicated on the value that GPD 5.4 can get out of it instead of the value that GPD4 can get out of it. And the margins and all that stuff that these labs are doing and they're in a competitive environment so their margins can't go to infinity. So you sort of have this, like, dynamic that is quite interesting in that.

Starting point is 00:15:49 And H-100 is worth more today than it was three years ago. That's crazy. And, I mean, it's also interesting from the perspective of like, just take that forward. If we had actual AGI models developed, if we had like genuinely human on a server, and a human, like, on a flop basis, in H100, these are such hand-wavy numbers about how many flops can the brain do.

Starting point is 00:16:08 But on a flop basis, an H-100 is estimated to 1E15 is like how much some people estimate the human brain does in flops. Obviously, in terms of memory, the human brain has way more. H-100 is like 80 gigabytes and brain might have petabytes. Oh, yeah, you've got petabytes? Name a petabyte of ones and zeros, bro. Name me a string. Well, this is actually the point.

Starting point is 00:16:32 Or like actually in... No, we've just got the best sparse attention techniques ever. Genuinely, right? Like, in the sort of like amount of information that is compressed, it might be petabytes. But like the actual... You know, it's like extremely sparse ammo. But anyways, imagine if we had a human knowledge worker can produce six figures a year of value. And so if an age 100 can produce something close to that, if we had actual humans on a server,

Starting point is 00:16:55 The value of an H100 is like it can repay itself in the course of like a couple of months. So as I've been going through everything to prep for taxes, I realized that I worked with over 50 different contractors last year, from cinematographers to audio technicians to editors. And I owed all of them 10.99s. In the past, I've just used a spreadsheet and a big folder of invoices to figure out who I need to collect tax forms from. But with so many contractors, this takes a bunch of time,

Starting point is 00:17:20 and I've almost missed some people. This year, though, Mercury made my process way more straightforward. Whenever I pay somebody in 2025, I just hit a toggle to have Mercury request to $1.9 from them. Because of that, everything that I needed to issue 1099s got sent directly to Mercury. I literally just clicked a button and Mercury generated and sent them all out. This is just one of the many things that I never would have assumed that a banking platform could just handle for me. Mercury has a bunch of features like this, which are going to collectively save me multiple days this tax season. You can learn more at mercury.com.

Starting point is 00:17:50 Mercury is a fintech company, not an FDIC insured bank. banking services provided through Choice Financial Group and Column N.A. Members FDIC. So when I interview Dario, the point I was trying to make is not that I think the singularity is two years away, and therefore, Dario desperately needs to buy more compute. Although the revenue is certainly there that he needs to buy more compute. But the point I was trying to make is that given what Dario seems to be saying, given his statements that were two years away from a data center of geniuses, certainly not more than five years away. and data server geniuses should be running trillions upon trillions of dollars of revenue. It just does not make sense why he keeps making these statements about being more conservative on compute or to your point, being less aggressive than open AI on compute.

Starting point is 00:18:34 And I guess that point got lost because people were like roasting me about like, oh, this podcast was I try to convince this like multi-hundred-billion-dollar company CEO. Like, why don't you, you all know it, bro? But no, I was trying to say that internally his statements are inconsistent. Anyway, so it's good to iron it out. Yeah, I think, you know, going back to like sort of the earlier view that if the models are so powerful, the value of a GPU goes up over time. As we approach closer and closer to, you know, let's say a point where right now only open-eyeanthropic have that viewpoint. As we approach further and further out, actually everyone is going to, even with open source models, be able to like sort of like start to see that value skyrocket per GPU. and so in that sense you should you should commit now to compute but interestingly in like in

Starting point is 00:19:25 anthropic fashion right you know there there's a bit of a meme that they are they don't they have problems with commitment issues and they're like sort of polyamberous not not darry but this is a bit of a meme this explains everything um by the way so there's this interesting economics effect called alken allen which is the idea that if you increase the fixed cost of different goods, one of which is higher quality and which is lower quality, that will make people choose the higher quality good on the margin. So to give a specific example, suppose the, you know, better tasting apple costs $2 and then like the shittier apple, apple costs $1. Okay, now suppose you put an import tariff on them. And so now, now it's $3 or just $2 for like great apple, medium apple, right?

Starting point is 00:20:15 Is that because they both increased by a dollar or should it be like 50% increase? No, no, because they both increased by a dollar. The whole effect is that if there's a fixed cost that's applied to both, the relative price, the price difference between them, the ratio changes. So previously it was like the more expensive one was 2x more expensive and that was just 1.5x more expensive. So I wonder if applied to AI that would mean that, look, if GPUs are going to get more expensive, there will be a fixed cost increase in the price of compute. Yes. as a result, that will push people to be willing to pay higher margins to for slightly better models. Because the calculus is, I'm going to be paying all this money for the compute anyways.

Starting point is 00:20:55 I might as well just pay slightly more to making sure it's like the very best model rather than a model that's slightly worse. Right. So the hopper went from $2 to $3. And if a hopper can make a million tokens of Opus and it can make two million tokens of Sonet, the price differential between Opus and Sonnet has decreased because the price of the GPU has increased by a dollar from two to three. Interesting. I think that makes a ton of sense. Also, I think we just see all of the volumes are on the best models today, all the revenues

Starting point is 00:21:26 on the best models today. And in a compute limited world, there's sort of two things that happen, right? A, companies that have locked up, you know, and don't have commitment issues, you know, have these five-year contracts for compute, they've kind of locked in a humongous margin advantage because they've locked in compute for five years at a price of what it transacted at five years ago or three years ago or two years ago, whatever it is. Whereas if you're now three years into that five-year contract and someone else's two-year contract or three-year contract rolled off, and now you're trying to buy that at, you know, modern pricing when you're priced to the value

Starting point is 00:22:01 of models, the price is going to be up a lot more. And so in a sense, like the person who committed early has better margins in general. And the percentage of the market that is in long-term contracts is much larger than the percentage of the market in short-term contracts that can be this sort of flex capacity that you add at the last second. And at the same time, right, so where does the margin go, right? Because models get more valuable. How much can the cloud players flex their pricing?

Starting point is 00:22:32 Well, in fact, like if you look at Corrieve, their average term duration is like over three years right now for like 90% plus of their compute. It's over three years. And so they end up with this like conundrum of like, well, they can't actually flex price. But every year they're adding incrementally way more capacity than they had previously, right? This year alone, right, meta's adding as much capacity as they had in the entire fleet of compute and data centers for all purposes for serving WhatsApp and Instagram and Facebook in 2022 and doing AI, right?

Starting point is 00:23:03 They're adding that alone this year. So in the same sense, you know, you talk about meta's doing that. Correve and Google and Amazon, all these companies are adding insane amounts of compute year on year on year, that new compute gets transacted at the new price. So in a sense, yes, you've locked in, as long as we're in a sort of a takeoff, right? Oh, opening I went from 600 megawatts to two gigawatts last year and from two gigawatts to, you know, six plus this year and, you know, six to 12 next year, right? The incremental added compute is where all the cost is, not the prior long-term contracts.

Starting point is 00:23:33 So then who holds the card is the info providers for, charging margin, right? So now the cloud players, the neoclods or the hypers can charge the margin, oh, they can't because, or they can't to some extent, but then as you go upstream to, oh, well, who has access to all the memory and logic capacity? Well, it's, it's in video for the most part. They've signed a lot of long-term contracts. You know, they've got like $90 billion of long-term contracts today, and they're negotiating three-year deals with the memory vendors today. You know, you've got, you know, obviously Amazon and Google through broadcom and they're, you know, Amazon directly and all these companies sort of AMD. These

Starting point is 00:24:06 companies hold all the cards because they've secured the capacity. And TSMC is not raising prices, but memory vendors are just like sort of to some extent raising a lot of price, right? So they're going to double or triple price again. But then they're also signing these long-term deals. So who is able to accrual the margin dollars is actually, you know, potentially the cloud, potentially the chip vendors and the memory vendors until TSMC or ASML like break out and they like, no, actually we're going to charge a lot more. But at the same time, do the model vendors get to charge crazy margins. I think at least this year we're going to see margins for the model vendors go up a lot,

Starting point is 00:24:41 right? Because they're so capacity constrained, they have to destroy demand, right? There's no way they can continue, Anthropic can continue at the current pace without destroying demand. Yeah. Let's get into logic and memory. How specifically Nvidia has been able to lock up so much of both? So if you, I think according to your numbers by 27, Nvidia is going to have like 70 plus

Starting point is 00:25:05 percent of N3 wafer capacity or something like that, or around that area. And then I forget what the numbers were for a memory at SK-Hinex and Samsung and so forth. But if you look at – so think about how the neocloud business works and how NVIDIA works with that or how the RL environment business works and how Anthropic works with that. In both those cases, Nvidia is purposely trying to fracture the complementary industry to make sure that they have as much leverage possible. So they're giving, you know, allocation to random neoclots to make sure that there's not one person that has all the compute.

Starting point is 00:25:39 Similarly, Anthropic or Open AI when they're working with the data providers, they say, no, we're going to just cede a huge industry of these things so that we're not locked into any one supplier for data environments. And I wonder why on the three nanometer process, that's going to be trinium three, that's going to be TPUV7, other accelerators potentially. and why is TSM's just giving it all up to Nvidia rather than trying to fracture the market? Yeah, so I think there's a couple like points here, right?

Starting point is 00:26:12 On three nanometer, you know, if we go back to last year, the vast majority of three nanometer was Apple. Apple's being moved to two nanometer, memory prices are going up so Apple's volumes may go down, right? Because as memory prices go up, they have to, either they cut margin or they move on. You know, there's some time lag because they have long-term contracts. But basically, Apple likely reduces demand slash moves to two nanometer faster, where two nanometers

Starting point is 00:26:36 is only capable of sort of mobile chips today. And in the future, AI chips will move there. So sort of Apple has that. And then Apple is also talking to third-party vendors because they're getting squeezed out of TSM a little bit because TSM's margins on high-performance computing, HPC, AI chips, etc., is higher than it is for mobile because they have a bigger advantage in mobile, sorry, in HPC than they do in mobile. But anyways, when you look at what's TSM running calculus here? Actually, they're providing really good allocations to companies that are doing CPUs, right?

Starting point is 00:27:09 So when you think about, hey, Amazon has Traneum and Amazon has Graviton, both of those are on 3 nanometer. Graviton being their CPU training being their AI chip, they're actually, TSMC is much more excited to give allocation to Graviton than they are to Trayium because they view CPU business as more stable long-term growth, right? And as a company that is conservative and doesn't want to ride cycles of growth too hard, you actually want to allocate to the market that is more stable and lower growth rate first before you allocate all the incremental capacity to the fast growth rate market. Now, that is the case generally. And so when you look at like, hey, same for AMD, right? The allocations they get on, you know, their CPUs is like TSM is much more excited about

Starting point is 00:27:57 those than they are for GPUs. Likewise for Amazon. And Invidia is a bit unique because, yes, they have CPUs, yes, they make switches, yes, they make networking. They make NVLink. They make all these different infantaband Ethernet, all these different products, Nix. By and large, most of these things will be on 3 nanometer by the end of this year with the Rubin launch and all the chips that are in that family. The GPU being the most important one. And yet, Nvidia is getting the majority of supply, right?

Starting point is 00:28:23 part of this is because you look at the market and you like sort of like, you know, TSM and others, like there are many ways that they forecast market demand. But also it's market signal, right? The market signaled, hey, we need this much capacity next year. We need this much. We need this much. We'll sign non-cancelable, non-returnable. We may even pay deposits, right?

Starting point is 00:28:45 Things like this. InVidia just did it way earlier than Google or Amazon. And in some cases, Google and Amazon. Amazon had stumbling blocks. You know, there was one, one of the chips got delayed slightly by a couple quarters, tranium and all these sorts of things happened. And so in that case, there was a huge sort of like, okay, well, these guys are delaying, but Invidia is wanting more, more, more, more, more, more, and we are checking with

Starting point is 00:29:09 the rest of the supply chain. Is there enough capacity? Right. So they're going to all the PCB vendors, and they're saying, hey, is there enough victory giant? Is there enough PCB? This is like one of the largest suppliers of PCBs to Nvidia and they're a Chinese company. All the PCBs come from China, sort of from them.

Starting point is 00:29:23 or many of them. And anyways, they're like, do you have enough PCB capacity? Great. Oh, hey, memory vendors, who has all the memory capacity? Oh, okay, Nvidia does. Great. So when you look at sort of in the same way, you know, who is AGI-pilled enough to buy compute in long timelines at levels that seem ridiculous to people who aren't

Starting point is 00:29:40 AGI-pilled, but nonetheless, they're willing to pay a pretty good margin and sign it now because they view in the future that ratio is screwed up. The same thing happens with the supply chain for semiconductors, right? InVideo was, while I don't think Nvidia is quite AGI-pilled, right? You know, Jensen doesn't believe software is going to be automated fully and all these things, right? Accelerated computing, not AI ships, right? It's AI chips, right? But that's what he calls it, right?

Starting point is 00:30:05 Yeah, because, I mean, I think there's a broader term, right? AI is within that, but, like, physics modeling and simulations and like... Or maybe just like he's not embracing the sort of, like, main use case. I think he's embracing it. But, like, I just don't think he's, like, AGI-pilled like Dario, right? Or Sam. But he's still way, way more AGI-pilled than... Google was at Q3 of last year or Amazon was at Q3 of last year and he saw way more demand, right?

Starting point is 00:30:30 And the reason is pretty simple. You know, you can see all the data center construction. He's like, okay, I want to have this market share. You know, we sort of like have all the data centers tracked and, you know, you can see, you know, there's a lot of data centers that you could say, well, they could be one or the other, right? And so to some extent, Google and Amazon, you know, Google especially, even though their, you know, their TPU is just better for them to deploy. They have to deploy a crap load of GPUs because they don't have enough TPUs. to fill up their data centers. They can't get them fat.

Starting point is 00:30:55 Wait, can I? So I have a question about that. Google sold, I think, a million, was it the V7s, the Ironwoods to Anthropic? And you're saying in general, there's this big bottleneck right now this year or next year. I mean, I guess going forward forever now is going to be the, you know, logic memory, the stuff that like it takes to build these ships. And Google has deep mind. This is the other third prominent AI lab.

Starting point is 00:31:19 And if this is the big bottleneck, why would they sell it rather than just giving it to deep mind? Right. So this is, again, like, a problem of, like, you know, deep mind people were like, this is insane. Why did we do this? Right. But then Google cloud people and Google executives saw a different like thought process, right? And basically, you know, you and I know the compute team. There's one guy from, you know, both of them actually came from Google at the main people on the compute team atthropic. They saw this dislocation. They negotiated a deal and they were able to get access to this compute before Google realized. And so actually the change. And so actually the change. chain of events, at least from our data that we found was in early Q3, we saw over the course of two, over the course of like six weeks, we, we saw capacity on anthropic, or sorry, on TPUs go up by a significant amount over the course of those six weeks. And it went up like multiple times in those six weeks, right? There were multiple requests. Google even had to go to TSMC and explain to them why they needed this increase in capacity because it was so sudden. But that, a lot of that capacity increase was for selling to Anthropic.

Starting point is 00:32:24 Yeah. Because Anthropic saw it before Google. And then Google had Dano Bonano and Gemini 3, which caused their user metrics to skyrocket. And leadership at Google was like, oh. And then they started making the statement of we have to double compute every, is it six months? Or I don't remember the exact number that they said. But they really woke up a lot more. And then they're like, oh, hey, TSMC, we want more.

Starting point is 00:32:45 We want more. And it's like, well, sorry, guys. Like, we're sold out for next year. We can work on next year. we can maybe get like 5, 10% more for 26, but really we're going to work on 27, right? There's sort of like, you know, there's this like information asymmetry of the labs in my mind, right? I don't know if this is exactly the narrative I've spun myself from seeing all the data in the supply chain on like way for orders and like what's going on with the data centers that, you know, anthropic signed and fluid stack signed and all this. Like sort of it's, it's pretty clear to me that Google screwed up.

Starting point is 00:33:12 And you can see this from Google's Gemini ARRs, right? they had next to nothing in Q1, Q3, Q3 a little bit, right, once they started inflecting. But Q4, they were at like 5 billion ARR, right, exiting or something like this. So it's like, or 5 billion revenue for Q4 on an ARR basis. And so it's clearly like Google didn't see revenue skyrocket. And in a sense, right, Anthropic was not willing, you know, has kind of had like a little bit of commitment issues before their ARR exploded, even though they have far more information asymmetry and see what's coming down the pipe. Google is going to be more conservative than anthropic is, A, and B, Google had even less ARR. So they sort of were like, I think, just not willing to like sort of do it.

Starting point is 00:33:56 And then they realized they should do it. And so now since then, Google has gotten absurdly AGI-I-pilled, right, in terms of like what they're doing. They bought an energy company. They're buying putting deposits down for turbines. They're buying a ridiculous percentage of the powered land. They're going to utilities and negotiating like that. long-term agreements are doing this on the data center and power side very, very aggressively, right? So, you know, I think Google woke up towards the end of last year, but it took them

Starting point is 00:34:25 some time. And how many gigawatts do you think Google will have by the end of next year? By my data. You charge for that kind of information. Yes, yes. I feel like every year, the bottleneck for what is preventing us from scaling AI compute keeps changing. A couple years ago was co-os. Last year, it was power this year. You'll tell me where the bottleneck is this year, but I want to understand five years out, what will be the thing that is constraining us from deploying the singularity? Yeah, I think the biggest bottleneck is compute, and for that, the longest lead time supply chains are not power or data centers. They're actually

Starting point is 00:34:59 the semiconductor supply chain themselves, right? It switches back from being power and data center as a major bottleneck to chips. And in the chip supply chain, there's a number of different bottlenecks, right? There's memory, there's logic wafers for. MTSMC, there's FABs themselves. Construction of the FABs takes a couple years, two to three years versus a data center takes less than a year, right? We've seen Amazon build data centers in as fast as eight months, right? So there's a big difference in lead times because of the complexity of the building, the FAB that actually makes the chips. And then the tools, right, those also have really long lead times. And so the bottlenecks as we've scaled have shifted from,

Starting point is 00:35:40 hey, what is the supply chain currently not, what is it currently not able to do, which was co-os and power and data centers, but those were all shorter lead time items, right? Co-os is a much more simple process of packaging chips together. Power and data centers are ultimately way more simple than the actual manufacturing of the chips. And so there's been some sliding of capacity across, you know, mobile or PC to data center chips, but that's been somewhat fungible, whereas on, whereas co-os and power and data centers, those have sort of had to start anew as supply chains, but now there's sort of no more capacity for the mobile and PC industries, which used to be the majority of the semiconductor industry,

Starting point is 00:36:23 to shift over to AI, right? NVIDIA is now the largest customer at TSMC, and Nvidia is the largest customer, SK Hynix, the largest memory manufacturer, right? So it's sort of impossible for this scaling or this sliding of, resources away from the common person, right, PCs and smartphones to shift any more towards the AI chips. And so now how do we scale the AI chip production? And that's the biggest bottleneck as we go to 2030 is those. It would be very interesting if there's an absolute gigawatt ceiling that you can project out to 2030 based just on, hey, we can't produce more than this many

Starting point is 00:37:01 EUV machines. Right. So to scale compute further, right, there's some different bottlenecks. This year, next year, but ultimately by 28, 29, the bottleneck falls to the lowest rung on the supply chain, which is ASML, right? ASML makes the world's most complicated machine, i.e. an EUV tool, and the selling price for those is $300, $400,000, $400 million. And currently, they can make about $70. Next year, they'll get to $80. Even under very aggressive supply chain expansion, they only get to a little bit over $100 by the end of the decade. And so what does that mean? Okay, they can make 100 of these tools by the end of the decade and, you know, 70 right now. How does that actually translate to AI compute, right? We see all these numbers from Sam Altman and many others across

Starting point is 00:37:47 the supply chain, gigawatts, gigawatts, gigawatts, right? How many gigawatts are we adding? And we see, you know, Elon saying, hey, the 100 gigawatts in space. A year. A year, right. The problem with any of these numbers or the challenge to these numbers is, you know, actually not the power or not the data center. We can dive into that. But it's, it's, it's, it's, it's, it's, it's many. manufacturing the chips, right? So a gigawatt of, you know, NVIDIA's Ruben chips, right? So Rubin is announced at GTC, I believe the week this podcast goes live. And to make a gigawatt worth of data center capacity of Nvidia's latest chip that they're releasing at the end of this year, towards the end of this year, you need, you know, a few different wafer technologies, right? You need

Starting point is 00:38:28 about 55,000 wafers of 3 nanometer. You need about 6,000 wafers of 5 nanometer. And then you need about 170,000 wafers of DRAM, right, memory. And so across these three different buckets, each of these requires different amounts of EUV. Right. So when you manufacture a wafer, there's thousands and thousands of process steps where you're depositing material, removing them. But the sort of key critical step, which at least in advanced logic is like 30% of the

Starting point is 00:38:58 cost of the chip, is something that doesn't actually put anything on the wafer. You take the wafer, you deposit photoresist, which is like a chemical that, basically chemically changes when you expose it to light, and then you stick it into the EV tool, which shines light at it in a certain way. It patterns it, right? Because there's what's called a mask, which is a stencil effectively for the design. And so when you look at a wafer, you know, leading edge 3 nanometer wafer has 70 or so masks, right?

Starting point is 00:39:21 70 or so layers of lithography, but 20 of them are the most advanced EUV, right? And that specifically, you know, if you think about, okay, well, if I need 55,000 wafers for a gigawatt, if I do 20 EUV way past. per wafer, you then can do the math that's like, okay, that's 1.1 million passes of EUV for a single gigawatt. So actually, like, it's pretty simple. And then once you add the rest of the stuff, it ends up being two million, right, across five nanometer and all the memory.

Starting point is 00:39:49 You're at roughly two million EUV passes for a single gigawatt. You know, these tools are very complicated. So when you think about what it's doing across a wafer, it's taking the wafer and it's scanning and it's stepping across, right? It's standing, stepping across, and it does this hundreds of times across the entire, or dozens of times across the whole wafer. And so when you're talking about, hey, how many EUV passes, that's the entire wafer is being exposed at a certain rate.

Starting point is 00:40:15 A wafer, a UV tool can do roughly 75 wafers per hour. And the tool is up roughly 90% of the time, right? So in the end, you end up with, actually, I need about three and a half EUV tools to do the 2 million EUV wafer passes for the gigawatt. So three and a half EUV tools satisfies a gigawatt. So it's funny to think about the numbers, right, because we're talking about, oh, what's a gigawatt cost? It costs like $50 billion, roughly, right? Whereas what does three and a half?

Starting point is 00:40:41 EUV tools cost, that's like 1.2, right? It's actually, like, quite a lower number, which is interesting to think about, like, oh, 50 gigawatts of economic, you know, sort of CAPEX in the data center. And what gets built on top of that in terms of tokens is even larger, right? It might be $100 billion worth of AI value into the supply chain is held up by this $1.2 billion worth of tooling that simply just cannot expand its supply chain quickly. In fact, it goes, even the intermediate layers are sort of shocking here. So Carl Zeiss, which is like the optic supplier that is bottlenecking ASML itself,

Starting point is 00:41:18 I checked its market cap this morning. You know what it is? $2.5 billion. Dude, let's LBO it. Let's LBO it. And I think, so you had this article recently where you were saying over the last three years, TSM has done $100 billion of CAPEX. So it's like 30, 30, 40.

Starting point is 00:41:35 And if you think about, I mean, a small fraction of that is sort of like being used by Nvidia for the three nanometer that it's going to, or, you know, previously for a nanometer that it's using for its chips. But Nvidia has turned that into, what was, what are it's like, earnings last quarter was like 40 billion? And so 40 billion times four. So 160 billion dollars. So invidia alone is turning some small fraction of 100 billion in Kepax that's going to

Starting point is 00:42:02 be depreciated over many years, not just this one year, into $160 billion in a single year. And then that gets even more intense when you go down the supply chain to ASML, which is taking a billion dollars with the machines to produce a gigawatt. And, of course, those machines last for more than a year, right? So it's doing more than that. Okay, so now I want to understand, okay, well, how many such machines will there be by 2030 if you include not just the ones that are sold that year, but are, have been compiling over the previous years.

Starting point is 00:42:26 And what does that imply about the... Sam Altman says he wants to do a gigawatt a week in 2030. When you add up those numbers, is that compatible with that? Right. That's completely compatible, right? Because if you think about TSM and the entire ecosystem has something 250 to 300 EUV tools already, and then you stack on 70 this year, 80 next year, growing to 100 by 2030, you're at like 700 EUV tools by the end of the decade.

Starting point is 00:42:52 700 EUV tools, 3.5 tools per gigawatt, assuming it's all allocated to AI, which it's not, but 3.5 tools per gigawatt gets you to 200 gigawatts worth of AI. chips for the data centers to deploy, right? So 200 gigawatts, Sam wants 50 gigawatts, right, 52 gigawatts a year. He's only taking 25% share then, right? Obviously, there's some share given to, you know, mobile and PC, assuming that, you know, for some reason, we're allowed to even have consumer goods still, you know, and we don't get priced out of them. But, you know, roughly, like, he's saying 25%, 50%, you know, 25% market share of the total chips fab. That's, that's kind of like very reasonable given, you know, this year alone, I think he's going to have access to 25% of the

Starting point is 00:43:34 Blackwell GPUs that are deployed, right? So it's not that crazy. I find it surprising that, you know, when was the first, when did ASML start shipping UV tools when the 7 nanometer started? So I don't know when that was exactly. But you're saying in 2030, they're going to be using machines that initially were shipped in 2020. So 10 years, you're using the same most important machine in this most technologically advanced industry in the world. I find that surprising. So ASML's been shipping EUV tools now for roughly a decade, but it only entered mass volume production around 2020. You know, the tool's not the same. You know, back then the tools were even lower throughput. There's various specifications around them called overlay, right?

Starting point is 00:44:17 You know, as mentioning your stacking layers on top of each other, right? You'll do some EUV. You'll do a bunch of different process steps, depositing stuff, etching stuff, cleaning the wafer, you know, dozens of those steps before you do another EUV layer. There's a spec called overlay, right, which is, okay, you did all this work, you drew these lines on the wafer. Now I want to draw these dots, right? Let's just say I want to draw these dots to connect these lines of metal and then, you know, holes, and then the next layer up is another set of lines that goes perpendicular, so now you're connecting wires going perpendicular to each other. You have to be able to land them on top of each other. So it's called overlay. And overlay is a spec that's been improved rapidly by

Starting point is 00:44:54 ASML. Way for throughput has been improved rapidly by ASML. And also the price of the tool has gone up, but not as much as the capabilities of the tool, right? Initially, the EUV tools were like 150 million, and over time, they're now like 400 million, you know, as I look out to 2028. But the capabilities of the tools have more than doubled as well, right? Especially on throughput and overlay accuracy, which is the ability to stack, you know, accurately align the subsequent passes on top of each other, even though you do tons of steps between. And so this is, you know, ASML is improving super rapidly. I think it's also something noteworthy to say.

Starting point is 00:45:31 ASML is, you know, maybe one of the most generous companies in the world, right? They have this linchpin thing. No one has anything competitive. Maybe China will have some EUV by the end of the decade. But no one else, you know, has anything even close to EUV. And yet they haven't taken price and margins up like crazy, right? You know, you go ask, you know, some other folks, you know, that we talk to all the time, like, you know, for example, Leopold, and they're like, you know, let's let's have the price go up, right? Because they can.

Starting point is 00:46:00 The margin is there. You can take the margin. Like, Nvidia takes the margin. Memory players are taking the margin. But ASML has never risen the price more than they've increased the capability of the tool. And so in a sense, they've always provided net benefit to their customer. It's not that the tool is stagnant. It's just that, like, you know, these tools are old.

Starting point is 00:46:17 Yes, you can upgrade them some and the new tools are coming. and for simplicity's sake, we're kind of ignoring, you know, the advances for this podcast, the advances in overlay or throughput per tool. So you say we're producing 60 of these machines this year and then 70, 80 over subsequent years. What would happen if ASMO just decided to double its cap X or triple as cap X? What is preventing them for producing more than 100 in 2030? Why is so confident that even five years out you can be relatively sure what their production will be? So I think a couple factors here, right?

Starting point is 00:46:50 ASML has not decided to just go YOLO, let's expand capacity as fast as possible, right? In general, the semiconductor supply chain has not, right? It's lived through the booms and bust, and we can talk a bit more about it. But basically, no one, you know, some players as of very recently have, like, woken up. But in general, no one really sees demand for 200 gigawatts a year of AI chips or, you know, trillions of dollars of spend a year in the semiconductor supply chain. chain. They're just like, they're not, they're not AI-pilled, right? They're not AGI. We're going to get to a trillion dollars this year. Yeah, I feel you, but I'm saying like, no one really understands this in the supply chain. Constantly, we're told our numbers are way too high. And then when they're right, they're like, oh, yeah, yeah, but your next year's numbers are still too high. And it's like, but anyways, like ASML has sort of their tool has four major components, right? It has the source, right, which is made by Symer in San Diego. It has the,

Starting point is 00:47:48 reticle stage, which is made in Wilmington, Connecticut, right, has the wafer stage and the optics, right, the lenses and such. And those two are made in Europe, right? And so when you look at each four, each of these four, they're tremendously complex supply chains that, A, they have not tried to expand massively, and B, when they try to expand them, the time lag is quite long, right? And so, again, this is the most complicated machine that humans make period, right, at a volume, at any sort of volume, but like, let's talk about the source specifically, right? What does the source do? It drops these tin droplets. It hits it three subsequent times with the laser perfectly. So the first one hits this tin droplet, expands out. It hits it again, so it expands

Starting point is 00:48:34 out to this perfect shape, and then it blasted at super high power, and the tin droplets get excited enough that they release EUV light, 13.5 nanometer. And then it's in this thing. that is like basically collecting all the light and directing it into the lens stack, right? Then you have the lens stack, which is Carl Zeiss, right, as you mentioned, and some other folks, but Zeiss being the most important part of it. They also have not tried to expand production capacity

Starting point is 00:48:57 because they don't see any, you know, they're like, oh, yeah, yeah, yeah, like we're growing a lot because of AI. We're growing from 60 to 100, right? It's like, no, no, no, no, we need to go to like a couple hundred, but it's fine, whatever. Each of these tools has, you know, I think 18 of these lenses effectively,

Starting point is 00:49:12 mirrors, they are multi-layer mirrors, which are perfect layers of molybdenum and ruthenium, if I recall correctly, stacked on top of each other in many layers, and then the light bounces off of it perfectly. But it's not just like, you know, like when we think about a lens, you know, it's like in a shape and it focuses the light. This is a, this is like a mirror that's also a lens. And so it's pretty complicated. Any defect in this perfect layer of stat in these like super thinly deposited stacks will

Starting point is 00:49:41 mess it up, any curvature issues. Like, there is a lot of challenges with scaling the production. It's quite artisanal, right, in this sense, right? Because you're not making tens of thousands of these a year. You're making hundreds. You're making thousands, right? You know, talk about 60 tools a year, 18 of these per tool. You end up with, you know, you're still in the, you know, hundreds of tools or a thousand, you're at the thousand number roughly for these lenses and projection optics. So then you, and then you step forward to the reticle stage, which is also something really crazy. This thing moves at, I want to say, nine G's. Like it will shift nine Gs because as you step across a wafer, the tool will go, and the wafer stage is

Starting point is 00:50:25 complementary. It's the wafer part. So you line these two things up. You're taking all the light through the lenses that's focused. And here's the reticle. Here's the wafer. And you're passing, the reticle's moving one direction, the wafer is moving the other direction, as it scans a 26 by 33 millimeter section of the wafer, and then it stops, it shifts over to another part of the wafer and does it again. And it does that in just seconds, right? And each of them are moving at 9g's in opposite directions. So each of these things is like a wonder and marvel of like chemistry, fabrication, you know, sort of like mechanical engineering, optical engineering, because you have to align all these things and make sure they're perfect, all of these things.

Starting point is 00:51:07 all these things have crazy amounts of metrology because you have to perfectly test everything because if anything is messed up, the yield goes to zero, right? Because this is such a finely tuned system. And by the way, it's so large that you're building it in all these, you're building in the factory in Heindhoven, Netherlands, and they're deconstructing it and shipping it on many planes to the customer site, and then you're reassembling it there and testing it again. And that process takes many, many months. So like, it's just, there's so many steps in the supply chain, right, whether it's Zice making their lenses and projection optics or Simer, which is an ASML owned company, making the EUV source.

Starting point is 00:51:44 And each of these has its own complex supply chain, right? AsML's commented their supply chain has over 10,000 people in it, right? Like individual suppliers. Yes. And it might not be directly. It might be through like, hey, you know, Zyce has so many suppliers and, you know, XYZ company has so many suppliers. But, you know, they, these, you know, if you just think about like, okay, you're talking

Starting point is 00:52:02 about two physically moving objects that are like this large and this. this large, you know, the size of a wafer, right? And it has to be accurate to the level of, you know, single digit nanometers or even smaller because the entire system, the overlay, right, layer to layer variation has to be on the order of three nanometers, right? And so if the overlay is three nanometers, that means each individual part, the accuracy of its physical movement has to be even less than that, right? It has to be sub one nanometer in most cases because the error of these things stacks up, right? And so there's no way to like, you know, just like snap your fingers and increase production, right? You know, things simple as power, right? The U.S. going from

Starting point is 00:52:45 0% power growth to 2% power growth, even though China's already at 30, was like so hard for America to do, right? And that's a really simple supply chain with very few people in the supply chain, right, who make difficult things. And there's, you know, probably what, 100,000 electricians, slash people work in the supply chain of electricity or more in the U.S. And, you know, when you look at, oh, ASML employees like so few people. Carl Zeiss probably employs like less than a thousand people working on this. And all of those people are like super, super specialized. So it's, you know, you can't just train random people up for this like in the snap of a finger.

Starting point is 00:53:24 You can't just get your entire supply chain to get galvanized, right? NVIDIA's had to do a lot to get the entire supply chain to even deliver the capacity they're going to make this year. even though when you go talk to Anthropic, they're like, well, we're short of TPUs or short of train and we're short of GPS. When you go talk to Open Eye, they're like, we're short of these things, right? So Open Eye in Anthropic, they know they need X.

Starting point is 00:53:44 Invidia is not quite as AGI-I-pilled, and they're building, you know, X minus 1, and you go down the supply chain, everyone's doing minus 1. And in some cases, they're doing, like, divided by 2, right? Because they just don't, they're not AGI-GI-pilled, right? And so you end up with the time lag for this whip to react, right?

Starting point is 00:54:02 the sort of AI-pildness and desire to increase production is so long. And then once they finally understand, hey, we need to increase production rapidly, right? And they think they understand, oh, AI means we have to go from 60 to 100. In addition to the tools all just getting better and faster, you know, the source getting higher power from 500 watts to 1,000 and, you know, all these other aspects of the supply chain, you know, advancing technically plus increase of production. They think they're like actually increasing production a lot. But if you float through the numbers of, hey, what does Elon want?

Starting point is 00:54:34 He wants 100 gigawatts a year in space by 2028 is it? Or 2029? And, you know, Sam Altman wants 50 gigawatts, 52 gigawatts a year by the end of the decade. And you look at, you know, probably Anthropic needs the same. And then, you know, Google needs that. You know, you go across the supply chain. It's like, wait, no, the supply chain can't possibly build enough capacity for everyone to get what they want on the side of compute. Real conversations are full of fits and starts and pauses and interruptions.

Starting point is 00:55:05 I mean, just listen to this episode. At least superficially, voice models have gotten pretty good at handling these kinds of things. But at a deeper level, interruptions can throw off a model's understanding and degrade the quality of its responses. And it's not always clear why. Labelbox realized that this was a huge bottleneck for their customers. So they built an evaluation pipeline called Echo Chain to help you diagnose and fix your voice model's specific failure modes. Echo Chain starts by feeding conversations into your voice model. It then injects interruptions at specific intervals and classifies any failures into one of three different modes.

Starting point is 00:55:33 One, did it acknowledge a correction but keep the old plan? Two, did it adapt briefly but then slide back to old assumptions? Or three, did it abandon the old task entirely? This is extremely useful information because Labelbox can get your model the exact data it needs to fix whatever issue is preventing it from being a viable and competent voice model. So if you want to ensure that your voice model states performant in real conversations, you should reach out to Labelbox. to labelwalks.com slash thwarcash. So I feel like in the data center supply chain

Starting point is 00:56:07 for the last few years, people have been making arguments of this specific thing we are bottlenecked by, therefore AI compute can't scale more than X. But then, as you've written about, oh, no, if, you know, say the grid is a bottleneck, then we just do behind the meter on the site, we do gas turbines, et cetera. If that doesn't work, there's like all these other alternatives that people fall back on.

Starting point is 00:56:29 And I want to ask you a question about whether we can imagine a similar thing happening in the semiconductor supply chain. So if EUV becomes a bottleneck, well, you know, what if we just went back to 7 nanometer and do what China is doing currently and producing 7 nanometer ships with multi-patterning with DUV machines? And, you know, if you look at a 7 nanometer ship like the A100, there's been a lot of progress, obviously, since from the A100 to the B100 or B200. but how much of that progress is just numerics. And then like if you just hold constant, say FP16 from A100 to B100, the B100 is like a little over one petaflop. And then A100 is like 300 terraflops. And so you have like basically 3x holding numerics constant. You have like a 3x improvement from A100 to B100.

Starting point is 00:57:25 And then some of that is the process improvement. and some of that is just the accelerator design improving, which we could replicate again in the future. And then it just seems like that actually is like very small effect from the process improving from 7 nanometer to 4 nanometer. So I don't know, say we have, I don't know the numbers offhand, but let's say there's like 150K waferes per month of 3 nanometer and then eventually similar amounts for 2 nanometer.

Starting point is 00:57:50 But then there's a similar amount for 7 nanometer, right? So if you have all those old wafers and then there's maybe a 50% hair cut because the process, you know, the bits per wafer area are like, what is it, 50% less or something? Then it's like, it doesn't seem like that bad to just bring on seven nanometer wafers. And then, oh, that gives you another 50 or another 100 gigawatts. Yeah, tell me why that's naive. Yeah. So I think, you know, we potentially do go crazy enough that this is, this happens because we just need incremental compute and the compute is worth the higher cost power, etc.

Starting point is 00:58:25 of these chips. But it's also unlikely to some extent, to a large extent, because of, I think, I think just comparing, you know, some of these are like not fair comparisons, right? For example, you know, from A100, which is 312 taraflops to Blackwell, which is like 1,000-ish of FP-16, or maybe it's 2000, and then Rubin is like 5,000 or so FP-16. It's not a fair comparison because these chips have vastly different. design targets, right? At A100, that is what

Starting point is 00:59:00 Nvidia optimized for was FP16, B-FUD-16 numerics. When you look at Hopper, they didn't care as much about that. They cared about FP-8. When you look at Rubin, they don't care about FP-16 and BF-16 as much. They care mostly about FP4 and 6, right? And so numerics are what they've designed the chip for. And so there's a couple, like, you know, okay,

Starting point is 00:59:24 let's just say, let's make a new chip design on 7 nanometer, sure, we can do that. And then it's optimized for the numerics of the modern day. The performance difference is still going to be much larger than the flops different you mentioned, right? Often it's easy to boil things down to flops per watt or flops per dollar, but that's actually not a fair comparison, right? And so this is where you can bring in, hey, let's look at Kimmy K1 or Deepseek. when you look at Kimmy or Kimmy K2.5, sorry, and Deepseek, when you look at these two models and you look at their performance on Hopper versus Blackwell on, you know, very optimized software, you get vastly different performance, right? And most of this is not attributed to

Starting point is 01:00:09 flops. A lot of this is, or numerics, right, because those models are actually 8-bit. So it's not like Blackwell's and Hopper, they're both optimized for 8-bit, and Blackwell's not really taking advantage of its 4-bit there. You know, the performance gulf is, is actually, actually much larger. And, you know, the way you can sort of compare them and think about them is, sure, it's one thing to, you know, shrink process technology and make the transistors smaller, and each chip has X number of flops. But you forget the big gating factors. These models don't run in a single chip. They run on hundreds of chips at a time, right? If you look at Deepseek's production deployment, which is well over a year old now, they were running on 160 GPUs, right? And that's what

Starting point is 01:00:48 they serve production traffic on. And so they split the model across 160 GPUs. Every time, you cross the barrier of a chip to another chip, there is an efficiency loss because you now have to transmit over, you know, high-speed electrical surrety's, and there's a latency cost, there's a power cost, there's a, there's all these, um, dynamics that hurt. As you shrink and shrink and shrink the process node, you've increased the amount of compute in a single chip. Now in-chip, right, uh, movement of data is, you know, at, at hundreds or of, of at least tens of terabytes a second, if not hundreds of terabytes a second. Whereas between chips, you're on the order of a terabytes of second, right?

Starting point is 01:01:27 And so this movement of data between chips that are super close to each other physically, and then you can only put so many chips close to each other physically. So you have to put chips in different racks. The order of data between that is on the order of hundreds of gigabits a second, right? 400 gig or 800 gig a second. So 100 gigabytes a second, roughly. And so you've got this huge ladder of like, oh, on ship I can communicate at super fast speed. within the rack I can communicate at, you know, order of magnitude speeds, outside the rack I can

Starting point is 01:01:55 communicate at an even order of magnitude lower than that. And as you break the bounds of chips, you end up with this performance loss. So anyways, the reason I explain this is because when I look at, when you look at Hopper versus Blackwell, even if both of them are using, you know, a rack worth of chips, the hopper is significantly slower because the amount of performance that you have leverage to the task within that, you know, within each domain of, hey, tens of terabytes of communications of communication between these transistors or these processing elements. And you know, terabytes a second between these processing elements is much, much higher and therefore the performance is much higher. So when you look at inference at, let's say, 100 tokens a second

Starting point is 01:02:34 for Deepseek and Kimmy K2.5, Hopper versus Blackwell, the performance difference is on the order of 20x. Interesting. Not two or three X like the Flops performance difference indicates, even though those are on the same process node. You know, there's just differences in networking technologies and what they've worked on. And so you can translate some of these back. But when you look at Rubin, what they're doing on three nanometer, some of these things are just not possible to do all the way back on A100, even if you make a new chip for seven nanometer.

Starting point is 01:03:01 There's just like certain architectural improvements you can port. There's certain ones you cannot. And so the performance difference is not just going to be the difference in flops. It's in some senses cumulative between the difference in, you know, flops per chip, networking speed between chips, how many flops are on a chip versus a system, memory bandwidth on a single chip and on an entire system, all of these things compound. Can I ask a very naive question? So this year, last year, the B200 has now two dies on a single chip so you can get that bandwidth on a single chip without having to go through Enme link or Infinaband.

Starting point is 01:03:34 And then next year, Ruben Ultra will have four dies on one chip. What is preventing us from just doing that with? Like, how many dies could you have a single chip and still get these tens of terabytes a second? Yeah, so even within Blackwell, there are differences in performance when you go, when you're communicating on the chip versus across the chips. Those bounds are obviously much smaller than when you're going, you know, out of the entire chip, but each die versus, you know, within the package. And so anyways, when you scale, you know, the number of chips up, there is some performance loss. It's not just perfect, but it is way better than different entire packages. Now, how large can advance packaging scale?

Starting point is 01:04:16 The way Nvidia is doing it is co-os the way Google and with Broadcom and MediaTack and, you know, Amazon, Traneum. All these chips are doing is called Co-OAS. But actually, you can go and look back at what Tesla did with Dojo, right? Dojo, which they canceled and restarted. Anyways, Dojo was a chip that was the size of an entire wafer. They had 25 chips on it. and there were some tradeoffs, right? They couldn't put HBM on it.

Starting point is 01:04:44 But the positive side was that they had 25 chips on it. And so to date, it is still probably the best chip for running convolutional neural networks. It's just not great at transformers because the, you know, the sort of the shape of the chip, the memory, the arithmetic, all these various specifications of it are just not well suited for transformers. They're well suited for CNNs. And anyway, so, you know, Dojo chips were optimized around that they, they were. made a bigger package, but at the same time, you know, as you make packages bigger and bigger and bigger, you have other constraints, right? Networking speed, memory bandwidth, cooling capabilities, all of these things start to rear their heads. It's not simple. But yes, you will see a trend line

Starting point is 01:05:24 of more chips on the package. And yes, you're going to be able to do that on 7 nanometer. In fact, that's what Huawei did with their Ascend 9, 10, C, or D. They put, they were initially just one, and then they did two, and they're focusing on scaling the packaging up because that is an area where they can advance faster than sort of process technology where they can't shrink. But at the end of the day, that's still, you know, that's something that you can do on the leading edge chips too, right? Anything you do on 7 nanometer, you can also probably do on 3 nanometer in terms of packaging. So if we're, if you end up in this world in 2030 where the West has the most advanced process

Starting point is 01:06:01 technology, but it has not ramped it up as much, whereas China, I don't know if you think by 2030, they would have UV in, I don't know, 2 nanometer or whatever. But they are semiconductor pills, so they're producing in mass quantity. Basically, I'm wondering what the year is where there's a crossover, where our advantage in process technology has faded enough and their advantage in scale has increased enough. And also their advantage in, like, having one country that has the entire supply chain envisionize

Starting point is 01:06:30 rather than having random suppliers in Germany and Netherlands and whatever would mean that China would be ahead in its ability to produce mass flops. Yeah, so to date, China still does not have, you know, entire indigenous semi-gaductor supply chain, right? But were they in 2030? Yeah, by 2030, it's possible that they do. But to date, right, all of China's 7 nanometer and 14-nometer capacity

Starting point is 01:06:57 uses ASMLD-UV tools, right? And the amount that they can ship and import from ASML is large. And the point being that the vast majority of ASML's revenue, especially on EUV, all of it, is outside of China. So the scale advantage is still in the favor of, let's call it, the West plus Taiwan, Japan, etc. But they're trying to make their own DUV and EUV tools, right? They're trying to do all these things. The question is how fast can they advance and scale up production as well as quality. And to date, we haven't seen that.

Starting point is 01:07:29 Now, I'm quite bullish that they're going to be able to do these things over the next five to 10 years, right? really scale up production, really kick it into high gear. They have more engineers working on it. They have more desire to throw capital. So by 2030, do they have fully indigentized DUV? I think for sure, for sure. DoV, yes. And fully indigenous EUV by 2030?

Starting point is 01:07:49 I think they'll have working tools. I don't think that they'll be able to manufacture a bunch yet, right? You know, there's sort of having it work and then there's production hell, right? And ultimately, like, ASML had EUV working in the early 2010. at some capacity. Right. Right. Now, the tools

Starting point is 01:08:08 were not accurate enough. They were not. Scaled for high production, scaled for high volume manufacturing, reliable enough. And then they had to ramp production and that all took time. Production hell takes time, right?

Starting point is 01:08:18 Which is why it took another five to seven years to get EUV into mass production at a fab rather than just working in the lab. So how many DUV tools do they need to manufacture in 2030? ASML? No, China. Oh, that's a great question.

Starting point is 01:08:33 You know, current, it's it's a bit of a challenge to look into this supply chain, especially. We try really hard. But, you know, in some instances, they're like buying stuff from Japanese vendors. And if they want to fully indigenize supply chain, they need to not buy these lenses or buy these projection optics or stages from Japanese vendors. They need to build it internally. So it's really tough to say where they'll be able to get to. Like, I honestly think it's like a shot in the dark. But it's probably not unlikely that they'll be able to do on the order of 100 DUV tools a year, whereas ASML is doing hundreds of DUV tools a year currently.

Starting point is 01:09:13 You know, no one's made a process node. No company has a process node where they make a million waferes a month, right? Elon says he wants to do it, and China's obviously going to do it, right? And I don't think the, you know, TSMC is trying to do that. the memory makers may get there as well, right, to the million wafers a month, but not in a single fab. It's sort of mind-boggling to think of that scale and challenging to see the supply chain galvanized for that. So I'm not sure, you know, I don't want to doubt, you know, China's capability to scale. Right.

Starting point is 01:09:48 I guess this is an interesting question that I think it might, you know, at some point I may analysis, we'll do the deep dive on this. But I think this question of like, by when would China be able, like, indigenous Chinese production could be bigger than the rest of the West combined if you just add up like all the deep

Starting point is 01:10:08 input of the input of your model when they'll have do you view machines at scale when they'll have view machines at scale because I think there's this like question around if you have long timelines on AI by long meaning 2035 which is not that long in the grand scheme of things should you expect a world

Starting point is 01:10:22 where China's like dominating in semiconductors which I think I don't know it doesn't get asked enough in San Francisco we're just like thinking on time scale of like, you know, weeks. And then if you're outside of San Francisco, you're not thinking about AGI at all. And so this question of like, okay, what if we have AGI?

Starting point is 01:10:37 What if you have this transformational thing that is commanding tens of trillions of dollars or hundreds of trillions of dollars of economic growth and weight, you know, token output and so forth? But then it happens in 2035. And like, what does that imply for the West versus China? I think it's just like, I don't know. The semi-analysis has got to write the definitive model on this.

Starting point is 01:10:55 Yeah, so I think it's really challenging. challenging when you move time skills out that far, right? Like, what we tend to focus on is, like, we're tracking every data center, we're tracking every fab, we're tracking all the tools, and we're tracking where they're going. But the time lags for these things are relatively short, right? We can only make, like, reasonably accurate estimates for data center capacity based on, you know, land purchasing and, you know, permits and turbine purchasing and all these things. And we know where all these things are going and we like, that's what the data we sell is.

Starting point is 01:11:25 But, like, you know, as you go out to like 2035, you know, things. are just so radically different and your error bars get so large, it's kind of hard to make an estimate. But at the end of the day, like, you know, there is, if takeoff or timelines are slow enough, right, then certainly China, I don't see why they wouldn't be able to catch up drastically, right? You know, in some sense, we've got like this valley, right, of where, you know, call it three to six months ago Chinese models were, or maybe even now, Chinese models are competitive as they've ever been. I think Opus 46 and GPD 5.4 have really pulled away and made the gap a little bit bigger, but I'm sure some new Chinese models will come out.

Starting point is 01:12:04 But as we move from, you know, hey, these companies are selling tokens where they provide the entire reasoning chain and all that to selling automated, you know, white collar work, right? Automated software engineer, send them the request. They give you the result back and there's a bunch of thinking on the back end that they don't show you. The ability to distill out of American models into Chinese models will be harder, A. B, as the scale of the compute that the labs have, right? Open AI exited the year with roughly two gigawatts last year. Anthropic, we'll get to, you know, two plus gigawatts this year. And by the end of next year, they'll both be at like 10 gigawatts of capacity.

Starting point is 01:12:40 China is not scaling their AI lab compute nearly as fast. And so at some point, you know, when you can't distill the learnings from these labs into the Chinese models, plus this compute race that OpenAi Anthropic, Google, et cetera, meta are all racing on, at some point they end up getting to a point where, you know, the model performance should start to diverge more. And then all of this CAPEX that's being spent on, you know, data centers and all that, right? Amazon, you know, 200 billion, Google 180, you know, so on and so forth. All these companies are spending hundreds of billions of dollars of CAPEX. You know, there's, you know, nearly a trillion dollars of CAPEX being invested in data centers in America this year,

Starting point is 01:13:22 roughly, right? You end up with, okay, well, what's the return on invested capital here? You and I would think that the return on invested capital for data center capax is very high. And at least if we look at Anthropics revenues in January, they added like $4 billion in February, which is a shorter month. They added like six. We'll see what they can do in March and April. Given compute constraints are what's bottlenecking their growth, right? The reliability of Claude Code is actually quite low because they're so compute constraint. But if this continues, then the ROIC on these data centers is super high. And at some point, the U.S. economy starts growing faster and faster over the next,

Starting point is 01:13:59 you know, this year and next year because of all this cap-x and all this revenue that these models are generating and downstream supply chain versus China doesn't have that yet, right? They have not built the scale of infrastructure to then invest in models to get to the capabilities, to then deploy these models at such scale, right? Because when you look at like Anthropics, hey, they're at, call it, 20 billion ARR, of that, you know, the margins are sub 50% at least last reported by the information. So then, you know, you're at, okay, that's like $13, 14 billion of compute that it's running on rental cost-wise, which is actually like $50 billion worth of CAPEX that someone

Starting point is 01:14:36 laid out for Anthropic to generate their current revenue. And China has just not done this. If and when Anthropic 10X is revenue again, and I think our answer would be when, not if, then China doesn't have the compute to deploy at that scale. And so there is some sense of like, oh, we're in fast takeoff-ish, right? It's not like we're talking about, you know, Dyson sphere by X day. It's more like the revenue is compounding at such a rate that it does affect the economic growth. And the resources these labs are gathering are going so fast that, you know, and China hasn't done that yet. So in that case, the U.S. and the West is actually diverging.

Starting point is 01:15:15 The flip side is actually these infrastructure investments have middling returns, Maybe they're not as good as hoped. You know, maybe Google is wrong for wanting to take free cash flow to zero and spend $300 billion on CAPEX next year. Maybe they're just wrong. And, you know, people on Wall Street who are bearish and people who don't understand AI are correct, right? And in which case, then the U.S. is building all this capacity, it doesn't get really great returns. And China is able to build the fully vertical indigenous supply chain, not, you know, U.S., Japan, Korea, Taiwan, Southeast Asia. you know, Europe, all these, all these countries together building this like less vertical supply chain.

Starting point is 01:15:56 And in a sense, at some point, China is able to scale past us. If AI takes longer to get to certain capability levels, then, you know, I would say the vast majority of your guest on this podcast, believe. It's like fast timelines, US wins, long timelines, China wins. Right. But I don't know like, I don't know what fast timelines means, right? Like, I like don't think you have to believe in AGI to have the timelines where the U.S. wins. Okay, let's go back to memory because I think this is maybe people on Wall Street and people in the industry are understanding how big of this is, but maybe generally people don't understand how big of a deal of this is. So we've got this memory crunch, as you're talking about. And earlier I was asking about, oh, could we solve for the EUV tool shortage by going back to 7 nanometers?

Starting point is 01:16:37 So let me ask a similar question about memory. HBM is made of DRAM but has 3 to 4X less bits per wafer area than the DRM is made out of. Is it possible that accelerator is in the future? It could just use commodity DRM and not HBM. And so just we can make much more capacity out of the DRM we get. And the reason I think of this might be possible is, look, if we're going to have agents that are just going off and doing work and it doesn't really, you don't, it's not a synchronous chatbot application,

Starting point is 01:17:09 then you don't necessarily need extremely high, fast latency kinds of things anymore. And so maybe you can have the, low bandwidth because the reason you stack DRM into stacks and make HBM is for higher bandwidth. And so is it possible to go to HBM accelerators and basically have the opposite of clodcode fast, like have clot code slow and do that? I think at the end of the day, the incremental purchaser who's willing to pay the highest price for tokens also ends up being the one that's like less price sensitive. and, you know, the compute should be allocated in a capitalistic society towards the goods that have the highest value and the private market determines this by willingness to pay.

Starting point is 01:17:53 And so to some extent, sure, Anthropic could actually release a slow mode, right? They could release Claude Slow Mode and have an increase in tokens per dollar by a significant amount. They could probably, like, reduce the price of Opus 4x, 4x, 5x, and reduce the speed by maybe just like 2x. Like the curve on inference throughput versus speed is there already just on HBM. And yet they don't because no one actually wants to use a slow model. And furthermore, on these agentic tasks, you know, it's great that the model can run at this time horizon of hours.

Starting point is 01:18:30 That's kind of like, okay, well, if the model was just running slower, that hours would become a day, right? Or vice versa, right? If the model's running faster, that hours becomes hour. and yet no one really wants to move to that day-long wait period because the highest value tasks also have some time sensitivity to them, right? And so I struggle to see, you know, yes, you could use DDR, but then there's a couple like things that are challenging with this, right? You could use regular DRAM.

Starting point is 01:19:00 One is you're still limited, you know, one of the like core constraints of chips, even though they're, you know, sort of like, you know, there's a chip is like a certain size. All of the I.O. escapes on the edges of the chip, right? So oftentimes, you know, what you see is the left and the right of the chip are HBM. The IO from the chip to the HBM is on the sides, and then the top and bottom are IO to other chips, right? And so if you were to change from HBM to DDR, then all of a sudden this IO on this edge would have significantly less bandwidth, but it had it's significantly more capacity per chip. Yeah.

Starting point is 01:19:38 Because, and so, yes, you're making less, you know, the metric that you actually care about is bandwidth per wafer, not bits per wafer. Because the thing that is constraining the flops is just getting in and out the next matrix. And for that, you just need more bandwidth. Yeah, getting out the weights and getting in and out the KV cache. Right. And so in many cases, these GPUs are not running at full memory capacity. Yes, it's obviously like a system design thing, you know, model hardware software co-design of,

Starting point is 01:20:11 hey, what do I, how much KVCash do I do? How much do I keep on the chip? How much do I offload to other chips and call when I need it for tool calling or whatever? How much do I, how many chips do I paralyze this on? Obviously, these are like, the search space of this is like very broad, which is why we have like inference X, like this is like an open source model, like searches all the optimal points on inference for a variety of eight different chips and models. Anyways, like the point is

Starting point is 01:20:35 you don't necessarily, you're not always necessarily constrained by memory capacity. You can be constrained by flops. You can be constrained by network bandwidth. You can be constrained by memory bandwidth. Or you can be constrained by memory capacity. There's sort of like four,

Starting point is 01:20:49 if you're really to simplify it down, there's like four constraints. And each of these can break out into more. But in this case, if you switch to DDR, yes, you produce 4x the bits per DRM wafer, but all of a sudden the constraints shift a lot and your system design shifts a lot. You go slower, yes, is the market smaller?

Starting point is 01:21:06 Okay, maybe possibly. But also now all of a sudden, all these flops are wasted. Because they're just sitting there waiting for memory. It's like, great, I don't need all that capacity because I can't really increase batch size because then the KV cache is going to take even longer to read. And so you never, you can, yeah. Interesting.

Starting point is 01:21:21 What is the bandwidth difference between HBM and normal DRM? Yeah, so an HBM stack of HBM4, let's just talk about like the stuff that's in Rubin because that's where we've been indexing on, is 2048 bits across connected in an area that's like 13 millimeters or wide. So 2048 bits, and it transfers memory at around 10 gigar transfers a second. So HBM, a stack of HBM 4 is 2048 bits on an area that's 13 millimeters wide, roughly, or 11, and that's the shoreline that you're taking on the chip.

Starting point is 01:21:52 And in that shoreline, you have 2048 bits transferring at 10 giga transfers per second. You multiply those together and you divide by 8 bits to bytes. you're at roughly 2.5 terabytes a second per HBM stack, right? When you look at DDR, in that same area, it's maybe 64 or 128 bits wide. And that DDR 5 is transferring at anywhere from 6.4 giga transfers a second to maybe 8,000 giga transfers a second. So your bandwidth is like significantly lower, right? It's 64 times 8,000 divided by 8. you're at 64 gigabytes a second

Starting point is 01:22:29 and even if you take a generous interpretation of 128 times 8 gigar transfers you're at 128 gigabites a second for the same shoreline versus 2.5 terabytes a second. There's an order of magnitude difference in bandwidth per edge area. And if your chip is a square or it's 26 by 33, right, is the maximum size for a chip, individual die, you only have so much edge area. And then on the inside of that chip, you put all your compute. There's things you can do to try and change, write more S-RAM, more caching, blah, blah, blah. But at the end of the day, you're very constrained by bandwidth.

Starting point is 01:22:59 Interesting. So then there's a question of like, where can you destroy demand to free up enough for AI? And I guess the picture is especially bad because as you're saying, if it takes 4X more wafer area to get the same bite for HBM, you know, to destroy 4X as much consumer demand for laptops and phones and whatever in order to free up one byte for AI. So, yeah, what does this imply for the next year or two of? Sorry for the run-on question. I think on your newsletter, you said 30% of the CAPEX in 2026 of Big Tech is going towards memory? Yes. That's insane, right?

Starting point is 01:23:36 Yeah. Like, of the 600 billion or whatever, you're saying 30% is going just to... And, you know, obviously there's some level of like margin stacking that Nvidia does. And so if you separate out, you know, and you apply their margin to the memory and the logic. But at the end of the day, yeah, like a third of their CAPX is going to memory. That's crazy. Okay. So what is the question I'm trying to ask?

Starting point is 01:23:55 It's something like, yeah, what is this? basically what should we expect over the next year or two as this memory crunch hits? Yeah, so memory crunch will continue to be harder and harder. And prices continue to go up. And this affects different parts of the market differently, right? Gets to sort of the like, are people going to hate AI more and more? Yes, because now smartphones and PCs are not going to get incrementally better year on year. And in fact, they're going to incrementally worse.

Starting point is 01:24:19 If you look at the bill of materials of an iPhone, what fraction of it is the memory? Like, how much more expensive does an iPhone get if the memory is two X more extensive? or whatever it has to be. So I believe an iPhone has 12 gigabytes of memory. Each gig cost, used to cost roughly $3 or $4, so it's $50. But now the price of memory is like triple. Let's call it if it's now, it's $12 per gig for DDR. So now you're talking about $150 versus $50, right?

Starting point is 01:24:46 $100 increase in cost on Apple. Also, Apple has some margin. They're not just going to eat the margin. So now that's $100 cost increase. That's just on the DRAM. The NAND also has the same sort of like market. in fact, you know, it's probably $150 increase on the iPhone. Apple has to either pass it on to the consumer, A, or B, they have to eat it.

Starting point is 01:25:05 I don't see Apple reducing their margin too much. Maybe they eat a little bit. But at the end of the day, that means the end consumer is paying $250 more for an iPhone. And now that's on like, hey, what is last year's memory pricing versus today's? Now, there is some lag for Apple to have to feel the heat because they have tended to have, you know, 3, 6 or a year-long contracts for a lot of memory. But at the end of the day, Apple gets hit pretty hard by this. But they won't really adjust until the next iPhone release.

Starting point is 01:25:34 But that's the high end of the market. Actually, that's only a few hundred million phones a year, right? Apple sells, what, two, 300 million phones a year? The bulk of the market is this mid-range low end, right? Used to be 1.4 million smartphones were sold a year. Now we're at like 1.1. But our projections are we maybe get down to like 800 million this year. And next year are like 600 or 500 million.

Starting point is 01:25:55 because, and we look at like, you know, there's some data points out of China from some of our analysts in Asia and Singapore and Hong Kong and Taiwan. They've been trekking this and they see Xiaomi and Opo are cutting low end and mid-range smartphone volumes by half. Because, yes, it's only a $150 price increase on a $1,000 smartphone or $150 bomb increase on $1,000 iPhone where Apple has some larger margin. But if we look at the smaller phones, the percentage of the bomb that goes to, memory and storage is much larger and the margins are lower. So there's less capacity to even eat the margins. And they have like generally tended not to do as long term agreements on memory. And why this is like a big deal is if smartphone volumes, let's say half, the halving will frankly happen in the low and mid range, not in the high end. So it's not like the bits released

Starting point is 01:26:49 are halving, right? You know, currently consumers more than half of memory demand. Even if you half the smartphone volumes because of the shape of the halving, right? It's like low end gets cut by more than half, high end gets cut by less than half, because you and I will buy, you know, the high end phones that cost north of $1,000, we'll buy them, even if they get a little bit more expensive. And Apple's volumes will not go down as much as like a low-end smartphone provider. And the same applies to PCs. And what this does to the market is quite drastic, right?

Starting point is 01:27:18 DRAM gets released, goes to AI chips, who are willing to do longer-term contracts, willing to pay higher margins, et cetera, et cetera, because at the end of the day, the margin that they extract is much larger from the end user or whatever. And so this probably leads to, like, people hating AI even more, right? Because they're going to start being like,

Starting point is 01:27:37 today you already see all the memes, like, on like PC subreddits and PC, like, Twitter, gaming PC Twitter is like, you know, cat dancing videos and it's like, this is why memory prices is doubled and you can't get a new gaming GPU, right? Or you can't get a new desktop. And it's going to be even worse

Starting point is 01:27:53 when memory prices double again, especially DRAM. Another dynamic that's quite interesting is it's not just DRAM, it's also NAND. NAND is also going up in price. Both of these markets have expanded capacity very slowly over the last few years. NAND almost zero. But smartphones, the percentage of NAND that goes to phones and PCs is larger than the percentage of DRAM that goes to phone and PC. So as you destroy demand, you unlock, mostly for the DRM purposes, you unlock more NAND that gets

Starting point is 01:28:22 allocated and can sort of go to other markets. And so the price increases of DRAM will be larger than those of NAND because you've released more from the consumer. And in fact, you've produced more memory for AI. Sorry, but the NAND is, maybe you just explained it and I missed it. Is it because SSDs are being used in large quantities for data centers? They are, but not as large quantities as DRAM. Okay, but you're saying they will also increase because they're busying some quantity, but Like, there's not as much in need as there is for HBO. Makes sense.

Starting point is 01:28:54 One thing I didn't appreciate until I was reading some of your newsletters is that basically the same constraints that are preventing logic scaling over the next few years. It's quite similar to what's preventing us from producing more memory waferers. In fact, like literally the same exact machine, this EUV tool is needed for memory. So I guess, yeah, maybe there's a question that somebody could be asking right now, like, well, why can't we just make more memory? Is that somebody you? Yeah, who knows? So I think the constraints, as I was mentioning earlier, are not necessarily UV tools today or next year.

Starting point is 01:29:30 They become that as we get to the latter part of the decade. But currently, right, the constraints are more so. They physically just haven't built fabs, right? So over the last three to four years, these vendors have just not built new fabs. That's because memory prices were really low. Their margins were low. And in fact, they were losing money. in 2023 on memory.

Starting point is 01:29:51 So they're like, oh, we're not building new fabs. And then, like, the market slowly recovered over time, but never really got amazing until last year. You know, in 2024, we were, like, banging on the drums that, like, hey, reasoning means long context, which means large KV cash, which means you need a lot of memory demand. And we've been talking about that for, like, a year and a half, two years. And people who understand AI, like, went really long memory then, right? You know, and so you've seen that sort of, like, dynamic.

Starting point is 01:30:18 but now it finally played out in pricing. It took so long for what was obvious, right? Hey, long context, KV Cash gets bigger. You need more memory. And accelerators, half their cost is memory. So, of course, they're just going to start, you know, they're going to start going crazy on it. It took a year for that to actually reflect in memory prices.

Starting point is 01:30:35 Once memory prices reflected, then it took another six months, three months for the memory vendors to start building fabs. And those fabs take two years to build. And so we don't have really meaningful fabs that you can even put these tools in, until late 27 or 28, right? And so instead what you've seen is like some really crazy stuff to get capacity, right? Micron bought a fab from a company in Taiwan that makes lagging edge chips, right? Hynix and Samsung are doing, you know, some pretty crazy things to try and expand capacity at their existing fabs that also have like very large knock on effects in the economy.

Starting point is 01:31:15 and so, hey, why can't we build more capacity? It's like there's nowhere to put the tools, right? And it's not just EUV. There's other tools involved in DRAM and logic, right? Like, logic, you know, N3, 30% or so of the cost, you know, 28% of the cost is EUV of the Waifer, of the final wafer. When you look at like DRAM, it's in the teens. And it's going up, but it's in the teens.

Starting point is 01:31:39 So it's as much smaller percentage of the cost is DRAM or is EUV. These other tools are also bottlenecks, although they're supposed to be able to be. apply chains are not as complex as ASMLs. And so you see applied materials in lamb research and all these other companies also expanding capacity a lot. And anyways, you don't have anywhere to put the tool because the most complex building that people make is FAPs. And Fabs take two years to build. You can think of Jane Street as a research lab with a trading desk attached. Their infrastructure team has built some of the biggest research clusters in the world with tens of thousands of high-end GPUs and hundreds of thousands of CPU cores and exabytes of storage.

Starting point is 01:32:14 This compute is part of how Jane Street surfaces all the hidden patterns that are embedded in incredibly noisy market data. Even beyond the noise, the nature of the signal changes constantly in reaction to things like pandemics and elections and new regulations, and even changes in sentiment. There's this unremitting game of trying to figure out whether your old models still reflect the real world, and if not, what to do about it. If you're interested in working on this sort of thing, Jane Street is hiring ML researchers and engineers. They're also accepting applications for their summer ML internship program, with spots in London, New York and Hong Kong. And if you happen to find yourself at GTC, which is happening the week after this episode drops,

Starting point is 01:32:50 Jane Street's GPU performance team is giving a talk. Go to jane street.com slash thwar cash to learn more. I entered with Elon recently, and his old plan is that I guess they're going to build this gigafab, terra fab, some power of 10, and they're going to build the clean rooms. I don't even ask you about the dirty rooms thing, but let's say they build the clean rooms.

Starting point is 01:33:13 and okay I have a couple questions one do you think this is the kind of thing that Elon Coe could build much faster than people are conventionably building it but this is not about building the end tools this is just about building the facility itself how complicated is it to just build the clean room

Starting point is 01:33:31 and do it extremely fast? Is this something that like Elon with this move fast thing could do much faster if that's what we're bottlenecked on this year or next year? And two, does that even matter if in two years your view is that we're not bottlenecked on clean room space but we're bottlenecked on the tooling.

Starting point is 01:33:45 So I think, I think, you know, as with any complex supply chain, it takes time and constraint shift over time. And even if something isn't any longer a constraint, that doesn't mean that market no longer has margin, right? So, for example, energy will not be a big bottleneck as we get to, you know, a couple years from now. But that doesn't mean energy is not growing super fast and there's no margin there.

Starting point is 01:34:03 It's just like it's not the key bottleneck. And in the space of fabs, right, clean rooms are the biggest bottleneck this year and next year. and as we get over time, 29, you know, 28, 29, 30, there will be still constraints there. The thing about Elon is, I think he's had a tremendous capability

Starting point is 01:34:20 to garner physical resources and really smart people to build things. And the way he's able to recruit really amazing people is just try and build the craziest stuff, right? In the case of AI, that's not really worked because everyone's trying to build AGI, everyone's very ambitious. But in the case of like, we're going to make,

Starting point is 01:34:35 you know, we're going to go to Mars and we're going to make rockets that land themselves or we're going to make fully autonomous cars that are electric, right? Or we're going to make human aid robots, right? These are methods of recruiting the people who think that's the most important problem in the world to work on that problem because he's the only one trying really hard. In the case of semiconductors, I want to make a fab that's a million wafers per month. No one has a fab that big.

Starting point is 01:34:56 That's what he stated, right? He wants to make a million wafers a month. You know, it's possible that he's able to recruit a lot of really awesome people and get them on this herokly, you know, this crazy task of trying to build a fab that does a million waifers per month. Step one is to build the clean room. And I think that he probably can do, right? I think, you know, there's some mindset, you know, his mindset around like delete things.

Starting point is 01:35:14 It can be dirty. It's fine. Probably not right. Or actually, I think 100% it's not right. You like need the fab to be very clean. I think the entire air, the entire, all of the air and the fab gets replaced like every three seconds. It's like that fast. And there's so few particles per, but I think he can build the clean room.

Starting point is 01:35:33 It'll take a year or two maybe. Initially, it won't be super fast. But then over time, he'll get faster and faster at it. But then the really complex part is actually developing a process technology and building wafers. And I don't think he can develop that quickly. I think that has a lot of built-up knowledge. It's, again, like the most complicated integration of very expensive tools and supply chain that's done is a TSM or an Intel or a Samsung. And those, some of these two other two companies aren't even that great.

Starting point is 01:35:59 And they're like tremendously complex. How surprised would you be if in 2030 people like there just happened to be some total disruption. We're not using UV. We're using something that has like much better facts, it's much simpler to produce. We can produce in much bigger quantities. I'm sure as an industry insider that sounds like a totally naive question. But do you see what I'm asking?

Starting point is 01:36:18 Like, is it what probability should we put on, oh, something totally out of the left field comes out. And none of this is relevant. Something that's very simple and easy to scale, I have very, very low probability for. There are a number of companies working on effectively like particle accelerators or synchotrons that generate light that's either 13.5 nanometer like EUV or even X-ray, like even narrower wavelength,

Starting point is 01:36:44 like seven nanometer or whatever wavelengths of light, to then use in lithography tools. But those things are like massive particle accelerators that are then generating this light. It's a very complicated thing to build. So there's a couple of companies, and I think that that could be a big disruption to the industry beyond what EUV is. I don't necessarily think that like we're going to just magically build something new that is like direct right and super simple and can be manufactured at huge volumes although there are some attempts to do things like this.

Starting point is 01:37:09 Yeah. Because I ask because if you think about Elon Co's in the past, rocketry was this thing, though, I was thought that. I mean, it is incredibly complicated. Look, I'm just a naive yapper compared to Elon, right? What have I built? So maybe it's possible, right? Yeah, yeah.

Starting point is 01:37:24 In order to be able to build more memory in the future, could we build 3D RAM the way we do 3D and and then go back to DUV? This is the hope currently everyone's roadmap for 3D RAM. is that you'll still use EUV because you want to have that tighter overlay because now when you're doing these subsequent processing steps, you want it to be, you know, everything is vertically stacked. You have more layers on top of each other. And you want the pitches to be tighter and all these things.

Starting point is 01:37:52 So generally people are still trying to do an EUV, but what 3D would do is it would take the, you know, hey, a single EUV pass, how many bits can it make, right? If you do this sort of like calculation, and that number would go up drastically if you go to 3D RAM. That is the hope. But right now everyone's roadmap is sort of, like you go from current, it's called a 6F cell to a 4F cell, and then finally 3D DRAM, like,

Starting point is 01:38:15 by the end of the decade or early next decade. So there's still like a lot of R&D and manufacturing and integration to be done. I wouldn't call that out of the cards. I think it's very much likely going to happen. It also is going to require a huge retooling of fabs, right? The breakdown of tools in a fab are very different, right? Actually, the lithography tool is the only thing that isn't like that different, but the number of them relative to different types of chemical vapor deposition or atomic layer deposition or dry etch

Starting point is 01:38:42 or different kinds of etch chambers with different chemistries, all of these things, you have all these different kinds of tools for different process nodes. You can't just convert a logic fab to a DRAM fab or vice versa back and forth or a nan fab to in a short amount of time. And in the same way, existing DRAM fabs require a lot of retooling just to go from 1B or one alpha to one beta to one gamma process nodes because now they have to add DEUV and change the chemistry stacks for when you're using EUV in terms of deposition in etch, and the EUV tool has to be there. And furthermore, like when you change to 3D RAM, there's going to be

Starting point is 01:39:14 an even larger shift. And so there's a lot of retooling of these fabs that needs to happen in terms of the tools. And so that would be a big disruption. That would make EUV demand generally lower, but as we've seen across time, EUV demand as a percentage of wafer costs has trended up initially or lithography, right? Lithography initially, I want to say in like 2014-ish, era was like 16% of the wafer cost, 17% and it's gone to 30 over the last, you know, 15 years. And for DRAM, it was in the mid-teens as well, or low-teens, and now it's trended towards the high teens. And before we get to 3D-RAM, it'll likely cross into the 20s percentage range. But then if we get to 3D-RAM, it tanks again in terms of the total end wafer cost as a percentage of EUV.

Starting point is 01:39:58 Yeah. I guess you care less about like the percent of cost and more about how much it bottlenecks being a pretty smart. Right, but the percentage of cost is sort of- A proxy, yeah, yeah. Yeah. So if you're Jensen or Sam Waltman or whoever who stands to gain a lot from scaling up AI compute, there's these stories that they'd go to TSM and say, hey, why can't we X and Y and Z? But I think the point you're making here is it doesn't really matter in some sense what TSMC does.

Starting point is 01:40:26 And in fact, even if you have Intel and Samsung building more foundries, in the long run, you're going to be bottlenecked by ASML and other tool makers and other material makers. So first is that correct interpretation and second, then why should basically search Silicon Valley people be going to the Netherlands to try to pitch ASA? Like right now should they be trying to pitch ASML to make more tools so that like in 2030 they can have more AI compute? You know, it's a funny dynamic we saw in 23, 24 and 2025. People who saw the energy bottleneck before others asymmetrically went to, you know, Siemens, Mitsubishi and of course G. Vernova and bought up turbine capacity and now they're able to charge excess amounts for deploying these turbines places because of energy.

Starting point is 01:41:08 And in the same sense, this could be done for EUV except ASML is not just going to trust any random bozo who wants to buy EUV tools in the sense that like, you know, these turbines are much cheaper than EUV tools and there's many more of them produced, right? Especially once you like get to like industrial gas turbines or like, you know, not just combine cycle, but like the cheaper, smaller, et cetera, less efficient ones. people put down deposits for these. So in a sense, someone could do this, right? Someone should go to the Netherlands and be like,

Starting point is 01:41:37 I'll pay you a billion dollars. You give me the right to purchase 10 EUV tools two years from now, right? And I'm first in line two years from now. And then over those two years, you then go around and wait for everyone to realize, oh, crap, I don't have enough EUV tools. And then you try and sell your option at some premium. But all you're effectively doing is you're saying,

Starting point is 01:41:59 ASML, you're dumb. You weren't making enough margin on these. I'm going to make a margin. And the question is like, will ASML even agree to this? Right? And I'm like, I don't think so, right? So, but there's a world where they at least get the demand signal from that to increase production. Potentially, potentially.

Starting point is 01:42:13 I agree. But it sounds like you're saying, oh, they couldn't even increase production if they wanted to, given the supply chain. Right. But that's exactly the market in which if they can't increase production, just like TSM cannot increase production that fast. And yet demand is moaning, then the obvious solution is to arbitrage this because you and I know demand is way higher than they're projecting. and their capability to build. So then you arbitrage this by locking up the capacity and then sort of doing like a forward contract and and then trying to sell it at a later date once other people realize

Starting point is 01:42:41 actually shit, everything is fucked and we don't have enough capacity. And then you'll have like this insane margin that ASML and TSMC should have been charging. But the thing is, I don't know if ASML and TSMC will ever agree to this. Okay, let me ask about power now. So it sounds like you think power can be arbitrarily scaled. Not arbitrarily, but yes. But beyond these numbers. And I think, if I remember correctly, your blog post on the power, how I have their increasing power, you were like, where you were implying that Giovernova and Mitsubishi and Sima assignments could produce and gas turbines was like 60 gigawatts a year.

Starting point is 01:43:17 And then there's these other sources, but they're like less significant than the turbines. And so in only a fraction of that goes to AI, I assume. So yeah, if in 2030 we have enough. logic and memory to do 200 gigawatts a year? Do you just think that these things are on a path to ramp up to more than 200 gigawatts a year? Or what do you see? Yeah. So, I mean, right now we're at 30, right? Or 20, 20. So this is critical IT capacity, by the way, right? This is an important thing to mention. When I'm talking about these gigawatts, I'm talking about critical IT capacity, server plugged in, that's how much power it pulls. But there's losses along the chain, right?

Starting point is 01:43:53 There is loss on the transmission. There's losses on the conversion. There's losses on cooling, etc. And so you should gross this factor up, you know, from 20 gigawatts for this year or 200 gigawatts by the end of the decade to some number 20, 30 percent higher. And then you have capacity factors, right? Turbines don't run at 100%. In fact, like if you look at PGM, which is the largest grid, I think, in America, sort of the Midwest, sort of northeast kind of area-ish, not the full northeast. But anyways, PGM, they rate in their models for like, hey, turbines, how much capacity, we want to have excess, you know, roughly 20% capacity. In addition, in that 20% excess capacity, we're running all the turbines at 90%

Starting point is 01:44:35 because they are derated some for reliability. Oh, things go down, maintenance, et cetera, et cetera, et cetera. So then in reality, the nameplate capacity for energy is always way higher than the actual end critical IT capacity because of all of these factors. But it's not just turbines, right? If you were just making power from turbines, like, that's simple, boring, easy, right? we're humans and capitalism is far more effective. And so the whole point of that blog was, yes, there's only three people making combine cycle gas turbines, but there's so much more we can do.

Starting point is 01:45:06 We can do error derivatives, right? We can take airplane engines and turn them into turbines as well. And there's even new entrance to the market, like boom, supersonic's trying to do that, right? And they're working with Crusoe. And also there's all the other ones that already exist in the market. There's medium speed reciprocating engines, right? Engines that spin in circles, right? So sort of like any diesel engine, right?

Starting point is 01:45:26 There's like 10 people who make engines that way, right? So Cummins, you know, at least I'm from Georgia and people used to be like, oh, man, you got a Cummins engine in there, you know, like, you know, regarding RAM trucks. But it's like, well, actually, automobiles manufacturing is going down. These companies all have capacity and could scale and convert that to for data center power, right? Stick all these reciprocating engines. Yes, it's not as clean as combined cycle. Maybe you can convert them from diesel to get to gas if you want.

Starting point is 01:45:52 But at the end of the day, these spinning engines, oh, what about, ship engines, right? All of these engines for these massive cargo ships. Those are great. Nebius is doing that for a data center for Microsoft in New Jersey. They're running these ship engines to generate power. Oh, there's, you know, Bloom Energy is doing fuel cells. We've been like very positive on them for like a year and a half now because they have like such a capability to increase their production and their payback period for production increase is like very fast, even if the cost is a little bit higher than combined cycle, which is like the best cost and efficiency. You know, and then there's solar plus battery, which as these cost curves continue to come down, those can come online.

Starting point is 01:46:30 There's wind. And, you know, of course, the derating of those, you know, hey, when you put on it a wind turbine, you might say, oh, I'm only going to expect 15% of the maximum power because things oscillate. But yeah, batteries, there's all these things. And then the other thing is that, like, the grid is scaled for, you know, hey, we're not going to cut off power at peak usage, which is like the hottest day in the summer. but in reality that's a load spike that is 10, 15, 20% higher than the average. Well, if you just put enough utility scale batteries or you put paker plants that only run a small portion of the year, then all of a sudden, you know, and those could be gas, they could be industrial gas turbines, they could be combined cycle, they could be any of the other sources of power I mentioned, they could be batteries, then all of a sudden you've unlocked 20% of the U.S. grid for data centers because most of the times that capacity is sitting

Starting point is 01:47:18 idle and it's really only there for that peak, right, which is a day or two, right? And it's a few hours of like maybe a few days of the full year is that peak. And so you just have enough capacity to absorb that peak load and all of a sudden you've transferred all. And today, data center is only 3, 4% of the power of the U.S. grid. And by 28, there'll be 10%. But if you can just unlock 20% of the U.S. grid like this, like it's like not that crazy. And the U.S. grid is terawatt level, not hundreds of gigawatts level. Right. So we can add a lot. lot more energy. It's not easy. I'm not saying it's easy. These things are going to be hard. There's a lot of hard engineering. There's a lot of risks that people have to take. There's a lot of

Starting point is 01:47:57 new technologies people have to use. But Elon was the first to do this behind the meter gas. And since then, we've seen an explosion of different things that people are doing to get power. And they're not easy, but people are going to be able to do them. And the supply chains are just way more simple than chips. Interesting. So I guess he made the point during the interview that the specific blade for the specific turbine he was looking at, the lead times for that go out beyond 2030. And your point is that... That's great.

Starting point is 01:48:25 There's so many other ways to make energy. Like, just be inefficient. Like, it's fine. Right. So you're like right now, I guess combined cycle gas turbines have capax of $1,500 per kilowatt. And you're saying you could just, it would make sense to have either technologies that are much more expensive than that or other things are getting cheap enough to that to make it competitive.

Starting point is 01:48:43 Exactly. Exactly. You know, it can be as high as $3,500 per kilowatt, even. So it could be twice as much as the cost of combined cycle. And the total cost of the GPU, you know, on a TCO basis, has gone up a few cents per hour. Right. Again, because we've been talking about hopper pricing, a $1.40 now becomes, you know, oh, the power price doubles. Okay, the hopper that was $1.40 is now $1.50 in cost.

Starting point is 01:49:07 Right. It's like, oh, I don't care because the models are improving so fast that the marginal utility of them is worth way more than that tends to increase in energy. Okay, and then so you're saying 20% of the grades are winter, what about, 20% of that can just come online from utility scale batteries, increasing what you'd be comfortable putting on the grade. The regulatory mechanism there is like not easy, by the way. But like that's 200 gigawatts, like if that hypothetically happens. But you're saying on just from the different sources of gadget generation you mentioned, the different kinds of engines and turbines combined, how many gigawatts could they unlock by the end of the decade? Yeah, so we're tracking in some of our data where all, you know, there's over 16 different manufacturers of power generating things just from gas alone, right? So, you know, yes, there's only three turbine manufacturers for a combined cycle.

Starting point is 01:49:59 But we're tracking 16 different vendors, and we have all of their orders and things like that. And it turns out there is just hundreds of gigawatts of orders to various data centers. As we get to the end of the decade, we think like something like half of the capacity that's being added will be behind the meter. And when we look at like a lot of this is actually behind the meter is almost always more expensive than grid connected, but there's just a lot of problems with getting grid connected and, you know, permits and interconnection cues and all this sort of stuff. So it ends up being even though it's more expensive, people are doing behind the meter. And then what they're doing behind the meter with ranges widely, right? It could be reciprocating engines. It could be ship engines.

Starting point is 01:50:36 It could be error derivatives. It could be combine cycle, although combine cycle is not that great for behind the meter. It could be bloom energy fuel cells. It could be solar plus battery, right? Like, it could be any of these things. You're saying any of these individually could do like tens of gigawatts? Any of these individually will do tens of gigawatts and in a whole they will do hundreds of gigawatts. Okay.

Starting point is 01:50:56 So that alone should more than. I mean, it's going to take, I mean, like, electrician wages probably double or triple again, right? And like, there's going to be a lot of new people entering that field and there's going to be a ton of people who make money. But it is something that I don't, like, I don't see that as the main bottleneck, right? So right now in Abilene, the 1.2 gigawatt data center that Crusoe is building for Open AI, I think they have like 5,000 people working there, or at peak they had did. And if you turn that into 100 gigawatts, and I'm sure things will get more efficient over time, but that would be like 400K people it would take to build 100 gigawatts. And if you think about the U.S. labor force of how many electricians there are, how many construction workers there are, Yeah, I guess there's like 800K electricians.

Starting point is 01:51:43 I don't know if they're all substitutable in this way. There's millions of construction workers. But if we're in a world we're adding 200 gigawatts a year, are we going to be crunched on labor eventually, or do you think that is actually not a real constraint? So labor is a humongous constraint in this. People have to be trained. Likewise, we probably start importing the highest skilled at labor in this way, right?

Starting point is 01:52:05 Because now it makes sense that, you know, hey, a really high-skilled electrician in Europe who was working on destroying power plants now comes to America and is building data center, you know, high voltage electricity, you know, power moving across the data center, right? Something like this, right? Humanoid robots maybe start to, or robotics at least start to, but the main factor is going to be for reducing the number of people is modularizing things and making them in factories in Asia, unfortunately, but, you know, at least for America, but, you know, Korea, Southeast Asia. in many ways China as well but these areas are going to do, are going to ship more and more built out

Starting point is 01:52:46 sections of the data center and those will be shipped in, right? Maybe today you, you know, you currently ship servers in or a rack in and then you plug that into, you know, different pieces that you're shipping from different places. But now you'll ship it to a factory and integrate the entire, you know,

Starting point is 01:53:04 hey, maybe this is a two megawatt block and this block goes from, you know, high voltage power to the, you know, the voltage power that you, the voltage and maybe DC that you deliver to the rack instead of being AC and high voltage, right? Or something like this, right? Or cooling, you take, you ship a fully integrated thing that has a lot of the cooling subsystems already put together. Or, because plumbers are also a big constraint here.

Starting point is 01:53:29 Or, furthermore, you take, instead of just a single rack and now you have people wiring up all these racks of power and electricity and blah, blah, blah, blah, blah, blah, you take a skid and you put an entire row of servers, and that is shipped from the factories. And today, a single rack may be 120, 140 kilowatts, but as we get to, you know, next generation, you know, invidia khyber and things like that, it's almost a megawatt. And then in addition, if you do an entire row, it'll have the rack, it'll have the networking, and it'll have the cooling and the power racks all integrated together. So now when you come in, actually, you have much less stuff to cable, whether it be networking,

Starting point is 01:54:07 fiber, whether it be the power, right? There's fewer power things to connect, and then there's fewer plumbing things to connect, right? And so this drastically can reduce the amount of people working in data centers, and therefore the capability to build these will be much larger. And along the way, there will be, you know, new things mean, you know, some people move faster to new things, some people move slower, right? Crusoe and Google have been talking a lot about this modularization, as has people like meta and, you know, many others, right, have been talking a lot about this modularization and others are going to be slower to doing it. But at the end of the day, you know, and people who move faster to new things may have more delays or people who are

Starting point is 01:54:44 slower have labor problems. So there will always be dislocations in the market because this is a very complex supply chain. At the end of the day, it's still simple enough that we will be able to solve it through capitalism and human ingenuity on the time skills that are required. Yeah. Okay. So speaking of big problems to solve, I, um, uh, Elon Musk is very bullish on space If you're right, that power is not a constraint on Earth. I guess the other reason they would make sense is that even you can, there is enough, there will be enough gas turbines or whatever to build it on Earth. I think Elon's next argument then is like you can't get the permitting to build hundreds of

Starting point is 01:55:18 gigawatts on Earth. Do you buy that argument? Land-wise, it's pretty, America's big. Data centers don't take that much space. You can, you can solve that. Permitting-wise, air pollution permits are a challenge, but the Trump administration's made it much easier. you go to Texas and you can skip a lot of this red tape.

Starting point is 01:55:37 And so, you know, Elon had to deal with a lot of like this complex stuff in Memphis and then building a power plant across the border and all these things for Colossus 1 and 2. But at the end of the day, there's a lot more you can get away with in the middle of Texas, right? Given that Elon lives in Texas, why didn't you just go to Texas? I think it was partially like they over indexed on grid power for a temporary period of time, right? Because that's just what they thought they needed more of. You said an aluminum refinery connected to the grid there. It was an appliance factory that was dilated. That was idled.

Starting point is 01:56:09 But I think they may have indexed more to what was grid power. They may have indexed more to like water access and gas access because actually I think they bought that knowing that the gas line was right there and they were going to tap it. Same with water. It was a whole host of different constraints. It was probably an area where electricians and things like that were easier to find. But at the end of the day, I'm not exactly sure why they chose that site. I bet Elon would have chosen somewhere in.

Starting point is 01:56:31 Texas if you could have like gone back. But yeah, because of the regulatory faces these challenges he's faced, it's it's ultimately like permitting is a challenge, but America is a big place and there are 50 states and things will get done. And there are a lot of small jurisdictions where you can just transport in all the workers that you need for a temporary period of six months to a year depending on the type of contractor. It can be even three months for depending on the type of the contractor that's coming in and put them in temporary housing, pay out the butt because labor is very cheap relative to the GPUs and the power, or not the power, but the GPUs and the like the networking and so on and so forth and the end value of the tokens it's going to produce.

Starting point is 01:57:11 So all of these things have plenty of room to like be paid for. And so I think it's fine, right? And also people are diversifying now, right? Australia, Malaysia, Indonesia, India. these are all places where data centers are going up at a much faster pace, but currently still 70% plus of the AI data centers are in America, and that continues to be the trend. And so I think people are figuring out how to build these things and permitting.

Starting point is 01:57:37 Like, I just like ultimately like permitting in red tape in middle of nowhere, Texas, or middle of nowhere Wyoming or middle of nowhere like New Mexico is probably a hell of a lot easier than sending stuff into space. Right. Well, other than the fact that the economic argument makes less sense once you consider the fact that energy is a small fraction of the cost of ownership of a data center. What are the other reasons you're skeptical? Yeah, so obviously power is free in space, basically.

Starting point is 01:58:03 That's the reason to do it. Yeah, that's the reason to do it. But then there's all the other counter arguments, right? Which is because even if power costs double, you're still at a fraction of the total cost of the GPU, the main challenges is, and what we've seen that disperses, right? We have cluster max, which rates all the neoclouds, and we test them.

Starting point is 01:58:22 We test over 40 cloud companies, including the hyperscalors and neoclouds, what differentiates some of these clouds the most outside of software is their ability to deploy and manage failure, right? GPUs are horrendously unreliable. Even today, 15% of Blackwells or so that get deployed have to be RMA'd.

Starting point is 01:58:39 You have to take them out. You have to maybe just plug them and plug them back in, but sometimes you have to take them out and ship them to Nvidia, or rather there are partners who do these RMAs and such. What do you make a VELONS-Kine argument that once you have the initial, after initial phase, they actually don't fail that much?

Starting point is 01:58:52 Sure, but now you've done this, you've tested them all, you deconstructed them, put them on a spaceship, fucking put them into space, and then put them online again, that's months, right? And if your argument is that, you know, hey, GPUs have a useful life of X years, right? If a GPU has a useful life of five years and it takes three additional months,

Starting point is 01:59:14 probably six, let's say six additional months, then that is 10% of your cluster's useful life. And because we're so capacity constrained, that compute is most valuable theoretically in the first six months you have it because we're more constrained now than in the future because that compute now can contribute to a better model in the future

Starting point is 01:59:32 or can contribute to revenue now which you can use to raise more money to get about all these sorts of things now is always the most important moment and so you've delayed your compute deployment by six months potentially and the thing that separates these clouds is we see clouds that take six months

Starting point is 01:59:47 to deploy GPUs today on Earth right we see clouds that take a lot less than six months right and so the question is, where does space get in there? I don't see how you would test them all on Earth, deconstruct them and ship, ship them and shoot them into space, and it not take longer than just putting them in the spot that you were testing them. Yeah.

Starting point is 02:00:04 So the question I wanted to ask is the topology of space communication. So right now, Starlink satellites talk to each other at 100 gigabits per second. And you could imagine that being much higher with optical inner satellite laser links that are optimized for this. and that actually ends up being quite close to the infiniband bandwidth, which is like 400 gigabytes a second, right? But that's per GPU, not per rack. I see, okay.

Starting point is 02:00:31 So multiply that by 72. Also, like, that was Hopper when you go to Blackwell and Rubin, that 2Xs and 2Xs again. All right. But how much compute is happening per, like, during inference, are the different scale up still working together, or is it just happening, it's a batch within a single scale up? A lot of models fit within one scale up domain, but many times you split them across multiple scale up domains.

Starting point is 02:00:55 I think that you really have to, as models become more and more sparse, at least this is like the general trend, then you want to ping just a couple experts per GPU. And if leading models today have hundreds, if not thousand experts, then you'd want to run this across hundreds of chips or thousands of chips. even as we continue to advance into the future. And so then you end up with this problem of, well, now you need to connect all these satellites together comms-wise as well. Okay, so that would be tough. Because I was imagine if there's a world where you could do a batch,

Starting point is 02:01:34 inference for a batch on a single scale-up, then maybe it's more plausible. But if not, then it's as much tougher. Yeah, I mean, networking these ships together is a problem. And you can't just make this satellite infinitely large, right? Like, there are a lot of challenges with physics to making a satellite really big, right? So then these inner, that's why you need these inner interconnects between the satellites. Those interconnects are more expensive than the, you know, a cluster like 20% of the cost or 15% of the cost is networking.

Starting point is 02:02:00 All of a sudden, now you're making it like space lasers instead of like pretty simple, like lasers that are manufactured in millions of volumes with, you know, plugable transceivers. And those things are very unreliable as well. More unreliable than the GPUs, by the way. Across the life of a cluster, you have to unplug, clean it all the time, right? unplug, replug it just for random reasons. These things are just not as reliable. So you've got that problem as well. Like you've got a more expensive, complicated space laser to communicate instead of this

Starting point is 02:02:27 plugable optical transceiver that's been in super high volume. Okay, so all in all, what does that imply for space data centers? So space data centers effectively are not limited by, you know, hey, we have this energy advantage. It's actually just limited by the same contended resource. We can only make 200 gigawatts of chips a year by the end of the decade. So what are we going to do to get that capacity? It doesn't matter if it's on land or in space. It doesn't really matter, right?

Starting point is 02:02:54 Because you can build that power. And I think human capabilities and capacity could get to the period where we're adding a terawater year globally of various types of power. At some point, we do cross the chasm where space data centers make sense, but it's not this decade, right? It is much further out once you have energy constraints actually being a big bottleneck, once you have space, land, permitting, be a much bigger bottleneck as it subsumes more and more of the economy. And chips are no longer the bottleneck because chips are the biggest bottleneck.

Starting point is 02:03:28 And so you want them deployed working on AI the moment they're done being manufactured. And so there's a lot of things people are doing to increase that speed faster and faster, whether it be modulizing data centers or even modulizing racks where you actually put the chip in at the data center, but only the chip and everything else is already wired up and ready to go at the data center. So there's things like this that people are doing to decrease that time that you cannot do in space. And at the end of the day, all that matters in a chip-contrained world is get these chips working on producing tokens ASAP in a world, you know, maybe 2035, once the semiconductor industry and ASML and ZICE and all these other suppliers, land research

Starting point is 02:04:08 applied materials, fab manufacturers, like pendulum swings and they're able to make enough chips and really we're optimizing every dial and like it makes sense to optimize the 10% of energy costs or 15% of energy costs or as we move to A6 potentially and Nvidia's margins aren't 70 plus percent. Maybe that energy cost is 30% of the cluster and fab construction, all this. Like these are the things are data center construction. These are the things to optimize. But that's not a, you know, Elon doesn't win by doing, you know, 20% gains. Elon never wins that way. Elon wins when he swings for the fences and does 10x gains, right?

Starting point is 02:04:42 That's what SpaceX is about. That's what Tesla was about. That's what all of his success has been about, right? It's not a been about these chasing the 20%. So I think space data centers will eventually be a 10x gain, potentially, as Earth's resources get more and more contentious. But that's not this decade. Yeah. I mean, I think just to drive some intuition about how much land there is on Earth, obviously the ships themselves, especially if we move to a world where you have

Starting point is 02:05:06 racks that have megawatts, a mega-watt-e charge, like literally, it's not even a rhino factor. That's the other thing, right? The power-dense, you know, if chips and manufacturing is the constraint, right now, roughly, it's one watt per millimeter squared for AI chips and such.

Starting point is 02:05:20 One easy way is to pump that to two watts per millimeter square. Now, you may not get 2X the performance. You may only get 20% more performance, and that requires much more exotic cooling, right? It requires more complicated cold plates and very complicated liquid cooling, or maybe it requires things like emergent, cooling, but in space, higher watts per millimeter is very difficult, whereas on Earth, these

Starting point is 02:05:40 are solved problems. And one of these things enables you to get a lot more tokens. Maybe it's 20% more tokens per way for that's manufactured. And that's a humongous way. So that millimeter, you mean of diarrhea? Yeah, of diarrhea. Square millimeters of a direa. I mean, it would be better for space because if you can run more watts per millimeter

Starting point is 02:05:59 would be the chip runs hotter and the hotter the chip. I guess this is a question of computer chip engineering, but it cools to the power or fourth by Stefan Boltzmann's law. So if you can run a very hot chip because it allows a lot of go. No, no, but you can't run it hotter. You can only run it denser. And the problem is getting the heat out of that dense area means you have to move away from standard air cooling and liquid cooling to more exotic forms of liquid cooling or even

Starting point is 02:06:21 immersion to get to higher power densities. And that's more difficult in space than it is on Earth. Yeah. And maybe it's at this point worth explaining what exactly a scale-up is and what it looks like for Nvidia versus Traneum versus TPUs. Yeah, so earlier I was mentioning how communication within a chip is super fast. Communication within chips that are in the same rack is fast, but is not as fast. And then, you know, it's on the order of terabytes.

Starting point is 02:06:49 And then communication very far away is on the order of gigabytes, hundreds of gigabytes, right? So this order of magnitude as you get further distance compute and maybe across the country, it's on the order of gigabytes a second, right? Scale up domain is this like tight domain. where the chips are communicating on the order of terabytes a second. And so for Nvidia, previously this meant an H-100 server had eight GPUs, and those eight GPUs could talk to each other at terabytes a second. With Blackwell, NVL-72, they implemented rack scale up.

Starting point is 02:07:22 And that meant all 72 GPUs in the rack would connect to, could connect to each other at terabytes a second speed. And the speed doubled gen on gen, but also the most important innovation they did was going from 8 to 72 in the domain. When we look at Google, their scale-up domain is completely different, right? It is always been on the order of thousands, right? With TPUV-4, they had pods the size of 4,000 chips. With V8, they have pods, or V7, they have pods in the 7,000, or sorry, 8,000, 9,000 range.

Starting point is 02:07:49 And what's relevant here is that it's not the same as Nvidia, it's not like-for-like. Google has a topology that's a tourist, right? So every chip connects to six neighbors. rather than in vidia, the 72 GPUs connect all to all, right? So they can send terabytes a second to each other to any arbitrary other chip in that pod of scale up, whereas Google, you have to bounce through chips, right? So this means if TPU1 needs to talk to TPU 76, then it has to bounce through various chips. And there is always some blocking of resources when you do that. So because that one TPU is only connected to six other TPUs.

Starting point is 02:08:23 And so there's a difference in topology and bandwidth, and there are tradeoffs and advantages of both, right? Google gets to have a massive scale-up domain, but then they have the trade-off of you have to bounce across chips to get from one chip to another. You can only talk to six direct neighbors. And so there is like this trade-off. And Amazon, it has mutated their scale-up domain. They're somewhere in between Nvidia and Google effectively, where they're trying to make larger scale-up domains. They try and do all-to-all to some extent, which is what with switches, which is what Nvidia does, but also to some extent they use tourist topologies like Google does. And as we advance forward to next generations,

Starting point is 02:08:59 all three of them are moving more and more towards a dragonfly topology, which means there's sort of like there is some fully connected elements and there are some elements that are not fully connected. So you can get the scale up to be hundreds or thousands of chips, but also have it not contend for resources when you're bouncing through chips. Related question. I heard somebody make the claim that the reason the parameter scaling has been slow and only now are we getting bigger and bigger models from Open AI and Anthropic is that,

Starting point is 02:09:31 so original GP4 is over a trillion parameters, and only now are models starting to approach that again. And I heard a theory. The reason is that Nvidia's scale-ups have just not had that much memory capacity. And so what was the claim exactly? If you have, say, one, five, unless you have a 5T model running at FP8, so that's five trillion gigabytes. Yeah. And then you have the KV cache. Let's say it's like, let's say it's the same size for one batch.

Starting point is 02:10:09 So you need 10 gigabytes, sorry, 10 terabytes to be able to run. A single forward pass, yeah. And then only with the GB200 and VL72 do you have an Nvidia scale up that has 20 terabytes. and before that they were much smaller. Whereas Google, on the other hand, has had these huge TPU pods that are not all to all, but still have, I think, hundreds of terabytes of capacity in a single scale-up.

Starting point is 02:10:32 So does that explain why parameter scaling has been slow? I think it's partially the capacity and bandwidth, but also as you build a larger model, the ability to deploy it is slower, right? Like in terms of, like, hey, what is the inference speed for the end user? That's kind of irrelevant. What's really relevant is R.L. And what we've seen with these models and allocation of compute at a lab is sort of there's,

Starting point is 02:10:55 there's a few main ways you can allocate compute. You can allocate it to inference, i.e. revenue, you can allocate it to development, i.e. making the next model, and you can allocate it to research. And in development specifically, you can split it between pre-training and RL, right? And so when you think about, hey, what exactly is happening? Well, the model, the compute efficiency gains you get from research are so large. you actually want most of your compute to go to research, not to develop it, because, you know, all these researchers are generating new ideas, trying them out, testing them, and continuing to march along this and push the prado optimal curve of scaling laws further and further and further. And at least what we've seen empirically is, like, model cost gets 10x cheaper every year or even more than that, which at the same scale gets 10x cheaper or get, you know, to get to reach new frontiers it costs the same amount or more, right?

Starting point is 02:11:45 so you don't want to you don't want to allocate too many resources to pre-training and an RL, you actually want to allocate most of your resources to research. And then in the middle is this sort of this development period. If you pre-train a $5 trillion parameter model, now

Starting point is 02:12:01 you have to spend all this time, how many rollouts do you have to do in these RLs? And these rollouts for a trillion parameter model versus a $5 trillion parameter model are five times larger, which then means it takes if you wanted to do as many rollouts, maybe the larger model is more sample efficient. Let's say it's 2x more sample efficient. Okay, great. Now you need

Starting point is 02:12:19 two and a half X much time of RL to get the model smarter. Or you could RL the smaller model for 2X the time and you'd still have a 50% or you'd still have a 25% difference in the big model, which is 2x more sample efficient and doing X number of rollouts versus the small model, which is a trillion parameters doing, although it's less sample efficient, is doing twice as many rollouts, it's still done faster. And so you get the model faster sooner and you've done more RL, and then you can take that model to help you build the next models, help your engineers, train, and do all these research ideas. And so this feedback loop is actually weighed towards smaller models in every case, no matter

Starting point is 02:12:58 what your hardware is. And then as you look to Google, Google does deploy the largest production model of any of the major labs, right, with Gemini Pro. It is a larger model than GPT-5-4. It's a larger model than Opus. And so you end up with, yes, Google does this because they have a unipolar set of compute, right, almost all TPU. Whereas Anthropic is dealing with H-100s, H-200s, Blackwell, Tranioms, TPUs of various generations, right? And Open AI is dealing with mostly InVITA right now, but going towards having AMD and Traneum as well.

Starting point is 02:13:37 The fleets of compute, like Google can just optimize around a larger model, And they can leverage 1,000 chips in a scale-up domain to get, you know, the RL speed much faster so that you can actually have this feedback loop be fast. But at the end of the day, in isolation, you almost always want to go with a smaller model that gets R-led faster and gets deployed into research and development. Right. So you can build the next thing and get more compute efficiency wins. And then this compounding effect of, oh, I made a smaller model that I RLed more, that I then deployed into research and development earlier. and I spent less compute on the training itself because I was able to allocate more compute

Starting point is 02:14:15 to the research. This compounding effect of being able to do the research faster and faster and faster is potentially a faster takeoff. And that's all these companies want as fastest takeoff possible. Okay, spicy question. You know, you're explaining you make the, a semi-analysis sells these spreadsheets

Starting point is 02:14:31 and you're always like, six months ago or a year ago, we told people the memory crunch or now you're telling people the clean room crunch and then in the future the tool crunch. Why is Leopold the only person that is using your spreadsheets to make outrageous money? What is everybody else doing? I think there are a lot of people making money in many ways.

Starting point is 02:14:52 I think obviously Leopold jokes that, you know, he's the only client of mine that tells me our numbers are too low. Everyone else tells me our numbers are too high, almost ad nauseum, you know, whether it's a hyperscaler saying, hey, that other hyperscaler, their numbers are too high, you know, and we're like, nah, that's it. And they're like, no, no, no, no, it's impossible, blah, blah, blah. And then you're like, finally have to convince them through all these facts and data when we're working with hyperscalers or AI labs that, in fact, no, that number isn't too high. That's correct. But eventually, like, sometimes it's like six months later it takes them to realize or a year later. I think other clients, like on the trading side also use our data, right? We sell data to a lot of, you know, I think roughly 60% of my business is industry.

Starting point is 02:15:33 So AI labs, data center companies, hyperscalers, semiconductor companies, you know, you. the whole supply chain across AI infrastructure. But then like 40% of our revenue is like hedge funds, right? And, you know, I'm not going to comment on who our customers are, but I think a lot of people use the data. It's just how do you interpret it? And then what do you like view as beyond it? And I will say Leopold is pretty much the only person who tells me my numbers are too low always.

Starting point is 02:16:00 And sometimes he's too high. Sometimes I'm too low, right? But in general, I think other people are, you know, doing that. and you can check certain, you can look across the space at hedge funds and look at their 13 Fs and see actually they own maybe not exactly what Leopold does, because it's always like a question of like, what is the most constrained thing, what's the thing that's going to be, that's most outside of expectations. And that's what you're really trying to exploit is inefficiencies in the market.

Starting point is 02:16:25 And in a sense, what our data shows is like making the market more efficient by making the base data of what's happening more accurate versus like, but in a sense, I think many, many funds do trade on information that is out there. And it's not, I don't think, I don't think Leopold's the only person. I think he has the most conviction on the entire, in the entire, like, about the AGI takeoff, though, right? Right. I mean, but the bets are not about like what happens in 2035. The bets that you're making that are at least exemplified by public returns we can see for

Starting point is 02:17:00 different funds, including Leopold, about what has happened in the last year. And the last year stuff could be predicted using your sprint. spreadsheets, right? So it's like, it's less about, it's about buying like the next year of spreadsheets. They're not just spreadsheets. You know, there's reports. There's API access to the data. There's a lot of data. But anyways, you know, I think. Do you see what I mean? Like it's like, it's not about some crazy singularity thing. It's about like, oh, do you buy the memory crunch? A simple one, though, is like, you only buy the memory crunch if you believe AI is going to take off in, in a huge way. And the memory crunch, a lot of it was predicated on like, you know, at least for like people in the Bay Area who think about infrastructure. It's like obvious. KV. KV. Cash. explodes as context lenses go longer, so you need more memory, and then you do the math, and you also have to have a lot of supply chain understanding of like what fabs are being built and what data centers are being built and how many chips and all these things. And so we, we track all these different data sets like very tightly. But at the end of the day, it takes,

Starting point is 02:17:53 you know, someone to fully believe that this is going to happen. Like, I think a year ago, if you told someone memory prices were quadruple and smartphone volumes are going to go down 40% you know, over the over the year or two after that, people were like, you're crazy. That never happened, except a few people do believe that, and those people did trade memory, right? And people did. I don't think, like, Leopold was the only person buying, like, memory companies. I think there were a lot of people buying memory companies. He, of course, sized and positioned and did things in a better ways than some, maybe most, right?

Starting point is 02:18:25 I don't want to comment on whose returns or what, but he certainly did well. But other people also did really well, right? Trying to be, like, this is, wow, you've made me diplomatic for the, the first time ever. No, no, you're fine. I think it's hilarious, right? I'm being a diplomat, you know, whereas usually I'm like spicy.

Starting point is 02:18:43 Yeah. Okay. Maybe some rapid fire to close out. Can TSMC, if you're saying, look, the memory logic, et cetera, the N3 is mostly going to be AI accelerators, but then there's N2, which is mostly Apple now. And then in the future, I guess,

Starting point is 02:19:04 AI would also want to go on N2. can they kick out Apple if NVIDIA and Amazon and Google say, hey, we're pulling to pay a lot of money for N2 capacity? So I think the challenge of this is chip design timelines take a long while. And so that's more than a year. And the designs that are on two nanometer are more than a year out. And so what would really happen is Apple or sorry,

Starting point is 02:19:30 Nvidia and all these others will be like, hey, we're going to prepay for the capacity. And you're going to expand it for us. and then Apple would be, and maybe TSM takes a little bit of margin, but not a ton, they're not going to kick Apple out entirely, right? What they're going to do is when Apple orders X, they may say, hey, we project you only need Y or X minus one. And so that's what we're going to give you is X minus one.

Starting point is 02:19:49 And then that flex capacity Apple's kind of screwed on. Whereas traditionally, Apple is always overordered by like 10 percent and cut back by 10% over the course of the year. And some years, they hit the entire 10%. Just, you know, volumes vary, right, based on the season, a macro, blah, blah, blah, blah, blah. And so I don't think TSM would kick out Apple. I think Apple will become a smaller and smaller and smaller percentage of TSM's revenue and therefore be less relevant for TSM to cater to their demands.

Starting point is 02:20:16 And TSM could eventually start saying, hey, you've got to pre-book your capacity for next year for two years out. And you have to prepay for the CAPEX because that's what Nvidia and Amazon and Google are doing. Yeah. I wonder if it's worth going to specific numbers. I don't have any of them on the hand of how many N2Wafers or what percentage of N2 does Apple have its hands on versus over the coming years

Starting point is 02:20:39 versus AI? Yeah, I mean, this year Apple has the majority of N2 that's going to get fabricated. There's a little bit from AMD. They are trying to make some AI chips and CPU chips early. There's a little bit.

Starting point is 02:20:49 But for the most part, it's Apple. And as we go forward to the year after that, Apple still gets closer to like half of it as other people start ramping. But then it falls drastically, right? Just like for N3, they were half. We'll see, and when I say N2, that includes A16, which is a variant of N2,

Starting point is 02:21:10 over time, those nodes will be the majority. And what's also interesting is traditionally Apple's been the first to a process node. Two nanometers actually the first time they're not. Well, besides Huawei, right? Huawei back in 2020 and before was the first with Apple, but they were both making smartphones. Now with two nanometer, you've got AMD trying to make a CPU and a GPU. chiplet that they used advanced packaging to package together in the same time frame as Apple. And this is a big risk for AMD that causes potential delays potentially because it's a brand

Starting point is 02:21:45 new process technology. It's hard. But at the end of the day, this is a bet that they want to do to scale faster than Nvidia and try and beat them. As we move forward, actually, when we move to the A16 node, the first customer there is not even Apple. It's AI. And as we move forward, that will become more and more prevalent, not only will Apple not be the first to a node, they will also not be the majority of the volume to the new node, and then they'll just be like any old customer, and because the scale of TSM's CAPEX keeps ballooning,

Starting point is 02:22:12 but Apple's business is kind of not growing at the same pace, they become a less and less relevant customer. And they also will just cut their orders because things in the supply chain are kicking them out, whether it be packaging or materials or DRAM or NAND, these things are increasing in costs. They can't pass on all the cost of customers likely because the consumer is not that strong. And you end up with like this conundrum where they are just not Apple, TSMC's best bud like they have been historically.

Starting point is 02:22:38 Do you think if Huawei had access to three nanometer, they would have a better accelerator than Rubin? Potentially, yeah. I think Huawei, they were the first for the seven nanometer AI chip as well. They were the first with a five nanometer mobile chip, but they were the first with a seven nanometer AI chip. the Huawei Ascend was like two months before the TPU and like four months before invidia's I want to say was it V100 or A100? A100, I think. And so, you know, I mean, that's just moving to a process.

Starting point is 02:23:08 No, that doesn't imply software. It doesn't imply hardware design, all these other things. But Huawei is arguably the only company in the world that has all the legs, right? Huawei has cracked software engineers. Huawei has cracked networking technologies. that's in fact their biggest business historically, right? And they have cracked AI talent. But furthermore, beyond Nvidia, they actually have better AI researchers.

Starting point is 02:23:34 And furthermore beyond Nvidia, they have their own fabs. And furthermore beyond Nvidia, they have their own end market of like selling tokens and things like that. And Huawei tends to be like they're able to get the top, top, top talent. Invidias as well, but not as in much concentration. And Huawei has a bigger pool in China. it's very arguable that Huawei, if they had TSM, would be better than NVIDIA. And there are areas where China has advantages outside of areas that Nvidia can't access as easily, right, around not just scale, but also like some things around, you know, certain optical technologies.

Starting point is 02:24:10 China's actually really good at. So there's certain, I think it's very reasonable that if in 2019 that Huawei was not banned from using TSMC, Huawei would had already eclipsed Apple as the biggest TSM customer and Huawei has huge share in networking and compute and CPUs and all these things

Starting point is 02:24:29 they would have kept gaining share and they'd likely be TSM's biggest customer Wow, that's crazy. I've got kind of a random final question for you. So the other part of the Elon interview was robots.

Starting point is 02:24:39 And so if human noise take off faster than people expect, if by 2030 there's millions of human noise running around which each need local compute,

Starting point is 02:24:50 any thoughts on what that implies? What would be required for that? You know, there's a lot of like difficulties with like the VLMs and all these things that people, VLAs that people are deploying on robots. But to some extent you don't need to have all the intelligence in the robot. And it would be much more efficient to not do that, right? Because in the server, in cloud, you can batch process and all these things. So what you may want to do is, hey, a lot of the planning and longer horizon tasks are determined by, by a much more capable model in the cloud

Starting point is 02:25:23 that runs at very high batch sizes. And then it pushes those directions to the robots who then interpolate between each subsequent action or is given like, hey, pick up that cup and then the model on the robot can pick up the cup. And it's like as it's picking up, it's like, oh, you know, in fact, this, you know, things like weight and all these things might have to be and like force may have to be like determined

Starting point is 02:25:45 by the model on the robot, but not everything needs to be like, you know, hey, pick up the robot, you know, this, right? or like, hey, that's a headphone. Actually, I'm the supermodel in the cloud. I know that this headphones are, you know, Sony XM6s, which is not a darkish ad spot, but, you know. I'm like, why he's this guy plugging this thing so hard? It's like on the table.

Starting point is 02:26:03 It's like on his neck when we're entering Satya together. Like, is he getting paid by Sony? Unfortunately not. Unfortunately not. But anyways, like, you know, it might say, hey, the headband is soft. And this is the weight of it and all these things. And then the model on the robot can be less intelligent and take these inputs and.

Starting point is 02:26:20 do the actions. And it may get told by the model in the cloud every second, every 10 times a second. Maybe, you know, it depends on the hurts of the action, but a lot of that can be offloaded to the cloud because otherwise, if you do all of the processing on the device, I believe it would be more expensive because you can't batch. Two, you couldn't have as much intelligence as you do in the cloud because the models will just be bigger in the cloud. And three, we're in a semiconductor shortage world, and any robot you deploy needs leading edge chips because the power is really bad for robots, right? You need it to be low power and efficient. And all of a sudden, you're taking power and chips that would have been for AI data centers and you're putting them in robots.

Starting point is 02:26:58 Yeah. So now that 200 gigawatts gets lower if you're at, if you're deploying millions of humanoids. I think this is very interesting because something people might not appreciate about the future is how centralized in a physical sense intelligence will be. We're right now with humans, your compute, like there's eight billion humans and their compute is on their heads, on their person. And in a future, even with robots that are out of physically in the world. I mean, obviously knowledge work will be done in a centralized way from data centers with huge, like hundreds of thousands of instances or maybe millions of instances.

Starting point is 02:27:30 But even for robotics, the future you're suggesting is one where there's like more centralized thinking and centralized computation that's driving, you know, millions of robots out in the world. And so I think that's just like, yeah, there's an interesting fact about the future that I think people might not appreciate. I think Elon recognizes this, which is. why he's like going to different places for his chips, right? He signed this massive deal with Samsung to make his robot chips in Texas because he thinks, you know, like I personally think he thinks that, you know, Taiwan risk is huge. And because of that and the centralization of

Starting point is 02:28:05 resources in Taiwan, him having his robot chips in Texas and also being a separate supply chain that is not as constrained by no one's making AI chips really on Samsung besides Nvidia's new LPU that they're launched. They're launching it next week, but we're recording it the week before. It's coming out this week. This episode's coming out Friday. Oh, this episode's coming out before. Sick.

Starting point is 02:28:25 So they're launching this new AI chip next week for, which is built on Samsung, but that's like sort of a recent development from Nvidia. And then that's the only other AI like AI demand there, whereas on TSM, everything is competing. So he gets this like both geopolitical diversification, but also supply chain diversity for his robots. And he's not as competing as much with the willingness to pay of infinity of the data center of geniuses. Okay. Final question. On Taiwan, if we believe that tools are the ultimate bottleneck, how much of Taiwan's place in the AISD semiconductor supply chain could we de-risk simply by having a plan to airlift every single process engineer or TSM out when things come to, if they're a semiconductor supply chain, could we de-risk simply by having a plan to airlift every single process engineer or TSM out when things come to, if they, get blockaded or something. Or do you actually still need to ship out the EUV tools,

Starting point is 02:29:19 which would be multiple plane loads per single tool and would not be practical? If you ship out all the process engineers and assuming it's like hot enough that you destroy the fabs, no one has all the fabs in Taiwan now, which is a big risk, right? You know, these tools actually use a lot of semiconductors, which are manufactured in Taiwan. So it's like a, it's like a, you know, a snake eating its own tail sort of like meme because you can't make the tools without the chips from Taiwan, which you can't use. without the tools in Taiwan. There's obviously some diversification there,

Starting point is 02:29:47 and they don't use super advanced chips in lithography tools, but at the end of the day, there is some tail-eating the dragon. Just shipping out all the engineers and blowing up the fabs means China has a stronger semiconductor supply chain than the rest of the world, right? In terms of verticalization, now that you've removed Taiwan. And now you've got all the know-how, but you've got to replicate it in, let's say, Arizona,

Starting point is 02:30:11 or wherever for TSM. And it's going to take a long time to build all the capacity that TSM has had built over the years. And so you've drastically slowed U.S. and global GDP, not just growth, you've shranked the GDP massively.

Starting point is 02:30:28 And you've got a lot bigger problems and your incremental ability to add compute goes to almost zero, right? Instead of hundreds of gigawatts a year by the end of the decade, let's say by the end of the decade something happens to Taiwan.

Starting point is 02:30:40 Now you're at maybe like 10 gigawatts across Intel and Samsung or 20 gigawatts, it's like nothing. Right. But now all of a sudden you've like really cost some crazy dynamics in AI. Of course, you have all the existing capacity, but that existing capacity pales in comparison to the capacity that's being expanded. Yeah. Okay, Dylan, that was excellent.

Starting point is 02:30:58 Thank you so much for coming on the podcast. Thank you for having me and see you tonight. Yes.

Dwarkesh Podcast - Dylan Patel — Deep Dive on the 3 Big Bottlenecks to Scaling AI Compute

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.