The a16z Show - Dylan Patel on the AI Chip Race - NVIDIA, Intel & the US Government vs. China

Starting point is 00:00:00 How you buy GPUs is like buying cocaine. You call up a couple people, you text a couple of people, you ask, you know, how much you got? What's the price? If you're two arch-nemesis suddenly team up, and it's the worst possible news you can have. I did not see this coming. I think it says amazing development. Like a Warren Buffett coming into a stock. Jensen is like the Buffett effect for the semiconductor world.

Starting point is 00:00:20 It's kind of poetic that everything's gone full circle and Intel's sort of crawling to Nvidia. Today, we're talking about one of the biggest surprise. in semiconductors in years. NVIDIA just put $5 billion into Intel. Two long-term rivals now teaming up on custom data centers and PC products, a deal nobody saw coming. For NVIDIA, it's the Buffett effect. For Intel, it's a lifeline.

Starting point is 00:00:46 And for AMD, ARM, and the global chip race, the fallout could be massive. To break it all down, I'm joined by Dylan Patel, chief analyst at semi-analysis, Sarah Wang, general partner at A16Z, and Guido Appenzeller, partner at A16Z and former CTO of Intel's

Starting point is 00:01:02 data center and AI business unit. Let's get into it. Dylan, welcome back to the podcast. Thanks for having me, yeah. It just so happens that there's some big news just as we're having you, Nvidia announcing $5 billion investment in Intel and them teaming up to jointly develop

Starting point is 00:01:20 custom data centers and busy products. What do you think about the collaboration? I think it's hilarious that like Nvidia could invest, it gets announced, and their investment's already up 30%. $5 billion investment, $2 billion profit already, right?

Starting point is 00:01:35 I think it's fun because they need their customers to really have big buy-in. So when their potential customers buy-in and commit to certain types of products, it makes a lot of sense, right? And it's kind of funny in a way because in the past, there was this whole thing around

Starting point is 00:01:55 how Intel was sued for being anti-competitive with their chip. chipsets and Nvidia actually got like a settlement from Intel right way back when like the graphics were separate from the GPU and the graphics were really put on the chips set which had like all this other I.O like USB and all this stuff. So it's kind of a funny like turn of events that now Intel is going to make

Starting point is 00:02:20 like a chiplet and package it alongside a chiplet from from Nvidia and then that's like a PC product. Right. So you know it's kind of poised. that everything's gone full circle and Intel's sort of crawling to Nvidia but actually it might just be the best like device right I don't want an arm laptop because it can't do a lot of things and so an x86 laptop with invidia graphics fully integrated would be probably the best product in the market so you're optimistic how do you think this will go I mean sure I mean I

Starting point is 00:02:53 hope I'm a perpetual optimist on Intel because I have to be I was thinking that the structure of the deal that at least like a lot of the government folks and Intel were sort of trying to go for was people get, you know, big customers and the biggest suppliers directly give capital to Intel. But this is sort of the other way around where they're buying some of the stock, having some ownership, but they're not really like diluting the other shareholders. And then the other shareholders will get diluted, slash everyone will get diluted when Intel finally does raise the capital from the capital markets. but because they've announced these deals, and they're pretty small, right? 5 billion in Vedia, 2 billion soft bank. U.S. government was 10.

Starting point is 00:03:37 You know, these are still relatively small. Pretty small, yeah. Yeah, on the nature of things, right? I mean, like, you know, last time I think I said Intel needs like $50 billion, right? Now when they go to the capital markets, it's better. And hopefully they get another, you know, a couple of these announcements.

Starting point is 00:03:52 Maybe, you know, there's all sorts of speculation that Trump is involved in, you know, sort of getting these companies to invest. NVIDIA, and now the government as well, of course, and now is Apple going to come invest, right? And also do something with Intel or who else will come in,

Starting point is 00:04:12 and that'll really boost investor confidence and they can dilute slash go get debt. Like a Warren Buffett coming into a stock. The Jensen is like the Buffett effect for the semiconductor world. Guido, you were the CTO of the Intel Data Center and AIBU, So what are your thoughts?

Starting point is 00:04:29 I think it's really good for customers and consumers in the short term. Having both Intel and like, specifically the laptop market, right? Having to collaborate is amazing. I wonder what's going to happen with any of the internal graphics or AI products at Intel. They might just push a reset and give up on that for now. They currently don't have anything competitive, right? There was the Gaudi effort that's more or less done. There was the internal graphics chips, which never competed really at the high end.

Starting point is 00:04:56 So from that perspective, it makes a lot of. of sense, right? It's for both sides. Look, I think the, for Intel, they needed a breath of fresh air, right? They were sort of desperate. So I think it's a very good thing. I think AMD is fucked. You're just, if you're two arch nemesis suddenly team up, and it's the worst possible news you can have, right? They were already struggling, right? Their cards are good. Their software stack is not, right? They were getting very limited traction, right? They now have a bigger problem outside. I think arm is a little bit screwed as well. well, right? Because their biggest selling point was sort of like, look, we can partner

Starting point is 00:05:31 with everybody that doesn't want to partner with Intel. And that's what they're, in the sense, their number one, you know, like, Nvidia is probably the most dangerous of the future CPU competitors, right? And so they now suddenly have access to Intel technologies and might get in that direction. It remixes the card, right? It's, I did not see this coming. I think it says an amazing development. Yeah, it'll be very interesting to see this play out. To Eric's point, PAC News week, the other thing that we wanted to pick your brain on since we have you here, Dylan, is the other news dropping on Huawei unveiling their kind of AI roadmap. And, you know, obviously they're hyping up the capabilities. I think you guys have

Starting point is 00:06:09 been sort of ahead of the curve of trying to gauge a what can the 950 supercluster actually do. But would love your thoughts on everything that's going on from the China front, right? And this is kind of coupled with deep seek saying their next models are going to be on domestically produced Chinese chips, the Chinese government, kind of banning companies from buying the produced specifically for China and video chips. So there's just sort of a lot of dominoes falling right now in the semi-market in China, but would love your take overall and, I mean, drill into some detail. Yeah, I think when you sort of zoom out to even like, you know, let's walk from 2020,

Starting point is 00:06:46 because I think it's really important to recognize how cracked Huawei is, or even just historically, like they've always been really good. Sure, initially they stole like Cisco source code and firmware and all this stuff, but then they rapidly pass them up as well as every other telecom company. In 2020, they released an ascend chip and submitted to impartial public benchmarks. And they were the first to bring seven nanometer AI chips to market. They were the first to have that, right? Now, you can still say Nvidia was ahead, but the Apple was like nothing, right? And this is when they could access the full foreign supply chain. This was when they just passed Apple to be TSM's largest customer. They were, you know,

Starting point is 00:07:34 clearly ahead of everyone on a manufacturing supply chain sort of design standpoint in a total basis, right? Now, of course, Nvidia still had higher market share, but it was so nascent then, like it could have, they could have really taken over the market. Quality got banned by the Trump one administration from accessing, and then it went into effect in 2020, right, the full ban. And so they were only able to make a small volume of these chips, but they had trained significant models on these chips that they made then. And then over the next couple years, right, in VDIA continue to accelerate, Huawei, because they were banned from TSM, had to go and try and figure out how to manufacture at SMIC, the domestic TSMC. And then they were also in parallel

Starting point is 00:08:17 trying to go through shell companies to manufacture at TSM and acquire memory from Korea and so on and so forth. So by the end of 24, this had gotten in full swing and it was caught, right? It was caught and they finally shut it down. But they were able to acquire 3 million chips, 2.9 million chips from TSM through these other entities, right? roughly $500 million worth of orders, which ends up being a billion dollar fine that the US government gave TSM, if I recall correctly. At least there was a Reuters article of that.

Starting point is 00:08:54 I don't know if they actually issued it, which is important and interesting to gauge because the number of a sense floating out there has not consumed this entire capacity yet. So now we get to 2025, right? The H20 got banned in the beginning of the year, Nvidia had to write off huge amounts of money

Starting point is 00:09:14 our revenue estimate for Nvidia and China for just H20 was north of 20 billion because that's what they were booking in capacity slash had to write off and then it got banned they cut the supply chain like they just said no we're not doing this anymore they had their inventory gets re-approved they resell the inventory but now they're like do we even

Starting point is 00:09:31 restart production is invidia's question and now you have China saying hey like we don't need Nvidia we have domestic alternatives Whether it be Huawei or CamerCon, these companies have capacity, but most of this capacity is still foreign produced, right? Whether it be wafers from TSM, memory from Korea,

Starting point is 00:09:57 Samsung and S.K. Hynix. So the question is sort of like, how much can they do domestically? And there's sort of two fronts there, right? There's the logic, i.e. replacing TSM, and there's the memory, I.E. replacing Sinex, Samsung, Micron. And on the logic side, they are behind, but they're really ramping there. And I think they can sort of get to the production capacity estimates needed. And the U.S. is still allowing them to import all the equipment necessary, pretty much.

Starting point is 00:10:26 The bands are really for beyond the current generation of technology. Beyond 7 nanometer, the bands are really for 5 nanometer and below. Even though the government says they're for 14 nanometer, the actual equipment that's banned is only for below 7 nanometer. and so they'll be able to make a lot of 7 nanometer AI chips and maybe even get to 5 with using existing equipment for 5 nanometer rather than using rather than like taking the new techniques.

Starting point is 00:10:53 And so like there's the logic side and then there's the memory side. And the aspect of Huawei's announcement that was surprising was that they're doing custom memory, right? Yeah. That's the part that is sort of like, hey, this is really exciting. They announced two different types of chips for next year. one that's focused on recommendation systems and pre-fill and then one that's focused on decode.

Starting point is 00:11:16 There's a twin these days. Yeah. So in Nvidia, the same thing. They just announced a pre-fell-specific chip recently. There's numerous AI hardware startups that are really focusing on pre-fill versus decode. And so the sort of split of inference up to two workloads, you know, Huawei's doing the same thing for their next year chip. And what's interesting is the decode one has, you know, custom HBM.

Starting point is 00:11:37 What does that mean? What is the manufacturing supply chain? because that's the one that's tricky, right? How much can they manufacture of that custom HBM? And Invidia and others are also adopting custom HBM only starting next year, right? So it's not like, you know, yes, the manufacturing capacity is not there. The maybe it is going to consume a bit more power. It's going to be slightly lower bandwidth.

Starting point is 00:11:59 But the fact that they're able to do, you know, some of the same things that Nvidia plans to do, AMD plans to do in their memory is, you know, evidence that they're catching up. but then the main question that remains is production capacity. So as far as like, hey, Nvidia's banned in China, right? Like they're saying don't buy Nvidia chips. I think for a period of time, that's fine for China, right? From a perspective of, hey, I'm China. That's fine because you have all this capacity that you, you know, shipped in in 2024.

Starting point is 00:12:29 They haven't turned into AI chips. Now you're turning them to AI chips. You're running all that stockpile down. What about the transition from running that stockpile down to ramping your new stuff, right? And that transition is the one that's really tricky. China's either shooting itself in the foot by not purchasing Nvidia chips during that time period or China's able to ramp. I think they'll be able to ramp. I think it'll take a little bit longer. And there will be like a sort of a gap in between where China probably backtracks and says it's fine. Like bike dance and it's like

Starting point is 00:13:01 begging for invidia chips, right? Like they don't want to use, they use some camera con, they use some of Huawei, but they really want to use Nvidia because it's way better. They don't care about the domestic supply chain. They want to make the best models. They want to deploy their AI as efficiently as possible. And so this is like, you know, the government can mandate them to like not do it, right? So it's not that Nvidia is not competitive. It's that the government's sort of

Starting point is 00:13:26 trying to instigate it. And then like I guess the last sort of thing is like, you know, there's always the argument of like hey, if banning Nvidia chips to China is so good for China, why didn't China do it for itself? And I finally doing it for themselves.

Starting point is 00:13:44 So again, like it'll be interesting to see. Smuggling is still happening, right? Re-exportation of chips from, you know, other countries to China. That is still happening at some volume, low volume, lower, medium volume, right? But then, you know, the direct

Starting point is 00:14:00 shipments of Nvidia chips that are legally allowed to China are not necessarily happening today, but may have to restart at some point because China won't have the production capacity to, you know, they would just have so many fewer AI chips being deployed domestically versus the U.S. And at some point, you kind of have to pick, like, am I all about the internal supply chain or am I all about chasing, you know, super powerful AI? Yeah. So is there an angle here about a negotiation angle as well? Because currently there's still discussions ongoing, what exactly are the boundaries, what can be

Starting point is 00:14:34 exported to China. So these are well-timed announcements if you want to make a point that US should allow more exports. Do you think that's a factor or not? Yes. So, you know, in the report we did a few weeks ago about the production capacity of Huawei and the supply chain, there was a bit in there that we wrote about how, you know, honestly, like, if you were China and you do want Nvidia chips, actually, how do you play this, right? And it's by hyping up your... domestic supply chain. And it's by, it's like, it's like, yes, we can do everything.

Starting point is 00:15:08 It's Huawei announced the most crazy shit possible. Announce the seven years of fucking, or three years of roadmaps. So you said the radio report basically. I think they knew. They were already bid. And then like, say, we're banning Nvidia, right? And then it's like, then the government official is going to think, alongside sort of lobbying from domestic players, like, of course we want to ship them better AI chips.

Starting point is 00:15:30 Like, we're losing this market. We can't lose this market. And it's sort of like, it is 10,000 IQ, right? And we're here playing checkers while they're playing chess. Well, so I guess negotiating chip aside, in that report, you talked about HBM or high bandwidth memory being a bottleneck to Huawei. To your point on one of the surprising aspects of the announcement, do you think it's credible that it's no longer a bottleneck based on what they're saying? Or are they, is it just hype? I think production capacity-wise, it is still absolutely a bottleneck.

Starting point is 00:16:02 they certain types of equipment required for making HBM need to be imported they're working on domestic solutions but as far as we know they have not imported enough equipment for this although if you look at Chinese

Starting point is 00:16:16 import data for different types of equipment right there's there's sort of like fabs spend you know roughly it depends on the process technology but fabs spend roughly different amounts of money on lithography etch deposition metrology right like these different steps

Starting point is 00:16:29 and historically lithography is hovered around you know, 17, 18%, with EUV, it grew to 25%, right? But China, because they wanted, they sort of like wanted to stockpile lithography and they were worried about the becoming ban, they were importing lithography at a much higher rate than that, right? Like 30, 40% of their equipment imports were lithography. And they were just stockpiling lithography equipment.

Starting point is 00:16:55 This is sort of like reversed now in that like, hey, if I want to, and so if you look at the monthly import-export data, both into provinces and in China, but also out of countries, you can see that etch specifically is skyrocketing. And the main thing about stacking HBM is that you have to, you know, when you have each wafer, you have to etch, create this thing called it through silicon via so it can connect from the top to bottom and then you stack them on top of each other, right? 12 high, 16 high for HBM. That's how you make super high bandwidth memory.

Starting point is 00:17:25 And they're imports for etch is like skyrocketing now. So it's like, it's, they don't have the production capacity yet. how fast can they ramp it as a function of how much equipment can they get a and be like the yields right improving yields is really hard on manufacturing intel and samsung are really good and tsmc is just amazing not not that those companies suck like i think is a better way to put it and and so you know it's those two things i think yield they haven't even started production of high speed of of hbm3 right they've only done some sampling of hbm 2 hbm 3 came out like a few years ago so there's still quite a bit of ways on going up the learning curve.

Starting point is 00:18:04 Obviously, I expect them to catch up faster than it took the technology to be developed because it exists in the world. We know how to do it. It's just a matter of actually doing it versus inventing it. And then the other one is sort of the production capacity. A couple months of import-export data is not enough to set up for years' worth of supply chain built up, which is what we have today in Korea for the Korean companies. now Heinex is also investing in the U.S. in Illinois and then microns, primarily in Japan, the American memory companies primarily in Japan and Taiwan, but they're also expanding in Singapore and the U.S. now. There's so much capital that's been invested, it would take some time for China to build up that production capacity to actually match the West. And when I say the West, I mean East Asia, in production, non-China, East Asia in production capacity. So it'll take some time to get there. And I don't think, I think it's like, hey, we can

Starting point is 00:18:59 design this, it's always a question of can we manufacture. And then the thing like that Jensen would say is like, you're betting on China not being able to manufacture. Like, you know, it's a matter of when, not if. And that's the whole calculus that like I think the US government has to be aware of when they're like, hey, what level of AI chips do we sell? Do we sell everything? Probably not because AI is far more powerful. And the end market of AI is going to be way larger than the end market of semiconductors and equipment. Do we sell, you know, what level do we sell at? Well, how much can China make at each specific, you know,

Starting point is 00:19:33 sort of performance tier and then, you know, analyze that and what's the volume and then figure out, like, what is okay, which is like maybe a little bit above or around the same level. Yeah. So if you, to your point on, like, playing chess versus checkers, if you're Jensen, what would your next move be given the situation at hand? It's both like partially, true that he's afraid of Huawei

Starting point is 00:19:57 more than he is like an AMD. Right. He called them formidable. Yeah. Well, like, I mean, like every other like Huawei's beat Apple, right? They passed Apple up in TSM orders. They passed Apple up in phone market share. Not in the U.S., but like in many parts

Starting point is 00:20:13 of the world before the bands came down. And then even now they're growing back again in market share without like Western supply chains. You know, they've done this to numerous other industries. I would say Apple is like a formidable competitor, right? Like, they've beaten a lot of industries.

Starting point is 00:20:29 And so it's reasonable that he's afraid of them. It's sort of, you know, and he's not afraid of A&B. So, like, I think, like, the best thing is, like, try and so as much, like, Huawei announced is reality rather than, like, their hope target. Yeah. And so away all doubt on manufacturing capacity, which I think is not fair, right? I think manufacturing capacity is a real bottleneck for them. And then the yield learnings, real bottleneck, like temporary, maybe.

Starting point is 00:21:01 We'll see how long and we'll see how fast the rest of the, you know, the Nvidia technology advances past what Huawei's capable of, right? And how fast Huawei is able to close the gap. But I think his main sort of pitch would be Huawei is real. They're a formidable competitor. They're going to take over not just the Chinese market, but also for, markets, right, whether it be the Middle East or Southeast Asia or South Asia or Europe or Latam, right, everywhere besides America. And there's a, I think, I think Noah Smith has this analogy, right? This whole idea is that you should goopagos China, right? Make them have their own domestic

Starting point is 00:21:45 industry that is so different from the rest of the world, right? Kind of what happened with Japan in the 70s and 80s. There are, in 90s, they're, and 90s. They're, PCs were so specific and hyper optimized to the Japanese market with like, you know, the weird, like, I don't know if you've seen the weird scroll wheel on the, on these Japanese PCs. Like, you literally, like, it's like, you go like this and it scrolls, right? And it's like, and then the touchpad is a circle. And then that's around it. It's like, things like that are so weird. Totally. And the rest of the world doesn't care, but Japan market likes it, right? And his whole idea is like, let's Galapagos them, i.e. keep their technology within China. And then that's, like,

Starting point is 00:22:21 dead weight loss and they never expand outside versus that we serve the whole world. But the whole risk is that the opposite can also happen, right? Our technology is hyper optimized to running, you know, language models at this scale and RL and you keep, you know, you keep like hardware software code design can take you down a trap path of the tree that like is a dead end. And then China, like, because they're not allowed to access this tree, they're like, oh, okay, then they end up in the like optimal spot, right? We had a local minima. They had a local maximum. They had a local maximum. They a local, a global maxima, right? Like, that's sort of like technological Galapagosing is sort of what Noah Smith's analogy is.

Starting point is 00:22:59 I like it a lot. I don't know if it's accurate, but it's an interesting one. Yeah, I love that. Well, actually, maybe just taking a step back from current events, even though there's so much to talk about right now. Last time you appeared with us, Nvidia came up, obviously. And you talked about a couple of the potential paths forward for NVIDIA. Give us maybe the bull case, the bear case. Fair enough.

Starting point is 00:23:25 There's a lot embedded in their numbers now. But what's interesting is consensus for the banks is like for across like the hyperscalers. So Microsoft, Corrieve, Amazon, Google, and Oracle, right? Meta, right? So it's the six hyperscalers, right? Who I would consider hyperscalers? The consensus for the banks is $360 billion of spend next year. across all of them.

Starting point is 00:23:54 And my number is closer to, like, it's like $450,500. And that's based on like, you know, all the research we do on, like, data centers and, like, tracking each individual data center in the supply chains, right? So this is just Nvidia spent. This is HAPX for the hyperscalers. Right? And that back, Kappex gets split up across different companies, but the vast, vast majority still goes to WemiteA, right?

Starting point is 00:24:17 And Nvidia is in a position not where they take, they can't take share, right? It's they grow with the market slash defense share. And so the question is like, how fast is the growth rate of CAPEX for hypers and other users, right? And the reason I included Oracle and Corrieve is hyper scalers, even though they're traditionally not called hyperscalers because they are opening eyes hyperscaler. So, you know, when you look and you look at the Oracle announcement, right? Like, first of all, the Oracle announcement, I don't understand why people don't think this

Starting point is 00:24:48 is crazier. They did the most unprecedented thing in the history of stocks and public and companies ever. They gave a four-year guidance. And it made Larry the richest man in the world, you know, like all these things. Anyways, you know, the question is like, how fast does revenue grow, right? Do you think Oracle and OpenA.I, which signed a $300 billion plus deal with Oracle, will actually be able to pay $300 billion, right, across raising capital and revenue? and I think most, and it gets to a rate of like over $80 billion a year in just a handful of years, right?

Starting point is 00:25:25 So it's like, do you believe the market will grow that fast? It's very possible, yes. And it's very possible for like, you know, Open AI, what is their revenue going to be exiting next year? Some people think $35 billion. Some people think $45 billion. Some people think $45 billion. You know, ARR, by the end of the year next year, this year, they, they have. hit 20, right? Arr. So if that growth rate is maintained, then all of that cost goes to

Starting point is 00:25:53 compute, plus all the capital they continue to raise, right? And again, there are financials that they sort of like gave to investors for their last round was like, hey, we're going to ban, we're going to burn like $15 billion next year. It's probably more likely going to be like 20. But like, you know, and you stack this on and they're not turning a cash flow, they're not going to be profitable until 2029. So you sort of have like, they're going to continue to bet, burn 15, 20, $25 billion of cash each year plus revenue growth. That's their compute spend. And you do this for entroping, you do this for open-ana, you do this for all the labs. It's very possible that the pie does get to, you know, more than 500, you know, not 360 billion

Starting point is 00:26:31 next year, 500 billion next year, and for total capex. And the pie continues to grow for hypers. Invidia says, actually, it's going to be multiple trillions a year on AI infrastructure. And he's going to capture a huge portion of it. That's his bull case, right? That's the bull case, is AI is actually so transformative and the world just gets covered in data centers and the majority of your interactions are with AI whether it's like, you know, business productivity and telling an agent to do some code

Starting point is 00:26:59 or you're just talking to your AI girlfriend Annie, right? Like it doesn't matter. You know, all of this is running on Nvidia for the most part. The beer case is, you know, even if it does grow a lot. Yeah, go ahead. Save the book case for a second. I think fundamentally the value creation, I think, personally is there, right? I mean,

Starting point is 00:27:15 trillion dollars of value with AI, I can totally see this happen. So assume it's true, where will Nvidia top out? I guess how much do you believe in takeoffs, right? Yes. Yes, so like if there is like a takeoff scenario, right, where like powerful AI builds more powerful AI,

Starting point is 00:27:35 builds more powerful AI, or that creates more and more, you know, each level of intelligence, like, enables more for the economy, right? Like how many, how many, how many monkeys can you employ in your business versus how many, like, humans, right? You know, sort of the same, or how many dogs, right?

Starting point is 00:27:50 Like, you know, there's sort of like, what is the value creation of a human versus a dog? Sort of like the same with AI. So, like, I mean, in this case, the value creation could be hundreds of trillions, if not, you know, the data after that. Do you need this? I mean, if you take every white-collar worker,

Starting point is 00:28:06 make them twice as productive with AI, that's in the hundreds of trillions, isn't it? Yeah, but like, what is twice, you know, like, if you talk to people, the labs, right? Like, twice as productive. What does that even mean? It's replaced them. Right? It's, it's be 10 times better than death. Like, I mean, like, I don't know how soon that. If it's sort of white color work is essentially useless without a constant stream of LN tokens,

Starting point is 00:28:26 right, that make them productive, right? At that point, you basically can tax every single knowledge work in the world, right? Which is most workers in the world long term. Yeah. So, I don't know. What's your guess? Give us a number. What's the cap? Cap? I mean, like, why aren't we making a Matriosa brain? Like, I don't know. Like, I mean, at some point, the machine says humans don't need to live and we need even more compute. One step before that, right? Are we colonizing Mars yet? TBD.

Starting point is 00:28:56 I don't know, man. I find it, like, completely, like, impossible to predict anything beyond five years, given how much stuff is changing. Like, that's how, like, I'll leave into economists, right? Like, you know, like, honestly, like, you know, supply chain stuff is like three, four years out and that's it. And then fifth year is like sort of like yellow, right? So like I just try and ground myself in the supply chain stuff, right? Like it's like a, you know, supply chain and then like,

Starting point is 00:29:26 what is the adoption of AI? What's the value creation? What's the usage? And you can see that in like a short horizon. Beyond that, like, I don't know, like, are we all going to be connected to computers, like BCIs and stuff? Like, I don't know, dude. or our humanoid robots are they going to be you know i mean you saw elon's thing right like he's like

Starting point is 00:29:44 yeah humanoid robots are why teslas worth more than 10 trillion so go hey great what is all that being trained on great in video okay awesome so that that's worth also 10 trillion right like i don't i don't know like uh it's too it's too out there for me i don't like the out there discussions very fair um read some sci-fi books so just pulling out the thread where you talked about i mean this is kind of a throwaway comment, but how market share can't really grow just because it's such a dominant market share. And we talked about, or you guys talked about, the moat of Amidia last time. And obviously, this moat is tied to maintaining that very high market share that they currently have. And I love this sort of historic journey you took us through with Huawei just earlier.

Starting point is 00:30:29 Can you kind of walk through what Nvidia did throughout history to build their moat? It's super awesome because, you know, they failed multiple times in the beginning, and they bet the whole company multiple times, right? Like, Jen's just crazy enough to bet the whole company, right? Like, whether it was, like, certain chips ordering volume before he knew it even worked, and it was, like, all the money he had left, or, like, ordering volumes for projects he had not won yet, like, I heard a rumor that, or not a rumor, but, like, a story from someone who's, like, a gray beard in the industry, and I think would know was like,

Starting point is 00:31:05 you know, no, no, no, like, Nvidia ordered the volume for the Xbox before Microsoft gave them the order. They were just like, they was just like, fuck it, yellow. Yeah, right? I don't know, like, I don't know how true this. I'm sure there's more nuance there, like, you know,

Starting point is 00:31:22 verbal indication or whatever, but like the order was placed before he got the order, right? Like, is what he said. You know, there's cases like with the crypto bubbles, right? Like, there was a couple of them, But like, Nvidia did their damn best to convince everyone

Starting point is 00:31:38 on the supply chain that it wasn't crypto and that it was gaming, real demand, it was gaming and data center and professional visualization. And therefore, you guys should ramp your production

Starting point is 00:31:47 and they all ramped production and spent all this CAPEX on increasing production and building out new lines for them. And they pay per item and then they bought them and sold them at and made shit loads of money.

Starting point is 00:31:59 And then when it all fell apart, they just had to write down a quarter's worth of inventory. whatever. Everyone else was like, well, crap, I have all these empty production lines, right? And so it's like, you know, but, but like what did AMD do then, right? Their chips were actually better for crypto mining, right? On a, you know, amount of silicon, uh, cost versus how much you hash, but like they just didn't, they just, AMD was like, ah, we're going to not really raise production, right? Like, as a reasonable, you know, thing, right? It wasn't a, it's sort of like strike while the iron's hot. And so like, you know, the same has happened with Invidia, right? They've, uh, in recent times, like, sort of, they've ordered capacity that no one believes, right, multiple times. They see the end demand, obviously. But in many cases, they're just like,

Starting point is 00:32:47 their number for, like, Microsoft was higher than Microsoft's internal planning, right? And then Microsoft's internal planning went up, but, like, their number for Microsoft was way higher. And it's like, oh, we just don't think Microsoft's going to need this much, even though they tell us this. It's like, who the heck? It was like, no, no, no, no, customer, you're going to buy more.

Starting point is 00:33:06 Like, and orders, right? And then when the orders come through the supply chain, it's like, I have to put pay NCNR, right, non-cancel, non-returnable, like, you know, this is. Hey. You know, this is, uh, I asked a question in Taiwan once. Uh, there was like a, it was, it was, it was Colette, which is the CFO and Jensen, CEO. They were, they were both there. Um, and it was, it was a room full of like, mostly finance bros and they were asking stupid finance questions like three days before earnings. So obviously they just could not answer anything.

Starting point is 00:33:33 because it's like, you know, SEC regulations. But then my question to them was like, look, Jensen, you're like so vibes, like driven and like very gut feel and like very visionary. And then that's, you know, CFO, like, she's amazing in her own right. But like, you know, those personalities clash, how do you work together? And he's like, I hate spreadsheets. I don't look at them. I just know, right?

Starting point is 00:33:55 It gives his response. And it's like, of course, you know, the best innovators in the world have really good gut instinct. Right. Right. And so like the gut instinct to like order with, you know, with non-cancelable, when you don't know, and they've had to write down over their history multiple times, right? Many, many billions of dollars in accumulative orders, right? So accumulate in total orders. Whether it be, you know, the age 20, which is more regulatory, but like other cases they've ordered and had to cancel. Is that many billions? It's many billions. Peanuts. Well, well, it depends, right? The crypto write-down was like multiple billion when their stock was. was like less than $100 billion, right?

Starting point is 00:34:32 Like, it's like a... That's compared to the upside, right? I think everything you did was right. I think everything AMD did was wrong, like, you know, in that scenario. But like, it is crazy to... Especially in a cyclical industry like semiconductors where companies go bankrupt all the time,

Starting point is 00:34:50 which is why we have all this consolidation, is every down cycle, companies go bankrupt. I mean, if it's a little bit of risk return perspective, right? These beds were totally worth taking. Yes. If you look at it from, I'm a CEO, I want to have predictable quarters for Wall Street. It's a very different story. I think that's a part of the tensions from now.

Starting point is 00:35:08 Yeah, so we, I don't know if you've seen these like Li Kuan Yu edits where they're like him like saying some like fiery speech. And then like it's like some cool music at the end and it's like showing different pictures of them. And so we made one of Jensen recently and put it on social media right on like Instagram, TikTok, XHS, Redbook, right? Twitter, of course, right? like all the different social media. And I really liked it because he's like, he's like, you know, the goal of like playing is to win.

Starting point is 00:35:38 And the goal or sorry, and the reason you win is so you can play again. Right. And you compared it to pinball where like actually you just play all day and you keep getting more rounds. And it's like his whole thing is like, I want to win so I can play the next game. Um, and like it's only about the next generation, right? It's only about now, next generation.

Starting point is 00:35:56 It's not about 15 years from now because that's, it's a whole new playing field every time or five years from now. I think that's, you're right, it's the risk or reward is, is correct. Yeah. But there's few people take these kind of risks. It's the only semiconductor company that's worth, you know, I think even north of $10 billion,

Starting point is 00:36:14 that was founded as late as it was. Like Media Tech was in the early 90s and then, Nvidia and everyone else is like from the 70s mostly. Yeah. From big ones. Yeah. Yeah. Yeah, I think you raised this great point on

Starting point is 00:36:28 these bet the bet the farm and he's actually been wrong a couple times to your point mobile right like what the hell happened with mobile exactly and he still takes them and i think mark actually had this great conversation with eric where he talked about being founder run where you have this memory of the risks you took to get to where you are today right and so in a lot of cases if you're a CEO brought on later on you're sort of like okay continue to steer the ship as is um but in this case he he remembers all the times they almost went bell up and he's like, I've got to bet. Keep making bets like that. How do you think he's changed over? I mean, he's been one of the longest running CEOs over 30.

Starting point is 00:37:06 He's kind of right up there with Larry Ellison now. How do you think he's changed over the last 30 years or so? I mean, obviously, like, I'm 29. I don't forget that what he was like. I've watched a lot of old interviews. I won't say he wasn't... The CEO longer than you've been alive. Yeah, exactly. Exactly. Like, in videos. that was founded before I was born, I'm 96, right? Like, you know?

Starting point is 00:37:32 Yeah, maybe anything over the last couple of you, right? I think even like watching old interviews, right? Like, I watched a lot of old interviews, a lot of old, like, presentations he's given. One thing is that he's just like sauced up and dripped up, like, wait, like, the charisma he's gotten has only gotten stronger.

Starting point is 00:37:48 Right? Yep. Which is an interesting point. I don't know if it's quite relevant. I don't agree with that, yeah. But like, the man, like, has learned to be a rock star more, even though he was always charismatic. It was like, he's a complete rock star now. And he was a rock star, you know, a decade ago, too. It's just people maybe didn't

Starting point is 00:38:09 recognize it. I think, I think the first live presentation that I watched, it was stream was like, um, it was, what's the, what's the, what's the, it's CES like 2014 or 2015 or whatever. Um, he's, he's, he's, it's, it's, it's consumer electronics show. I'm, I'm, I'm like, mom. I'm, I'm, I'm, moderating like gaming gaming hardware subredits, right? Like at the time, I'm a teenager. And like the dude is like talking only about AI.

Starting point is 00:38:37 He's telling, like, all these gamers about AlexNet and self-driving cars. Right? It's like, know your audience, first of all, but also like, like, it has nothing to do with consumer electronics at Gaby. You know, at the time, I was also like,

Starting point is 00:38:53 I was half like, holy crap, this is amazing, but also half like, I want YouTube. announce new gaming GPU, right? Like, you know, but I know like on the forums, on the forums, quickly everyone was like, you know, screw this, you know. I want to hear about the gaming

Starting point is 00:39:08 GPUs, Nvidia's price gouging. Like, you know, of course, invidia's always had to like, we priced the value and like, plus a little bit, right? Because we're just smart enough to know. You know, I'm guessing Jensen just has the gut feel of how to price things, right? He'll change the price like, at least on

Starting point is 00:39:24 gaming launches, he'll change the price up until like right before the presentation. Wow. So like it really is like a gut feel thing probably. And anyway, so, so he, he had that charisma to know what was right. But I think people, a lot of people were like, oh, no, whatever, Jensen's wrong. He doesn't know what he's talking about. But now, like, he talks, people are like, oh, very, very, you know, so it might just be that he's been right enough.

Starting point is 00:39:49 Yeah, there's a post on X recently that said he had moved up into God mode with a select group of CEOs, but that this was, like, it's exactly. Who's the other gods? It was Zuck. Pretty other gods. Elon. Elon, Zuck, and Jensen. Nice, nice.

Starting point is 00:40:07 Okay. Good crew to be in. So we pray to Silicon Valley. The cult now? Exactly. Just on one last thing on people. You mentioned Colette, his CFO, and there's sort of a famously loyal crew at NVIDIA, even though all of the OGs could retire at this point.

Starting point is 00:40:25 Is there anyone akin to a Gwynne Shotwell at SpaceX or previously a Tim Cook to Steve Jobs at Apple that is at Nvidia today? I mean, he had two co-founders, right? Like, that's, you know, let's not overlook that. One of them's like, you know, not involved and hasn't been for a long time. But the other one was involved up until just a, you know, a few years ago, right? So it's not just Jensen running the show, although he was running the show. there's quite a few people on the hardware side. I've always, there's someone at Nvidia that's like mythical to me.

Starting point is 00:41:03 Like when you talk to the engineering teams, he leads a lot of engineering teams. He's a private person, so I don't want to say his name actually. Fair enough. But, you know, he's like, he's like effectively, like chief engineering officers, like his role. And people within his org will know who he is. and I think there are people like that, but, you know, he's intensely loyal, and there's a number of these types of people.

Starting point is 00:41:31 There's another fella who's like, you know, like there's all these like innovative ideas at Nvidia, and he's the guy who literally is like, we need to get the silicon out now, we're cutting features. And that's like what he's famously known for, and all the technologists in Nvidia hate him. This is like a second guy. This is a second guy.

Starting point is 00:41:48 Also intensely loyal to Nvidia has been around for a long time. time, but it's like, you know, it's sort of like, when you have such a visionary company and forward, you know, one problem is that you get lost in the sauce, right? You know, oh, I want to make this. It's got to be perfect, amazing. And it's like, you know, you got to have that sort of like, and these people are like, you know, obviously they're close to Jensen for a reason because Jensen also believes like these things, right, have the visionary future luck egg, but also like screw it, cut it, we'll put it in the next one, ship, right? Like, you know, ship now, ship faster. like in a space like silicon, which is like really hard to do so.

Starting point is 00:42:26 And sort of like the thing about Nvidia that's always been, you know, super impressive. And it's from the beginning days where he's talked about this before is their first chip, their first successful chip, they were going to run out of money. And he had to go get money from other people to even finish the development. And even then he just had enough money because he'd already had a failed chip before this. was the chip came back and it had to work otherwise it would not you know and so they were like because they could only pay for it's called a mask set right basically you put these like I'll call them stencils into the lithography tool and then it like says where the patterns are and

Starting point is 00:43:02 you you know you put the stencil in you deposit stuff you etched off you deposit materials on the wafer etched away and you put the stencil in and like you you like tell it where to put stuff right and then the deposition and etch keeps happening in those spots and you stack dozens of of layers on top of each other, and then you make it with chip. These stencils are custom to each chip, right? And they cost today in the orders of tens and tens of billions of dollars. But even back then, it was still a lot of money.

Starting point is 00:43:28 It wasn't that much then, of course. They could only pay for one set. But the typical thing with semiconductor manufacturing is, you know, as good as you can simulate it, as good as you can do all the verification, you'll send a design in and you have to change it. there's going to be something. It's so hard to simulate everything perfectly.

Starting point is 00:43:50 And the thing about Nvidia is they tend to just get it, right, the first time. Even great executing companies like AMD or Broadcom or whoever, they often have to ship, they're denoted in like A and then a number or B and then a number, so it's like two different parts of the masks.

Starting point is 00:44:10 So like, Nvidia always ships A-0. Almost always. They sometimes ship A-1. And a lot of times, even if they'll start production of the, you know, A is basically the transistor layer, then the numbers like the wiring that connects all the transistors together. So, Nvidia will start production of the A and ramp it really high and then just hold it right before you transition to the metal, just in case they do need to change the metal layers. And so, like, the moment they're ready and they've confirmed that it works, they can just, you know, blast through a lot of production.

Starting point is 00:44:38 Whereas everyone else is like, oh, let's get the chip back. Oh, okay, A0 doesn't work. We've got to make this tweak, make this tweak, and get the chip back. It's called a stepping, right? At the internet, we were very jealous of Nvidia at that time, right? They consistently delivered in the first one we did not.

Starting point is 00:44:52 The data center CPU group, there was one product where, you know, I said A1, A0A1, or you go to B if it's, you have to change the transistor layer as well. So it's like B. Invidia, sorry, Intel got to like E2 once. E2.

Starting point is 00:45:09 Like that's like a 15 revision. This is, this is. It says like a peak. of A&D's, like when they went skyrocketing on market share versus Intel was when Intel was at E2, right? Like 15th stepping. Because there's quarters of delay, right? I mean, it's catastrophic for a go to market. Yeah, each time is a quarter of delay or something, right? Yeah. So it's, it's absurd. So I think that's the other thing about Invidia is like, you know, screw it, let's ship it. Let's get the volume. ASAP. Let's, let's, let's, you know, let's do these things that, you know,

Starting point is 00:45:42 And so anyways, they, like, you know, have some of the best simulation, verification, et cetera, that lets them sort of go from design, you know, from idea to shipment as fast as possible, you know, cutting out any unnecessary features that could delay it, making sure they don't have to do revisions. So they can get, you know, they can respond to the market ASAP. There's a story about how Volta, which was the first Nvidia chip with tensor course, you know, they saw all the AI stuff on the prior generation P-100.

Starting point is 00:46:12 Pascal, and they decided we should go all in on AI, and they added the tensor cores to Volta only a handful of months before they sent it to the FAP. Like they said, screw it, you know, let's change it. And it's crazy. And it's like, if they hadn't done that, who would have, maybe someone else would have taken the AI chip market, right? So there's all these times where they just, and it's, those are major changes, but there's often like minor things that you have to tweak, right?

Starting point is 00:46:40 number four maths or like some architectural detail. Invidia is just so fast. The other crazy thing is they have a software division that can't keep up with that, right? I mean, if you come out with the chip, right, and basically no stepping required, it's immediately in the market, then being ready with drivers

Starting point is 00:46:56 and all the infrastructure on top of all that's just super impressive. Yeah, I love that point because you think of Nvidia benefiting from tailwind after tailwind, but I think both of you're saying, you have to move fast enough and execute well enough and take advantage of those tailwind. And if you think about, and by the way, I loved your CES story. I'm just envisioning him more than 10 years ago talking about self-driving cars.

Starting point is 00:47:18 But, you know, if you think about nailing the video game tailwind, VR, Bitcoin mining, obviously AI now. You know, one thing that, or one of the things that Jensen talks about today is robotics, AI factories. Maybe my last question on Nvidia, what do you think about the next 10 to 15 years? I know calling Beyond 5 is hard. but like what does Nvidia's business look like? It's really a question of and this is like

Starting point is 00:47:49 I think every time I've talked to you know some executives at Nvidia have asked this question because I really want to know and they won't answer it obviously but it's like what are you going to do with your balance sheet like you are the most high cash flow company and like

Starting point is 00:48:06 you have so much cash flow now the hyperscalers are all taking their cash flow like way down right because they're spending on GPUs what is what are you going to do with all this cash flow right like you know even even before this whole takeoff he wasn't allowed to buy ARR right

Starting point is 00:48:25 so so what can he do with all this capital and all this cash right even this $5 billion investment Intel there's regulatory scrutiny there right like it's in the announcement like, yeah, this is subject to review it, right? Like, you know, I imagine that it'll get past, but, like, he can't buy anything big.

Starting point is 00:48:45 He's going to have hundreds of billions of dollars of cash on his balance sheet. What do you do? Is it start to build AI infrastructure and data centers? Maybe. But, like, why would you do that if you can just get other people to do it? And just take the cash.

Starting point is 00:49:01 Well, he's investing those, right? Investing peanuts. Right? You know, like, he gave recently, like, a core we have a backstop, because today it's really hard to find a large number of GPUs for burst capacity, right? Like, hey, I want to train a model for three months, right? I have my base capacity where I don't know my experiments, but I want to train a big model three months.

Starting point is 00:49:22 We know from our portfolio. Yeah, yeah. So, like, Invidia sees this issue. They think it's a real problem with startups. It's why the labs have such an advantage. But what if I could, you know, right now, like, you know, most companies in the valley spend, what, 75% of their round on GPUs, right? At least, yeah. Yeah. What if you could do 75% in three months

Starting point is 00:49:43 on one model run, right? You know? Yeah. And really scale and have some sort of like competitive product and then you have the model. Then you raise more capital, right? Or start deploying, right?

Starting point is 00:49:52 What do you do with it? Is it start buying a crap load of humanoid robots and deploying them? But like they don't really make good software. They don't make really that amazing software for them. In terms of the models, right? They make, you know, the layer below is great.

Starting point is 00:50:06 where they deploy their capitals is like the question. He has been investing up and on the supply chain a little bit though, right? Investing in the neoclouds, investing in some of the model training companies. Yeah, but again, small fries.

Starting point is 00:50:19 Like, he could have just done the entire Anthropic round if he wanted to. Of course he didn't, right? And then, like, really got them to use GPUs. Or like, he could have done the entire, you know, open AI round. Are you going to do any XAI round? Do you these are things he should be doing?

Starting point is 00:50:32 Or what's... I mean, like... Yeah, good question. I don't know, right? I think, like, we'll quote you up for the next round that we're raised. But anyways. He could make venture a dead industry. No, she's kidding.

Starting point is 00:50:46 Take all of the best rounds. But it's a lot of business, yeah. You can do the scenes and then have Jensen mark you up. That's why the word. Well, I don't think. I like it. I think picking winners is obviously really tough for him because he has customers all across this ecosystem.

Starting point is 00:51:01 If he starts picking winners, then, like, his customers will even be even more anxious to leave and give. even more effort to whether it's AMD or some startup or their internal efforts, et cetera, et cetera, right? Buying TPUs, whatever it is. Like, you know, people will, he can't just like invest in these. Like, you know, he can do a little bit, right? A few hundred million in an open AI round is fine.

Starting point is 00:51:24 Or a few hundred million the next AI round is fine. Core weave, right? Like, yeah, everyone's like throwing a fuss about it. But it's like he invested a couple hundred million plus, you know, early on, plus, you know, rented a cluster from them for internal development purposes instead of renting it from a hyperscaler, which is cheaper for Nvidia to do, right?

Starting point is 00:51:43 It's better for them to do it from them than the hyperscalers. It's like, did he really, like, is he really backstopping core weave that much, right? Or, you know, any of the other customers or Neo-Clouds? Like, there's some investment,

Starting point is 00:51:56 but it's more like, this is a good cloud, you know, we'll throw like five or 10% of the round, right? It's not he's taking 50% plus of the round. Is he also reshaping his market? I mean, look, a couple of years ago, there were four big purchases of these cards. You just listed six. To what extent is that...

Starting point is 00:52:14 That's him in Nevis and Leibniz. There's a long list there. Of course. For Matt, yeah. Is that a strategy? It is. I think it absolutely is. But he didn't have to put much capital down to do this.

Starting point is 00:52:27 Like, just chip one earlier than the other? I don't know. Yeah, that's... No, but it's like, if you look at the grand amount of capital, spent investing in the neoclouds, it's, it's a few billion dollars. But he has a lot of other levers if he wants to. Right, right.

Starting point is 00:52:41 Allocations, as you mentioned. What's nice is, you know, historically, you gave volume discounts to the hyperscalers. But because he can use the argument of antitrust, he's like, everyone gets the same price. So fair. It's very fair. It's very fair. You know?

Starting point is 00:52:58 So what should he do with the cat? Or what should guide his I mean, I think like, you know, like there's the argument he should invest in data centers and only the data center layer, not the not what goes in the data center so that more people build data centers. And then if the market demand continues to grow up, data centers in power or not the issue, right? Invest in data centers in power. I've said that to them. They should invest in data centers in power, not in the cloud layer, because the cloud layer is quite commoditized, but quite, it's commoditize or complement, right? It's the whole

Starting point is 00:53:28 phrase. And I won't say being a cloud is commoditized, but it's certainly like, you have a lot of competitors who are decent now. And you've educated the commercial real estate and other infrastructure investment firms into going into AI Infra as well. So, like, I don't think it's the cloud layer that you invest it, right? Do you invest in data centers and energy? Yeah. Do you invest it?

Starting point is 00:53:51 Because that's the bottleneck for your in growth, really. Is A, how much people want to spend and can spend, and B, the ability to actually put them in data centers. and then like robotics and like I think there's like areas he could invest in but nothing requires $300 billion of capital so what do you do you do with the capital? Like I really I really don't know

Starting point is 00:54:11 and I like feel like Jensen has to have some idea there's some visionary plan here because that's what shapes the company right is I mean they could sell they could they could just continue to you know I mentioned $200 billion of free cash flow $250 billion a free cash flow a year what do they do with it like do they just buy back stock

Starting point is 00:54:28 forever? Like, do they go Apple route? The reason my Apple hasn't done anything interesting in like, you know, nearly a decade is you know, they've got a not visionary at the head. Tim Cook's great, a supply chain. And they're just plowing the money into buybacks. They're not really, you know,

Starting point is 00:54:44 automotive, the self-driving car thing failed. We'll see what happens with ARVR. You know, we'll see what happens with wearables, right? But like meta and opening eye might be even better than them. We'll see, like, in others, right? So, so, what he invest in, I have no clue, but nothing

Starting point is 00:55:00 what requires so much capital is the tough question. It actually gets a return. Because the easy thing is like my cost of equity, right? I just buy back. And this completely change the company culture. I think that's another thing, right? There are probably areas he could invest it in, but you suddenly end up with the company doing

Starting point is 00:55:16 two completely different things, which are very difficult to keep on it. But they do like 10 completely different things, right? I mean, one way to look at is we build AI infrastructure. And in the guys of we build AI infrastructure, robots, humanoid around the world are AI infrastructure or data centers and energy is AI infrastructure, right?

Starting point is 00:55:35 Like, you know, like... So the human rights would totally work, right? If you're suddenly pouring concrete and building power plants, it has completely different cultures, completely different stuff of people. It's very much harder. Okay, agree. But there's different ways to invest in the various companies

Starting point is 00:55:47 or like backstop, like, the building of a power plants, right? Like, you know, there's no one who has to build power plants because they're 30-year underwriting things. You know, there's all these different areas where could use capital to, you know, allow something to happen, right? Not necessarily owning it himself.

Starting point is 00:56:05 And look, look and Barry Maddenthal, one of the biggest problems we had was that our customer base sucked, right? I mean, we were selling to, most of the chips went into the large hyperscalers, you know, which they're way to concentrate it, and they build their own chips, and so you can push down your prices. So, honestly, spending it on diversifying the cloud,

Starting point is 00:56:25 you know, the company was in 2014. 14, you guys should have just charged so much that your margins were 80%. What would the world have done? Nothing. The margins were pretty good, that guy. That wasn't the problem. That was the primary problem. There were 60, 65.

Starting point is 00:56:39 They were 80. Still, yeah. Oh, boy. There was a Jetson. It's the Jetson and a different program. GSD is kicking in here. Well, wait, I think Guido's comment is actually a really good segue

Starting point is 00:56:52 into something else we wanted to talk to you about, which is the hyperscalers. And one of the reasons that I love reading semi-analysis is you guys make these out-of-consensus calls that you're often right about. And one of them recently... Wow, is calling... Only often?

Starting point is 00:57:09 You have a Jensen hit rate. It's very high. Where's my billion-dollar, you know, PV-positive bet? But the one that caught my eye was Amazon's AI. resurgence. So I wanted to talk to you a little bit about that just because, you know, I think we found it pretty interesting being on the ground, helping our portfolio companies pick who their partners are. And so we have some microdata on this. But you sort of walk through why they're

Starting point is 00:57:40 behind. Yeah. So in Q1, 2020, I wrote an article called Amazon's Cloud Crisis. And it was about all these neoclouds are going to commoditize Amazon. It was about how, Amazon's entire infrastructure was really good for the last era of computing, right? What they do with their elastic fabric, ENA and EFA, right, their NICs, what they, and the whole protocol and everything behind them, what they do for custom CPUs, etc., right? Like, it was really good for the last era of scale out computing and not the era of sort of scale up AI infra and how Neoclows are going to commoditize them and how their silicon teams were focused on, you know, cost-optimptombs.

Starting point is 00:58:24 whereas the name of the game today is max performance per cost, right? But like that often means you just drive up performance like crazy. Even if cost doubles, you drive up performance more triples because then the cost per performance falls still. That's sort of the name of the game today within Bidia's hardware. And it ended up being like really good call. Everyone like was calling us out like, no, you're wrong.

Starting point is 00:58:49 And this was like when Amazon was like the best stock. and Microsoft really hadn't started taking off yet, and nor had all these other, you know, Oracle and so on and so forth. And since then, Amazon has been the worst performing hyperscaler. And the call here is that, you know, they still have structural issues, right? They still use elastic fabric, although that's getting better, still behind Nvidia's networking, still behind Broadcoms

Starting point is 00:59:16 slash Arista, like type networking, NICs. They still use, you know, their internal AI chip is, okay, but the main thing is that they're now waking up and being able to actually capture business, right? So the main call here is that since that report, AWS has been decelerating revenue. Year-on-year-on-year revenue has been falling consistently. And our big call is that it's actually going to start re-accelerating, right?

Starting point is 00:59:44 And that's because of entropic. It's because of all the work we do on data centers, right? Tracking every single data center, when that goes online and what's in there. the flow through on costs, or if you know how much the chips costs, the networking costs, the power cost. You know how much generally margins are for these things,

Starting point is 01:00:00 then you can sort of start estimating revenue. So when we build all that up, it's very clear to us that they trough on AWS revenue growth at this point. This is the lowest 80% revenue growth will be on a year or a basis

Starting point is 01:00:15 for at least the next year. And it's re-accelerating to north of 20% again. because of all these massive data centers they have online with Traneum and GPUs, right, depends on which one, it depends on which customer. The experience is not as good as, you know, say, a CoreWeave or wherever, but the name of the game is capacity today.

Starting point is 01:00:38 CoreWeave can only deploy so much. They only can get so much data center capacity, and they're really fast at building. But the company with the most data center capacity in the world, that and still today, although they may, may get passed up in the next two years is Amazon. Actually, they will get passed up based on what we see is Amazon. But incrementally, Amazon still has the most spare data center capacity that's going to ramp into AI revenue over the next year. Let me ask for a question. Is that the right type

Starting point is 01:01:06 of data center capacity? Like for the high density AI buildouts today, you need massively more cooling, you need to have enough water close by, need to have enough power close by. Is it in the right place or is it the wrong type of data? So data center capacity, in this sense, I mean, all the way from power is secured to substations built, to transformers, to, you can provide the power whips to the racks. Now, obviously, the data center capacity will differ, right? You know, historically, actually, Amazon's had the highest density data centers in the world. Right?

Starting point is 01:01:37 They went to, like, 40-kilow racks when everyone was still at 12. And if you've ever stepped in foot inside of most data centers, they're, like, pretty cool and dry-ish. if you step inside of Amazon data center, they feel like a swamp. It feels like where I grew up, right? It's like humid and hot because they're like optimizing every percentage.

Starting point is 01:01:59 And so sort of like, your point in here is that like Amazon's data centers aren't equipped for the new type of infrastructure. But when you compare them to the cost of the GPU, like getting, getting, you know, having a complex cooling arrangement is fine. Right.

Starting point is 01:02:14 You know, we made a call on Asera Labs a few months ago, a couple months ago, when they're like at 90 and it's gone to 250 the month after because of what their orders Amazon is placing with them. But there's certain things with Amazon's infrastructure. I won't get too much into it.

Starting point is 01:02:28 But the Rack infrastructure requires them using a lot more of like a sterolabs connectivity products. And the same applies to cooling. Right? So it's on the networking and cooling side. They just have to use a lot more of this stuff. But again, this stuff is inconsequential on cost compared to the GPU.

Starting point is 01:02:47 You can build. My question was more like, look, I may need a major river close by for cooling at this point, right? It's in many areas I just can't get enough water. And, you know, it's probably power in the same region. There's two gigawatts scale sites that they have power all secured, wet chillers and dry chillers all secured. Like everything's fine. It's just not as efficient.

Starting point is 01:03:08 But, you know, that's fine, right? Like, you know, they're going to ramp the revenue. They're going to add the revenue. Not that it necessarily think Amazon's internal models are going to be great. or, hey, their internal ship is better than in videos or competitive with TPU. Or their hardware architecture is the best. I don't necessarily think that's the case. But they can build a lot of data centers and they can fill them up with stuff that will be rented out, right?

Starting point is 01:03:35 And it's a pretty simple, it's a pretty simple thesis. How important has Anthropic been to the co-design for Tradium? Because I remember we had a portfolio company. this was summer, 2023, they invited them to AWS. They spent, man, I think eight hours with them over the course of a week trying to figure out training them back then. It was just impossible to work through.

Starting point is 01:04:01 Is that, you know, obviously that portfolio company hasn't gone back and tried it now, but like how different is it now based on what you're hearing? Oh, it's still bad. Okay, okay. You know, it's tough to use. So there's sort of like This is sort of the argument that every inference company offers

Starting point is 01:04:19 including the AI hardware startups is because I'm only running like three different models at most I can just hand optimize everything and write kernels for everything and even like go down to like an assembly level right How are going to be? It is pretty hard. It is pretty hard. But like you tend to do this for production inference anyways

Starting point is 01:04:40 Like, you aren't using KudyNN, which is Nvidia's like library that's like super easy to generate your, you know, to generate kernels and stuff, right? Like you're not, or not generate kernels, but anyways, you're still using these like ease of use libraries. You know, when you're running inference, you're either, you know, using cutlass or stamping out your own PtX or, you know, in some cases, people are even going down to the SaaS level, right? And like when you look at like say an open AI or like, you know, an anthropic, when they run inference on GPUs, they're doing this, right? And the ecosystem is not that amazing. Once you get all the way down to that level, it's not like using Nvidia GPUs is easy now. I mean, you have an intuitive understanding of the hardware architecture because you work on it so much and everyone's worked on it. And you can talk to other people.

Starting point is 01:05:30 But at the end of the day, it's not like easy, right? Whereas, you know, Anthropic, Traneum or TPUs, actually the hardware architecture is a little bit more simple than a GPU. Larger, more simple cores, rather than having all this functionality, you know, less general. So it's a little bit easier to code on. There's tweets from anthropic people saying they, when they are doing that low level, actually they prefer working on Traynium and TPU because of the simplicity.

Starting point is 01:06:00 No. Interesting. To be clear, Traynium and TPU at. I mean, Tradyam especially is very hard to use. Like, not for the faint of heart. It's very difficult. But you can do it if you're just running, like, if I'm anthropic and I must only run Claude 4.1 opus for Sonnet.

Starting point is 01:06:21 And screw it. I won't even run Haiku. I'll just run high Q on, like, on GPUs or whatever. I'm just going to run two models. And actually, screw it. I'm just going to run opus on GPUs too and true TPS. Sonnet is the majority of my traffic anyways. I could spend the time.

Starting point is 01:06:33 And how often am I changing that architecture every four or six months, right? Like how much? It's not even changing that much honestly, right? I mean, from three to four definitely did change, right? Yeah, I mean, define architectural change. You know, at a high level, like the primitives are more or less the same across the last couple of generations. I don't know enough about anthropics model architecture, to be honest.

Starting point is 01:06:57 But I think, I think from what I've seen at other places, there have been enough changes that it takes time to, you know, program this. and really get, the main thing is like, you know, if I'm anthropic and I have, what, $7 billion ARR now or whatever, north of 10,

Starting point is 01:07:13 you know, by the end of next year, north of 20, right? Like, ARR is like, maybe even 30 is like, that's, and my margins are 50%, 70%, that's $15 billion or training up

Starting point is 01:07:26 that I need, right? Then I can run on sonnet. And most of that's going to be sonnet, three, five, or sorry, four, five, is, right? It's going to be one model serving most of the use cases. So, like, you know, I could, I could spend the time and it'll work on this hardware. Yeah, totally. Maybe on the topic of

Starting point is 01:07:45 non-consensus calls you've made, and maybe I'll move to another cloud, in June, you guys said that Oracle is winning the AI compute market. And then in this pod, we've already referenced the big jump, obviously, that Oracle had. I think it was the single largest gain that a company with over $500 billion of market cap has ever had. So an enormous... Was the 2023 Q1 NVIDIA not bigger? It might have been smaller.

Starting point is 01:08:11 Okay. I think it was maybe close. We'll fact check ourselves. That's amazing. But, you know, obviously this is the massive commitment that was announced. Can you walk us through why you made that call then

Starting point is 01:08:23 and just sort of why Oracle is poised to do so well in such a competitive space? Yeah, so Oracle, the largest balance sheet in the industry that is not dogmatic to any type of hardware, right? They're not dogmatic to any type of networking. They will deploy Ethernet with Arista. They'll deploy Ethernet through their own white boxes. They'll deploy Nvidia networking, Infinite Band, or Spectrum X.

Starting point is 01:08:54 And they have really good network engineers. They have really great software across the board, right again, like ClusterMax. They were ClusterMax Gold because their software is great. There's a couple of things that they needed to add that would take them higher, and they're adding those, right? To Platinum, right, which was where Corbyev was. And so, like, when you couple of two things, right? Like, Open AI's got insane compute demand.

Starting point is 01:09:18 Microsoft is quite pansy. They're not willing to invest in. They don't believe OpenAI can actually pay the amount of money, right? I mentioned earlier, right? The $300 billion deal, opening out you don't have $300 billion. And Oracle is willing to take the bet. Now, of course, the bet is a bit like, there is a bit more security in the bet in that. Oracle really only needs to secure the data center capacity, right?

Starting point is 01:09:43 So this is sort of like how we came across the bet, right? And we've been telling our institutional clients, especially in like a super detailed way, whether it be the hyperscalers or AI labs or semi-electric companies or investors in our data center model because we're tracking every single thing. data center in the world. Oracle doesn't build their own data centers either, by the way. They get them from other companies. They co-engineer, but they don't physically build them themselves. And so they're quite nimble in terms of being able to assess new data centers, engineer them. So we saw all these different data centers Oracle is snatching up in deep

Starting point is 01:10:15 discussions, snatching up, signing, etc. And so we have, you know, hey, gigaw out here, gigaw out there, giga out there, right? Avoline, you know, two gigawatts, right? You have all these different sites that they're signing up and discussions with. And we're, we're noting them. And then we had the timeline because we're tracking entire supply chain. We're tracking all the permits, regulatory filings, you know, through, you know, language models, using satellite photos constantly. And then supply chain of like chillers, transformer equipment, generators, et cetera. We're able to make a pretty strong estimate of quarter by quarter in our data center model, quarter by quarter, how much power there is

Starting point is 01:10:51 for each of these sites. So some of these sites that we know of aren't even ramping until 2007, but we know that Oracle signed it, right? And we have the sort of ramp path. So then it's this question of like, okay, let's say you have a megawatt, right? For a simple sake, simplicity's sake, which is a ton of power,

Starting point is 01:11:09 but now it doesn't feel like much. We're on the gigawatt era. But, you know, if you're talking about a megawatt, right, you fill it up with GPUs. How much do the GPUs for a megawatt cost? Right? Or actually, it's even simpler to do the math, right? If I'm talking about a GV 200, right, each individual GPU is 1,200 watts.

Starting point is 01:11:30 But when you talk about the CPU, the whole system, it's roughly 2,000 watts. At the same time, you know, all in everything, simplicity's sake, $50,000 per GPU, right? The GPU doesn't cost them. There's all the peripheries, right? So $50,000, capax for 2,000 watts. So $25,000 for 1,000 watts. and then what's the rental price for GPU? If you're on a really long-term deal, volume 270,

Starting point is 01:11:59 right, 260 in that range, then you end up with, oh, it costs like $12 million per megawad to rent a megawatt. And then each chip is different. So we track each chip, what the CAPEX is, so you know what each chip is. You can predict what chips they're putting in which data centers, when those data centers go online,

Starting point is 01:12:18 how many megawatts by quarter, and then you end up with, oh, well, Stargate goes online in this time period. They're going to start renting it this time. It's this many chips. Each Stargate site, right? And so, therefore, this is how much opening I would have to spend to rent it. And then you prick that out, and we were able to predict Oracle's revenue with pretty high certainty, and we matched pretty dead on what they announced for 25, 26, 27,

Starting point is 01:12:42 and we were pretty close on 28. The surprise for us was that, you know, they announced some stuff that 28, 29, data centers that they we haven't found yet but we'll find them right of course and sort of like this methodology lets you see sort of hey what data centers are you getting how much power

Starting point is 01:13:00 what are they signing how much incremental revenue that is when that comes online and so that's sort of the basis of our Oracle bet obviously in the newsletter we included a lot less detail but you know you know sort of it was that thesis right that like hey

Starting point is 01:13:18 they have all this capacity they're going to sign these deals. In our newsletter, we talked about two main things. We talked about the opening eye business, and then we talked about the bite dance business. And presumably tomorrow on Friday, there's going to be announcing it about TikTok and all this. But like the bite dance business,

Starting point is 01:13:36 huge amounts of data center capacity that Oracle is also going to lease out to byte dance. So we did the same methodology there. With bite dance, it's pretty certain they'll pay because they're a profitable company. With open AI, it's not. and so there's got to be some error bars as you go further out in terms of like

Starting point is 01:13:52 will opening I exist in 28, 29, 30 and will they be able to pay the 80 plus billion dollars a year that they've signed up to Oracle with? That's the only like risk here. And if that happens, then Oracle's downside is also somewhat protected because they only signed the data center, which is a minority of the cost, right?

Starting point is 01:14:09 The GPs are everything. And the GPs, they purchase one to two quarters before they start renting them. So they're not, you know, the downside risk is pretty low for them in terms of, if they don't get the deal. Well, they don't get the revenue, but it's not like they're stuck with a bunch of assets they bought that are worthless.

Starting point is 01:14:24 Yeah. Yeah. Is that another angle here? I mean, opening air in Microsoft wears off BFFs, and now they're filed to voice papers, and they just want to diversify, and then that's pushing them away towards other providers? Yeah, so Microsoft was exclusive compute provider. It got Reorg to write a first refusal.

Starting point is 01:14:42 You know, and then Microsoft... Is it no last choice or something like that? No, it's still... It's still right at first refusal, but it's like, Microsoft, those two are not mutually exclusive. Well, if Open AI is like, we're going to sign a $80 billion contract or a $300 billion contract for the next five years, you guys want it? Or, you know, it's like, and they're like, no, what? Okay, cool, right? And it's like, it's like, and then they go to Oracle, right?

Starting point is 01:15:07 And it's opening eyes like sort of like, this is, this is the, you know, opening I need someone with a balance sheet to actually be able to pay for it, right? and then they'll make tons of money off of OpenEI on the margins on the compute and the infra and all these things but someone's got to have a balance sheet and Open AI doesn't have a balance sheet Oracle does

Starting point is 01:15:27 although given the scale of what they signed we also had another source of information which was that they were talking to debt markets because Oracle actually just needs to raise debt to pay for this many GPUs overtime now they won't do it like immediately

Starting point is 01:15:44 they can pay for everything this year and next year from their own cash. But like in 27, 28, 29, they'll start to have to use debt to pay for these GPUs, which is what, you know, Corvave has done. And many of the Neal Clouds, most of it's debt financed. Even meta went and got debt for their Louisiana Mega Data Center. Not because, just because it's cheaper than, it's literally better on a financial basis to do buybacks with your cash and get debt because the debt is cheaper than the return on your stock. Like, it's like a financial engineering thing.

Starting point is 01:16:12 but like, you know, who's out there, right? It could be Amazon, it could be Google, it could be Microsoft. It's a very short list. Or it could be Oracle or meta, right? Meta's obviously not. Microsoft's chickened out. Amazon, Google, and Oracle, right? That's all that's left.

Starting point is 01:16:31 Google would be an awkward fit. Yeah, Google would be an awkward fit. Amazon would be a fine fit, but, you know, exactly, right? It's like... It's a very... It's a very drop-mic, yeah. Well, I guess maybe on the topic of these giant data center buildouts, you guys just released a piece on XAI and Colossus 2. Do you, are you getting less impressed by these feats of building something this massive in six months?

Starting point is 01:16:57 Or is it still very impressive to you guys? You know, this is the like thing I've said about AI researchers or that they're like the first class of humans to think about things on an order of magnitude scale. Whereas, like, people have always thought about things in terms of, like, percentage growth, like, ever since industrialization. And before that, it was just, like, absolute numbers, right? You know, sort of, like, humanity is involving in terms of how we think, because things are changing bad. Everything is an X scale. And so, like, you know, it was, like, really impressive when GPT, you know, 2 was trained on so many chips. And then GPD3 was trained on that, you know, like on 20KA 100s and, you know, or sorry,

Starting point is 01:17:40 GPD4 or 20KA 100s, GPD, you know, sort of like, it's like, holy crap. And then it was like, oh, the era of 100K GPUs clusters, right? And we did some reports around 100K GPU clusters. But now there's like, there are like 10, 800K GPU clusters in the world. I was like, okay, this kind of boring. But it's like, 100K GPUs is like, you know, over 100 megawatts. Now it's like, you know, like literally, you know, in our Slack and some of these channels, like, oh, we found another 200 megawatt data center.

Starting point is 01:18:10 There's someone who like puts the yawning emoji. Every time. And I'm like, dude, what? Like now it's only, it's only exciting if you do gigawatt scale. Like we're in gigawatt era. Yeah. Yeah, yeah. And I'm sure like, you know, and, you know, I'm not sure.

Starting point is 01:18:26 Maybe we'll start yawning to that too. But like, you know, the long scale of this is like. The capital numbers are crazy, right? Like, you know, it's like, it's crazy enough that opening I did like $100 billion trillion trading run, you know, or, you know, like, then they did a billion dollar training run. Now we're talking about $10 billion training runs, right? Like, you know, it's, it's crazy that we think in log scale.

Starting point is 01:18:47 But yes, things are only impressive. Yeah. When they do, like, what Elon's doing, so what Elon's doing in, in, in, in, in, uh, Tennessee, in Memphis, first time was crazy, right? 100K GPUs in six months. He bought a factory in like February of 24 and had models training within six months, right? And he did liquid cooling, you know,

Starting point is 01:19:12 first large-scale data center at this scale for AI doing liquid cooling. Like all these sorts of crazy firsts, putting generators outside like cat turbines, all these different things to get the power, you know, mobile substations, all these different crazy things, tapping the natural gas line that's like running alongside the factory, So he does this.

Starting point is 01:19:32 It's like, holy crap. And he did it for 100K GPUs. Right. You know, 200, 300 megawatts, right? Now he's doing it for a gigawatt scale, and he's doing it just as fast, right? And so, like, you would think, like, this is obviously way more impressive that he did it again.

Starting point is 01:19:51 Yeah. But, like, like, they have desensitized, but, like, it's like, you know, like, you've given the child too much candy. Yeah. Right? Exactly. And now, like, the,

Starting point is 01:20:00 child has no, you know, is like, you know, doesn't like apples, right? Like, I don't know. So, so like, yeah, a gigawatt data center. There was all these protests around his Memphis facility. People like, oh, you're destroying the air. And it's like, have you booked around that area of Memphis? Like, there is like a gigawatt gas turbine plant that's just powering generally that area. There's a sewage plant that's servicing the entire city of Minnesota, or sorry, city of the Memphis. And there's like open air pits of like the like like there's open air mining. Like there's all sorts of disgusting shit around there.

Starting point is 01:20:38 Which is needed. Right. We need that stuff to have a country run, right? Like to be clear. And you know, it's like people are complaining about like a couple hundred megawatts in air. Yeah. Of a generation. So he got like protests from all sorts of people.

Starting point is 01:20:53 You know, you got super into the politics side of things. Right. And LACP even protested him. Like. And so he really got like some local municipalities to be like oh I don't like you know like this and so he couldn't do as much

Starting point is 01:21:05 as he wanted to in Memphis but he still needed the data center to be close because he wanted to connect these data centers super high bandwidth, super close and he always already had a lot of infrastructure set up there. So he bought another distribution center at this time and it's still Memphis but the cool thing

Starting point is 01:21:21 about Memphis is it's right across the border from Mississippi right now you know it's like 10 miles away from his original one, but his facility is like a mile away from Mississippi, and he bought a power plant in Mississippi, and he's putting turbines there, the regulation is completely different, right? And if the question is really like galvanized resources and build it really fast, maybe Elon is ahead of everyone. You know, he hasn't made the best model yet, or he doesn't have the best model, at least today, I think. You know, you could argue Grogfor was the best for a little

Starting point is 01:21:54 period of time. But like, you know, it's, it's, it's, it's, it's truly amazing how fast he's able to build these things. And for first principles, it's like, most people are like, fuck, like, you know, they, they, they, we can't, we can't build the power. We can't do power here anymore. I guess we have to find a new site. And it's like, no, no, just go across the border. Like, Coe of Mississippi. And the, my favorite thing is like, Arkansas's right there. So Mississippi gets mad, you know, I don't, you know, the regulation, the whole future data Center's, you know, built in places where multiple states meet. Is that the...

Starting point is 01:22:28 Four quarters, yeah. The optimal regular... I think there's one... There are you guys. Is there a point in the U.S. with five? I know there's a point with four. Four states intersect. There, yeah.

Starting point is 01:22:38 Maybe that's the corner of a data center. Kind of certain. I'm going to buy real estate in that area of front Reddit. Well, I guess on the topic of just maybe new hardware, you had this piece analyzing TCO for the GB 200s. And I'm kind of going to ask this question on behalf of our portfolio companies, which it sounds like you're helping them already. But one of the findings that I thought was really interesting was TCO was sort of 1.6X H-100s for GB200s.

Starting point is 01:23:10 And so obviously, you know, there's this point on, okay, that's sort of the benchmark for the performance boosts that you're going to need to at least make the sort of performance cost ratio benefit from switching over. maybe just talk about what you've seen from a performance standpoint and what do you recommend to portfolio companies, maybe in a smaller scale than XAI, who are thinking about new hardware, try to get it. There's capacity constraints, obviously. Yeah, I mean, that's a challenge, right? With each generation of GPU, it gets so much faster

Starting point is 01:23:42 that you end up like you want the new one. And in some metrics, you could say GB200 is, three times faster than, or two times faster than the prior generation. Other metrics, you can say it's way more than that, right? So if you're doing pre-training versus inference, right? They can run everything for a bit, right? Yeah, if you can run it for a bit or just inference and take advantage of the huge NVLink, NVL 72, you know, there's ways you can, you could squit and say GV200 is only 2x faster than H-100,

Starting point is 01:24:17 in which case, 1.6-TCL. It's, you know, it's worthwhile, right? it's worth going to the next gen. But more marginal. It's more marginal. It's not a big deal. Then there's other cases where it's like, well, if you're running deep seek inference, the performance difference per GPU is north of like 6,7x.

Starting point is 01:24:35 And it continues to optimize, you know, for deep seek inference. And so the question, you know, then it's like, well, I'm only paying 60% more for 6x. And it's like, it's a 4x or 3x performance per dollar gain. Like, absolutely, right? If you're like in writing inference of deep seek, that can also include RL, right? And so the question is sort of, and then the other question is like, well, the GPU is new. You know, there's B200, there's GV 200, there's B200. B200 is much more simple from a hardware perspective.

Starting point is 01:25:05 It's just eight GPs in a box. So then it's not as much of a performance gain, especially in inference. But you have, you have all the stability, right? It's an eight GPU box. It's not going to be unreliable. The GV 200s are still having some reliability challenges. Those are being worked through. It's getting better and better by the day.

Starting point is 01:25:22 But it's still a challenge. But, you know, when you have a GB2, when you have a H100, right, Box, or H200, 8 GPUs, one of them fails. You take the entire server offline yet to fix it, right? So usually if your cloud's good, they'll swap it in, right? But if it's GV200, what do you now do with 72 GPUs? If one fails, you break the whole thing and get a new 72-R-R-R-R. the blast radius of a failure, right?

Starting point is 01:25:50 Note, GPU failure rates at best are the same and likely worse, right, gen on gen, because everything's getting hotter, faster, etc. So at best, the failure rates are the same. If you model the failure rates as the exact same because you go from one out of eight to one out of 72, it's a huge problem.

Starting point is 01:26:06 So now what a lot of people are doing is they run a high priority workload on 64 of them and then the other eight you run low priority workloads, which is then like, okay, this is this whole infrastructure challenge. I have to have high priority workloads, like the low priority workloads.

Starting point is 01:26:19 When a high priority workload has a failure, instead of taking the whole rack offline, you just take some of the GPUs from the low priority one, put it in the high priority one, then you just let the dead GPU sit there until you service the rack at a later date. And it's like, there's all these complicated infrastructure things that make it so, oh, wait, actually,

Starting point is 01:26:38 that 3x or 2x performance increase in pre-training is lower because the downtime is higher. slash I'm not using all the GPUs always slash I'm not able to you know I'm not smart enough or I don't have the infra to like have low priority and high priority workloads like it's not impossible yeah the labs are doing it right like it's just

Starting point is 01:26:57 I mean if I'm running a cloud it's actually really hard right because I probably have to rent the spot one like the spares out of spot instance or something no no no because then because it's a coherent domain it's NVLink you don't want anyone touching that so it has to be the end customer doesn't have to leave them because it's empty spares that's even worse

Starting point is 01:27:12 the end customer usually would just be like I want them and the SLAs and the pricing, everything is like accounting for that, right? So like generally when you have a cloud, you have an SLA, right? That is, hey, it's going to be uptime is going to be 99% you know, blah, blah, blah, right? For this period.

Starting point is 01:27:29 With GB200, it's 99% for 64 GPUs, not 72. And then it's like 95% for 70%. Now it differs across every cloud. Every cloud is a different SLA. Got it, yeah. But like, they've adjusted for this because they're like, look, this hardware is just finicky.

Starting point is 01:27:45 still want it. You know, we will credit you in that 64 of them will always work, right? Not 72. And so, like, there's this whole, like, finicky nature. And the end customer has to be capable of dealing with the unreliability. And it's like, and the end customer can just continue to use B200, right? Performance games, not as much. The whole reason you want this 72 domain is so you can have, you know, some of these gains.

Starting point is 01:28:09 Right. But you have to be smart enough to be able to do it. And that's challenging for small companies. Totally. So the... Invita has announced the Ruben pre-fell cards, like CTX? Yeah, CPX, there we go. What's your take on that?

Starting point is 01:28:23 Does it cannibalize? Dude, by the way, I don't know if this is like brain rot or like, I don't know, but like, I can't remember what I had for lunch yesterday. But I know the model number of every fucking chip, like... Hots you in your dreams. We're broken, we're broken. Living the dream. No, no, no, no.

Starting point is 01:28:43 You know, why do you pre-enacted? announce a product that's 5x faster for certain use cases? Is that that much? I think it's got something great. Historically, AI chips for AI chips, right? And then we started getting a lot of

Starting point is 01:28:59 people saying, this is a training chip, this is an inference chip. Actually, training and inference are switching so fast and what they require that now it's still like one chip. Actually, there are still workload level dynamics that differ, but the main workload is inference, even

Starting point is 01:29:15 in training, right? Because of RL, most of that is, you know, generating stuff in an environment and trying to, you know, achieve a reward, right? So it's inference still, right? Training is now becoming mostly dominated by inference as well. But inference has like two main operations, right? There is calculating the KV cash for pre-fill, right? Here's all these documents. Do the attention between all of them, right? Between all the tokens, however, you know, whatever type of attention you use. and then there's decode, which is auto-aggressively generate each token. These are very, very different workloads.

Starting point is 01:29:50 And so initially, the ideas or infrastructure techniques, the ML systems techniques were, oh, okay, I will just make the batch size every single forward pass this big. And if I make it, let's call it, I'll make it a thousand big. And maybe

Starting point is 01:30:06 I'll run 32 users concurrently, that way, you know, now I still have 900-something left, 960, left, right? That 960 is actually doing the pre-fell for if a request comes in, it chunks it. It's called trunk pre-fell. You pre-fell chunks of it now. You get really good

Starting point is 01:30:22 utilization on GPUs. But then that ends up impacting the decode workers, right? The people were auto-aggressively generating each token, that being having slower GPS. And tokens per second is really important for user experience and all these other things, right? So then the

Starting point is 01:30:38 idea is like, okay, these two workloads are so different and they are literally different, right? You pre-fill and then you decode. It's not like you're interleaving them. So why don't we split them entirely? And this is done on the same type of chip, right? Open AI, Anthropic, Google. Pretty much everybody does it. Everyone, everyone good. Everyone's good. Together, fireworks. All these guys do pre-fill decode, disaggregated pre-filled decode. So they run pre-fill on a set of GPUs. Why is this beneficial? Because you can auto-scale them. Right? You can, hey, all of a sudden, I have a lot more long context workers. I allocate more.

Starting point is 01:31:12 resources to pre-fill. Oh, all of a sudden have a, you know, not all of a sudden, but like, you know, over time, my traffic mix is not long input, short output. It's short input, long output. I have more decode workers. This way I can guarantee, and so now I can auto-scale the resources differently, and I can also guarantee that my pre-fill time is, you know, the by the time, you know, what's really important in search is how fast you get the page to start loading. Not when does the resource happen. What do people do in games? Like the loading screen often has some sort of interactive environment or it blends in over time or whatever it has tips and tricks, ways to distract you.

Starting point is 01:31:47 The same thing is, there's like studies and papers out there that users prefer a faster time to first token, right? First token gets streamed to me sooner, even if the total time to get all my tokens is a little bit longer. I can't read that fast anyways, right? So. I mean, I like to skiv. Yeah, I like it just good.

Starting point is 01:32:05 Yeah, I mean, most models return above speed reading, speed. But you need that, right? I think, I think, but like, you know, the idea is that you want to guarantee time to first token is a certain level for user experience reasons. Otherwise, people like, screw this, not using AI. The decode speed matters a lot, too, but not as much as time to first token. And so by having separate pre-filled decode, you do this, right? But now you've already, and this is all in the same infrastructure, you've already done this. So now it's like, what's the next logical step? These workloads are so different. Decode, you have to load all the parameters in and the kV caches. to generate a single token. You batch a couple users together, but very quickly you run out of memory capacity or memory bandwidth because everyone's KV cache is different.

Starting point is 01:32:48 The attention of all the tokens, right? Whereas on pre-fill, I could even just serve like one or two users at a time because if they send me a 64,000 context request, that is a lot of flops, right? 64,000 contacts requests. I'll use Lama 70B because it's simple to do math on, like 70 billion parameters.

Starting point is 01:33:07 That's 140 gigaflops, for token, 70 times 64,000, that's many, many terraflops. You can use the entire GPU for like a second, right? Like potentially, right? Depending on the GPU to just do the pre-fill, right? And that's just one forward pass. So I don't necessarily care about, you know, loading all the tokens or all the parameters in KV cache and fast.

Starting point is 01:33:31 All I care about is all the flops. And so that leads us to sort of like, you know, I had to, I think it was long-go-ded explanation because it's hard for people to understand what CPX is. I've had a lot of like, even my own clients, like, we set like multiple notes like explaining and they're like, I still don't understand. I'm like, shit, okay. Send it to the attention is all you need paper.

Starting point is 01:33:48 You can't expect. I mean, like, think about like a, like a networking person. Like they're like, no, I don't need to know about this. You know, attention is all you need, right? Like it's like, we're thinking about an investor, right? Like, you know, there's all people. Maybe the data center operator. Like they're like, oh, there's two chips.

Starting point is 01:34:02 Why? Should I build my data center for differently? It's like, like, you know, I got to explain everything. Or just like, no. You don't have to build differently. But anyways, you get to now... In Stanford, there's 25% of all students, not CS students, of all students, read their paper.

Starting point is 01:34:18 Read what paper? Attention is what you need? That's low. They do majorists and you don't like the philosophy. I'm like this amazing. Anyway, sorry. The Middle East, I can't remember what country it is, has AI education starting at, like, age eight,

Starting point is 01:34:31 and in high school, they have to read attention is all you need. Wow. Someone told me that their... Sand had to read attention is all you need. Which is, I don't know. Look, look, top-down mandates for education, you know, maybe they work, maybe they don't. Like, you know, maybe people like homeschooling are kids. I don't know.

Starting point is 01:34:47 I went to public school, but like, back to your readers. Yeah, God. Just on the topic of hardware cycles, I wanted to maybe, yeah. I didn't actually explain what CPX is. So CPEX is a very, like, compute-optimized chip, whereas, you know, for pre-fill, and then decode is, just to simply say it is, like, the rest of, is the normal with chips with. HBM. HBM is more than half the cost of the GPU. If you strip that out, you end up having a much cheaper chip passed onto the customer. So, or like, you know, if Nvidia takes the same margin, then the cost of this pre-fill chip is much, much lower. And now the whole process is

Starting point is 01:35:24 way cheaper and more efficient. Now long context can be adopted. Right. Yeah. Well, so I, I love that. We're actually going with all this detail, because I had a more 10,000-foot view question for you, which is I haven't been following the semi market as closely as you have. I probably started with the A100. And I remember helping Gnome at Character, this is summer of June 23, chased down GPUs. And the only thing that mattered at that time was delivery date

Starting point is 01:35:54 because there was a huge capacity crunch. And then to see that over the last two years evolve, where, let's say, six to 12 months ago, people were doing these RFPs to 20 neoclouds, right? And the only thing that mattered to some degree was price. Right, people actually do RFPs for GPUs? Yes. So just to be clear, my opinion on how you buy GPUs is that it's like buying cocaine or any other drug.

Starting point is 01:36:20 This is described to me, not me. I don't buy cocaine. Okay, yeah. Right. Someone tells me this. Someone tells me this I'm like, holy shit, it's right. You call up a couple people. You text a couple people.

Starting point is 01:36:29 You ask, you know, how much you got. What's the price? It's like. Exactly. Exactly. This is fucking like buying drugs. Sorry, sorry. No, I mean, it's the same way. You just send like, we have Slack connects with like 30 neoclouds. There you go. As well as like some of the major ones. And we just send them a message like, hey, customer wants this much. You know, this is what they're looking for. And then they send quotes. I know this guy.

Starting point is 01:36:55 I know a guy. Well, so I think that's actually a very accurate description. And I've sent countless port code is your cluster max original post. Because I thought it did. a really good job breaking them down. But maybe one question to end on for me is just, what era are we in now with Blackwell's coming online? Are we sort of back to the summer 2020-3 era? And that's kind of the cycle

Starting point is 01:37:17 that we've just entered? Or what sort of your view on where do we on? So for a very good question. For one of your port cos, we were like, you know, after their difficulties with Amazon, we were like, okay, let's actually

Starting point is 01:37:32 like get you, GPUs. the original deals we got you were gone, but here's some other deals, right? It turned out that multiple major neoclouds had sold out of Hopper capacity. And their Blackwell capacity comes online in a few months. So it's a bit of a challenge, right? Due to inference? Infference demand has been skyrocketing this year, right? Reasoning models.

Starting point is 01:37:54 These reasoning models are revenue. It's been skyrocketing this year. And then also, like, there's a bit of like the, you know, blackwell comes online. but it's hard to deploy, so it takes a little, you know, there's a learning curve

Starting point is 01:38:06 to deploying it. So whereas, like, you got down to, like, you buy the hopper, you install the data center, it's running within, like, you know, a month or two, right? For Blackwell,

Starting point is 01:38:14 it was like, it's a longer time frame because of liability challenges, it's a new GPU. I mean, it's just learning pain, right? Learning, learning, growing pains.

Starting point is 01:38:22 So there was, there was, like, this gap of, like, how many GPs are coming onto the market right as revenue starting to inflect. And so a lot of capacity got sucked up, right?

Starting point is 01:38:31 And actually, prices for hopper bottomed like three or four months ago or like five or six months ago. Yeah. And actually they've like crept up a little bit now. They're still like, you know, not, not. So I do, I don't think we're quite 2023, 2024 era of GPUs are tight. But certainly if you want to, if you want like just a few GPUs, it's easy.

Starting point is 01:38:52 But if you want a lot, it's, it's hard. Yeah. Like you can't get capacity that instantly. Yeah. Wow. What a time. So we, so we wrap on that. Dylan, this was another instant classic.

Starting point is 01:39:04 Thank you so much for coming to play. It was like two hours, bro. Oh, no. I missed. We couldn't stop. Thanks so much. It's great. Thank you so much for having me.

Starting point is 01:39:14 Thanks for listening to the A16Z podcast. If you enjoyed the episode, let us know by leaving a review at rate thispodcast.com slash A16Z. We've got more great conversations coming your way. See you next time. As a reminder, the content here is for informational purposes only. Should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security and is not directed at any investors or potential investors

Starting point is 01:39:38 in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see A16Z.com forward slash disclosures.

The a16z Show - Dylan Patel on the AI Chip Race - NVIDIA, Intel & the US Government vs. China

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.