a16z Podcast - Dylan Patel on the AI Chip Race - NVIDIA, Intel & the US Government vs. China

Starting point is 00:00:00 How you buy GPUs is like buying cocaine. You call up a couple people, you text a couple of people, you ask, you know, how much you got, what's the price? If you're two arch nemesis suddenly team up, and it's the worst possible news you can have. I did not see this coming. I think it's an amazing development. Like a Warren Buffett coming into a stock. Jensen is like the Buffett effect for the semiconductor world. It's kind of poetic that everything's gone full circle and Intel's sort of crawling to Nvidia.

Starting point is 00:00:27 Today, we're talking about one of the biggest surprise. in semiconductors in years. NVIDIA just put $5 billion into Intel. Two long-term rivals now teaming up on custom data centers and PC products, a deal nobody saw coming. For NVIDIA, it's the Buffett effect. For Intel, it's a lifeline. And for AMD, ARM, and the global chip race, the fallout could be massive.

Starting point is 00:00:50 To break it all down, I'm joined by Dylan Patel, chief analyst at semi-analysis, Sarah Wang, general partner at A16Z, and Guido Appenzeller, partner at A16Z and former CTO of Intel's data center and AI business unit. Let's get into it. Dylan, welcome back to the podcast. Thanks for having me, yeah.

Starting point is 00:01:12 It just so happens that there's some big news, just as we're having you, NVIDIA, announcing $5 billion investment in Intel and them teaming up to jointly develop custom data centers and PC products. What do you think about the collaboration? I think it's hilarious that like, Nvidia could invest, it gets announced, and their investment's already up 30%.

Starting point is 00:01:31 $5 billion investment, $2 billion profit already, right? I think it's fun because they need their customers to really have big buy-in. So when their potential customers buy-in and commit to certain types of products, it makes a lot of sense, right? And it's kind of funny in a way because in the past, there was this whole thing around how Intel was sued for being anti-competitive with their chipsets. And Nvidia actually got like a settlement from Intel, right? Way back when like the graphics were separate from the GPU and the graphics were really put on the chip set, which had like all this other I.O.

Starting point is 00:02:10 Like USB and all this stuff. So it's kind of a funny like turn of events that now Intel is going to make like a chiplet and package it alongside a chiplet from from Nvidia. And then that's like a PC product. So, you know, it's kind of poetic that everything's gone full circle and Intel sort of crawling to NVIDIA. But actually, it might just be the best, like, device, right? I don't want an arm laptop because it can't do a lot of things. And so an X-86 laptop with NVIDIA graphics, fully integrated, would be probably the best product in the market.

Starting point is 00:02:49 Sir, are you optimistic? How do you think this will go? I mean, sure. I mean, I hope. I hope, right? I'm a perpetual optimist. on Intel because have to be. I was thinking that the structure of the deal that at least like a lot of the government

Starting point is 00:03:02 folks and Intel were sort of trying to go for was people get big customers and the biggest suppliers directly give capital to Intel.

Starting point is 00:03:13 But this is sort of the other way around where they're buying some of the stock having some ownership but they're not really like diluting the other shareholders

Starting point is 00:03:20 and then the other shareholders will get diluted slash everyone will get diluted when Intel finally does raise the capital from the capital markets, but because they've announced these deals and they're pretty small, right?

Starting point is 00:03:31 5 billion Nvidia, 2 billion soft bank. U.S. government was 10. You know, these are still relatively small. Pretty small, yeah. Yeah, on the nature of things, right? I mean, like, you know, last time I think I said Intel needs like $50 billion, right? Now when they go to the capital markets, it's better.

Starting point is 00:03:48 And hopefully they get another, you know, couple of these announcements. Maybe, you know, there's all sorts of speculation that Trump is involved in, you know, sort of getting these companies to invest. NVIDIA, and now, you know, the government as well, of course, and now, you know, is Apple going to come invest, right? And also do something with Intel or who else will come in? And that'll really boost investor confidence,

Starting point is 00:04:14 then they can dilute slash go get debt. Like a Warren Buffett coming into a stock. The Jensen is like the Buffett effect for the semiconductor world. Guido, you were the CTA. of the Intel Data Center and AIBU, what are your thoughts? I think it's really good for customers and consumers in the short term. Having both Intel and like, specifically the laptop market, having the two collaborate is amazing.

Starting point is 00:04:40 I wonder what's going to happen with any of the internal graphics or AI products at Intel. They might just push a reset and give up on that for now. They currently don't have anything competitive, right? There was the Gaudi effort that's more or less done, right? There was the internal graphics chips, which never competed really. the high end, right? So from that perspective, it makes a lot of sense, right? It's for both sides.

Starting point is 00:05:02 Look, I think the for Intel, they needed a breath of fresh air, right? They were sort of desperate. So I think it's a very good thing. I think AMD is fucked. I mean, they're just, if you're two arch nemesis suddenly team up and it's the worst possible news you can have,

Starting point is 00:05:17 right? They were already struggling, right? Their cards are good. Their software stack is not, right? They were getting very limited traction, right? They now have a bigger problem. slide. I think Arm is a little bit screwed as well, right, because their biggest selling point was sort of like, look, we can partner with everybody that doesn't want to partner with Intel. And that's what in the sense, the number one, you know, like, Nvidia is probably the most dangerous of the future CPU competitors, right? And so they now suddenly have access to Intel

Starting point is 00:05:42 technologies and might get in that direction. It remixes the card, right? It's, I did not see this coming. I think it says amazing development. Yeah, it'll be very interesting to see this play out. To Eric's point, PAC News Week, the other thing that we wanted to pick your brain on, since we have you here, Dylan, is the other news dropping on Huawei unveiling their kind of AI roadmap, and, you know, obviously they're hyping up the capabilities. I think you guys have been sort of ahead of the curve of trying to gauge what can the 950 supercluster actually do. But would love your thoughts on everything that's going on from the China front, right? And this is kind of coupled with deep seek saying their next models are going to be on domestically produced Chinese chips, the Chinese government, kind of banning companies from buying the produced specifically for China and video chips. So there's just sort of a lot of dominoes falling right now in the semi-market in China, but would love your take overall and, I mean, drill into some detail. Yeah, I think when you sort of zoom out to even like, you know, let's walk from 2020 because I think it's really important to recognize how.

Starting point is 00:06:49 cracked Huawei is, or even just historically, like they've always been really good. Sure, initially they stole like Cisco source code and firmware and all this stuff, but then they rapidly passed them up as well as every other telecom company. In 2020, they released an ascend chip and submitted to impartial public benchmarks. And they were the first to bring seven nanometer AI chips to market. They were the first to have that, right? Now, you could still say Nvidia was ahead, but the gap was like nothing, right? And this is when they could access the full foreign supply chain. This was when they just passed Apple to be TSM's largest customer. They were, you know, clearly ahead of everyone on a manufacturing supply chain sort of design

Starting point is 00:07:38 standpoint in a total basis, right? Now, of course, Nvidia still had higher market share, but it was so nascent then. Like, it could have really taken over the market. Quality got banned by the Trump administration from accessing, and then it went into effect in 2020, right, the full ban. And so they were only able to make a small volume of these chips, but they had trained significant models on these chips that they made then. And then over the next couple years, right, InViti continued to accelerate, Huawei, because they were banned from TSM, had to go and try and figure out how to manufacture at SMIC, the domestic TSMC.

Starting point is 00:08:14 and then they were also in parallel trying to go through shell companies to manufacture at TSM and acquire memory from Korea and so on and so forth. So by the end of 24 this had gotten in full swing and it was caught, right?

Starting point is 00:08:32 It was caught and they finally shut it down. But they were able to acquire 3 million chips, 2.9 million chips from TSM through these other entities, right? Roughly $500 million worth the orders which ends up being a billion dollar fine that the U.S. government gave TSM, if I recall correctly. At least there was a Reuters article that. I don't know if they actually issued it.

Starting point is 00:08:57 Which is important and interesting to gauge because the number of ascends floating out there has not consumed this entire capacity yet. So now we get to 2025, right? The H20 got banned in the beginning of the year. Nvidia had to write off huge amounts of money our revenue estimate for Nvidia in China

Starting point is 00:09:17 for just H20 was north of 20 billion because that's what they were booking in capacity slash had to write off and then it got banned they cut the supply chain like they just said no we're not doing this anymore they had their inventory gets re-approved they resell the inventory

Starting point is 00:09:30 but now they're like do we even restart production is invidia's question and now you have China saying hey like we don't need Nvidia we have domestic alternatives whether it be Huawei or CamberCon, these companies have capacity. But most of this capacity is still foreign produced, right?

Starting point is 00:09:53 Whether it be wafers from TSM, memory from Korea, Samsung and S.K. Hynix. So the question is sort of like, how much can they do domestically? And there's sort of two fronts there, right? There's the logic, i.e. replacing TSM, and there's the memory, i.e. replacing Synex Samsung Micron. And on the logic side, they are behind, but they're really ramping there. And I think they can sort of get to the production capacity

Starting point is 00:10:19 estimates needed. And the US is still allowing them to import all the equipment necessary, pretty much. The bands are really for beyond the current generation of technology, beyond 7 nanometer. The bands are really for 5 nanometer and below. Even though the government says they're for 14 nanometer, the actual equipment that's banned.

Starting point is 00:10:39 is only for below 7 nanometer. And so they'll be able to make a lot of 7 nanometer AI chips and maybe even get to 5 with, you know, using existing equipment for 5 nanometer rather than using, rather than like taking the new techniques. And so like there's the logic side and then there's the memory

Starting point is 00:10:55 side. And the aspect of Huawei's announcement that was surprising was that they're doing custom memory, right? Yeah. That's the part that is sort of like, hey, this is really exciting. They announced two different types of chips for next year,

Starting point is 00:11:11 one that's focused on recommendation systems and pre-fill, and then one that's focused on decode. There's a twin these days. Yeah. So in Nvidia, the same thing. They just announced a pre-fill-specific chip recently. There's numerous AIA hard-bar startups that are really focusing on pre-fill versus decode. And so the sort of split of inference up to two workloads,

Starting point is 00:11:30 you know, Huawei's doing the same thing for their next year chip. And what's interesting is the decode one has, you know, custom HBM. what does that mean? What is the manufacturing supply chain? Because that's the one that's tricky, right? How much can they manufacture of that custom HBM? And Invidia and others are also adopting custom HBM only starting next year, right? So it's not like, you know, yes, the manufacturing capacity is not there. The maybe it is going to consume a bit more power. It's going to be slightly lower bandwidth. But the fact that they're able to do, you know, some of the same things that Nvidia's plans to do, AMD plans to do in their memory is, is, is, is, is, is, is, you know, evidence that they're catching up. But then, you know, the main question that remains is production capacity. So as far as like, hey, Nvidia's banned in China, right? Like they're saying don't buy Nvidia chips. I think for a period of time, that's fine because, fine for China, right, from a perspective of, hey, I'm China. That's fine because you have all this capacity that you, you know, shipped in in 2024. They haven't turned into AI chips. Now you're turning them to AI chips. You're

Starting point is 00:12:32 running all that stockpile down. What about the transition from running that stockpile down? What about the transition from running that stockpile down to ramping your new stuff, right? And that that transition is the one that's really tricky. China's either shooting itself in the foot by not purchasing Nvidia chips during that time period or China's able to ramp. I think they'll be able to ramp. I think it'll take a little bit longer. And there will be like a sort of a gap in between where China probably backtracks and says it's fine. Like bike dance and is like begging for Nvidia chips, right? like they don't want to use they use some camera con

Starting point is 00:13:07 they use some Huawei but they really want to use Nvidia because it's way better they don't care about like the domestic supply chain they want to make the best models they want to deploy their AI as efficiently as possible and so this is like you know the the government can mandate them to like not do it right

Starting point is 00:13:22 so it's not that Nvidia is not competitive it's that the government's sort of trying to instigate it and then like I guess the the last sort of thing is like There's always the argument of like, hey, if banning Nvidia chips to China is so good for China, why didn't China do it for itself? And I'm finally doing it for themselves. So again, like it'll be interesting to see. Smuggling is still happening, right, reexportation of chips from other countries to China. That is still happening at some volume, low volume, lower medium volume, right? But then, you know, the direct shipments of Nvidia chips that are legally allowed to, China are not necessarily happening today, but may have to restart at some point because China won't have the production capacity to, you know, they would just have so many fewer

Starting point is 00:14:13 AI chips being deployed domestically versus the U.S. And at some point, you kind of have to pick, like, am I all about the internal supply chain or am I all about chasing, you know, super powerful AI? Yeah. So is there an angle here about a negotiation angle as well? because currently there's still discussions ongoing, what exactly are the boundaries, what can be exported to China.

Starting point is 00:14:35 So these are well-timed announcements if you want to make a point that, you know, U.S. should allow more exports. Do you think that's a factor or not? Yes. So, you know, in the report we did a few weeks ago about the production capacity of Huawei and the supply chain,

Starting point is 00:14:51 there was a bit in there that we wrote about how, you know, honestly, like, if you were China and you want Nvidia, you do want Nvidia chips, actually, how do you play this right and and it's by it's by hyping up your domestic supply chain and it's by it's like

Starting point is 00:15:07 yes we can do everything it's Huawei announced the most crazy shit possible announced seven years of fucking three years of roadmaps so you read your report basically I think they knew they were already bid and then like say we're banning Nvidia right like and then it's like

Starting point is 00:15:23 then the government official is going to think alongside sort of like lobbying from domestic players like of course we want ship them better AI chips. Like, we're losing this market. We can't lose this market. And it's sort of like, it is 10,000 IQ, right? And we're here playing checkers while they're playing chess. Well, so I guess negotiating chip aside, in that report, you talked about HBM or high band with memory being a bottleneck to Huawei. To your point on one of the surprising aspects of the announcement, do you think it's credible that it's no longer a bottleneck based on what they're

Starting point is 00:15:57 saying? Or are they, is it just hype? I think production capacity-wise, it is still absolutely a bottleneck. Certain types of equipment required for making HBM need to be imported. They're working on domestic solutions, but as far as we know, they have not imported enough equipment for this. Although, if you look at Chinese import data for different types of equipment, right, there's sort of like fabs spend, you know, roughly, it depends on the process technology, but fabs spend roughly different amounts of money on lithography, etch, deposition, metrology, right? like these different steps.

Starting point is 00:16:30 And historically, lithography has hovered around, you know, 17, 18%, with EUV, it grew to 25%. Right. But China, because they wanted, they sort of like wanted to stockpile lithography and they were worried about the becoming ban, they were importing lithography at a much higher rate than that, right? Like 30, 40% of their equipment imports were lithography. And they were just stockpiling lithography equipment. This is sort of like reversed now in that like, hey, if I would, want to, and so if you look at the monthly import-export data, both into provinces in China,

Starting point is 00:17:03 but also out of countries, you can see that etch specifically is skyrocketing. And the main thing about, you know, stacking HBM is that you have to, you know, when you have each wafer, you have to etch, create the thing called it through silicon via so it can connect from the top to bottom and then you stack them on top of each other, right, 12 high, 16 high for HBM. That's how you make super high band with memory. And they're imports for etches like skyrocketing now. So it's like, they don't have the production capacity yet. How fast can they ramp it as a function of how much equipment can they get, A, and B, like the yields, right? Improving yields is really hard on manufacturing. Intel and Samsung are really good, and TSMC is just amazing. Not that

Starting point is 00:17:44 those companies suck, like I think is a better way to put it. And so, you know, it's those two things. I think yield, they haven't even started production of high speed, of HBM3, right? They've only done some sampling of HBM 2, HBM 3 came out a few years ago. So there's still quite a bit of ways on going up the learning curve. I obviously I expect them to catch up faster than it took the technology to be developed because it exists, right? In the world, we know how to do it. It's just a matter of actually doing it versus inventing it.

Starting point is 00:18:16 And then the other one is sort of the production capacity. You know, a couple months of import-export data is not enough to set up for, you know, years worth of supply chain built up, right? which is what we have today in Korea for the Korean companies. Now, Heinex is also investing in the U.S. in Illinois and then microns, primarily in Japan, the American memory companies, primarily in Japan and Taiwan, but they're also expanding in Singapore and the U.S. now. Like, there's so much capital that's been invested,

Starting point is 00:18:44 it would take some time for China to build up that production capacity to actually match the West. And when I say the West, I mean East Asia in production, non-China, East Asia in production capacity. So it'll take some time to get there And I don't think I think it's like Hey we can design this It's always a question of can we manufacture

Starting point is 00:19:02 And then and then the thing like that Jensen would say is like You're betting on China not being able to manufacture like Right You know it's a it's a matter of when not if And that's the whole calculus that like I think the US government Has to be aware of when they're like hey what level of AI chips do we sell Do we sell everything Probably not because AI is far more powerful

Starting point is 00:19:23 And the end market of AI is going to be way larger than the end market of semiconductors and equipment, do we sell, you know, what level do we sell at? Well, how much can China make at each specific, you know, sort of performance tier and then, you know, analyze that and what's the volume and then figure out, like, what is okay, which is like maybe a little bit above or around the same level. Yeah. So if you, to your point on, like, playing chess versus checkers, if you're Jensen, what would your next move be given the situation at hand? It's both like, part. partially true that he's afraid of Huawei more than he is like an AMD.

Starting point is 00:19:59 Right. He called them permissible. Yeah. Well, like, I mean, like, every other, like, Huawei's beat Apple, right? They passed Apple up in TSMC orders. They passed Apple up in phone market share, not in the U.S., but, like, in many parts of the world, before the bands came down. And then even now they're growing back again in market share without, like, Western supply chains.

Starting point is 00:20:21 You know, they've done this to numerous other industries. I would say Apple is like a formidable competitor, right? Like they've beaten a lot of industries. And so it's reasonable that he's afraid of them. It's sort of, you know, and he's not afraid of A and B. So like I think like the best thing is like try and so as much like Huawei, what Huawei announced is reality rather than like their hope target. And so away all dealt on manufacturing capacity, which I think is not fair, right?

Starting point is 00:20:52 I think manufacturing capacity is a real bottleneck for them. And then the yield learnings, real bottleneck, like temporary, maybe. We'll see how long and we'll see how fast the rest of the, you know, the Nvidia technology advances past what Huawei is capable of, right? And how fast Huawei is able to close the gap. But I think his main sort of pitch would be Huawei is real. They're a formidable competitor. They're going to take over not just the Chinese.

Starting point is 00:21:22 market, but also foreign markets, whether it be the Middle East or Southeast Asia or South Asia or Europe or Latam, right, everywhere besides America. And there's a, I think Noah Smith has this analogy, right? This whole idea is that you should Galapagos China, right? Make them have their own domestic industry that is so different from the rest of the world, right? Kind of what happened with Japan in the 70s and 80s, and 90s, their PCs were so specific and hyper optimized to the Japanese market with like, you know, the weird, like, I don't know if you've seen the weird scroll wheel on these Japanese PCs. Like, you literally, like, it's like, you go like this and it scrolls, right? And it's like, and then the touchpad is a circle, and then that's around

Starting point is 00:22:10 it. It's like, things like that are so weird. Totally. And the rest of the world doesn't care, but Japan market likes it, right? And his whole idea is like, let's Galapagos them, i.e., keep their technology within China and then that's like dead weight loss and they never expand outside versus that we serve the whole world. But the whole risk is that the opposite can also happen, right? Our technology is hyper optimized to running, you know, language models at this scale and RL and you keep, you know, you keep like hardware software code design can take you down a trap path of the tree that like is a dead end. And then China like because they're not allowed to access this tree, they're like, oh, okay. And then they end up in the like optimal spot, right? We, we had a

Starting point is 00:22:49 a local minima, they had a local maximal, they had a local, a global maxima, right? Like, that, that's sort of like technological Galapagosing is sort of what Noah Smith's analogy is. I like it a lot. I don't know if it's accurate, but it's an interesting one. Yeah, I love that. Well, actually, maybe just taking a step back from current events, even though there's so much to talk about right now, last time you appeared with us, Nvidia came up, obviously, and you talked about a couple of the potential paths forward for NVIDIA. Give us maybe the bull case the bear case.

Starting point is 00:23:23 There's a lot embedded in their numbers now, but what's interesting is consensus for the banks is like for across like this the hyperscalers. So Microsoft, Corrieve, Amazon, Google, and Oracle, right?

Starting point is 00:23:44 Meta, right? So it's the six hyperscalers, right? I would consider hyperscalers. The consensus for the banks is $360 billion of spend next year across all of them. And my number is closer to, like, it's like $450,500. And that's based on like, you know, all the research we do on, like, data centers

Starting point is 00:24:02 and, like, tracking each individual data center in the supply chains, right? So this is just Nvidia spend. This is HAPX for the hyperscalers. Right? And that back, Kappex gets split up across different companies, but the vast, vast majority still goes to a video, right?

Starting point is 00:24:16 and Nvidia is in a position not where they can't take share right it's they grow with the market slash defend share yeah and so the question is like how fast is the growth rate of of capex for hyperscalers and other users right

Starting point is 00:24:32 and the reason I included Oracle and Corrieve was hypers galers even though they're traditionally not called hyperscalers because they are opening eyes hyperscaler right so you know when you look and you look at the Oracle announcement right like first of all the Oracle announcement

Starting point is 00:24:46 I don't understand why people don't think this is crazier. They did the most unprecedented thing in the history of, like, stocks and public, and companies ever. They gave a four-year guidance, and it made Larry the richest man in the world, you know, like all these things. Anyways, you know, the question is, like, how fast does revenue grow, right? Do you think Oracle and Open AI, which signed a $300 billion plus deal with Oracle, will actually be able to pay $300 billion? right across raising capital and revenue and I think most and and it gets to a rate of like over 80 billion eight over 90 billion dollars a year uh in just a handful of years right so it's like do you believe the market will grow that fast um it's it's very possible yes and it's very possible

Starting point is 00:25:34 for like you know open AI what is their revenue going to be exiting next next year some people think 35 billion some people think 40 billion some people think 45 billion you know, ARR by the end of the year next year, this year they hit 20, right? ARR. You know, so if that growth rate is maintained, then all of that cost goes to compute plus all the capital they continue to raise, right?

Starting point is 00:25:57 And again, there are financials that they sort of like gave to investors for their last round was like, hey, we're going to burn like $15 billion next year. It's probably more likely going to be like 20. But like, you know, and you stack this on and they're not turning a cash flow, they're not going to be profitable until 2029. so you sort of have like

Starting point is 00:26:14 they're going to continue to burn 15, 20, $25 billion of cash each year plus revenue growth that's their compute spend and you do this for enthrophage, you do this for open-ana, you do this for all the labs it's very possible that the pie does get to you know more than 500, you know, not

Starting point is 00:26:30 360 billion next year, 500 billion next year and for total capex and the pie continues to grow for hypers Nvidia says actually it's going to be multiple trillions a year on AI infrastructure and he's going to capture a huge portion of it. That's his bull case, right? That's the bullcase is AI is actually

Starting point is 00:26:46 so transformative in the world just gets covered in data centers and the majority of your interactions are with AI whether it's like, you know, business productivity and telling an agent to do some code or you're just talking to your AI girlfriend Annie, right? Like it doesn't matter.

Starting point is 00:27:03 You know, all of this is running on Nvidia for the most part. The bear case is even if it does grow a lot. Yeah, go ahead. Stay with the bull case for a second. I think fundamentally the value creation I think personally is there right I mean create trillions of dollars of value with AI I can totally see this happen

Starting point is 00:27:18 so assume it's true where will Nvidia top out I guess how much do you believe in takeoffs right yes so like if there if there is like a takeoff scenario right where like powerful AI builds more powerful AI builds

Starting point is 00:27:35 more powerful AI builds more powerful AI or you know that creates more and more each level of intelligence like enables more for the economy, right? Like, how many monkeys can you employ in your business versus how many, like, humans, right? You know, sort of the same, or how many dogs, right? Like, you know, they're sort of like,

Starting point is 00:27:51 what is the value creation of a human versus a dog? Sort of like the same with AI. So, like, I mean, in this case, the value creation could be hundreds of trillions, if not, you know, the negative. We're after that. Do you need this? I mean, if we take every wide-collar worker,

Starting point is 00:28:06 make them twice as productive with AI, that's in the hundreds of trillions, isn't it? Yeah, well, like, what is twice, you know, like, I mean, like, if you talk to people at the labs, right, like, twice as productive, what does that even mean? It's replaced them, right? It's, and it's be ten times better than death. Like, I mean, like, I don't know how soon that's. If it's sort of white-collar workers essentially useless, without a constant stream of L-O-M tokens, right, that make them productive, right? At that point, you basically can tax every single knowledge worker in the world, right? Which is most, most workers in the world long term. Yeah, so, I don't know. I mean, what's your guess? Give us a number. What's the cap for me? cap? I mean, like, why aren't we making a Matriosa brain? Like, I don't know. Like, uh, like, uh, I mean, at some point, the, the machine says humans don't need to live and we need, I need even more compute. One step before that, may. Are we, are we colonizing Mars yet? TBD. I don't know, man. It's, it's, it's, it's, I find it like completely, like, impossible to predict anything beyond five years, given how much stuff is changing. Like, like, a year. I was a lot. I was a lot. I was a lot. I'm

Starting point is 00:29:09 I'll leave it to economist, right? Like, you know, like, honestly, like, you know, supply chain stuff is like three, four years out and that's it. And then fifth years, like, sort of like, yellow, right? So, like, I just try and ground myself in the supply chain stuff, right? Like, it's like, you know, supply chain, and then, like, what is the adoption of AI?

Starting point is 00:29:27 What's the value creation? What's the usage? Like, and you can see that in, like, a short horizon. Beyond that, like, I don't know, like, are we all going to be connected to computers, like, BCIs and stuff? Like, I don't know, dude. are humanoid robots

Starting point is 00:29:41 are they going to be, you know, I mean, you saw Elon's thing, right? Like he's like, yeah, humanoid robots are why Tesla's worth more than $10 trillion. So like, oh, hey, great. What is all that being trained on? Great, invidia. Okay, awesome. So that's worth also $10 trillion, right? Like, I don't, I don't know. Like, it's too out there for me. I don't like

Starting point is 00:29:57 the out there discussions. Very fair. Read some sci-fi books. So just pulling out the thread where you talked about, I mean, this is kind of a throwaway comment, but how market share can't really grow just because it's such a dominant market share and we talked about

Starting point is 00:30:13 or you guys talked about the moat of Amidia last time and obviously this moat is tied to maintaining that very high market share that they currently have and I love this sort of historic journey you took us through with Huawei just earlier can you kind of walk through what Nvidia did throughout history to build their moat? It's super awesome because they failed

Starting point is 00:30:37 multiple times in the beginning and they bet the whole company multiple times right like Jensen's just crazy enough to bet the whole company right like whether it was like certain chips ordering volume before he knew it even worked and it was like all the money he had left or like ordering volumes for projects he had not won yet like I heard a rumor that or not a rumor but like a story

Starting point is 00:31:01 from someone who's like a gray beard in the industry and I think would know was like yeah no no no like in video ordered the volume for the Xbox before Microsoft gave them the order they were just like they're just like fuck it yellow right

Starting point is 00:31:16 I don't know like I don't know how true this I'm sure there's more nuance there like you know verbal indication or whatever but like the order was placed before he got the order right like is what he said you know there's there's cases like with the crypto bubbles right

Starting point is 00:31:31 like there was a couple of them but like Nvidia did their damn best to convince everyone on the supply chain that it wasn't crypto and that it was gaming, real demand and it was gaming and data center

Starting point is 00:31:43 and professional visualization and therefore you guys should ramp your production and they all ramped production and spent all this CAPEX on increasing production and building out new lines for them and they pay per item and then they bought them and sold them and made shit loads of money

Starting point is 00:31:58 and then when it all fell apart they tend to write down a quarter's worth of inventory whatever everyone else was like well crap I have all these empty production lines, right? And so it's like, you know, but, but like what did AMD do then, right? Their chips, they were actually better for crypto mining, right?

Starting point is 00:32:14 On a, on a, you know, amount of silicon, uh, cost versus how much you hash, but like they just didn't, they, AMD was like, ah, we're going to not really raise production, right? Like, as a reasonable, you know, thing, right? It wasn't a, it's sort of like strike while the iron's hot. And so like, you know, the same has happened with Invidia, right? They've, in recent times, like, sort of, they've ordered capacity that no one believes, right, multiple times. They see the end demand, obviously. But in many cases, they're just like, their number for, like, Microsoft was higher than Microsoft's internal planning, right?

Starting point is 00:32:52 And then Microsoft's internal planning went up, but, like, their number for Microsoft was way higher. And it's like, oh, we just don't think Microsoft's going to need this much, even though they tell us this. It's like, who the heck is like, no, no, no, customer, you're going to buy bar. Like, and orders, right? And then when the orders come through the supply chain, it's like, I have to put pay NCNR, right, non-cancel, non-returnable, like, you know, this is, you know, this is, I asked a question in Taiwan once. There was like a, it was, it was Colette, which is the CFO and Jensen CEO. They were, they were both there. And it was a room full of like mostly finance bros and they're asking stupid finance questions like three days.

Starting point is 00:33:30 for earnings. So obviously they just could not answer anything because it's like, you know, SEC regulations. But then my question to them was like, look, Jensen, you're like so vibes driven and like very gut feel and like very visionary. And then that's, you know, CFO, like, she's amazing in her own right. But like, you know, those personalities clash, how do you work together? And he's like, I hate spreadsheets. I don't look at him. I just know. Right. It's like, is his response. And it's like, of course, you know, the best innovators in the world have really good gut instinct. Right.

Starting point is 00:34:00 Right. And so like the gut instinct to like order with, you know, with non-cancable, when you don't know, and they've had to write down over their history multiple times, right? Many, many billions of dollars in accumulative orders, right? Accumulate in total orders. Whether it be, you know, the age 20, which is more regulatory, but like other cases they've ordered and had to cancel.

Starting point is 00:34:22 Is that many billions? It's many billions. Peanuts. Well, it depends, right? The crypto writedown was like, multiple billion when their stock was like less than $100 billion, right? Like, it's like a, you know, it's pizza. That's compared to the upside, right?

Starting point is 00:34:36 I think, I think everything you did was right. Yeah. And I think everything AMD did was wrong, like, you know, in that scenario. But like, it is crazy to, especially in a cyclical industry like semiconductors where companies go bankrupt all the time, which is why we have all this consolidation is every down cycle, companies go bankrupt. I mean, if a little bit risk return perspective, right, these bits were totally worth taking.

Starting point is 00:35:00 Yes. If you look at it from, I'm a CEO, I want to have a predictable quarters for Wall Street. It's a very different story. I think that's sort of a part of detentions from now. Yeah, so we, I don't know if you've seen these, like, Li Kuan Yu edits, where they're, like, him

Starting point is 00:35:14 like saying some, like, fiery speech and then, like, and then it's, like, some cool music at the end, and it's, like, showing different pictures of him. And so we made one of Jensen recently and put it on social media, right, on, like, Instagram, TikTok, uh, XHS, Red Book, right uh twitter of course right like all the different social media uh and i really liked it because

Starting point is 00:35:32 he's like he's like you know the goal of like playing is to win and and the goal or sorry and the reason you win is so you can play again right and you compared it to pinball where like actually you just play all day and you keep getting more rounds and it's like his whole thing is like i want to win so i can play the next game um and like it's only about the next generation right it's only about now next generation it's not about 15 years from now because it's a whole new playing field every time or five years from now. I think that's, you're right,

Starting point is 00:36:03 it's the risk or reward is, is correct. Yeah, but there's few people take these kind of risks. It's the only semiconductor company that's worth, you know, I think even north of $10 billion that was founded as late as it was. Like, MediaTech was in the early 90s and then Nvidia and everyone else

Starting point is 00:36:22 is like from the 70s mostly. Yeah, big ones. Yeah. Yeah, I think you raised this great point on the bet the farm. And he's actually been wrong a couple times, to your point. Mobile, right? Like, what the hell will happen with mobile? Exactly.

Starting point is 00:36:35 And he still takes them. And I think Mark actually had this great conversation with Eric where he talked about being founder run, where you have this memory of the risks you took to get to where you are today, right? And so in a lot of cases, if you're a CEO brought on later on, you're sort of like, okay, continue to steer the ship as is. But in this case, he remembers all the times

Starting point is 00:36:57 they almost went belly up, and he's like, I've got to bet, keep making bets like that. How do you think he's changed over? I mean, he's been one of the longest running CEOs over 30. He's kind of right up there with Larry Ellison now. How do you think he's changed over the last 30 years or so? I mean, obviously, like, I'm 29. I don't forget that what he was like. I've watched a lot of old interviews.

Starting point is 00:37:22 I won't say he wasn't. He's a CEO longer than you've been alive. Yeah, exactly. exactly. Like, Nvidia was founded before I was born. I'm 96, right? Like, you know? Yeah, anything over the last couple of you, right? I think even

Starting point is 00:37:36 like watching old interviews, right? Like, I watched a lot of old interviews, a lot of old, like, presentations he's given. One thing is that he's just like sauced up and dripped up, like, wait, like, the charisma he's gotten has only gotten stronger. Right? Yep.

Starting point is 00:37:50 Which is an interesting point. I don't know if it's quite relevant. I don't agree with that, yeah. But like, the man like has learned to be a rock star more even though he was always charismatic it was like he's a complete rock star now and he was a rock star you know a decade ago too it's just people maybe didn't recognize it i think i think the first live presentation that i watched it was extreme was like um it was what's the what's the cons is c es like 2014 or 2015 or whatever um he's he's he's he's it's consumer electronic show I'm like moderating like gaming gaming hardware subredits right like at the time I'm a teenager and like the dude is like

Starting point is 00:38:34 talking only about AI he's telling he's telling like all these gamers about AlexNet and self-driving cars right it's like know your audience first of all but also like like it has nothing to do with consumer electronics at Gaby you know at the time

Starting point is 00:38:52 I was also like I was half like holy crap this is amazing but also as a hot half, like, I want you to announce new gaming GPU, right? Like, you know, but I know, like, on the forums, on the forums, quickly everyone was like, you know, screw this, you know. Yeah, yeah. I want to hear about the gaming GPUs, Nvidia's price gouging. Like, you know, of course, Nvidia's always had to, like, we priced the value and like, plus a little bit, right?

Starting point is 00:39:15 Because we're just smart enough to know. You know, I'm guessing Jensen just has the gut feel of how to price things, right? He'll change the price, like, at least on gaming launches, he'll change the price up until. like right before the presentation. Wow. So like it really is like a gut feel thing probably. And anyway, so, so he had that charisma to know what was right. But I think people, a lot of people were like, oh, no, whatever, Jensen's wrong.

Starting point is 00:39:41 He doesn't know what he's talking about. But now like he talks, people are like, oh, very, very, you know, so it might just be that he's been right enough. Yeah, there's a post on X recently that said he had moved up into God mode with a select group of CEOs, but that this was reeth, like, it's exactly... Who's the other gods? It was Zuck.

Starting point is 00:40:02 Pretty other gods. Elon. Elon, Zuck, and Jensen. Nice, nice. Okay. Good crew to be in. So we pray to Silicon Valley. It's sort of the cult now, is it?

Starting point is 00:40:12 Exactly. Just on one last thing on people. You mentioned Colette, his CFO, and, you know, there's sort of a famously loyal crew at NVIDIA, even though all of the OGs could retire at this point. Is there anyone akin to a Gwynne Shotwell at SpaceX

Starting point is 00:40:29 or previously a Tim Cook to Steve Jobs at Apple that is at Nvidia today? I mean, he had two co-founders, right? Like, that's, you know, let's not overlook that. One of them's, like, you know, not involved and hasn't been for a long time. But the other one was involved up until just a, you know, a few years ago, right?

Starting point is 00:40:48 So it's not just Jensen running the show. Totally. Although he was running the show. there's quite a few people on the hardware side I've always there's someone at Nvidia that's like mythical to me like when you talk to the engineering teams he leads a lot of engineering teams

Starting point is 00:41:06 he's a private person so I don't want to say his name actually fair enough but you know he's he's like he's like effectively like chief engineering officers like his role and people within his org will know who he is And I think there are people like that, but, you know, he's intensely loyal. And there's a number of these types of people.

Starting point is 00:41:31 There's another fella who's like, you know, like, there's all these, like, innovative ideas at NVIDIA. And he's the guy who literally is like, we need to get the silicon out now. We're cutting features. And that's like what he's famously known for. And all the technologists in NVIDIA hate him. This is like a second guy. This is a second guy. Also intensely loyal to NVIDIA has been around for a long time.

Starting point is 00:41:51 time, but it's like, you know, it's sort of like when you have such a visionary company and forward, you know, one problem is that you get lost in the sauce, right? You know, oh, I want to make this. It's got to be perfect, amazing. And it's like, you know, you got to have that sort of like, and these people are like, you know, obviously they're close to Jensen for a reason because Jensen also believes like these things, right? Have the visionary future looking, but also like screw it, cut it, we'll put in the next one, ship, right? Like, you know, ship now, ship faster in a space like silicon which is like really hard to do so and and and sort of like the thing about invidia that's always been you know super impressive and it's from the beginning days where

Starting point is 00:42:32 he's talked about this before is their first chip their their first successful chip they're going to run out of money and he had to go get money from other people to even finish the development and even then he just had enough money because he'd already had a failed chip before this was the chip came back and it had to work otherwise it would not you know and so they were like because they could only pay for it's called a mask set right basically you put these like I'll call them stencils into the lithography tool and then it like says where the patterns are and you you know you put the stencil in you deposit stuff you etched off you deposit materials on the way for etched away and you put the stencil in and like you you like tell it where to put stuff right

Starting point is 00:43:12 and then the deposition and etch keeps happening in those spots and you stack dozens of layers on top of each other, and then you make it with chip. These stencils are custom to each chip, right? And they cost today in the orders of tens and tens of billions of dollars. But even back then, it was still a lot of money. It wasn't that much then, of course. They could only pay for one set. But the typical thing with semiconductor manufacturing is,

Starting point is 00:43:41 as good as you can simulate it, as good as you can do all the verification, you'll send a design in and you have to change it. There's going to be something. It's so hard to simulate everything perfectly. And the thing about Nvidia is they tend to just get it, right, the first time. Even great executing companies like AMD or Broadcom or whoever, they often have to ship, you know, they're denoted in like A and then a number or B and then a number. So it's like two different parts of the masks. So like, Nvidia always ships A0.

Starting point is 00:44:12 Almost always. They sometimes ship A1. And a lot of times, even if they'll start production of the, you know, the A is basically the transistor layer, then the numbers, like the wiring that connects all the transistors together. So, Nvidia will start production of the A and ramp it really high and then just hold it right before you transition to the metal, just in case they do need to change the metal layers. And so, like, the moment they're ready and they've confirmed that it works, they can just, you know, blast through a lot of production.

Starting point is 00:44:38 Whereas everyone else is like, oh, let's get the chip back. Oh, okay, A0 doesn't work. We've got to make this tweak, make this tweak, get the chip back. It's called a stepping, right? We were very jealous of Nvidia at that time, right? They consistently delivered in the first one we did not. The data center CPU group, there was one product where, you know, I said A1, A0A1, or you go to B if you have to change the transistor layer as well.

Starting point is 00:45:03 So it's like B. Invia, sorry, Intel got to like E2 once, E2. Like that's like a 15 revision. This is, this is. It was like the peak of A&D's Like when they went skyrocketing on market share versus Intel Was when Intel was at E2 Right like 15 stepping

Starting point is 00:45:21 Because it's quarters of delay right I mean it's catastrophic for a go to market Yeah each each time is a quarter of delay or something right yeah So it's it's absurd So I think that's the other thing about in video is like You know screw it let's ship it Let's let's get the volume ASAP Let's

Starting point is 00:45:37 Let's let's you know let's do these things that you know and so anyways they like you know have some of the best simulation verification etc that lets them sort of go from design uh you know from idea to shipment as fast as possible um you know cutting out any and the necessary features that could delay it making sure they don't have to do revisions so they they can get you know they can respond to the market ASAP there's a story about how Volta which was the first invidia chip with tensor course you know they saw all the AI stuff on the prior generation P100 Pascal, and they decided we should go all in on AI, and they added the

Starting point is 00:46:20 tensor cores to Volta only a handful of months before they sent it to the FAP. Like they said, screw it, you know, let's change it. And it's crazy. And it's like, if they hadn't done that, who would have, maybe someone else would have taken the AI chip market, right? So there's all these times where they just, and it's, those are major changes, but there's often like minor things that you have to tweak, right? Number four maths. or like some architectural detail. Invidia is just so fast. The other crazy thing is they have a software division

Starting point is 00:46:47 that can't keep up with that, right? I mean, if you come out with the chip, right, and basically no stepping required, it's immediately in the market, then being ready with drivers and all the infrastructure on total. That's just super impressive. Yeah.

Starting point is 00:47:00 I love that point because you think of Nvidia benefiting from tailwind after tailwind, but I think both of you're saying, you have to move fast enough and execute well enough and take advantage of those tailwinds. And if you think about, and by the way, I loved your CES story. I'm just envisioning him more than 10 years ago talking about self-driving cars. But, you know, if you think about nailing the video game tailwind, VR, Bitcoin mining, obviously AI now.

Starting point is 00:47:26 You know, one thing that, or one of the things that Jensen talks about today is robotics, AI factories. Maybe my last question on NVIDIA, what do you think about the next 10 to 15 years? I know calling Beyond 5 is hard. but like what does Nvidia's business look like? It's really a question of and this is like I think every time I've talked to

Starting point is 00:47:51 you know some executives at Nvidia have asked this question because I really want to know and they won't answer it obviously but it's like what are you going to do with your balance sheet like you are the most high cash flow company and like you have so much cash flow now the hyperscalers are all taking their cash flow

Starting point is 00:48:12 way down right because they're spending on GPUs what is what are you going to do with all this cash flow right like you know even even before this whole takeoff he wasn't allowed to buy ARR right so so what can he do

Starting point is 00:48:28 with all this capital and all this cash right even this $5 billion investment Intel is there's regulatory scrutiny there right like it's in the announcement like, yeah, this is subject to review it, right? Like, you know, I imagine that it'll get past, but, like, he can't buy anything big.

Starting point is 00:48:45 He's going to have hundreds of billions of dollars of cash on his balance sheet. What do you do? Is it start to build AI infrastructure and data centers? Maybe. But, like, why would you do that if you can just get other people to do it? And just take the cash?

Starting point is 00:49:01 Well, he's investing those, when? Investing peanuts. Right? You know, like, he gave recently, like, a core wave a backstop because today it's really hard to find a large number of GPUs for burst capacity, right?

Starting point is 00:49:16 Like, hey, I want to train a model for three months, right? I have my base capacity where I don't know my experiments, but I want to train a big model three months, done. We know from our portfolio. Yeah, yeah. So, like, Invidia sees this issue. They think it's a real problem with startups. It's why the labs have such an advantage. But what if I could?

Starting point is 00:49:32 You know, right now, like, you know, most companies in the valley spend, what, 75% of their round on GPUs, right? Or at least, yeah. Yeah, we see it. What if you could do 75% in three months on one model run, right? You know? Yeah.

Starting point is 00:49:45 And really scale and have some sort of like competitive product. And then you have the model. Then you raise more capital, right? Or start deploying, right? What do you do with it? Is it start buying a crap load of humanoid robots and deploying them? But like they don't really make good software. They don't make really that amazing software for them in terms of the models, right?

Starting point is 00:50:05 They make, you know, the layer below is great. where they deploy their capitals is like the question he has been investing up and on the supply chain a little bit though right investing in the neo clouds investing in some of the model training companies yeah but again small fries like he could have just done the entire anthropic round if he wanted to of course he didn't right and then like really got them to use GPUs or like he could have done the entire you know opening i round yeah he could have done the entire like any xAI round do these are things he should be doing or what's i mean like yeah good question i don't know right i I think, like...

Starting point is 00:50:37 We'll quote you up for the next round that we're raised. But anyways. He could make venture a dead industry. Take all of the best rounds. But it's a lot of business, yeah. You can do the scenes and then have Jensen mark you up. That's why I can work. No, I don't think...

Starting point is 00:50:54 I don't think... I don't think... I don't think it. I think picking winners is obviously really tough for him because he has customers all across this ecosystem. If he starts picking winners, then, like, his customers will even be even more anxious. to leave and give even more effort to whether it's AMD or, you know, some startup or their

Starting point is 00:51:12 internal efforts, um, et cetera, et cetera, right, uh, buying TPUs, whatever it is. Like, you know, people will, he can't just like invest in these, like, you know, he can do a little bit, right? A few hundred million in an open AI round is fine or a few hundred million the next AI round is fine. Um, core weave, right? Like, yeah, everyone's like throwing a fuss about it. But it's like, he invested a couple hundred million plus, you know, early on, plus, um, you know, rented a cluster from them for internal development purposes instead of renting it from a hyperscaler,

Starting point is 00:51:41 which is cheaper for Nvidia to do, right? It's better for them to do it from them than the hyperscalers. It's like, did he really, like, is he really backstopping core weave that much, right? Or, you know, any of the other customers or Neo-Clouds?

Starting point is 00:51:55 Like, there's some investment, but it's like, it's more like, this is a good cloud, you know, we'll throw like five or 10% of the round, right? It's not he's taking 50% plus of the round. Is he also reshaping his market? I mean, look, a couple of years ago, there were four big purchases of these cards.

Starting point is 00:52:11 You just listed six. To what extent is that... That's him and Nevis and... There's a long list there. Of course, yeah. Is that a strategy? It is. I think it absolutely is. But he didn't have to put much capital down to do this.

Starting point is 00:52:27 Just chip one earlier than the other? I don't know. Yeah, that's... No, but it's like, if you look at the grand amount of capital they spent investing in the Neo-clouds, it's, it's, it's, it's, it's, a few billion, but he has a lot of other levers if he wants to. Right, right. Allocations, as you mentioned. Um, what's nice is, you know, historically, you gave volume discounts to the hyperscalers. Uh, but because he can use the argument of antitrust, he's like, everyone gets the same price. Uh, as. So fair. It's very fair. It's very fair. You know? So what should he do with the, or what to guide his, uh, I mean, I think like, you know, like, there's

Starting point is 00:53:04 the argument he should invest in data centers and only the data center layer not the not the not what goes in the data center so that more people build data centers and then if the market demand continues to grow up data centers in power not the issue right invest in data centers in power um i've said that to them they should invest in data centers in power not in the cloud layer because the cloud layer is is is is quite commoditized but quite um it's it's commoditize or compliment right is the whole phrase and i won't say being a cloud is commoditized but it's certainly like you have a lot of competitors who are decent now um And you've educated the commercial real estate and other infrastructure investment firms into going into AI Infra as well.

Starting point is 00:53:43 So, like, I don't think it's the cloud layer that you invest it, right? Do you invest in data centers and energy? Yeah. Do you invest it? Because that's the bottleneck for you're in growth, really. Is A, how much people want to spend and can spend, and B, the ability to actually put them in data centers. And then, like, robotics. And, like, I think there's, like, areas he could invest in.

Starting point is 00:54:04 Nothing requires $300 billion in capital. So what do you do with the capital? Like, I really, I really don't know. And I, like, feel like Jensen has to have some idea. There's some visionary plan here, because that's what shapes the company, right? I mean, they could sell, they could just continue to, you know, I mentioned $200 billion of free cash flow, $250 billion a year. What do they do they do they do buy back stock forever? Like, do they go Apple route?

Starting point is 00:54:31 And the reason my Apple hasn't done anything interesting and, like, you know, nearly a decade is, you know, they've got, they've got a not visionary at the head. Tim Cook's great, a supply chain. And they're just plowing the money into buybacks. They're not really, you know, automotive, the self-driving car thing failed. We'll see what happens with ARVR. You know, we'll see what happens with wearables, right? But like meta and opening eye might be even better than them.

Starting point is 00:54:55 We'll see, like, in others, right? So what does he invest in? I have no clue, but nothing, what requires so much capital? is the tough question and it actually gets a return because the easy thing is like my cost of equity right I just buybacks and doesn't completely change the company culture

Starting point is 00:55:11 I think that's another thing right there probably areas he could invest it in but you suddenly end up with the company doing two completely different things which are very difficult to keep on it but they do like 10 completely different things right I mean I mean one way to look at is we build AI infrastructure and in the guys of we build

Starting point is 00:55:26 AI infrastructure robots human rights around the world are AI infrastructure or data centers and energy is AI infrastructure, right? Like, you know, like... So the human rights would totally work, right? If you're suddenly pouring concrete

Starting point is 00:55:40 and building power plants, it has completely different culture, completely different stuff for people, and it's very much much harder. Okay, agree. There's different ways to do it, like, invest in the various companies or, like, backstop, like, the building of a power plants, right? Like, you know, because no one will have to build power plants because they're 30-year underwriting things.

Starting point is 00:55:55 You know, there's all these different areas where could use capital to, you know, allow something to happen, right? Not necessarily owning it in something. And look, look and Barry a matter of the biggest problems we had was that our customer base

Starting point is 00:56:10 sucked, right? I mean, we were selling to most of the chips went into the large hypers you know, which they're way too concentrated, and they build their own chips and so you can push down your prices. So honestly, spending it on diversifying the cloud, you know, the company was in

Starting point is 00:56:26 14, you guys should have just charged so much that your margins were 80%. What were the world have done? Nothing. The marks were pretty good back then. That wasn't the problem. That was the primary.

Starting point is 00:56:37 They were 60, 65. They were 80. Still, yeah. Oh, boy. There was Jensen. UDST is picking in here. Well, wait, I think Guido's comment is actually a really good

Starting point is 00:56:52 segue into something else we wanted to talk to you about, which is the hypers and one of the reasons that I love reading semi-analysis is you guys make these out-of-consensus calls that you're often right about and one of them recently was calling...

Starting point is 00:57:08 Only often? You have a Jensen hit rate. It's very high. Where's my billion-dollar, you know, PV-positive bet? But the one that caught my eye was Amazon's AI resurgence. So I wanted to talk to you a little bit

Starting point is 00:57:27 about that, just because, you know, I think we found it pretty interesting being on the ground helping our portfolio companies pick who their partners are. And so we have some microdata on this, but you sort of walk through why they're behind. Yeah. So in Q1, 2020, I wrote an article called Amazon's Cloud Crisis. And it was about all these neoclouds are going to commoditize Amazon. It was about how Amazon's entire infrastructure, was really good for the last era of computing, right? What they do with their elastic fabric, ENA and EFA, right, their NICs,

Starting point is 00:58:05 the whole protocol and everything behind them, what they do for custom CPUs, et cetera, right? It's really good for the last era of scale-out computing and not the era of sort of scale-up AI-infra and how Neoclouds are going to commoditize them and how their silicon teams were focused on, you know, cost optimization, whereas the name of the game today

Starting point is 00:58:26 is max performance per cost, right? But that often means you just drive up performance like crazy. Even if cost doubles, you drive up performance more triples because then the cost per performance falls still. That's sort of the name of the game today with NVIDIA's hardware. And it ended up being really good call. Everyone was calling us out like, no, you're wrong. And this was like when Amazon was like the best stock.

Starting point is 00:58:53 And Microsoft really hadn't started taking off yet. and Nor had like all these other, you know, Oracle and so on and so forth. And since then, Amazon has been the worst performing hyperscaler. And the call here is that, you know, they still have structural issues, right? They still use elastic fabric, although that's getting better, still behind invidia's networking, still behind broadcoms slash Arista, like type networking, NICs. They still use, you know, their internal AI chip is okay. But the main thing is that they're now waking up and being able to actually capture business, right?

Starting point is 00:59:30 So the main call here is that since that report, AWS has been decelerating revenue. You're on your revenue has been falling consistently. And our big call is that it's actually going to start re-accelerating, right? And that's because of anthropic. It's because of all the work we do on data centers, right? Tracking every single data center, when that goes online and what's in there. the flow through on cost, or if you know how much the chips costs,

Starting point is 00:59:55 the networking costs, the power cost. You know how much generally margins are for these things, then you can sort of start estimating revenue. So when we build all that up, it's very clear to us that they trough on AWS revenue growth this point, right? This is the lowest ADWS revenue growth

Starting point is 01:00:13 will be on a year or a basis for at least the next year, right? And it's re-accelerating to north of 20% again because of all these, massive data centers they have online with Traneum and GPUs, right? Depends on which one. It depends on which customer. The experience is not as good as, you know, say, a Corrieve or whatever, but the name of the game is capacity today.

Starting point is 01:00:38 Corrieve can only deploy so much. They only can get so much data center capacity, and they're really fast at building. But the company with the most data center capacity in the world, that and still today, although they may get passed up in the next two years. is Amazon. Actually, they will get passed up based on what we see is Amazon, but incrementally, Amazon still has the most spare data center capacity that is going to ramp into AI revenue over the next year. Let me ask a question. Is that the right type of data center capacity?

Starting point is 01:01:07 Like for the high-density AI buildouts today, you need massively more cooling, you need to have enough water close by, and you have enough power close by. Is it in the right place or is it the wrong type of data? So data center capacity, in this sense, I mean, all the way from power is secured to substations built, to transformers, to you can provide the power whips to the racks. Now, obviously, the data center capacity will

Starting point is 01:01:31 differ, right? You know, historically, actually, Amazon's had the highest density data centers in the world. Right? They went to like 40 kilowatt racks when everyone was still at 12. And if you've ever stepped foot inside of most data centers, they're like pretty cool

Starting point is 01:01:46 and dry-ish. If you step inside of the Amazon data center, they feel like a swamp. It feels like where I grew right? It's like, it's like humid and hot because they're like optimizing every percentage. And so sort of like your point in here is that like Amazon's data centers aren't equipped for the new type of infrastructure. But when you compare them to the cost of the GPU, like getting, getting, you know, having a complex cooling arrangement is fine, right? You know, we made a call on a Sarah Labs a few months ago, a couple months ago,

Starting point is 01:02:18 when they were like at 90 and it's it's gone to 250 the month after because of what their orders Amazon is placing with them but there's certain things with Amazon's infrastructure, I won't get too much into it, but the rack infrastructure requires them using a lot more of like a sterolabs connectivity products and the same applies to cooling, right?

Starting point is 01:02:37 So it's on the networking and cooling side. They just have to use a lot more of this stuff. But again, this stuff is inconsequential on cost compared to the GPU. You can build, my question is more like, look, I may need a major river close by for cooling at this point, right? It's in many areas, I just can't get enough water.

Starting point is 01:02:54 And, you know, it's probably power in the same region. There's two gigawatt scale sites that they have power all secured, wet, wet chillers and dry chillers all secured. Like, everything's fine. It's just not as efficient. But, you know, that's fine, right? Like, you know, they're going to ramp the revenue. They're going to add the revenue.

Starting point is 01:03:14 Not that I necessarily think Amazon's internal models are going to be great. Or, hey, their internal ship is better than NVIDIAs or competitive with TPU. or their hardware architecture is the best. I don't necessarily think that's the case. But they can build a lot of data centers and they can fill them up with stuff that will be rented out, right? And it's a pretty simple, it's a pretty simple,

Starting point is 01:03:38 it's a pretty simple thesis. How important has Anthropic been to the co-design for Tradium? Because I remember we had a portfolio company. This was summer, 2023. They invited them to AWS. they spent, man, I think eight hours with them over the course of a week trying to figure

Starting point is 01:03:56 out training them back then. It was just impossible to work through. Is that, you know, that, obviously that portfolio company hasn't gone back and tried it now, but like how different is it now based on what you're hearing? Oh, it's still bad. Okay. Okay.

Starting point is 01:04:11 You know, it's tough to use. So there's sort of like, this is sort of the argument that every inference company offers, right, including the AI hardware startups is because I'm only running like three different models at most, I can just hand optimize everything

Starting point is 01:04:29 and write kernels for everything and even like go down to like an assembly level right. How can it be? It is pretty hard. It is pretty hard. But like you tend to do this for production inference anyways. Like you aren't using KudianN which is Nvidia's like library

Starting point is 01:04:44 that's like super easy to generate your you know to generate kernels and stuff right like you're not or not generate kernels but anyways you're still you're not using these ease of use libraries you know when you're running inference you're either

Starting point is 01:04:58 you know using cutlass or stamping out your own PtX or you know in some cases people are even going down to the SaaS level right and like when you look at like say an open AI or like you know an anthropic

Starting point is 01:05:11 when they run inference on GPUs they're doing this right and the ecosystem is not that amazing when you Once you get all the way down to that level, it's not like using Nvidia GPUs is easy now. I mean, you have an intuitive understanding of the hardware architecture

Starting point is 01:05:27 because you work on it so much and everyone's worked on it. And you can talk to other people. But at the end of the day, it's not like easy, right? Whereas, you know, anthropic, Traneum or TPUs. Actually, the hardware architecture is a little bit more simple than a GPU. Larger, more simple cores, rather than having all this functionality, you know, less general. So it's a little bit easier to code on.

Starting point is 01:05:50 There's, there's tweets from anthropic people saying they, when they are doing that low level, actually they prefer working on tradium and TPU because of the simplicity. No. Interesting. To be clear, Tradium and TPU, I mean, Tradium especially is very hard to use. Like, not for the faint of heart. It's very difficult. But you can do it if you're just running, like, if I'm anthropic and I must only run Claude 4.1 opus, four, sorry. Sonnet. And screw it, I won't even run Haiku. I'll just run IQ on, like, on GPUs or whatever, right?

Starting point is 01:06:25 I'm just going to run two models. And actually, screw it. I'm just going to run Opus on GPUs too, and true TPUs. Sonnet is the majority of my traffic anyways. I could, I could spend the time. And how often am I changing that architecture every four or six months, right? Like, how much? It's nothing changing that much, honestly. Right? I think, I mean, from three to four definitely did change, right? Yeah, I mean, define architectural change. You know, at a high level, The primitives are more or less the same across the last couple of generations. I don't know enough about

Starting point is 01:06:56 Anthropics model architecture, to be honest. But I think from what I've seen at other places, there have been enough changes that it takes time to program this and really get... The main thing is like, you know, if I'm anthropic and I have, what, 7 billion ARR now or whatever, north of 10...

Starting point is 01:07:13 By the end of next year, north of 20, right? Like, ARR is like, maybe even 30 is like, that's, that's, and my margins are 50%, 70%, that's $15 billion or tranium that I need, right? Then I can run on Sonnet. And most of that's going to be Sonnet 3, 5, or, sorry, 4, 5, whatever it is, right? It's going to be one model serving most of the use cases. So, like, you know, I could spend the time and it'll work on this hardware.

Starting point is 01:07:42 Yeah, totally. Maybe on the topic of non-consensus calls you've made, and maybe I'll move to another cloud. In June, you guys said that Oracle is winning the AI compute market. And then in this pod, we've already referenced the big jump, obviously, that Oracle had. I think it was the single largest gain that a company with over $500 billion of market cap has ever had. So, an enormous... Was the 2023 Q1 Nvidia not bigger?

Starting point is 01:08:10 It might have been smaller. Okay. I think it was maybe close. We'll fact check ourselves. That's amazing. But, you know, obviously, this is the massive... commitment that was announced, can you walk us through why you made that call then and just sort of why Oracle is poised to do so well in such a competitive space?

Starting point is 01:08:29 Yeah, so Oracle, they're the largest balance sheet in the industry that is not dogmatic to any type of hardware, right? They're not dogmatic to any type of networking. They will deploy Ethernet with Arista. They'll deploy Ethernet through. their own white boxes, they'll deploy Nvidia networking, Infinite Band, or SpectrumX, and they have really good network engineers. They have really great software across the board, right again, like ClusterMax, they were ClusterMax gold

Starting point is 01:09:01 because their software is great. There's a couple of things that they needed to add that would take them higher, and they're adding those, right? To Platinum, right, which was where Corby was. And so, like, when you couple of two things, right? Like, opening eyes got insane compute demand. Microsoft is quite pansy. They're not willing to invest in. They don't believe Open AI can actually pay the amount of money, right?

Starting point is 01:09:27 I mentioned earlier, right? The $300 billion deal, opening I, you don't have $300 billion. And Oracle is willing to take the bet. Now, of course, the bet is a bit like, there's a bit more security in the bet in that. Oracle really only needs to secure the data center capacity, right? So this is sort of like how we came across the bet, right? And we've been telling our institutional clients, especially in a super detailed way,

Starting point is 01:09:50 whether it be the hypers or AI labs or semi-electric companies or investors in our data center model because we're tracking every single data center in the world. Oracle doesn't build their own data centers either, by the way. They get them from other companies. They co-engineer, but they don't physically build them themselves. And so they're quite nimble in terms of being able to assess

Starting point is 01:10:10 new data centers, engineer them. So we saw all these different data centers, Oracle is snatching up in deep discussion, snatching up, signing, et cetera. And so we have, you know, hey, gigawatt here, gigawatt there, gigaw out there, right? Avalene, you know, two gigawats, right? You have all these different sites

Starting point is 01:10:24 that they're signing up and discussions with, and we're noting them. And then we have the timeline because we're tracking into our supply chain, we're tracking all the permits, regulatory filings, you know, through language models, using satellite photos constantly,

Starting point is 01:10:39 and then supply chain of like chillers, transformer equipment, generators, et cetera. we're able to make a pretty strong estimate of quarter by quarter in our data center or quarter by quarter how much power there is for each of these sites, right? So some of these sites that we know of aren't even ramping until 2027, but we know that Oracle signed it, right? And we have the sort of ramp path. So then it's this question of like, okay, let's say you have a megawatt, right? For a simple sake, simplicity sake, which is a ton of power, but now it doesn't feel like much.

Starting point is 01:11:11 It's, you know, we're in the gigawatt era. But, you know, if you're talking about a megawatt, right, you fill it up with GPUs. How much do the GPUs for a megawatt cost, right? Or actually, it's even simpler to do the math, right? If I'm talking about a GV-200, right, each individual GPU is 1,200 watts. But when you talk about the CPU, the whole system, it's roughly 2,000 watts. At the same time, you know, all in everything, simplicity's sake, $50,000 per GPU, right? The GPU doesn't cost them.

Starting point is 01:11:42 There's all the peripheries, right? So $50,000, cap-x for 2,000 watts. So $25,000 for 1,000 watts. And then what's the rental price for GPU? If you're on a really long-term deal, volume 270, right, 260 in that range, then you end up with, oh, it costs like $12 million per megawatt to rent a megawatt. And then each chip is different. So we track each chip, what the cap-x is, what the.

Starting point is 01:12:12 networking is. So you know what each chip is. You can predict what each, you know, what chips they're putting in which data centers, when those data centers go online, how many megawatts by quarter. And then you end up with, oh, well, Stargate goes online in this time period. They're going to start renting it this time. It's this many chips. Each Stargate site, right? And so therefore, this is how much opening I would have to spend to rent it. And then you, you prick that out. And we were able to predict Oracle's revenue with pretty high certainty. and we matched pretty dead on what they announced for 25, 26, 27,

Starting point is 01:12:42 and we were pretty close on 28. The surprise for us was that, you know, they announced some stuff that 28, 29, data centers that they, we haven't found yet, but we'll find them, right, of course. And sort of like, this methodology lets you see sort of, hey, what data centers are you getting, how much power, what are they signing,

Starting point is 01:13:03 how much incremental revenue that is, when that comes online. And so that's sort of the basis of our Oracle bet. Obviously in the newsletter we included a lot less detail, but, you know, sort of, it was that thesis, right, that like, hey, they have all this capacity. They're going to sign these deals. In our newsletter, we talked about two main things. We talked about the opening eye business, and then we talked about the bite dance business. And presumably tomorrow, you know, on Friday, there's going to be announcing it about TikTok and all this.

Starting point is 01:13:33 but like the BightDance business, you know, huge amounts of data center capacity that Oracle is also going to lease out to BightDance, right? And so we did the same methodology there. You know, with BightDance, it's pretty certain they'll pay because they're a profitable company. With Open AI, it's not. And so there's got to be some like error bars

Starting point is 01:13:50 as you go further out in terms of like, will opening I exist in 28, 29, 30, and will they be able to pay the $80 plus billion a year that they've signed up to Oracle with? Right. That's the only like risk here. And if that happens, then Oracle's downside is also somewhat protected because they only sign the data center, which is a minority of the cost. The GPs are everything.

Starting point is 01:14:11 And the GPs, they purchase one to two quarters before they start renting them. So they're not, you know, the downside risk is pretty low for them in terms of if they don't get the deal. Well, they don't get the revenue, but it's not like they're stuck with a bunch of assets they bought that are worthless. Yeah. Is that another angle here? I mean, opening air in Microsofts of BFFs and now they're filed to voice papers

Starting point is 01:14:30 and they just want to diversify. and then that's pushing them away towards other providers. Yeah, so Microsoft was exclusive compute provider. It got reorg to write a first refusal. You know, and then Microsoft... Was it no last choice or something like that? No, it's still right of first refusal, but it's like Microsoft... Those two are not mutually exclusive.

Starting point is 01:14:52 Well, if opening eyes is like, we're going to sign a $80 billion contract or a $300 billion contract for the next five years, you guys want it? Or, you know, it's like... Yeah, yeah. And they're like, no. what? Okay, cool. Right?

Starting point is 01:15:04 And then they go to Oracle, right? And it's opening I is like sort of like, this is the, you know, opening I need someone with a balance sheet to actually be able to pay for it, right? And then they'll make tons of money, you know, off of opening I on the margins on the compute and the infra and all these things. But someone's got to have a balance sheet. And Open AI doesn't have a balance sheet.

Starting point is 01:15:26 Oracle does. Although given the scale of what they say, side. We also had another source of information, which was that they were talking to debt markets, right? Because Oracle actually just needs to raise debt to pay for this many GPUs over time.

Starting point is 01:15:43 Now, they won't do it immediately. They can pay for everything this year and next year from their own cash. But like in 27, 28, 28, 29, they'll start to have to use debt to pay for these GPUs, which is what Corby was done. In many of the Neal Clouds, most of its debt financed. Even meta went and got debt for

Starting point is 01:15:58 their Louisiana Mega Data Center. Not because just because it's cheaper than it's literally better on a financial basis to do buybacks with your cash and get debt because the debt is cheaper than the return on your stock. It's like a financial engineering thing, but like you know, who's out there, right?

Starting point is 01:16:16 It could be Amazon, it could be Google, it could be Microsoft. It was a very short list. Or it could be Oracle or meta, right? Meta's obviously not. Microsoft's chickened out. Amazon, Google, and Oracle, right? That's all that's left. Google would be an awkward fit.

Starting point is 01:16:32 Yeah, Google would be an awkward fit. Amazon would be a fine fit, but, you know, exactly, right? It's like, it's a very drop-endropic, yeah. Well, I guess maybe, you know, on the topic of these giant data center buildouts, you guys just released a piece on X-AI and Colossus 2. Do you, are you getting less impressed by these feats of building something this massive in six months? Or is it still very impressive to you guys? You know, this is the thing I've said about AI researchers

Starting point is 01:17:04 is that they're like the first class of humans to think about things on an order of magnitude scale whereas like people have always thought about things in terms of like percentage growth like ever since industrialization and before that it was just like absolute numbers right? You know, sort of like humanity is involving

Starting point is 01:17:21 in terms of how we think because things are changing faster. Everything is a knock scale. And so like you know, it was like really impressive when GPD 2 was trained on so many chips and then GPD 3 was trained on that like on 20KA 100s

Starting point is 01:17:38 and you know or sorry GPD 4 20KA 100s GPD you know sort of like it's like holy crap and then it was like oh the era of 100K GPUs clusters right and we did some reports around 100K GPU clusters but now there's like there's like 10 100K GPU clusters in the world

Starting point is 01:17:55 I was like okay this kind of boring but it's like 100K GPUs is like you know, over 100 megawatts. Now it's like, you know, like literally, you know, we, in our Slack and in some of these channels, like, oh, we found another 200 megawatt data center. There's, there's someone who like puts the yawning emoji. Every time.

Starting point is 01:18:16 And I'm like, dude, what? Like, now it's only, it's only exciting if you do gigawatt scale days. Like, we're in gigawatt era. Yeah. Yeah, yeah. And I'm sure, like, you know, and, you know, I'm not sure. Maybe, maybe we'll start yawning to that too. But, like, you know, the log scale of this is like, the capital numbers are crazy, right?

Starting point is 01:18:34 Like, you know, it's like, it's crazy enough that opening I did like $100 billion dollar trading run, you know, or, you know, like, then they did a billion dollar training run. Now we're talking about $10 billion training runs, right? Like, you know, it's, it's crazy that we think in log scale. But yes, things are only impressive. Yeah. What they do, like, what Elon's doing, so what Elon's doing in, in Tennessee, in Memphis, first

Starting point is 01:18:57 time was crazy, right? 100K GPUs in six months. He bought a factory in like February of 24 and had models training within six months, right? And he did liquid cooling, you know, first large scale data center with liquid, at this scale

Starting point is 01:19:14 for AI doing liquid cooling. Like all these sorts of crazy first. Putting generators outside like cat turbines, all these things for different things to get the power mobile substations, all these different crazy things, tapping the natural gas line that's like running alongside the factory. All these. So he does

Starting point is 01:19:32 this as like, holy crap. And he did it for 100K GPs. Right. You know, 200, 300 megawatts, right? Now he's doing it for a gigawatt scale and he's doing it just as fast. Right? And so like, you would think like

Starting point is 01:19:47 this is obviously way more impressive that he did it again. Yeah. But like, like they have desensitizes, but like it's like you know, like you've given the child too much candy, right? And now, like, the child has no, you know, is like, you know, doesn't like apples, right? Like, I don't know. So, so, so like, yeah, a gigawatt data center. There was all these protests around his Memphis facility. People like, oh, you're destroying the air. And it's

Starting point is 01:20:16 like, have you booked around that area of Memphis? Like, there is like a gigawatt gas turbine plant that's just powering generally that area. There's a sewage plant that's servicing the entire city of Minnesota, or sorry, city of the Memphis. And there's like open air pits of like the like there's open air mining. Like there's all sorts of disgusting shit around there, which is needed.

Starting point is 01:20:39 Right, we need that stuff to have a country run right, like to be clear. And you know, it's like people are complaining about like a couple hundred megawatts in air. Yeah. Of a generation. So he got like protests from all sorts of people. You know, you got super into the politics side

Starting point is 01:20:54 of things. And AACP even protest. tested him like. And so like he really got like some local municipalities to be like, oh, I don't like, you know, like this. And so he couldn't do as much as he wanted to in Memphis. But he still needed the data center to be close because he wanted to connect these data centers super high bandwidth, super close. And he always already had a lot of infrastructure set up there. So he bought another distribution center at this time. And it's still in Memphis. But the cool thing about Memphis is it's right across the border from Mississippi. Right. So now, you know, it's like 10 miles away from his original one, but his facility is like a mile away from Mississippi and he bought a power plant in Mississippi and he's putting turbines there. The regulation is completely different, right? And if the

Starting point is 01:21:38 question is really like galvanize resources and build it really fast, maybe Elon is ahead of everyone. You know, he hasn't made the best model yet or he doesn't have the best model at least today, I think. You know, you could argue Grockford was the best for a little period of time.

Starting point is 01:21:55 But like, you know, it's it's truly amazing how fast he's able to build these things and for first principles it's like most people are like fuck like you know we can't we can't build the power we can't do power here anymore because we have to find a new site and it's like

Starting point is 01:22:11 no no just go across the border go to Mississippi and my favorite thing is like Arkansas's right there so Mississippi gets mad you know I don't know the regulation the all future data centers you know built in places where multiple states meat. Is that the...

Starting point is 01:22:28 Four quarters, yeah. The optimal regular... I think there's one... There are you guys. Is there a point in the U.S. with five? I know there's a point with four. Four states intersect. There, yeah.

Starting point is 01:22:38 Maybe that's recorded a data center. Kind of certain. I'm going to buy real estate in that area of front Reddit. Well, I guess on the topic of just maybe new hardware, you had this piece analyzing TCO for GB200s.

Starting point is 01:22:54 And I'm kind of going to ask this question on behalf of our portfolio companies, which it sounds like you're helping them already. But one of the findings that I thought was really interesting was TCO was sort of 1.6x H-100s for GB200s. And so obviously, you know, there's this point on, okay, that's sort of the benchmark for the performance boosts that you're going to need to at least make the sort of performance cost ratio benefit from switching over. Maybe just talk about what you've seen from a performance. standpoint, and what do you recommend to portfolio companies, maybe in a smaller scale than XAI, who are thinking about new hardware, try to get it? There's capacity constraints, obviously.

Starting point is 01:23:36 Yeah, I mean, that's a challenge, right? With each generation of GPU, it gets so much faster that you end up like you want the new one. And in some metrics, you could say GB200 is three times faster than, or two times faster than the prior generation. Other metrics, so you can say it's way more than that, right? So if you're doing pre-training versus inference, right? They can run everything for a bit, right? Yeah, if you can run it for a bit or just inference and take advantage of the huge NVLink, NVL-72, you know,

Starting point is 01:24:11 there's ways you can squit and say GB200 is only 2x faster than H-100, in which case, 1.6x TCL. It's, you know, it's worthwhile, right? It's worth going to the next gen. More marginal. It's more marginal. It's more marginal. It's not a big deal. Then there's other cases where it's like, well, if you're running deep-seek inference,

Starting point is 01:24:31 the performance difference per GPU is north of like 6-7x, and it continues to optimize for deep-seek inference. And so the question, you know, then it's like, well, I'm only paying 60% more for 6x. And it's like it's a 4x or 3x performance per dollar gain. Like absolutely, right? And if you're like running inference of deep-seek, that can also include RL, right? and so the question is sort of and then the other question is like well the GPU is new

Starting point is 01:24:59 you know there's also B200 there's GV 200 there's B200 B200 is much more simple from a hardware perspective it's just eight GPs in a box so then it's not as much of a performance gain especially in inference but you have you have all the stability

Starting point is 01:25:12 right it's an 8 GPU box it's not going to be unreliable the GV 200s are still having some reliability challenges those are being worked through it's getting better and better by the day but it's still a challenge But, you know, when you have a G.B.2, when you have a H100, right, box, or H200, 8 GPUs, one of them fails. You take the entire server offline, you have to fix it, right? So usually your, if your cloud's good, they'll swap it in, right? But if it's GV200, what do you, what do you now do with 72 GPUs? If one fails, you break the whole thing and you get a new 72, the blast radius of a failure, right? No, GPU failure rates at best are the same. and likely worse, right,

Starting point is 01:25:55 gen on gen, because everything's getting hotter, faster, et cetera. So at best, the failure rates are the same. Even if you model the failure rates as the exact same, because you go from one out of eight to one out of 72, it's a huge problem. So now what a lot of people are doing is, they run a high priority workload on 64 of them, and then the other eight,

Starting point is 01:26:12 you run low priority workloads. Which is then like, okay, this is this whole, like, infrastructure challenge, like, I have to have high priority workloads, have low priority workloads. When a high priority workload has a failure, instead of taking the whole rack off line, you just take some of the GPUs from the low priority one, put it in the high priority one,

Starting point is 01:26:26 then you just let the dead GPU sit there until you service the rack at a later date. And it's like, there's all these complicated infrastructure things that make it so, oh wait, actually that that 3x or 2x performance increase in pre-training

Starting point is 01:26:42 is lower because the downtime is higher. Slash, I'm not using all the GPUs always, slash, I'm not able to you know, I'm not smart enough or I don't have the infra to like have low priority and high priority workloads. Like, it's not impossible. The labs are doing it, right? Like, it's just, I mean, if I'm running a cloud, it's actually really hard, right? Because I probably have to rent the spot one, like the spares out of spot instance or something. No, no, no, no, no, because then,

Starting point is 01:27:05 because it's a coherent domain. It's NVLink. You don't want anyone touching that. So, it has to be the end customer. I don't have to leave them as an empty spares. That's even worse. The end customer usually would just be like, I want them and I will, you know, and the SLAs and the pricing, everything is like accounting for that, right? So, like, generally when you have a cloud, you have an SLA, right? That is, hey, it's going to be uptime is going to be 99%, you know, blah, blah, blah, right? For this period. With GB200, it's 99% for 64 GPUs, not 72. And then it's like 95% for 70%. Now it differs across every cloud. Every cloud is a different SLA. But like, they've adjusted for this because they're like, look, this hardware is just finicky.

Starting point is 01:27:45 Do you still want it? You know, we will credit you in that 64 of them will always work, right? Not not 72 and so like there's this whole like finicky nature and the end customer has to be capable of dealing with the unreliability and it's like and the end customer can just continue to use b200 right performance games not as much the whole reason you want this 72 domain is so you can have you know some of these gains right um but you have to be smart enough to be able to do it and and that's challenging for small companies totally so the And Vier does announce the Ruben pre-fell cards, like C-TX, C-T-X, there we go. What's your take on that?

Starting point is 01:28:23 Does it cannibalize? Dude, by the way, I don't know if this is like brain rot or like, I don't know. But like, I can't remember what I had for lunch yesterday. But I know the model number of every fucking shit, like. Hots you in your dreams. We're broken, we're broken. Living the dream. No, no, no, no, no.

Starting point is 01:28:43 You know. Why do you pre-announce a product that's? 5X faster for certain use cases? Is that that much? I think it's like historically AI chips for AI chips, right? And then we started getting a lot of people saying this is a training chip, this is an inference chip. Actually, training and inference are switching so fast

Starting point is 01:29:04 and what they require that like now it's still like one chip. Actually, there are still workload level dynamics that like differ, but the main workload is inference even in training, right? Because of RL, most of that is, you know, generating stuff in an environment and trying to, you know, achieve a reward, right? So it's inference still, right? Training is now becoming mostly dominated by inference as well. But inference has, like, two main operations, right? There is calculating the KV cash for pre-fill, right? Here's all these documents. Do the attention between all of them, right? Between all the tokens, however, you know, whatever type of attention you use. And then there's decode, which is

Starting point is 01:29:44 auto-aggressively generate each token. These are very, very different workloads. And so initially, the ideas or infrastructure techniques, the ML systems techniques were, oh, okay, I will just make the batch size every single forward pass this big. And

Starting point is 01:30:02 if I make it, let's call it, I'll make it a thousand big. And maybe I'll run 32 users concurrently. That way, you know, now I still have 900-something left, 960, left, right? That 960 is actually doing the pre-fell for if a request comes in

Starting point is 01:30:18 it chunks it. It's called trunk pre-fell. You pre-fell chunks of it now. You get really good utilization on GPUs. But then that ends up impacting the decode workers, right? The people were auto-aggressively generating each token that being having slower KPS. And tokens per second is

Starting point is 01:30:35 really important for user experience and all these other things, right? So then the idea is like, okay, these two workloads are so different and they are literally different, right? You pre-fill and then you decode. It's not like you're interleaving them. So why don't we split them entirely? And this is done on the same type of chip, right? Open AI, Anthropic, Google. Pretty much everybody does that. Everyone, everyone good. Everyone build it together, fireworks. All these guys do pre-fill decode, disaggregated pre-filled decode. So they run pre-fill on a set of

Starting point is 01:31:03 GPS. Why is this beneficial? Because you can auto-scale them. Right? You can, hey, all of a sudden, I have a lot more long context workers. I allocate more. resources to pre-fell. Oh, all of a sudden, not all of a sudden, but like, you know, over time, my traffic mix is not long input, short output. It's short input, long output. I have more decode workers. This way I can guarantee, and so now I can auto-scale the resources differently, and I can also guarantee that my pre-fill time is, you know, the by the time, you know, what's really important in search is how fast you get the page to start loading. Not when does the resource happen. What do people do in games? Like the loading screen often has some sort of interactive

Starting point is 01:31:42 environment or it blends in over time or whatever it has tips and tricks ways to distract you the same thing is it's there's like studies and papers out there that users prefer a faster time to first token right first token gets streamed to me in like sooner even if the total time to get all my tokens is a little bit longer i can't read that fast anyways right so i mean i mean i like just give yeah i like just good yeah i mean most models we turn about speed reading speed but you need that right i think i think but like you know the the the The idea is that you want to guarantee time to first token is a certain level for user experience reasons. Otherwise, people are like, screw this, not using AI.

Starting point is 01:32:19 The decode speed matters a lot, too, but not as much as time to first token. And so by having separate pre-filled decode, you do this, right? But now you've already, and this is all in the same infrastructure, you've already done this. So now it's like, what's the next logical step? These workloads are so different. Decode, you have to load all the parameters in and the KV caches to generate a single token. you batch a couple users together, but very quickly you run out of memory capacity

Starting point is 01:32:45 or memory bandwidth because everyone's KV cache is different. The attention of all the tokens, right? Whereas on pre-fill, I could even just serve like one or two users at a time, because if they send me a 64,000 context request, that is a lot of flops, right?

Starting point is 01:33:01 64,000 context requests. I'll use Lama 70B because it's simple to do math on, like 70 billion parameters. That's 140 gigaflops per token. and 70 times 64,000, that's many, many terraflops. You can use the entire GPU for like a second, right? Like potentially, right?

Starting point is 01:33:19 Depending on the GPU to just do the pre-fill, right? And that's just one forward pass. So I don't necessarily care about, you know, loading all the tokens or all the parameters in KV cache and fast. All I care about is all the flops. And so that leads us to sort of like, you know, I had to, I think it was long-go-in-date explanation because it's hard for people to understand what CPX is.

Starting point is 01:33:39 I've had a lot of like, even my own clients like, we set like multiple notes like explaining and they're like, I still don't understand. I'm like, shit, okay. Send it to the attention is all you need paper. You can't expect. I mean, like, think about like a, like a networking person. Like they're like, no, I don't need to know about this.

Starting point is 01:33:55 You know, attention is all you need, right? Like it's like, we're thinking about an investor, right? Like, you know, there's all people. Maybe the data center operator. Like, they're like, oh, there's two chips. Why? Should I build my data center for differently? It's like, like, you know, I got to explain everything or just like, no.

Starting point is 01:34:09 You don't have to build differently. But anyways, you get to now... In Stanford, there's 25% of all students, not CS students, of all students, read their paper. Read what paper, attention is what you need? That's low. They do gym majors and you know, like the philosophy guy. I'm like this amazing.

Starting point is 01:34:25 Anyway, sorry. The Middle East, I can't remember what country it is, has AI education starting at, like, age eight, and in high school, they have to read attention is all you need. Wow. Someone told me that they're saying they had to read attention is all you need. which is I don't know

Starting point is 01:34:40 like top down mandates for education you know maybe they work maybe they don't like you know maybe people like homeschooling

Starting point is 01:34:46 or kids I don't know I went to public school but like back to your readers yeah just on the topic of hardware cycles

Starting point is 01:34:53 I wanted to maybe yeah I didn't actually explain what CPX is so CPX is a very like compute optimized chip whereas you know

Starting point is 01:35:02 for pre-fill and then decode is just to just to simply say it is like the rest is the normal chips with HBM. HBM is more than half the cost of the GPU. If you strip that out, you end up having a

Starting point is 01:35:13 much cheaper chip passed on to the customer. So, or like, you know, if Nvidia takes the same margin, then the cost of this pre-fill chip is much, much lower. And now the whole process is way cheaper and more efficient. Now long context can be adopted. Right. Yeah. Well, so I, I love that we're actually going with all this detail, because I had a more 10,000 foot view question for you, which is, I haven't been following the semi-market as closely as you have. I probably started with the A-100. And I remember helping Gnome at Character,

Starting point is 01:35:46 this is summer of June 23, chased down GPUs. And the only thing that mattered at that time was delivery date because there was a huge capacity crunch. And then to see that over the last two years evolve where, let's say, six to 12 months ago, people were doing these RFPs to 20 neoclouds, right? And the only thing that mattered to some degree was price.

Starting point is 01:36:10 Right, people actually do RFPs for GPUs. Yes. So just to be clear, my opinion on how you buy GPUs is that it's like buying cocaine or any other drug. This is described to me, not me. I don't buy cocaine. Okay, yeah, yeah. Great. Someone tells me this.

Starting point is 01:36:26 I'm like, holy shit, it's right. You call up a couple people. You text a couple people. You ask, you know, how much you got. What's the price? It's like. Exactly. Exactly. This is fucking, like, buying drugs.

Starting point is 01:36:38 Sorry, sorry. No, I mean, very accurate. It's the same way. You just send, like, we have Slack connects with, like, 30 neoclouds. There you go. As well of, like, some of the major ones. And we just send them a message, like, hey, customer wants this much. You know, this is what they're looking for. And then they send quotes. I know this guy. I know a guy. Well, so I think that's actually a very accurate description.

Starting point is 01:36:59 And I've sent countless port codes your ClusterMax original post, because I thought it a really good job breaking them down. But maybe one question to end on for me is just, what era are we in now with Blackwells coming online? Are we sort of back to the summer 2023 era? And that's kind of the cycle that we've just entered? Or what sort of your view on where we are? So for a very good question. For one of your Fort Co's, we were like, you know, after their difficulties with Amazon, we tried to, we were like, okay, let's actually like it's Gigi fuse. The original deals we got you were gone, but like here's some other deals, right?

Starting point is 01:37:36 It turned out that multiple major neoclouds had sold out of Hopper capacity. And their blackwell capacity comes online in a few months. So it's a bit of a challenge, right? In that... Due to inference? Inference demand has been skyrocketing this year,

Starting point is 01:37:52 right? Reasonable models, no. These reasoning models are revenue. It's been skyrocketing this year. And then also there's a bit of like the, you know, Blackwell comes online but it's hard to deploy so it takes a little, you know, there's a learning curve to deploying it. So whereas like you got down

Starting point is 01:38:08 to like you buy the hopper, you install the data center, it's running within like, you know, a month or two, right? For Blackwell, it was like, it's a longer time frame because of liability challenges, it's a new GPU. I mean, it's just learning pain, right? Learning growing pains. So there was like this gap of like how many GPs are coming onto the market right as

Starting point is 01:38:26 revenue starting to inflect. And so a lot of capacity got sucked up, right? And actually prices for Hopper bottomed like three or four months ago or like five or six months ago. Yeah. And actually they've like crept up a little bit now. They're still like, you know, not, not. So, um, I do, I don't think we're quite 2023, 2024 era of, uh, GPUs are tight. But certainly if you want to, if you want like just a few GPUs, it's easy. But if you want a lot, it's, it's hard. Yeah. Like you,

Starting point is 01:38:54 you can't get capacity that instantly. Yeah. Wow. What a time. So we, uh, so we wrap on that. Dylan, this was another instant classic. Thank you so much for coming in the podcast. It's like two hours, bro. Oh, no. I didn't know. I couldn't stop. Thanks so much. It was great. Thank you so much for having me. Thanks for listening to the A16Z

Starting point is 01:39:16 podcast. If you enjoyed the episode, let us know by leaving a review at rate thispodcast.com slash A16Z. We've got more great conversations coming your way. See you next time. As a reminder, the content here is for informational purposes only. Should not be taken as legal

Starting point is 01:39:31 tax or investment advice or be used to evaluate any investment or security and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see A16Z.com forward slash disclosures.

a16z Podcast - Dylan Patel on the AI Chip Race - NVIDIA, Intel & the US Government vs. China

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.