The a16z Show - Dylan Patel on the AI Chip Race - NVIDIA, Intel & the US Government vs. China
Episode Date: September 22, 2025Nvidia’s $5 billion investment in Intel is one of the biggest surprises in semiconductors in years. Two longtime rivals are now teaming up, and the ripple effects could reshape AI, cloud, and the gl...obal chip race.To make sense of it all, Erik Torenberg is joined by Dylan Patel, chief analyst at SemiAnalysis, joins Sarah Wang, general partner at a16z, and Guido Appenzeller, a16z partner and former CTO of Intel’s Data Center and AI business unit. Together, they dig into what the deal means for Nvidia, Intel, AMD, ARM, and Huawei; the state of US-China tech bans; Nvidia’s moat and Jensen Huang’s leadership; and the future of GPUs, mega data centers, and AI infrastructure. Resources: Find Dylan on X: https://x.com/dylan522pFind Sarah on X: https://x.com/sarahdingwangFind Guido on X: https://x.com/appenzLearn more about SemiAnalysis: https://semianalysis.com/dylan-patel/ Stay Updated: If you enjoyed this episode, be sure to like, subscribe, and share with your friends!Find a16z on X: https://x.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zListen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYXListen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711Follow our host: https://x.com/eriktorenbergPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Stay Updated:Find a16z on YouTube: YouTubeFind a16z on XFind a16z on LinkedInListen to the a16z Show on SpotifyListen to the a16z Show on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Transcript
Discussion (0)
How you buy GPUs is like buying cocaine.
You call up a couple people, you text a couple of people, you ask, you know, how much you got?
What's the price?
If you're two arch-nemesis suddenly team up, and it's the worst possible news you can have.
I did not see this coming.
I think it says amazing development.
Like a Warren Buffett coming into a stock.
Jensen is like the Buffett effect for the semiconductor world.
It's kind of poetic that everything's gone full circle and Intel's sort of crawling to Nvidia.
Today, we're talking about one of the biggest surprise.
in semiconductors in years.
NVIDIA just put $5 billion into Intel.
Two long-term rivals now teaming up on custom data centers and PC products,
a deal nobody saw coming.
For NVIDIA, it's the Buffett effect.
For Intel, it's a lifeline.
And for AMD, ARM, and the global chip race,
the fallout could be massive.
To break it all down, I'm joined by Dylan Patel,
chief analyst at semi-analysis,
Sarah Wang, general partner at A16Z,
and Guido Appenzeller,
partner at A16Z
and former CTO of Intel's
data center and AI business unit.
Let's get into it.
Dylan, welcome back to the podcast.
Thanks for having me, yeah.
It just so happens that there's some big news
just as we're having you,
Nvidia announcing $5 billion investment in Intel
and them teaming up to jointly develop
custom data centers and busy products.
What do you think about the collaboration?
I think it's hilarious that like
Nvidia could invest,
it gets announced,
and their investment's already up 30%.
$5 billion investment,
$2 billion profit already, right?
I think it's fun because they need their customers
to really have big buy-in.
So when their potential customers
buy-in and commit to certain types of products,
it makes a lot of sense, right?
And it's kind of funny in a way
because in the past,
there was this whole thing around
how Intel was sued
for being anti-competitive
with their chip.
chipsets and Nvidia actually got like a settlement from
Intel right way back when like the graphics were
separate from the GPU and the graphics were really put on the
chips set which had like all this other I.O like USB and all this stuff.
So it's kind of a funny like turn of events that now Intel is going to make
like a chiplet and package it alongside a chiplet from
from Nvidia and then that's like a PC product.
Right. So you know it's kind of poised.
that everything's gone full circle and Intel's sort of crawling to
Nvidia but actually it might just be the best like device right I don't want an
arm laptop because it can't do a lot of things and so an x86 laptop with
invidia graphics fully integrated would be probably the best product in the
market so you're optimistic how do you think this will go I mean sure I mean I
hope I'm a perpetual optimist on Intel because I have to be
I was thinking that the structure of the deal that at least like a lot of the government folks and Intel were sort of trying to go for was people get, you know, big customers and the biggest suppliers directly give capital to Intel.
But this is sort of the other way around where they're buying some of the stock, having some ownership, but they're not really like diluting the other shareholders.
And then the other shareholders will get diluted, slash everyone will get diluted when Intel finally does raise the capital from the capital markets.
but because they've announced these deals,
and they're pretty small, right?
5 billion in Vedia, 2 billion soft bank.
U.S. government was 10.
You know, these are still relatively small.
Pretty small, yeah.
Yeah, on the nature of things, right?
I mean, like, you know, last time I think I said
Intel needs like $50 billion, right?
Now when they go to the capital markets, it's better.
And hopefully they get another, you know,
a couple of these announcements.
Maybe, you know, there's all sorts of speculation
that Trump is involved in, you know,
sort of getting these companies to invest.
NVIDIA,
and now the government as well, of course,
and now is Apple going to come invest, right?
And also do something with Intel
or who else will come in,
and that'll really boost investor confidence
and they can dilute slash go get debt.
Like a Warren Buffett coming into a stock.
The Jensen is like the Buffett effect
for the semiconductor world.
Guido, you were the CTO
of the Intel Data Center and AIBU,
So what are your thoughts?
I think it's really good for customers and consumers in the short term.
Having both Intel and like, specifically the laptop market, right?
Having to collaborate is amazing.
I wonder what's going to happen with any of the internal graphics or AI products at Intel.
They might just push a reset and give up on that for now.
They currently don't have anything competitive, right?
There was the Gaudi effort that's more or less done.
There was the internal graphics chips, which never competed really at the high end.
So from that perspective, it makes a lot of.
of sense, right? It's for both sides. Look, I think the, for Intel, they needed a breath of fresh air,
right? They were sort of desperate. So I think it's a very good thing. I think AMD is fucked.
You're just, if you're two arch nemesis suddenly team up, and it's the worst possible news you can
have, right? They were already struggling, right? Their cards are good. Their software stack is not,
right? They were getting very limited traction, right? They now have a bigger problem outside. I think
arm is a little bit screwed as well.
well, right? Because their biggest selling point was sort of like, look, we can partner
with everybody that doesn't want to partner with Intel. And that's what they're, in the sense,
their number one, you know, like, Nvidia is probably the most dangerous of the future CPU competitors,
right? And so they now suddenly have access to Intel technologies and might get in that direction.
It remixes the card, right? It's, I did not see this coming. I think it says an amazing development.
Yeah, it'll be very interesting to see this play out.
To Eric's point, PAC News week, the other thing that we wanted to
pick your brain on since we have you here, Dylan, is the other news dropping on Huawei unveiling their
kind of AI roadmap. And, you know, obviously they're hyping up the capabilities. I think you guys have
been sort of ahead of the curve of trying to gauge a what can the 950 supercluster actually do.
But would love your thoughts on everything that's going on from the China front, right? And this is kind of
coupled with deep seek saying their next models are going to be on domestically produced Chinese
chips, the Chinese government, kind of banning companies from buying the produced specifically
for China and video chips.
So there's just sort of a lot of dominoes falling right now in the semi-market in China, but
would love your take overall and, I mean, drill into some detail.
Yeah, I think when you sort of zoom out to even like, you know, let's walk from 2020,
because I think it's really important to recognize how cracked Huawei is, or even just
historically, like they've always been really good. Sure, initially they stole like Cisco
source code and firmware and all this stuff, but then they rapidly pass them up as well as
every other telecom company. In 2020, they released an ascend chip and submitted to impartial
public benchmarks. And they were the first to bring seven nanometer AI chips to market. They were
the first to have that, right? Now, you can still say Nvidia was ahead, but the
Apple was like nothing, right? And this is when they could access the full foreign supply chain.
This was when they just passed Apple to be TSM's largest customer. They were, you know,
clearly ahead of everyone on a manufacturing supply chain sort of design standpoint in a total
basis, right? Now, of course, Nvidia still had higher market share, but it was so nascent then,
like it could have, they could have really taken over the market. Quality got banned by the Trump
one administration from accessing, and then it went into effect in 2020, right, the full ban.
And so they were only able to make a small volume of these chips, but they had trained
significant models on these chips that they made then. And then over the next couple years,
right, in VDIA continue to accelerate, Huawei, because they were banned from TSM, had to go and
try and figure out how to manufacture at SMIC, the domestic TSMC. And then they were also in parallel
trying to go through shell companies to manufacture at TSM and acquire memory from Korea and so on and so forth.
So by the end of 24, this had gotten in full swing and it was caught, right?
It was caught and they finally shut it down.
But they were able to acquire 3 million chips, 2.9 million chips from TSM through these other entities, right?
roughly $500 million worth of orders,
which ends up being a billion dollar fine
that the US government gave TSM, if I recall correctly.
At least there was a Reuters article of that.
I don't know if they actually issued it,
which is important and interesting to gauge
because the number of a sense floating out there
has not consumed this entire capacity yet.
So now we get to 2025, right?
The H20 got banned in the beginning of the year,
Nvidia had to write off
huge amounts of money
our revenue estimate for Nvidia and China
for just H20 was north of 20 billion
because that's what they were booking in capacity
slash had to write off
and then it got banned they cut the supply chain
like they just said no we're not doing this anymore
they had their inventory gets re-approved
they resell the inventory but now they're like do we even
restart production
is invidia's question
and now you have China saying
hey like we don't need
Nvidia we have domestic alternatives
Whether it be Huawei or CamerCon, these companies have capacity,
but most of this capacity is still foreign produced, right?
Whether it be wafers from TSM, memory from Korea,
Samsung and S.K. Hynix.
So the question is sort of like, how much can they do domestically?
And there's sort of two fronts there, right?
There's the logic, i.e. replacing TSM,
and there's the memory, I.E. replacing Sinex, Samsung, Micron.
And on the logic side, they are behind, but they're really ramping there.
And I think they can sort of get to the production capacity estimates needed.
And the U.S. is still allowing them to import all the equipment necessary, pretty much.
The bands are really for beyond the current generation of technology.
Beyond 7 nanometer, the bands are really for 5 nanometer and below.
Even though the government says they're for 14 nanometer,
the actual equipment that's banned is only for below 7 nanometer.
and so they'll be able to make a lot of 7 nanometer AI chips
and maybe even get to 5 with using existing equipment
for 5 nanometer rather than using
rather than like taking the new techniques.
And so like there's the logic side and then there's the memory side.
And the aspect of Huawei's announcement that was surprising
was that they're doing custom memory, right?
Yeah.
That's the part that is sort of like, hey, this is really exciting.
They announced two different types of chips for next year.
one that's focused on recommendation systems and pre-fill
and then one that's focused on decode.
There's a twin these days.
Yeah.
So in Nvidia, the same thing.
They just announced a pre-fell-specific chip recently.
There's numerous AI hardware startups that are really focusing on pre-fill versus decode.
And so the sort of split of inference up to two workloads,
you know, Huawei's doing the same thing for their next year chip.
And what's interesting is the decode one has, you know, custom HBM.
What does that mean?
What is the manufacturing supply chain?
because that's the one that's tricky, right?
How much can they manufacture of that custom HBM?
And Invidia and others are also adopting custom HBM only starting next year, right?
So it's not like, you know, yes, the manufacturing capacity is not there.
The maybe it is going to consume a bit more power.
It's going to be slightly lower bandwidth.
But the fact that they're able to do, you know, some of the same things that
Nvidia plans to do, AMD plans to do in their memory is, you know, evidence that they're catching up.
but then the main question that remains is production capacity.
So as far as like, hey, Nvidia's banned in China, right?
Like they're saying don't buy Nvidia chips.
I think for a period of time, that's fine for China, right?
From a perspective of, hey, I'm China.
That's fine because you have all this capacity that you, you know, shipped in in 2024.
They haven't turned into AI chips.
Now you're turning them to AI chips.
You're running all that stockpile down.
What about the transition from running that stockpile down to ramping your new
stuff, right? And that transition is the one that's really tricky. China's either shooting itself
in the foot by not purchasing Nvidia chips during that time period or China's able to ramp. I think
they'll be able to ramp. I think it'll take a little bit longer. And there will be like a sort of a gap
in between where China probably backtracks and says it's fine. Like bike dance and it's like
begging for invidia chips, right? Like they don't want to use, they use some camera con, they use some
of Huawei, but they really want to use
Nvidia because it's way better. They don't care about the domestic
supply chain. They want to make the best models. They want to deploy
their AI as efficiently as possible.
And so this is like, you know,
the government can mandate them to like not do it, right?
So it's not that Nvidia is not competitive. It's that the government's sort of
trying to instigate it.
And then like I guess the last sort of thing
is like, you know, there's always the argument of like
hey, if
banning
Nvidia chips to China is so good for China,
why didn't China do it for itself?
And I finally doing it for themselves.
So again, like it'll be interesting to see.
Smuggling is still happening, right?
Re-exportation of chips from, you know,
other countries to China. That is still happening
at some volume, low volume,
lower,
medium volume, right? But then,
you know, the direct
shipments of Nvidia chips that are legally
allowed to China are not
necessarily happening today, but may have to restart at some point because China won't have
the production capacity to, you know, they would just have so many fewer AI chips being
deployed domestically versus the U.S. And at some point, you kind of have to pick, like,
am I all about the internal supply chain or am I all about chasing, you know, super
powerful AI? Yeah. So is there an angle here about a negotiation angle as well? Because
currently there's still discussions ongoing, what exactly are the boundaries, what can be
exported to China. So these are well-timed announcements if you want to make a point that
US should allow more exports. Do you think that's a factor or not?
Yes. So, you know, in the report we did a few weeks ago about the production capacity of
Huawei and the supply chain, there was a bit in there that we wrote about how, you know,
honestly, like, if you were China and you do want Nvidia chips, actually, how do you play
this, right? And it's by hyping up your...
domestic supply chain.
And it's by, it's like, it's like, yes, we can do everything.
It's Huawei announced the most crazy shit possible.
Announce the seven years of fucking, or three years of roadmaps.
So you said the radio report basically.
I think they knew.
They were already bid.
And then like, say, we're banning Nvidia, right?
And then it's like, then the government official is going to think, alongside sort of
lobbying from domestic players, like, of course we want to ship them better AI chips.
Like, we're losing this market.
We can't lose this market.
And it's sort of like, it is 10,000 IQ, right?
And we're here playing checkers while they're playing chess.
Well, so I guess negotiating chip aside, in that report, you talked about HBM or high bandwidth memory being a bottleneck to Huawei.
To your point on one of the surprising aspects of the announcement, do you think it's credible that it's no longer a bottleneck based on what they're saying?
Or are they, is it just hype?
I think production capacity-wise, it is still absolutely a bottleneck.
they
certain types of equipment
required for making
HBM need to be imported
they're working on domestic solutions
but as far as we know they have not imported
enough equipment for this
although if you look at Chinese
import data for different types
of equipment right there's there's sort of like
fabs spend you know roughly
it depends on the process technology
but fabs spend roughly different amounts
of money on lithography
etch deposition metrology
right like these different steps
and historically
lithography is hovered around
you know, 17, 18%, with EUV, it grew to 25%, right?
But China, because they wanted, they sort of like wanted to stockpile lithography and
they were worried about the becoming ban, they were importing lithography at a much higher
rate than that, right?
Like 30, 40% of their equipment imports were lithography.
And they were just stockpiling lithography equipment.
This is sort of like reversed now in that like, hey, if I want to, and so if you look
at the monthly import-export data, both into provinces and
in China, but also out of countries, you can see that etch specifically is skyrocketing.
And the main thing about stacking HBM is that you have to, you know, when you have
each wafer, you have to etch, create this thing called it through silicon via so it can connect
from the top to bottom and then you stack them on top of each other, right?
12 high, 16 high for HBM.
That's how you make super high bandwidth memory.
And they're imports for etch is like skyrocketing now.
So it's like, it's, they don't have the production capacity yet.
how fast can they ramp it as a function of how much equipment can they get a and be like the yields right
improving yields is really hard on manufacturing intel and samsung are really good and tsmc is just
amazing not not that those companies suck like i think is a better way to put it and and so you know
it's those two things i think yield they haven't even started production of high speed of of hbm3 right
they've only done some sampling of hbm 2 hbm 3 came out like a few years ago so there's still
quite a bit of ways on going up the learning curve.
Obviously, I expect them to catch up faster than it took the technology to be developed
because it exists in the world.
We know how to do it.
It's just a matter of actually doing it versus inventing it.
And then the other one is sort of the production capacity.
A couple months of import-export data is not enough to set up for years' worth of supply chain
built up, which is what we have today in Korea for the Korean companies.
now Heinex is also investing in the U.S. in Illinois and then microns, primarily in Japan, the American memory companies primarily in Japan and Taiwan, but they're also expanding in Singapore and the U.S. now. There's so much capital that's been invested, it would take some time for China to build up that production capacity to actually match the West. And when I say the West, I mean East Asia, in production, non-China, East Asia in production capacity. So it'll take some time to get there. And I don't think, I think it's like, hey, we can
design this, it's always a question of can we manufacture. And then the thing like that Jensen
would say is like, you're betting on China not being able to manufacture. Like, you know, it's a
matter of when, not if. And that's the whole calculus that like I think the US government has to
be aware of when they're like, hey, what level of AI chips do we sell? Do we sell everything?
Probably not because AI is far more powerful. And the end market of AI is going to be way larger
than the end market of semiconductors and equipment.
Do we sell, you know, what level do we sell at?
Well, how much can China make at each specific, you know,
sort of performance tier and then, you know, analyze that and what's the volume
and then figure out, like, what is okay, which is like maybe a little bit above
or around the same level.
Yeah.
So if you, to your point on, like, playing chess versus checkers, if you're Jensen,
what would your next move be given the situation at hand?
It's both like partially,
true that he's afraid of Huawei
more than he is like an AMD.
Right. He called them formidable.
Yeah.
Well, like, I mean, like every other
like Huawei's beat Apple, right?
They passed Apple up in TSM orders.
They passed Apple up in phone market share.
Not in the U.S., but like in many parts
of the world before the bands
came down. And then even now they're
growing back again in market share without like
Western supply chains.
You know, they've done this to numerous other
industries. I would say Apple is like a formidable
competitor, right?
Like, they've beaten a lot of industries.
And so it's reasonable that he's afraid of them.
It's sort of, you know, and he's not afraid of A&B.
So, like, I think, like, the best thing is, like, try and so as much, like,
Huawei announced is reality rather than, like, their hope target.
Yeah.
And so away all doubt on manufacturing capacity, which I think is not fair, right?
I think manufacturing capacity is a real bottleneck for them.
And then the yield learnings, real bottleneck, like temporary, maybe.
We'll see how long and we'll see how fast the rest of the, you know, the Nvidia technology advances past what Huawei's capable of, right?
And how fast Huawei is able to close the gap.
But I think his main sort of pitch would be Huawei is real.
They're a formidable competitor.
They're going to take over not just the Chinese market, but also for,
markets, right, whether it be the Middle East or Southeast Asia or South Asia or Europe or Latam,
right, everywhere besides America. And there's a, I think, I think Noah Smith has this analogy,
right? This whole idea is that you should goopagos China, right? Make them have their own domestic
industry that is so different from the rest of the world, right? Kind of what happened with
Japan in the 70s and 80s. There are, in 90s, they're, and 90s. They're,
PCs were so specific and hyper optimized to the Japanese market with like, you know, the weird,
like, I don't know if you've seen the weird scroll wheel on the, on these Japanese PCs. Like,
you literally, like, it's like, you go like this and it scrolls, right? And it's like, and then the
touchpad is a circle. And then that's around it. It's like, things like that are so weird.
Totally. And the rest of the world doesn't care, but Japan market likes it, right? And his whole
idea is like, let's Galapagos them, i.e. keep their technology within China. And then that's, like,
dead weight loss and they never expand outside versus that we serve the whole world. But the whole
risk is that the opposite can also happen, right? Our technology is hyper optimized to running, you know,
language models at this scale and RL and you keep, you know, you keep like hardware software
code design can take you down a trap path of the tree that like is a dead end. And then China, like,
because they're not allowed to access this tree, they're like, oh, okay, then they end up in the like
optimal spot, right? We had a local minima. They had a local maximum. They had a local maximum. They
a local, a global maxima, right?
Like, that's sort of like technological Galapagosing is sort of what Noah Smith's analogy is.
I like it a lot.
I don't know if it's accurate, but it's an interesting one.
Yeah, I love that.
Well, actually, maybe just taking a step back from current events, even though there's so much to talk about right now.
Last time you appeared with us, Nvidia came up, obviously.
And you talked about a couple of the potential paths forward for NVIDIA.
Give us maybe the bull case, the bear case.
Fair enough.
There's a lot embedded in their numbers now.
But what's interesting is consensus for the banks is like for across like the hyperscalers.
So Microsoft, Corrieve, Amazon, Google, and Oracle, right?
Meta, right?
So it's the six hyperscalers, right?
Who I would consider hyperscalers?
The consensus for the banks is $360 billion of spend next year.
across all of them.
And my number is closer to, like, it's like $450,500.
And that's based on like, you know, all the research we do on, like, data centers and, like,
tracking each individual data center in the supply chains, right?
So this is just Nvidia spent.
This is HAPX for the hyperscalers.
Right?
And that back, Kappex gets split up across different companies, but the vast, vast majority still
goes to WemiteA, right?
And Nvidia is in a position not where they take, they can't take share, right?
It's they grow with the market slash defense share.
And so the question is like, how fast is the growth rate of CAPEX for hypers
and other users, right?
And the reason I included Oracle and Corrieve is hyper scalers, even though they're
traditionally not called hyperscalers because they are opening eyes hyperscaler.
So, you know, when you look and you look at the Oracle announcement, right?
Like, first of all, the Oracle announcement, I don't understand why people don't think this
is crazier.
They did the most unprecedented thing in the history of stocks and public and companies ever.
They gave a four-year guidance.
And it made Larry the richest man in the world, you know, like all these things.
Anyways, you know, the question is like, how fast does revenue grow, right?
Do you think Oracle and OpenA.I, which signed a $300 billion plus deal with Oracle,
will actually be able to pay $300 billion, right, across raising capital and revenue?
and I think most, and it gets to a rate of like over $80 billion a year in just a handful of years, right?
So it's like, do you believe the market will grow that fast?
It's very possible, yes.
And it's very possible for like, you know, Open AI, what is their revenue going to be exiting next year?
Some people think $35 billion.
Some people think $45 billion.
Some people think $45 billion.
You know, ARR, by the end of the year next year, this year, they, they have.
hit 20, right? Arr. So if that growth rate is maintained, then all of that cost goes to
compute, plus all the capital they continue to raise, right? And again, there are financials that
they sort of like gave to investors for their last round was like, hey, we're going to ban, we're
going to burn like $15 billion next year. It's probably more likely going to be like 20. But like,
you know, and you stack this on and they're not turning a cash flow, they're not going to be profitable
until 2029. So you sort of have like, they're going to continue to bet,
burn 15, 20, $25 billion of cash each year plus revenue growth. That's their compute spend.
And you do this for entroping, you do this for open-ana, you do this for all the labs.
It's very possible that the pie does get to, you know, more than 500, you know, not 360 billion
next year, 500 billion next year, and for total capex. And the pie continues to grow for
hypers. Invidia says, actually, it's going to be multiple trillions a year on AI infrastructure.
And he's going to capture a huge portion of it. That's his bull case, right? That's the bull case,
is AI is actually so transformative
and the world just gets covered in data centers
and the majority of your interactions are with AI
whether it's like, you know, business productivity
and telling an agent to do some code
or you're just talking to your AI girlfriend Annie, right?
Like it doesn't matter.
You know, all of this is running on Nvidia for the most part.
The beer case is, you know, even if it does grow a lot.
Yeah, go ahead.
Save the book case for a second.
I think fundamentally the value creation, I think,
personally is there, right? I mean,
trillion dollars of value with AI,
I can totally see this happen.
So assume it's true, where will Nvidia top out?
I guess
how much do you believe in takeoffs, right?
Yes.
Yes, so like if there is like a takeoff scenario, right,
where like powerful AI builds more powerful AI,
builds more powerful AI,
or that creates more and more, you know,
each level of intelligence, like,
enables more for the economy, right?
Like how many, how many,
how many monkeys can you employ in your business
versus how many, like, humans, right?
You know, sort of the same, or how many dogs, right?
Like, you know, there's sort of like,
what is the value creation of a human versus a dog?
Sort of like the same with AI.
So, like, I mean, in this case,
the value creation could be hundreds of trillions,
if not, you know, the data after that.
Do you need this?
I mean, if you take every white-collar worker,
make them twice as productive with AI,
that's in the hundreds of trillions, isn't it?
Yeah, but like, what is twice, you know,
like, if you talk to people,
the labs, right? Like, twice as productive. What does that even mean? It's replaced them.
Right? It's, it's be 10 times better than death.
Like, I mean, like, I don't know how soon that.
If it's sort of white color work is essentially useless without a constant stream of LN tokens,
right, that make them productive, right? At that point, you basically can tax every single
knowledge work in the world, right? Which is most workers in the world long term.
Yeah. So, I don't know. What's your guess? Give us a number. What's the cap?
Cap? I mean, like, why aren't we making a Matriosa brain? Like, I don't know.
Like, I mean, at some point, the machine says humans don't need to live and we need even more compute.
One step before that, right?
Are we colonizing Mars yet?
TBD.
I don't know, man.
I find it, like, completely, like, impossible to predict anything beyond five years, given how much stuff is changing.
Like, that's how, like, I'll leave into economists, right?
Like, you know, like, honestly, like, you know,
supply chain stuff is like three, four years out and that's it.
And then fifth year is like sort of like yellow, right?
So like I just try and ground myself in the supply chain stuff, right?
Like it's like a, you know, supply chain and then like,
what is the adoption of AI?
What's the value creation?
What's the usage?
And you can see that in like a short horizon.
Beyond that, like, I don't know, like,
are we all going to be connected to computers, like BCIs and stuff?
Like, I don't know, dude.
or our humanoid robots are they going to be you know i mean you saw elon's thing right like he's like
yeah humanoid robots are why teslas worth more than 10 trillion so go hey great what is all that
being trained on great in video okay awesome so that that's worth also 10 trillion right like i don't
i don't know like uh it's too it's too out there for me i don't like the out there discussions
very fair um read some sci-fi books so just pulling out the thread where you talked about i mean
this is kind of a throwaway comment, but how market share can't really grow just because it's such a dominant market share.
And we talked about, or you guys talked about, the moat of Amidia last time.
And obviously, this moat is tied to maintaining that very high market share that they currently have.
And I love this sort of historic journey you took us through with Huawei just earlier.
Can you kind of walk through what Nvidia did throughout history to build their moat?
It's super awesome because, you know, they failed multiple times in the beginning,
and they bet the whole company multiple times, right?
Like, Jen's just crazy enough to bet the whole company, right?
Like, whether it was, like, certain chips ordering volume before he knew it even worked,
and it was, like, all the money he had left, or, like, ordering volumes for projects he had not won yet,
like, I heard a rumor that, or not a rumor, but, like, a story from someone who's, like, a gray beard in the industry,
and I think would know was like,
you know, no, no, no, like,
Nvidia ordered the volume for the Xbox
before Microsoft gave them the order.
They were just like,
they was just like, fuck it, yellow.
Yeah, right?
I don't know, like, I don't know how true this.
I'm sure there's more nuance there, like, you know,
verbal indication or whatever,
but like the order was placed before he got the order, right?
Like, is what he said.
You know, there's cases like with the crypto bubbles, right?
Like, there was a couple of them,
But like,
Nvidia did their
damn best to convince everyone
on the supply chain
that it wasn't crypto
and that it was gaming,
real demand,
it was gaming and data center
and professional visualization.
And therefore,
you guys should ramp your production
and they all ramped production
and spent all this CAPEX
on increasing production
and building out new lines for them.
And they pay per item
and then they bought them
and sold them at
and made shit loads of money.
And then when it all fell apart,
they just had to write down
a quarter's worth of inventory.
whatever. Everyone else was like, well, crap, I have all these empty production lines, right? And so it's like, you know, but, but like what did AMD do then, right? Their chips were actually better for crypto mining, right? On a, you know, amount of silicon, uh, cost versus how much you hash, but like they just didn't, they just, AMD was like, ah, we're going to not really raise production, right? Like, as a reasonable, you know, thing, right? It wasn't a, it's sort of like strike while the iron's hot. And so like, you know, the same has happened with Invidia, right? They've, uh,
in recent times, like, sort of,
they've ordered capacity that no one believes, right, multiple times.
They see the end demand, obviously.
But in many cases, they're just like,
their number for, like, Microsoft was higher
than Microsoft's internal planning, right?
And then Microsoft's internal planning went up,
but, like, their number for Microsoft was way higher.
And it's like, oh, we just don't think Microsoft's going to need this much,
even though they tell us this.
It's like, who the heck?
It was like, no, no, no, no, customer, you're going to buy more.
Like, and orders, right?
And then when the orders come through the supply chain, it's like, I have to put pay NCNR, right, non-cancel, non-returnable, like, you know, this is.
Hey.
You know, this is, uh, I asked a question in Taiwan once.
Uh, there was like a, it was, it was, it was Colette, which is the CFO and Jensen, CEO.
They were, they were both there.
Um, and it was, it was a room full of like, mostly finance bros and they were asking stupid finance questions like three days before earnings.
So obviously they just could not answer anything.
because it's like, you know, SEC regulations.
But then my question to them was like, look, Jensen, you're like so vibes,
like driven and like very gut feel and like very visionary.
And then that's, you know, CFO, like, she's amazing in her own right.
But like, you know, those personalities clash, how do you work together?
And he's like, I hate spreadsheets.
I don't look at them.
I just know, right?
It gives his response.
And it's like, of course, you know, the best innovators in the world have really good gut instinct.
Right.
Right. And so like the gut instinct to like order with, you know, with non-cancelable, when you don't know, and they've had to write down over their history multiple times, right? Many, many billions of dollars in accumulative orders, right?
So accumulate in total orders. Whether it be, you know, the age 20, which is more regulatory, but like other cases they've ordered and had to cancel.
Is that many billions? It's many billions. Peanuts.
Well, well, it depends, right? The crypto write-down was like multiple billion when their stock was.
was like less than $100 billion, right?
Like, it's like a...
That's compared to the upside, right?
I think everything you did was right.
I think everything AMD did was wrong,
like, you know, in that scenario.
But like, it is crazy to...
Especially in a cyclical industry like semiconductors
where companies go bankrupt all the time,
which is why we have all this consolidation,
is every down cycle, companies go bankrupt.
I mean, if it's a little bit of risk return perspective, right?
These beds were totally worth taking.
Yes.
If you look at it from, I'm a CEO, I want to have predictable quarters for Wall Street.
It's a very different story.
I think that's a part of the tensions from now.
Yeah, so we, I don't know if you've seen these like Li Kuan Yu edits where they're like him like saying some like fiery speech.
And then like it's like some cool music at the end and it's like showing different pictures of them.
And so we made one of Jensen recently and put it on social media right on like Instagram, TikTok,
XHS, Redbook, right?
Twitter, of course, right?
like all the different social media.
And I really liked it because he's like,
he's like, you know, the goal of like playing is to win.
And the goal or sorry,
and the reason you win is so you can play again.
Right.
And you compared it to pinball where like actually you just play all day and you keep getting
more rounds.
And it's like his whole thing is like, I want to win so I can play the next game.
Um, and like it's only about the next generation, right?
It's only about now, next generation.
It's not about 15 years from now because that's,
it's a whole new playing field every time or five years from now.
I think that's, you're right,
it's the risk or reward is, is correct.
Yeah.
But there's few people take these kind of risks.
It's the only semiconductor company that's worth,
you know, I think even north of $10 billion,
that was founded as late as it was.
Like Media Tech was in the early 90s and then,
Nvidia and everyone else is like from the 70s mostly.
Yeah.
From big ones.
Yeah.
Yeah.
Yeah, I think you raised this great point on
these bet the bet the farm and he's actually been wrong a couple times to your point
mobile right like what the hell happened with mobile exactly and he still takes them and i think
mark actually had this great conversation with eric where he talked about being founder run
where you have this memory of the risks you took to get to where you are today right and so
in a lot of cases if you're a CEO brought on later on you're sort of like okay continue to steer
the ship as is um but in this case he he remembers all the times they almost went bell
up and he's like, I've got to bet. Keep making bets like that.
How do you think he's changed over? I mean, he's been one of the longest running CEOs over 30.
He's kind of right up there with Larry Ellison now. How do you think he's changed over the last
30 years or so?
I mean, obviously, like, I'm 29. I don't forget that what he was like.
I've watched a lot of old interviews. I won't say he wasn't...
The CEO longer than you've been alive.
Yeah, exactly. Exactly. Like, in videos.
that was founded before I was born, I'm 96, right?
Like, you know?
Yeah, maybe anything over the last
couple of you, right? I think even like
watching old interviews, right? Like, I watched
a lot of old interviews, a lot of old, like,
presentations he's given.
One thing is that he's just like sauced up and
dripped up, like, wait, like, the charisma
he's gotten has only gotten stronger.
Right?
Yep.
Which is an interesting point.
I don't know if it's quite relevant.
I don't agree with that, yeah. But like,
the man, like, has
learned to be a rock star more, even though he was always charismatic. It was like, he's a complete
rock star now. And he was a rock star, you know, a decade ago, too. It's just people maybe didn't
recognize it. I think, I think the first live presentation that I watched, it was stream was like,
um, it was, what's the, what's the, what's the, it's CES like 2014 or 2015 or whatever. Um,
he's, he's, he's, it's, it's, it's consumer electronics show. I'm, I'm, I'm like, mom. I'm, I'm, I'm,
moderating like gaming
gaming hardware subredits, right?
Like at the time, I'm a teenager.
And like the dude is like
talking only about AI.
He's telling, like, all these gamers
about AlexNet and self-driving cars.
Right?
It's like, know your audience, first of all,
but also like, like,
it has nothing to do with consumer electronics
at Gaby.
You know, at the time, I was also like,
I was half like, holy crap, this is amazing,
but also half like,
I want YouTube.
announce new gaming GPU, right?
Like, you know, but I know like on the forums,
on the forums, quickly everyone
was like, you know, screw this,
you know. I want to hear about the gaming
GPUs, Nvidia's price gouging.
Like, you know, of course, invidia's always had to like,
we priced the value and like,
plus a little bit, right? Because we're
just smart enough to know.
You know, I'm guessing Jensen just has
the gut feel of how to price things, right?
He'll change the price like, at least on
gaming launches, he'll change the price up until like
right before the presentation. Wow.
So like it really is like a gut feel thing probably.
And anyway, so, so he, he had that charisma to know what was right.
But I think people, a lot of people were like, oh, no, whatever, Jensen's wrong.
He doesn't know what he's talking about.
But now, like, he talks, people are like, oh, very, very, you know, so it might just be
that he's been right enough.
Yeah, there's a post on X recently that said he had moved up into God mode with a select
group of CEOs, but that this was, like, it's exactly.
Who's the other gods?
It was Zuck.
Pretty other gods.
Elon.
Elon, Zuck, and Jensen.
Nice, nice.
Okay.
Good crew to be in.
So we pray to Silicon Valley.
The cult now?
Exactly.
Just on one last thing on people.
You mentioned Colette, his CFO, and there's sort of a famously loyal crew at
NVIDIA, even though all of the OGs could retire at this point.
Is there anyone akin to a Gwynne Shotwell at SpaceX or previously a Tim Cook to Steve Jobs at Apple that is at Nvidia today?
I mean, he had two co-founders, right?
Like, that's, you know, let's not overlook that.
One of them's like, you know, not involved and hasn't been for a long time.
But the other one was involved up until just a, you know, a few years ago, right?
So it's not just Jensen running the show, although he was running the show.
there's quite a few people on the hardware side.
I've always, there's someone at Nvidia that's like mythical to me.
Like when you talk to the engineering teams, he leads a lot of engineering teams.
He's a private person, so I don't want to say his name actually.
Fair enough.
But, you know, he's like, he's like effectively, like chief engineering officers, like his role.
And people within his org will know who he is.
and I think there are people like that,
but, you know, he's intensely loyal,
and there's a number of these types of people.
There's another fella who's like, you know,
like there's all these like innovative ideas at Nvidia,
and he's the guy who literally is like,
we need to get the silicon out now, we're cutting features.
And that's like what he's famously known for,
and all the technologists in Nvidia hate him.
This is like a second guy.
This is a second guy.
Also intensely loyal to Nvidia has been around for a long time.
time, but it's like, you know, it's sort of like, when you have such a visionary company
and forward, you know, one problem is that you get lost in the sauce, right? You know, oh, I want to
make this. It's got to be perfect, amazing. And it's like, you know, you got to have that sort of
like, and these people are like, you know, obviously they're close to Jensen for a reason because
Jensen also believes like these things, right, have the visionary future luck egg, but also like
screw it, cut it, we'll put it in the next one, ship, right? Like, you know, ship now, ship faster.
like in a space like silicon, which is like really hard to do so.
And sort of like the thing about Nvidia that's always been, you know, super impressive.
And it's from the beginning days where he's talked about this before is their first chip,
their first successful chip, they were going to run out of money.
And he had to go get money from other people to even finish the development.
And even then he just had enough money because he'd already had a failed chip before this.
was the chip came back and it had to work otherwise it would not you know and so they were like
because they could only pay for it's called a mask set right basically you put these like
I'll call them stencils into the lithography tool and then it like says where the patterns are and
you you know you put the stencil in you deposit stuff you etched off you deposit materials on the
wafer etched away and you put the stencil in and like you you like tell it where to put stuff right
and then the deposition and etch keeps happening in those spots and you stack dozens of
of layers on top of each other, and then you make it with chip.
These stencils are custom to each chip, right?
And they cost today in the orders of
tens and tens of billions of dollars.
But even back then, it was still a lot of money.
It wasn't that much then, of course.
They could only pay for one set.
But the typical thing with semiconductor manufacturing
is, you know, as good as you can simulate it,
as good as you can do all the verification,
you'll send a design in and you have to change it.
there's going to be something.
It's so hard to simulate everything perfectly.
And the thing about Nvidia is
they tend to just get it, right, the first time.
Even great executing companies like
AMD or Broadcom or whoever,
they often have to ship,
they're denoted in like A and then a number
or B and then a number,
so it's like two different parts of the masks.
So like, Nvidia always ships A-0.
Almost always. They sometimes ship A-1.
And a lot of times, even if they'll start production of the, you know,
A is basically the transistor layer, then the numbers like the wiring that connects all the transistors together.
So, Nvidia will start production of the A and ramp it really high
and then just hold it right before you transition to the metal, just in case they do need to change the metal layers.
And so, like, the moment they're ready and they've confirmed that it works,
they can just, you know, blast through a lot of production.
Whereas everyone else is like, oh, let's get the chip back.
Oh, okay, A0 doesn't work.
We've got to make this tweak, make this tweak, and get the chip back.
It's called a stepping, right?
At the internet, we were very jealous of
Nvidia at that time, right?
They consistently delivered
in the first one we did not.
The data center CPU group,
there was one product where, you know,
I said A1, A0A1,
or you go to B if it's,
you have to change the transistor layer as well.
So it's like B.
Invidia, sorry, Intel got to like E2 once.
E2.
Like that's like a 15 revision.
This is, this is.
It says like a peak.
of A&D's, like when they went skyrocketing on market share versus Intel was when Intel was at
E2, right? Like 15th stepping. Because there's quarters of delay, right? I mean, it's catastrophic
for a go to market. Yeah, each time is a quarter of delay or something, right? Yeah. So it's, it's
absurd. So I think that's the other thing about Invidia is like, you know, screw it, let's ship it.
Let's get the volume. ASAP. Let's, let's, let's, you know, let's do these things that, you know,
And so anyways, they, like, you know, have some of the best simulation,
verification, et cetera, that lets them sort of go from design, you know,
from idea to shipment as fast as possible, you know,
cutting out any unnecessary features that could delay it,
making sure they don't have to do revisions.
So they can get, you know, they can respond to the market ASAP.
There's a story about how Volta, which was the first Nvidia chip with tensor course,
you know, they saw all the AI stuff on the prior generation P-100.
Pascal, and they decided we should go all in on AI, and they added the tensor cores to Volta
only a handful of months before they sent it to the FAP.
Like they said, screw it, you know, let's change it.
And it's crazy.
And it's like, if they hadn't done that, who would have, maybe someone else would have taken
the AI chip market, right?
So there's all these times where they just, and it's, those are major changes, but there's
often like minor things that you have to tweak, right?
number four maths or like some architectural detail.
Invidia is just so fast.
The other crazy thing is they have a software division
that can't keep up with that, right?
I mean, if you come out with the chip, right,
and basically no stepping required,
it's immediately in the market,
then being ready with drivers
and all the infrastructure on top of all that's just super impressive.
Yeah, I love that point because you think of
Nvidia benefiting from tailwind after tailwind,
but I think both of you're saying,
you have to move fast enough and execute well enough
and take advantage of those tailwind.
And if you think about, and by the way, I loved your CES story.
I'm just envisioning him more than 10 years ago talking about self-driving cars.
But, you know, if you think about nailing the video game tailwind, VR, Bitcoin mining, obviously AI now.
You know, one thing that, or one of the things that Jensen talks about today is robotics, AI factories.
Maybe my last question on Nvidia, what do you think about the next 10 to 15 years?
I know calling Beyond 5 is hard.
but like what does
Nvidia's business look like?
It's really a question of
and this is like
I think every time I've talked to
you know
some executives at Nvidia
have asked this question because I really want to know
and they won't answer it obviously
but it's like what are you going to do with your balance sheet
like you are the most high cash flow company
and like
you have so much cash flow
now the hyperscalers are all taking their cash flow
like way down right
because they're spending on GPUs
what is
what are you going to do with all this cash flow right
like you know even even before this whole takeoff
he wasn't allowed to buy ARR right
so so what can he do
with all this capital and all this cash
right even this $5 billion investment Intel
there's regulatory scrutiny there right
like it's in the announcement
like, yeah, this is subject to review it, right?
Like, you know, I imagine that it'll get past,
but, like, he can't buy anything big.
He's going to have hundreds of billions
of dollars of cash on his balance sheet.
What do you do? Is it
start to build AI
infrastructure and data centers? Maybe.
But, like, why would you do that
if you can just get other people to do it?
And just take the cash.
Well, he's investing those, right?
Investing peanuts.
Right?
You know, like,
he gave recently, like, a core
we have a backstop, because today it's really hard to find a large number of GPUs for burst
capacity, right? Like, hey, I want to train a model for three months, right? I have my base
capacity where I don't know my experiments, but I want to train a big model three months.
We know from our portfolio. Yeah, yeah. So, like, Invidia sees this issue. They think it's a real
problem with startups. It's why the labs have such an advantage. But what if I could, you know,
right now, like, you know, most companies in the valley spend, what, 75% of their round on
GPUs, right?
At least, yeah.
Yeah.
What if you could do
75% in three months
on one model run, right?
You know?
Yeah.
And really scale and have some
sort of like competitive product
and then you have the model.
Then you raise more capital, right?
Or start deploying, right?
What do you do with it?
Is it start buying
a crap load of humanoid robots
and deploying them?
But like they don't really make good software.
They don't make really that amazing software for them.
In terms of the models, right?
They make, you know, the layer below is great.
where they deploy their capitals is like
the question.
He has been investing up and on the supply chain
a little bit though, right?
Investing in the neoclouds,
investing in some of the model training companies.
Yeah, but again,
small fries.
Like, he could have just done
the entire Anthropic round if he wanted to.
Of course he didn't, right?
And then, like, really got them to use GPUs.
Or like, he could have done the entire, you know,
open AI round.
Are you going to do any XAI round?
Do you these are things he should be doing?
Or what's...
I mean, like...
Yeah, good question.
I don't know, right?
I think, like, we'll quote you up for the next round that we're raised.
But anyways.
He could make venture a dead industry.
No, she's kidding.
Take all of the best rounds.
But it's a lot of business, yeah.
You can do the scenes and then have Jensen mark you up.
That's why the word.
Well, I don't think.
I like it.
I think picking winners is obviously really tough for him because he has customers all across
this ecosystem.
If he starts picking winners, then, like, his customers will even be even more anxious
to leave and give.
even more effort to whether it's AMD or some startup or their internal efforts,
et cetera, et cetera, right?
Buying TPUs, whatever it is.
Like, you know, people will, he can't just like invest in these.
Like, you know, he can do a little bit, right?
A few hundred million in an open AI round is fine.
Or a few hundred million the next AI round is fine.
Core weave, right?
Like, yeah, everyone's like throwing a fuss about it.
But it's like he invested a couple hundred million plus, you know, early on,
plus, you know, rented a cluster from them
for internal development purposes
instead of renting it from a hyperscaler,
which is cheaper for Nvidia to do, right?
It's better for them to do it from them
than the hyperscalers.
It's like, did he really, like,
is he really backstopping
core weave that much, right?
Or, you know, any of the other customers
or Neo-Clouds?
Like, there's some investment,
but it's more like, this is a good cloud,
you know, we'll throw like five or 10% of the round, right?
It's not he's taking 50% plus of the round.
Is he also reshaping his market?
I mean, look, a couple of years ago,
there were four big purchases of these cards.
You just listed six.
To what extent is that...
That's him in Nevis and Leibniz.
There's a long list there.
Of course.
For Matt, yeah.
Is that a strategy?
It is.
I think it absolutely is.
But he didn't have to put much capital down to do this.
Like, just chip one earlier than the other?
I don't know.
Yeah, that's...
No, but it's like, if you look at the grand amount of capital,
spent investing in the neoclouds,
it's, it's
a few billion dollars. But he has
a lot of other levers if he wants to. Right, right.
Allocations, as you mentioned.
What's nice is, you know, historically,
you gave volume discounts to the hyperscalers.
But because he can use the argument
of antitrust, he's like, everyone gets the same
price.
So fair. It's very fair.
It's very fair. You know?
So what should he do with the cat? Or what
should guide his
I mean, I think like, you know, like there's the argument he should invest in data centers
and only the data center layer, not the not what goes in the data center so that more people
build data centers. And then if the market demand continues to grow up, data centers
in power or not the issue, right? Invest in data centers in power. I've said that to them.
They should invest in data centers in power, not in the cloud layer, because the cloud layer
is quite commoditized, but quite, it's commoditize or complement, right? It's the whole
phrase. And I won't say being a cloud is commoditized, but it's certainly like, you have
a lot of competitors who are decent now.
And you've educated the commercial real estate and other infrastructure investment firms
into going into AI Infra as well.
So, like, I don't think it's the cloud layer that you invest it, right?
Do you invest in data centers and energy?
Yeah.
Do you invest it?
Because that's the bottleneck for your in growth, really.
Is A, how much people want to spend and can spend, and B, the ability to actually put them
in data centers.
and then like robotics
and like I think there's like areas he could invest in
but nothing requires $300 billion of capital
so what do you do you do with the capital?
Like I really I really don't know
and I like feel like Jensen has to have some idea
there's some visionary plan here
because that's what shapes the company right
is I mean they could sell
they could they could just continue to
you know I mentioned $200 billion of free cash flow
$250 billion a free cash flow a year
what do they do with it like do they just buy back stock
forever? Like, do they go Apple route?
The reason my Apple
hasn't done anything interesting in like,
you know, nearly a decade is
you know, they've got a not
visionary at the head. Tim Cook's great, a supply chain.
And they're just plowing the money
into buybacks. They're not really, you know,
automotive, the self-driving car thing failed.
We'll see what happens with ARVR.
You know,
we'll see what happens with wearables, right?
But like meta and opening eye
might be even better than them. We'll see, like,
in others, right? So, so, what
he invest in, I have no clue, but nothing
what requires so much capital
is the tough question.
It actually gets a return.
Because the easy thing is like
my cost of equity, right? I just buy back.
And this completely change the company culture. I think that's another
thing, right? There are probably areas he could invest
it in, but you suddenly end up with the company doing
two completely different things, which are very difficult to
keep on it. But they do like 10 completely
different things, right?
I mean, one way to look at is we build AI
infrastructure. And in the guys of we build
AI infrastructure, robots, humanoid
around the world are AI infrastructure
or data centers and energy is AI infrastructure, right?
Like, you know, like...
So the human rights would totally work, right?
If you're suddenly pouring concrete
and building power plants, it has completely different cultures,
completely different stuff of people.
It's very much harder.
Okay, agree.
But there's different ways to invest in the various companies
or like backstop, like, the building of a power plants,
right?
Like, you know, there's no one who has to build power plants
because they're 30-year underwriting things.
You know, there's all these different areas
where could use capital to, you know,
allow something to happen, right?
Not necessarily owning it himself.
And look, look and Barry Maddenthal,
one of the biggest problems we had
was that our customer base sucked, right?
I mean, we were selling to,
most of the chips went into the large hyperscalers,
you know, which they're way to concentrate it,
and they build their own chips, and so you can push down your prices.
So, honestly, spending it on diversifying the cloud,
you know, the company was in 2014.
14, you guys should have just charged so much
that your margins were 80%.
What would the world have done?
Nothing.
The margins were pretty good, that guy.
That wasn't the problem. That was the primary problem.
There were 60, 65.
They were 80.
Still, yeah.
Oh, boy.
There was a Jetson.
It's the Jetson and a different program.
GSD is kicking in here.
Well, wait, I think Guido's comment
is actually a really good segue
into something else we wanted to talk to you about,
which is the hyperscalers.
And one of the reasons that I love reading semi-analysis
is you guys make these out-of-consensus calls
that you're often right about.
And one of them recently...
Wow, is calling...
Only often?
You have a Jensen hit rate.
It's very high.
Where's my billion-dollar, you know, PV-positive bet?
But the one that caught my eye
was Amazon's AI.
resurgence. So I wanted to talk to you a little bit about that just because, you know, I think
we found it pretty interesting being on the ground, helping our portfolio companies pick who
their partners are. And so we have some microdata on this. But you sort of walk through why they're
behind. Yeah. So in Q1, 2020, I wrote an article called Amazon's Cloud Crisis. And it was about
all these neoclouds are going to commoditize Amazon. It was about how,
Amazon's entire infrastructure was really good for the last era of computing, right?
What they do with their elastic fabric, ENA and EFA, right, their NICs, what they,
and the whole protocol and everything behind them, what they do for custom CPUs, etc., right?
Like, it was really good for the last era of scale out computing and not the era of sort
of scale up AI infra and how Neoclows are going to commoditize them and how their
silicon teams were focused on, you know, cost-optimptombs.
whereas the name of the game today is
max performance per cost, right?
But like that often means you just drive up performance like crazy.
Even if cost doubles, you drive up performance more triples
because then the cost per performance falls still.
That's sort of the name of the game today within Bidia's hardware.
And it ended up being like really good call.
Everyone like was calling us out like, no, you're wrong.
And this was like when Amazon was like the best stock.
and Microsoft really hadn't started taking off yet,
and nor had all these other, you know, Oracle and so on and so forth.
And since then, Amazon has been the worst performing hyperscaler.
And the call here is that, you know,
they still have structural issues, right?
They still use elastic fabric, although that's getting better,
still behind Nvidia's networking, still behind Broadcoms
slash Arista, like type networking, NICs.
They still use, you know, their internal AI chip is,
okay, but the main thing is that they're now
waking up and being able to actually capture business, right?
So the main call here is that since that
report, AWS has been decelerating revenue.
Year-on-year-on-year revenue has been falling consistently.
And our big call is that it's actually going to start re-accelerating, right?
And that's because of entropic.
It's because of all the work we do on data centers, right?
Tracking every single data center, when that goes online and what's in there.
the flow through on costs,
or if you know how much the chips costs,
the networking costs, the power cost.
You know how much
generally margins are for these things,
then you can sort of start estimating revenue.
So when we build all that up,
it's very clear to us that they trough
on
AWS revenue growth at this point.
This is the lowest
80% revenue growth will be
on a year or a basis
for at least the next year.
And it's re-accelerating to north of 20%
again.
because of all these massive data centers they have online
with Traneum and GPUs, right, depends on which one,
it depends on which customer.
The experience is not as good as, you know, say, a CoreWeave or wherever,
but the name of the game is capacity today.
CoreWeave can only deploy so much.
They only can get so much data center capacity,
and they're really fast at building.
But the company with the most data center capacity in the world,
that and still today, although they may,
may get passed up in the next two years is Amazon. Actually, they will get passed up based on what
we see is Amazon. But incrementally, Amazon still has the most spare data center capacity that's
going to ramp into AI revenue over the next year. Let me ask for a question. Is that the right type
of data center capacity? Like for the high density AI buildouts today, you need massively more cooling,
you need to have enough water close by, need to have enough power close by. Is it in the right
place or is it the wrong type of data? So data center capacity, in this sense, I mean,
all the way from power is secured to substations built,
to transformers, to, you can provide the power whips to the racks.
Now, obviously, the data center capacity will differ, right?
You know, historically, actually, Amazon's had the highest density data centers in the world.
Right?
They went to, like, 40-kilow racks when everyone was still at 12.
And if you've ever stepped in foot inside of most data centers,
they're, like, pretty cool and dry-ish.
if you step inside of Amazon data center,
they feel like a swamp.
It feels like where I grew up, right?
It's like humid and hot
because they're like optimizing every percentage.
And so sort of like,
your point in here is that like
Amazon's data centers aren't equipped
for the new type of infrastructure.
But when you compare them to the cost of the GPU,
like getting,
getting, you know, having a complex cooling arrangement is fine.
Right.
You know, we made a call on
Asera Labs a few months ago,
a couple months ago,
when they're like at 90 and it's gone to 250
the month after because
of what their orders Amazon is
placing with them. But there's certain things with Amazon's
infrastructure. I won't get too much into it.
But the Rack infrastructure requires
them using a lot more of like a sterolabs
connectivity products.
And the same applies to cooling.
Right? So it's on the networking and cooling side.
They just have to use a lot more of this stuff.
But again, this stuff is inconsequential on cost
compared to the GPU.
You can build.
My question was more like, look, I may need a major river close by for cooling at this point, right?
It's in many areas I just can't get enough water.
And, you know, it's probably power in the same region.
There's two gigawatts scale sites that they have power all secured,
wet chillers and dry chillers all secured.
Like everything's fine.
It's just not as efficient.
But, you know, that's fine, right?
Like, you know, they're going to ramp the revenue.
They're going to add the revenue.
Not that it necessarily think Amazon's internal models are going to be great.
or, hey, their internal ship is better than in videos or competitive with TPU.
Or their hardware architecture is the best.
I don't necessarily think that's the case.
But they can build a lot of data centers and they can fill them up with stuff that will be rented out, right?
And it's a pretty simple, it's a pretty simple thesis.
How important has Anthropic been to the co-design for Tradium?
Because I remember we had a portfolio company.
this was summer, 2023, they invited them to AWS.
They spent, man, I think eight hours with them
over the course of a week trying to figure out training them
back then.
It was just impossible to work through.
Is that, you know, obviously that portfolio company
hasn't gone back and tried it now,
but like how different is it now based on what you're hearing?
Oh, it's still bad.
Okay, okay.
You know, it's tough to use.
So there's sort of like
This is sort of the argument that every inference company offers
including the AI hardware startups
is because I'm only running like three different models at most
I can just hand optimize everything
and write kernels for everything
and even like go down to like an assembly level right
How are going to be?
It is pretty hard. It is pretty hard.
But like you tend to do this for production inference anyways
Like, you aren't using KudyNN, which is Nvidia's like library that's like super easy to generate your, you know, to generate kernels and stuff, right?
Like you're not, or not generate kernels, but anyways, you're still using these like ease of use libraries.
You know, when you're running inference, you're either, you know, using cutlass or stamping out your own PtX or, you know, in some cases, people are even going down to the SaaS level, right?
And like when you look at like say an open AI or like, you know, an anthropic, when they run inference on GPUs, they're doing this, right?
And the ecosystem is not that amazing.
Once you get all the way down to that level, it's not like using Nvidia GPUs is easy now.
I mean, you have an intuitive understanding of the hardware architecture because you work on it so much and everyone's worked on it.
And you can talk to other people.
But at the end of the day, it's not like easy, right?
Whereas, you know, Anthropic, Traneum or TPUs,
actually the hardware architecture is a little bit more simple than a GPU.
Larger, more simple cores, rather than having all this functionality,
you know, less general.
So it's a little bit easier to code on.
There's tweets from anthropic people saying they,
when they are doing that low level, actually they prefer working on Traynium and TPU because of the simplicity.
No.
Interesting.
To be clear, Traynium and TPU at.
I mean, Tradyam especially is very hard to use.
Like, not for the faint of heart.
It's very difficult.
But you can do it if you're just running, like,
if I'm anthropic and I must only run Claude 4.1 opus for Sonnet.
And screw it.
I won't even run Haiku.
I'll just run high Q on, like, on GPUs or whatever.
I'm just going to run two models.
And actually, screw it.
I'm just going to run opus on GPUs too and true TPS.
Sonnet is the majority of my traffic anyways.
I could spend the time.
And how often am I changing that architecture every four or six months, right?
Like how much?
It's not even changing that much honestly, right?
I mean, from three to four definitely did change, right?
Yeah, I mean, define architectural change.
You know, at a high level, like the primitives are more or less the same across the last couple
of generations.
I don't know enough about anthropics model architecture, to be honest.
But I think, I think from what I've seen at other places, there have been enough changes
that it takes time to, you know, program this.
and really get,
the main thing is like, you know,
if I'm anthropic and I have,
what,
$7 billion ARR now or whatever,
north of 10,
you know, by the end of next year,
north of 20, right?
Like, ARR is like,
maybe even 30 is like,
that's,
and my margins are 50%,
70%,
that's $15 billion or training up
that I need, right?
Then I can run on sonnet.
And most of that's going to be sonnet,
three, five,
or sorry, four,
five,
is, right? It's going to be one model serving most of the use cases. So, like, you know, I could,
I could spend the time and it'll work on this hardware. Yeah, totally. Maybe on the topic of
non-consensus calls you've made, and maybe I'll move to another cloud, in June, you guys said
that Oracle is winning the AI compute market. And then in this pod, we've already referenced
the big jump, obviously, that Oracle had. I think it was the single largest gain that a company
with over $500 billion of market cap
has ever had.
So an enormous...
Was the 2023 Q1 NVIDIA not bigger?
It might have been smaller.
Okay.
I think it was maybe close.
We'll fact check ourselves.
That's amazing.
But, you know, obviously this is the massive commitment
that was announced.
Can you walk us through
why you made that call then
and just sort of why Oracle is poised to do so well
in such a competitive space?
Yeah, so Oracle,
the largest balance sheet in the industry that is not dogmatic to any type of hardware, right?
They're not dogmatic to any type of networking.
They will deploy Ethernet with Arista.
They'll deploy Ethernet through their own white boxes.
They'll deploy Nvidia networking, Infinite Band, or Spectrum X.
And they have really good network engineers.
They have really great software across the board, right again, like ClusterMax.
They were ClusterMax Gold because their software is great.
There's a couple of things that they needed to add that would take them higher,
and they're adding those, right?
To Platinum, right, which was where Corbyev was.
And so, like, when you couple of two things, right?
Like, Open AI's got insane compute demand.
Microsoft is quite pansy.
They're not willing to invest in.
They don't believe OpenAI can actually pay the amount of money, right?
I mentioned earlier, right?
The $300 billion deal, opening out you don't have $300 billion.
And Oracle is willing to take the bet.
Now, of course, the bet is a bit like, there is a bit more security in the bet in that.
Oracle really only needs to secure the data center capacity, right?
So this is sort of like how we came across the bet, right?
And we've been telling our institutional clients, especially in like a super detailed way,
whether it be the hyperscalers or AI labs or semi-electric companies or investors
in our data center model because we're tracking every single thing.
data center in the world. Oracle doesn't build their own data centers either, by the way.
They get them from other companies. They co-engineer, but they don't physically build them
themselves. And so they're quite nimble in terms of being able to assess new data centers,
engineer them. So we saw all these different data centers Oracle is snatching up in deep
discussions, snatching up, signing, etc. And so we have, you know, hey, gigaw out here,
gigaw out there, giga out there, right? Avoline, you know, two gigawatts, right? You have all these
different sites that they're signing up and discussions with. And we're, we're
noting them. And then we had the timeline because we're tracking entire supply chain.
We're tracking all the permits, regulatory filings, you know, through, you know, language
models, using satellite photos constantly. And then supply chain of like chillers,
transformer equipment, generators, et cetera. We're able to make a pretty strong estimate of
quarter by quarter in our data center model, quarter by quarter, how much power there is
for each of these sites. So some of these sites that we know of aren't even ramping until
2007, but we know that Oracle
signed it, right? And we
have the sort of ramp path. So then it's this
question of like, okay,
let's say you have a
megawatt, right? For a simple sake,
simplicity's sake, which is a ton of power,
but now it doesn't feel like much.
We're on the gigawatt era.
But, you know, if you're talking about a megawatt, right,
you fill it up with GPUs.
How much do the GPUs for a megawatt cost?
Right? Or actually, it's
even simpler to do the math, right?
If I'm talking about a GV 200, right, each individual GPU is 1,200 watts.
But when you talk about the CPU, the whole system, it's roughly 2,000 watts.
At the same time, you know, all in everything, simplicity's sake, $50,000 per GPU, right?
The GPU doesn't cost them.
There's all the peripheries, right?
So $50,000, capax for 2,000 watts.
So $25,000 for 1,000 watts.
and then what's the rental price for GPU?
If you're on a really long-term deal, volume 270,
right, 260 in that range,
then you end up with, oh, it costs like $12 million per megawad
to rent a megawatt.
And then each chip is different.
So we track each chip, what the CAPEX is,
so you know what each chip is.
You can predict what chips they're putting in which data centers,
when those data centers go online,
how many megawatts by quarter,
and then you end up with, oh, well, Stargate goes online in this time period.
They're going to start renting it this time.
It's this many chips.
Each Stargate site, right?
And so, therefore, this is how much opening I would have to spend to rent it.
And then you prick that out, and we were able to predict Oracle's revenue with pretty
high certainty, and we matched pretty dead on what they announced for 25, 26, 27,
and we were pretty close on 28.
The surprise for us was that, you know, they announced some stuff that 28, 29,
data centers that they
we haven't found yet but we'll find them right of course
and sort of like this methodology
lets you see sort of
hey what data centers are you getting
how much power
what are they signing
how much incremental revenue that is when that comes online
and so that's sort of the basis of our
Oracle bet
obviously in the newsletter we included a lot
less detail but
you know you know sort of
it was that thesis right that like hey
they have all this capacity
they're going to sign these deals.
In our newsletter, we talked about two main things.
We talked about the opening eye business,
and then we talked about the bite dance business.
And presumably tomorrow on Friday,
there's going to be announcing it about TikTok and all this.
But like the bite dance business,
huge amounts of data center capacity
that Oracle is also going to lease out to byte dance.
So we did the same methodology there.
With bite dance, it's pretty certain they'll pay
because they're a profitable company.
With open AI, it's not.
and so there's got to be some error bars
as you go further out in terms of like
will opening I exist in 28, 29, 30
and will they be able to pay the 80 plus billion dollars a year
that they've signed up to Oracle with?
That's the only like risk here.
And if that happens, then Oracle's downside
is also somewhat protected
because they only signed the data center,
which is a minority of the cost, right?
The GPs are everything.
And the GPs, they purchase one to two quarters
before they start renting them.
So they're not, you know,
the downside risk is pretty low for them
in terms of,
if they don't get the deal. Well, they don't get the revenue,
but it's not like they're stuck with a bunch of assets they bought that are worthless.
Yeah. Yeah.
Is that another angle here? I mean,
opening air in Microsoft wears off BFFs,
and now they're filed to voice papers,
and they just want to diversify,
and then that's pushing them away towards other providers?
Yeah, so Microsoft was exclusive compute provider.
It got Reorg to write a first refusal.
You know, and then Microsoft...
Is it no last choice or something like that?
No, it's still...
It's still right at first refusal, but it's like, Microsoft, those two are not mutually exclusive.
Well, if Open AI is like, we're going to sign a $80 billion contract or a $300 billion contract for the next five years, you guys want it?
Or, you know, it's like, and they're like, no, what?
Okay, cool, right?
And it's like, it's like, and then they go to Oracle, right?
And it's opening eyes like sort of like, this is, this is the, you know, opening I need someone with a balance sheet to actually be able to pay for it, right?
and then they'll make tons of money
off of OpenEI
on the margins on the compute and the infra
and all these things but
someone's got to have a balance sheet
and Open AI doesn't have a balance sheet
Oracle does
although given the scale
of what they signed we also
had another source of information
which was that they were
talking to debt markets
because Oracle actually just needs to raise debt
to pay for this many GPUs
overtime now they won't do it like immediately
they can pay for everything this year and next year from their own cash.
But like in 27, 28, 29, they'll start to have to use debt to pay for these GPUs,
which is what, you know, Corvave has done.
And many of the Neal Clouds, most of it's debt financed.
Even meta went and got debt for their Louisiana Mega Data Center.
Not because, just because it's cheaper than, it's literally better on a financial basis to do buybacks
with your cash and get debt because the debt is cheaper than the return on your stock.
Like, it's like a financial engineering thing.
but like, you know, who's out there, right?
It could be Amazon, it could be Google, it could be Microsoft.
It's a very short list.
Or it could be Oracle or meta, right?
Meta's obviously not.
Microsoft's chickened out.
Amazon, Google, and Oracle, right?
That's all that's left.
Google would be an awkward fit.
Yeah, Google would be an awkward fit.
Amazon would be a fine fit, but, you know, exactly, right?
It's like...
It's a very...
It's a very drop-mic, yeah.
Well, I guess maybe on the topic of these giant data center buildouts, you guys just released a piece on XAI and Colossus 2.
Do you, are you getting less impressed by these feats of building something this massive in six months?
Or is it still very impressive to you guys?
You know, this is the like thing I've said about AI researchers or that they're like the first class of humans to think about things on an order of magnitude scale.
Whereas, like, people have always thought about things in terms of, like, percentage growth, like, ever since industrialization.
And before that, it was just, like, absolute numbers, right?
You know, sort of, like, humanity is involving in terms of how we think, because things are changing bad.
Everything is an X scale.
And so, like, you know, it was, like, really impressive when GPT, you know, 2 was trained on so many chips.
And then GPD3 was trained on that, you know, like on 20KA 100s and, you know, or sorry,
GPD4 or 20KA 100s, GPD, you know, sort of like, it's like, holy crap.
And then it was like, oh, the era of 100K GPUs clusters, right?
And we did some reports around 100K GPU clusters.
But now there's like, there are like 10, 800K GPU clusters in the world.
I was like, okay, this kind of boring.
But it's like, 100K GPUs is like, you know, over 100 megawatts.
Now it's like, you know, like literally, you know, in our Slack and some of these channels,
like, oh, we found another 200 megawatt data center.
There's someone who like puts the yawning emoji.
Every time.
And I'm like, dude, what?
Like now it's only, it's only exciting if you do gigawatt scale.
Like we're in gigawatt era.
Yeah.
Yeah, yeah.
And I'm sure like, you know, and, you know, I'm not sure.
Maybe we'll start yawning to that too.
But like, you know, the long scale of this is like.
The capital numbers are crazy, right?
Like, you know, it's like, it's crazy enough that opening I did like $100 billion
trillion trading run, you know, or, you know, like, then they did a billion dollar
training run.
Now we're talking about $10 billion training runs, right?
Like, you know, it's, it's crazy that we think in log scale.
But yes, things are only impressive.
Yeah.
When they do, like, what Elon's doing, so what Elon's doing in, in, in, in, in, uh,
Tennessee, in Memphis, first time was crazy, right?
100K GPUs in six months.
He bought a factory in like February of 24
and had models training within six months, right?
And he did liquid cooling, you know,
first large-scale data center at this scale for AI
doing liquid cooling.
Like all these sorts of crazy firsts,
putting generators outside like cat turbines,
all these different things to get the power,
you know, mobile substations, all these different crazy things,
tapping the natural gas line that's like running alongside the factory,
So he does this.
It's like, holy crap.
And he did it for 100K GPUs.
Right.
You know, 200, 300 megawatts, right?
Now he's doing it for a gigawatt scale,
and he's doing it just as fast, right?
And so, like, you would think, like,
this is obviously way more impressive that he did it again.
Yeah.
But, like, like, they have desensitized,
but, like, it's like, you know, like,
you've given the child too much candy.
Yeah.
Right?
Exactly.
And now, like, the,
child has no, you know, is like, you know, doesn't like apples, right? Like, I don't know.
So, so like, yeah, a gigawatt data center. There was all these protests around his Memphis
facility. People like, oh, you're destroying the air. And it's like, have you booked around that
area of Memphis? Like, there is like a gigawatt gas turbine plant that's just powering generally
that area. There's a sewage plant that's servicing the entire city of Minnesota, or sorry,
city of the Memphis.
And there's like open air pits of like the like like there's open air mining.
Like there's all sorts of disgusting shit around there.
Which is needed.
Right.
We need that stuff to have a country run, right?
Like to be clear.
And you know, it's like people are complaining about like a couple hundred megawatts in air.
Yeah.
Of a generation.
So he got like protests from all sorts of people.
You know, you got super into the politics side of things.
Right.
And LACP even protested him.
Like.
And so he really got
like some local municipalities to be like
oh I don't like you know like this
and so he couldn't do as much
as he wanted to in Memphis
but he still needed the data center to be close
because he wanted to connect these data centers
super high bandwidth, super close
and he always already had a lot of infrastructure
set up there. So he bought another
distribution center at this time
and it's still Memphis but the cool thing
about Memphis is it's right across the border
from Mississippi right now
you know it's like 10 miles away
from his original one, but his facility is like a mile away from Mississippi, and he bought a
power plant in Mississippi, and he's putting turbines there, the regulation is completely
different, right? And if the question is really like galvanized resources and build it really
fast, maybe Elon is ahead of everyone. You know, he hasn't made the best model yet, or he doesn't
have the best model, at least today, I think. You know, you could argue Grogfor was the best for a little
period of time. But like, you know, it's, it's, it's, it's, it's truly amazing how fast he's
able to build these things. And for first principles, it's like, most people are like, fuck, like,
you know, they, they, they, we can't, we can't build the power. We can't do power here anymore.
I guess we have to find a new site. And it's like, no, no, just go across the border.
Like, Coe of Mississippi. And the, my favorite thing is like, Arkansas's right there.
So Mississippi gets mad, you know, I don't, you know, the regulation, the whole future data
Center's, you know, built in places where multiple states meet.
Is that the...
Four quarters, yeah.
The optimal regular...
I think there's one...
There are you guys.
Is there a point in the U.S. with five?
I know there's a point with four.
Four states intersect.
There, yeah.
Maybe that's the corner of a data center.
Kind of certain.
I'm going to buy real estate in that area of front Reddit.
Well, I guess on the topic of just maybe new hardware,
you had this piece analyzing TCO for the GB 200s.
And I'm kind of going to ask this question on behalf of our portfolio companies, which
it sounds like you're helping them already.
But one of the findings that I thought was really interesting was TCO was sort of 1.6X H-100s for GB200s.
And so obviously, you know, there's this point on, okay, that's sort of the benchmark for the performance boosts that you're going to need to at least make the sort of performance cost ratio benefit from switching over.
maybe just talk about what you've seen from a performance standpoint
and what do you recommend to portfolio companies,
maybe in a smaller scale than XAI,
who are thinking about new hardware, try to get it.
There's capacity constraints, obviously.
Yeah, I mean, that's a challenge, right?
With each generation of GPU, it gets so much faster
that you end up like you want the new one.
And in some metrics, you could say GB200 is,
three times faster than, or two times faster than the prior generation.
Other metrics, you can say it's way more than that, right?
So if you're doing pre-training versus inference, right?
They can run everything for a bit, right?
Yeah, if you can run it for a bit or just inference and take advantage of the huge NVLink,
NVL 72, you know, there's ways you can, you could squit and say GV200 is only 2x faster than H-100,
in which case, 1.6-TCL.
It's, you know, it's worthwhile, right?
it's worth going to the next gen.
But more marginal.
It's more marginal.
It's not a big deal.
Then there's other cases where it's like, well, if you're running deep seek inference,
the performance difference per GPU is north of like 6,7x.
And it continues to optimize, you know, for deep seek inference.
And so the question, you know, then it's like, well, I'm only paying 60% more for 6x.
And it's like, it's a 4x or 3x performance per dollar gain.
Like, absolutely, right?
If you're like in writing inference of deep seek, that can also include RL, right?
And so the question is sort of, and then the other question is like, well, the GPU is new.
You know, there's B200, there's GV 200, there's B200.
B200 is much more simple from a hardware perspective.
It's just eight GPs in a box.
So then it's not as much of a performance gain, especially in inference.
But you have, you have all the stability, right?
It's an eight GPU box.
It's not going to be unreliable.
The GV 200s are still having some reliability challenges.
Those are being worked through.
It's getting better and better by the day.
But it's still a challenge.
But, you know, when you have a GB2, when you have a H100, right, Box, or H200, 8 GPUs,
one of them fails.
You take the entire server offline yet to fix it, right?
So usually if your cloud's good, they'll swap it in, right?
But if it's GV200, what do you now do with 72 GPUs?
If one fails, you break the whole thing and get a new 72-R-R-R-R.
the blast radius of a failure, right?
Note, GPU failure rates
at best are the same and
likely worse, right, gen on gen,
because everything's getting hotter, faster, etc.
So at best, the failure rates are the same.
If you model the failure rates as the exact same
because you go from one out of eight
to one out of 72, it's a huge problem.
So now what a lot of people are doing is
they run a high priority workload on 64 of them
and then the other eight
you run low priority workloads,
which is then like, okay, this is this whole
infrastructure challenge.
I have to have high priority workloads,
like the low priority workloads.
When a high priority workload has a failure,
instead of taking the whole rack offline,
you just take some of the GPUs from the low priority one,
put it in the high priority one,
then you just let the dead GPU sit there
until you service the rack at a later date.
And it's like, there's all these complicated infrastructure things
that make it so, oh, wait, actually,
that 3x or 2x performance increase in pre-training
is lower because the downtime is higher.
slash I'm not using all the GPUs always
slash I'm not able to
you know I'm not smart enough or I don't have the infra
to like have low priority and high priority workloads
like it's not impossible yeah the labs
are doing it right like it's just
I mean if I'm running a cloud it's actually really hard right
because I probably have to rent the spot one
like the spares out of spot instance or something
no no no because then because it's a
coherent domain it's NVLink
you don't want anyone touching that so it has to be
the end customer doesn't have to leave them because it's empty
spares that's even worse
the end customer usually would just be like I want them
and the SLAs and the pricing,
everything is like accounting for that, right?
So like generally when you have a cloud,
you have an SLA, right?
That is, hey, it's going to be uptime
is going to be 99% you know, blah, blah, blah, right?
For this period.
With GB200, it's 99% for 64 GPUs, not 72.
And then it's like 95% for 70%.
Now it differs across every cloud.
Every cloud is a different SLA.
Got it, yeah.
But like, they've adjusted for this
because they're like, look,
this hardware is just finicky.
still want it.
You know, we will credit you in that 64 of them will always work, right?
Not 72.
And so, like, there's this whole, like, finicky nature.
And the end customer has to be capable of dealing with the unreliability.
And it's like, and the end customer can just continue to use B200, right?
Performance games, not as much.
The whole reason you want this 72 domain is so you can have, you know, some of these gains.
Right.
But you have to be smart enough to be able to do it.
And that's challenging for small companies.
Totally.
So the...
Invita has announced the Ruben pre-fell cards, like CTX?
Yeah, CPX, there we go.
What's your take on that?
Does it cannibalize?
Dude, by the way, I don't know if this is like brain rot or like, I don't know, but like,
I can't remember what I had for lunch yesterday.
But I know the model number of every fucking chip, like...
Hots you in your dreams.
We're broken, we're broken.
Living the dream.
No, no, no, no.
You know, why do you pre-enacted?
announce a product that's
5x faster for certain use cases?
Is that that much?
I think it's got something great.
Historically,
AI chips for AI chips, right?
And then we started getting a lot of
people saying, this is a training chip, this is an
inference chip. Actually, training and
inference are switching so fast and what they
require that now it's
still like one chip.
Actually, there are still workload
level dynamics that differ, but
the main workload is inference, even
in training, right? Because of RL, most of that is, you know, generating stuff in an environment and
trying to, you know, achieve a reward, right? So it's inference still, right? Training is now becoming
mostly dominated by inference as well. But inference has like two main operations, right? There is
calculating the KV cash for pre-fill, right? Here's all these documents. Do the attention
between all of them, right? Between all the tokens, however, you know, whatever type of attention you use.
and then there's decode, which is
auto-aggressively generate each token.
These are very, very different workloads.
And so initially, the ideas
or infrastructure techniques, the ML systems
techniques were, oh, okay,
I will just make the batch size
every single forward pass
this big. And
if I make it, let's call it, I'll make it
a thousand big. And maybe
I'll run 32 users concurrently,
that way, you know, now I still have
900-something left, 960,
left, right? That 960
is actually doing the pre-fell for
if a request comes in,
it chunks it. It's called trunk pre-fell. You
pre-fell chunks of it now. You get really good
utilization on GPUs.
But then that
ends up impacting the decode workers,
right? The people were auto-aggressively generating
each token, that being having slower
GPS. And tokens per
second is really important for user experience and all
these other things, right? So then the
idea is like, okay, these two workloads are so different
and they are literally different, right? You
pre-fill and then you decode. It's not like you're interleaving them. So why don't we split them
entirely? And this is done on the same type of chip, right? Open AI, Anthropic, Google.
Pretty much everybody does it. Everyone, everyone good. Everyone's good. Together, fireworks.
All these guys do pre-fill decode, disaggregated pre-filled decode. So they run pre-fill on a set of
GPUs. Why is this beneficial? Because you can auto-scale them. Right? You can, hey, all of a sudden,
I have a lot more long context workers. I allocate more.
resources to pre-fill. Oh, all of a sudden have a, you know, not all of a sudden, but like,
you know, over time, my traffic mix is not long input, short output. It's short input, long
output. I have more decode workers. This way I can guarantee, and so now I can auto-scale the
resources differently, and I can also guarantee that my pre-fill time is, you know, the by the time,
you know, what's really important in search is how fast you get the page to start loading. Not when
does the resource happen. What do people do in games? Like the loading screen often has some sort of
interactive environment or it blends in over time or whatever it has tips and tricks,
ways to distract you.
The same thing is, there's like studies and papers out there that users prefer a faster
time to first token, right?
First token gets streamed to me sooner, even if the total time to get all my tokens is a little
bit longer.
I can't read that fast anyways, right?
So.
I mean, I like to skiv.
Yeah, I like it just good.
Yeah, I mean, most models return above speed reading, speed.
But you need that, right?
I think, I think, but like, you know, the idea is that you want to guarantee time to first token is a certain level for user experience reasons. Otherwise, people like, screw this, not using AI. The decode speed matters a lot, too, but not as much as time to first token. And so by having separate pre-filled decode, you do this, right? But now you've already, and this is all in the same infrastructure, you've already done this. So now it's like, what's the next logical step? These workloads are so different. Decode, you have to load all the parameters in and the kV caches.
to generate a single token.
You batch a couple users together,
but very quickly you run out of memory capacity
or memory bandwidth
because everyone's KV cache is different.
The attention of all the tokens, right?
Whereas on pre-fill,
I could even just serve like one or two users at a time
because if they send me a 64,000 context request,
that is a lot of flops, right?
64,000 contacts requests.
I'll use Lama 70B because it's simple to do math on,
like 70 billion parameters.
That's 140 gigaflops,
for token, 70 times 64,000, that's many, many terraflops.
You can use the entire GPU for like a second, right?
Like potentially, right?
Depending on the GPU to just do the pre-fill, right?
And that's just one forward pass.
So I don't necessarily care about, you know, loading all the tokens or all the parameters
in KV cache and fast.
All I care about is all the flops.
And so that leads us to sort of like, you know, I had to, I think it was long-go-ded
explanation because it's hard for people to understand what CPX is.
I've had a lot of like, even my own clients,
like, we set like multiple notes like
explaining and they're like, I still don't understand.
I'm like, shit, okay.
Send it to the attention is all you need paper.
You can't expect.
I mean, like, think about like a, like a networking person.
Like they're like, no, I don't need to know about this.
You know, attention is all you need, right?
Like it's like, we're thinking about an investor, right?
Like, you know, there's all people.
Maybe the data center operator.
Like they're like, oh, there's two chips.
Why?
Should I build my data center for differently?
It's like, like, you know, I got to explain everything.
Or just like, no.
You don't have to build differently.
But anyways, you get to now...
In Stanford, there's 25% of all students, not CS students,
of all students, read their paper.
Read what paper?
Attention is what you need?
That's low.
They do majorists and you don't like the philosophy.
I'm like this amazing.
Anyway, sorry.
The Middle East, I can't remember what country it is,
has AI education starting at, like, age eight,
and in high school, they have to read attention is all you need.
Wow.
Someone told me that their...
Sand had to read attention is all you need.
Which is, I don't know.
Look, look, top-down mandates for education, you know, maybe they work, maybe they don't.
Like, you know, maybe people like homeschooling are kids.
I don't know.
I went to public school, but like, back to your readers.
Yeah, God.
Just on the topic of hardware cycles, I wanted to maybe, yeah.
I didn't actually explain what CPX is.
So CPEX is a very, like, compute-optimized chip, whereas, you know, for pre-fill, and then decode is, just to simply say it is, like, the rest of, is the normal with chips with.
HBM. HBM is more than half the cost of the GPU. If you strip that out, you end up having a
much cheaper chip passed onto the customer. So, or like, you know, if Nvidia takes the same
margin, then the cost of this pre-fill chip is much, much lower. And now the whole process is
way cheaper and more efficient. Now long context can be adopted. Right. Yeah. Well, so I, I love that.
We're actually going with all this detail, because I had a more 10,000-foot view question for you,
which is I haven't been following the semi market as closely as you have.
I probably started with the A100.
And I remember helping Gnome at Character,
this is summer of June 23,
chased down GPUs.
And the only thing that mattered at that time was delivery date
because there was a huge capacity crunch.
And then to see that over the last two years evolve,
where, let's say, six to 12 months ago,
people were doing these RFPs to 20 neoclouds, right?
And the only thing that mattered to some degree was price.
Right, people actually do RFPs for GPUs?
Yes.
So just to be clear, my opinion on how you buy GPUs is that it's like buying cocaine or any other drug.
This is described to me, not me.
I don't buy cocaine.
Okay, yeah.
Right.
Someone tells me this.
Someone tells me this I'm like, holy shit, it's right.
You call up a couple people.
You text a couple people.
You ask, you know, how much you got.
What's the price?
It's like.
Exactly.
Exactly. This is fucking like buying drugs.
Sorry, sorry.
No, I mean, it's the same way. You just send like, we have Slack connects with like 30 neoclouds.
There you go. As well as like some of the major ones. And we just send them a message like, hey, customer wants this much. You know, this is what they're looking for. And then they send quotes. I know this guy.
I know a guy. Well, so I think that's actually a very accurate description. And I've sent countless port code is your cluster max original post. Because I thought it did.
a really good job breaking them down.
But maybe one question to end on for me is just,
what era are we in now
with Blackwell's coming online?
Are we sort of back to the summer
2020-3 era?
And that's kind of the cycle
that we've just entered?
Or what sort of your view on
where do we on?
So for a very good question.
For one of your port cos,
we were like, you know,
after their difficulties with Amazon,
we were like, okay, let's actually
like get you, GPUs.
the original deals we got you were gone, but here's some other deals, right?
It turned out that multiple major neoclouds had sold out of Hopper capacity.
And their Blackwell capacity comes online in a few months.
So it's a bit of a challenge, right?
Due to inference?
Infference demand has been skyrocketing this year, right?
Reasoning models.
These reasoning models are revenue.
It's been skyrocketing this year.
And then also, like, there's a bit of like the, you know,
blackwell comes online.
but it's hard to deploy,
so it takes a little,
you know,
there's a learning curve
to deploying it.
So whereas, like,
you got down to,
like, you buy the hopper,
you install the data center,
it's running within, like,
you know, a month or two, right?
For Blackwell,
it was like,
it's a longer time frame
because of liability challenges,
it's a new GPU.
I mean, it's just learning pain, right?
Learning,
learning,
growing pains.
So there was,
there was, like,
this gap of, like,
how many GPs are coming onto the market
right as revenue starting to inflect.
And so a lot of capacity
got sucked up,
right?
And actually,
prices for hopper
bottomed like three or four months ago or like five or six months ago.
Yeah.
And actually they've like crept up a little bit now.
They're still like, you know, not, not.
So I do, I don't think we're quite 2023, 2024 era of GPUs are tight.
But certainly if you want to, if you want like just a few GPUs, it's easy.
But if you want a lot, it's, it's hard.
Yeah.
Like you can't get capacity that instantly.
Yeah.
Wow.
What a time.
So we, so we wrap on that.
Dylan, this was another instant classic.
Thank you so much for coming to play.
It was like two hours, bro.
Oh, no.
I missed.
We couldn't stop.
Thanks so much.
It's great.
Thank you so much for having me.
Thanks for listening to the A16Z podcast.
If you enjoyed the episode, let us know by leaving a review at rate thispodcast.com slash
A16Z.
We've got more great conversations coming your way.
See you next time.
As a reminder, the content here is for informational purposes only.
Should not be taken as legal business, tax, or investment advice, or be used to
evaluate any investment or security and is not directed at any investors or potential investors
in any A16Z fund.
Please note that A16Z and its affiliates may also maintain investments in the companies
discussed in this podcast.
For more details, including a link to our investments, please see A16Z.com forward slash
disclosures.
