a16z Podcast - Dylan Patel on the AI Chip Race - NVIDIA, Intel & the US Government vs. China
Episode Date: September 22, 2025Nvidia’s $5 billion investment in Intel is one of the biggest surprises in semiconductors in years. Two longtime rivals are now teaming up, and the ripple effects could reshape AI, cloud, and the gl...obal chip race.To make sense of it all, Erik Torenberg is joined by Dylan Patel, chief analyst at SemiAnalysis, joins Sarah Wang, general partner at a16z, and Guido Appenzeller, a16z partner and former CTO of Intel’s Data Center and AI business unit. Together, they dig into what the deal means for Nvidia, Intel, AMD, ARM, and Huawei; the state of US-China tech bans; Nvidia’s moat and Jensen Huang’s leadership; and the future of GPUs, mega data centers, and AI infrastructure. Resources: Find Dylan on X: https://x.com/dylan522pFind Sarah on X: https://x.com/sarahdingwangFind Guido on X: https://x.com/appenzLearn more about SemiAnalysis: https://semianalysis.com/dylan-patel/ Stay Updated: If you enjoyed this episode, be sure to like, subscribe, and share with your friends!Find a16z on X: https://x.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zListen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYXListen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711Follow our host: https://x.com/eriktorenbergPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Stay Updated:Find a16z on XFind a16z on LinkedInListen to the a16z Podcast on SpotifyListen to the a16z Podcast on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Transcript
Discussion (0)
How you buy GPUs is like buying cocaine.
You call up a couple people, you text a couple of people, you ask, you know, how much you got, what's the price?
If you're two arch nemesis suddenly team up, and it's the worst possible news you can have.
I did not see this coming.
I think it's an amazing development.
Like a Warren Buffett coming into a stock.
Jensen is like the Buffett effect for the semiconductor world.
It's kind of poetic that everything's gone full circle and Intel's sort of crawling to Nvidia.
Today, we're talking about one of the biggest surprise.
in semiconductors in years.
NVIDIA just put $5 billion into Intel.
Two long-term rivals now teaming up on custom data centers and PC products,
a deal nobody saw coming.
For NVIDIA, it's the Buffett effect.
For Intel, it's a lifeline.
And for AMD, ARM, and the global chip race, the fallout could be massive.
To break it all down, I'm joined by Dylan Patel,
chief analyst at semi-analysis,
Sarah Wang, general partner at A16Z,
and Guido Appenzeller, partner at A16Z
and former CTO of Intel's data center and AI business unit.
Let's get into it.
Dylan, welcome back to the podcast.
Thanks for having me, yeah.
It just so happens that there's some big news,
just as we're having you,
NVIDIA, announcing $5 billion investment in Intel
and them teaming up to jointly develop custom data centers
and PC products.
What do you think about the collaboration?
I think it's hilarious that like,
Nvidia could invest, it gets announced, and their investment's already up 30%.
$5 billion investment, $2 billion profit already, right?
I think it's fun because they need their customers to really have big buy-in.
So when their potential customers buy-in and commit to certain types of products, it makes a lot of sense, right?
And it's kind of funny in a way because in the past, there was this whole thing around how Intel was
sued for being anti-competitive with their chipsets.
And Nvidia actually got like a settlement from Intel, right?
Way back when like the graphics were separate from the GPU and the graphics were really
put on the chip set, which had like all this other I.O.
Like USB and all this stuff.
So it's kind of a funny like turn of events that now Intel is going to make like a chiplet
and package it alongside a chiplet from from Nvidia.
And then that's like a PC product.
So, you know, it's kind of poetic that everything's gone full circle and Intel sort of crawling to NVIDIA.
But actually, it might just be the best, like, device, right?
I don't want an arm laptop because it can't do a lot of things.
And so an X-86 laptop with NVIDIA graphics, fully integrated, would be probably the best product in the market.
Sir, are you optimistic? How do you think this will go?
I mean, sure. I mean, I hope. I hope, right? I'm a perpetual optimist.
on Intel because
have to be.
I was thinking that
the structure of the deal
that at least like
a lot of the government
folks and
Intel were sort of
trying to go for
was people get
big customers
and the biggest
suppliers directly
give capital to Intel.
But this is sort of
the other way around
where they're buying some
of the stock
having some ownership
but they're not really
like diluting the other
shareholders
and then the other
shareholders will get diluted
slash everyone
will get diluted
when Intel finally
does raise the capital from the capital markets,
but because they've announced these deals
and they're pretty small, right?
5 billion Nvidia, 2 billion soft bank.
U.S. government was 10.
You know, these are still relatively small.
Pretty small, yeah.
Yeah, on the nature of things, right?
I mean, like, you know, last time I think I said
Intel needs like $50 billion, right?
Now when they go to the capital markets, it's better.
And hopefully they get another, you know,
couple of these announcements.
Maybe, you know, there's all sorts of speculation
that Trump is involved in, you know, sort of getting these companies to invest.
NVIDIA, and now, you know, the government as well, of course,
and now, you know, is Apple going to come invest, right?
And also do something with Intel or who else will come in?
And that'll really boost investor confidence,
then they can dilute slash go get debt.
Like a Warren Buffett coming into a stock.
The Jensen is like the Buffett effect for the semiconductor world.
Guido, you were the CTA.
of the Intel Data Center and AIBU, what are your thoughts?
I think it's really good for customers and consumers in the short term.
Having both Intel and like, specifically the laptop market,
having the two collaborate is amazing.
I wonder what's going to happen with any of the internal graphics or AI products at Intel.
They might just push a reset and give up on that for now.
They currently don't have anything competitive, right?
There was the Gaudi effort that's more or less done, right?
There was the internal graphics chips, which never competed really.
the high end, right? So from that perspective,
it makes a lot of sense, right? It's
for both sides.
Look, I think the
for Intel, they needed
a breath of fresh air, right? They were sort of desperate.
So I think it's a very good thing. I think
AMD is fucked.
I mean, they're just, if you're
two arch nemesis suddenly team up
and it's the worst possible news you can have,
right? They were already struggling, right?
Their cards are good. Their software stack is not,
right? They were getting very limited traction,
right? They now have a bigger problem.
slide. I think Arm is a little bit screwed as well, right, because their biggest selling point was
sort of like, look, we can partner with everybody that doesn't want to partner with Intel.
And that's what in the sense, the number one, you know, like, Nvidia is probably the most
dangerous of the future CPU competitors, right? And so they now suddenly have access to Intel
technologies and might get in that direction. It remixes the card, right? It's, I did not see this
coming. I think it says amazing development. Yeah, it'll be very interesting to see this play out.
To Eric's point, PAC News Week, the other thing that we wanted to pick your brain on, since we have you here, Dylan, is the other news dropping on Huawei unveiling their kind of AI roadmap, and, you know, obviously they're hyping up the capabilities.
I think you guys have been sort of ahead of the curve of trying to gauge what can the 950 supercluster actually do.
But would love your thoughts on everything that's going on from the China front, right?
And this is kind of coupled with deep seek saying their next models are going to be on domestically produced Chinese chips, the Chinese government, kind of banning companies from buying the produced specifically for China and video chips.
So there's just sort of a lot of dominoes falling right now in the semi-market in China, but would love your take overall and, I mean, drill into some detail.
Yeah, I think when you sort of zoom out to even like, you know, let's walk from 2020 because I think it's really important to recognize how.
cracked Huawei is, or even just historically, like they've always been really good. Sure,
initially they stole like Cisco source code and firmware and all this stuff, but then they
rapidly passed them up as well as every other telecom company. In 2020, they released an ascend
chip and submitted to impartial public benchmarks. And they were the first to bring seven
nanometer AI chips to market. They were the first to have that, right? Now, you could still say
Nvidia was ahead, but the gap was like nothing, right? And this is when they could access the full
foreign supply chain. This was when they just passed Apple to be TSM's largest customer.
They were, you know, clearly ahead of everyone on a manufacturing supply chain sort of design
standpoint in a total basis, right? Now, of course, Nvidia still had higher market share, but it was
so nascent then. Like, it could have really taken over the market. Quality got banned by the Trump
administration from accessing, and then it went into effect in 2020, right, the full ban.
And so they were only able to make a small volume of these chips, but they had trained
significant models on these chips that they made then.
And then over the next couple years, right, InViti continued to accelerate, Huawei, because
they were banned from TSM, had to go and try and figure out how to manufacture at SMIC,
the domestic TSMC.
and then they were also in parallel
trying to go through shell companies
to manufacture at TSM
and acquire memory from Korea
and so on and so forth.
So by the end of 24
this had gotten in full swing
and it was caught, right?
It was caught and they finally shut it down.
But they were able to acquire
3 million chips, 2.9 million chips
from TSM through these other entities,
right? Roughly $500 million worth the orders
which ends up being a billion dollar fine that the
U.S. government gave TSM, if I recall correctly.
At least there was a Reuters article that. I don't know if they actually issued it.
Which is important and interesting to gauge because
the number of ascends floating out there has not consumed this entire
capacity yet. So now we get to 2025, right?
The H20 got banned in the beginning of the year.
Nvidia had to write off
huge amounts of money
our revenue estimate for
Nvidia in China
for just H20 was north of 20 billion
because that's what they were booking in capacity
slash had to write off
and then it got banned
they cut the supply chain
like they just said no we're not doing this anymore
they had their inventory
gets re-approved they resell the inventory
but now they're like do we even restart
production
is invidia's question
and now you have China saying
hey like we don't need
Nvidia we have domestic alternatives
whether it be Huawei or CamberCon, these companies have capacity.
But most of this capacity is still foreign produced, right?
Whether it be wafers from TSM, memory from Korea, Samsung and S.K. Hynix.
So the question is sort of like, how much can they do domestically?
And there's sort of two fronts there, right?
There's the logic, i.e. replacing TSM, and there's the memory,
i.e. replacing Synex Samsung Micron.
And on the logic side, they are behind,
but they're really ramping there.
And I think they can sort of get to the production capacity
estimates needed.
And the US is still allowing them to import all the equipment
necessary, pretty much.
The bands are really for beyond the current generation
of technology, beyond 7 nanometer.
The bands are really for 5 nanometer and below.
Even though the government says they're for 14 nanometer,
the actual equipment that's banned.
is only for below 7 nanometer.
And so they'll be able to make a lot of 7 nanometer
AI chips and maybe even get to 5
with, you know, using existing
equipment for 5 nanometer
rather than using, rather than
like taking the new techniques.
And so like there's the logic side and then there's the memory
side. And the aspect of
Huawei's announcement that was
surprising was that they're
doing custom memory, right?
Yeah.
That's the part that is sort of like,
hey, this is really exciting.
They announced two different types of chips for next year,
one that's focused on recommendation systems and pre-fill,
and then one that's focused on decode.
There's a twin these days.
Yeah.
So in Nvidia, the same thing.
They just announced a pre-fill-specific chip recently.
There's numerous AIA hard-bar startups that are really focusing on pre-fill versus decode.
And so the sort of split of inference up to two workloads,
you know, Huawei's doing the same thing for their next year chip.
And what's interesting is the decode one has, you know, custom HBM.
what does that mean? What is the manufacturing supply chain? Because that's the one that's tricky, right? How much can they manufacture of that custom HBM? And Invidia and others are also adopting custom HBM only starting next year, right? So it's not like, you know, yes, the manufacturing capacity is not there. The maybe it is going to consume a bit more power. It's going to be slightly lower bandwidth. But the fact that they're able to do, you know, some of the same things that Nvidia's plans to do, AMD plans to do in their memory is, is, is, is, is, is, is,
you know, evidence that they're catching up. But then, you know, the main question that remains
is production capacity. So as far as like, hey, Nvidia's banned in China, right? Like they're saying
don't buy Nvidia chips. I think for a period of time, that's fine because, fine for China, right,
from a perspective of, hey, I'm China. That's fine because you have all this capacity that you, you know,
shipped in in 2024. They haven't turned into AI chips. Now you're turning them to AI chips. You're
running all that stockpile down. What about the transition from running that stockpile down? What about the transition from
running that stockpile down to ramping your new stuff, right? And that that transition is the one
that's really tricky. China's either shooting itself in the foot by not purchasing Nvidia chips
during that time period or China's able to ramp. I think they'll be able to ramp. I think it'll
take a little bit longer. And there will be like a sort of a gap in between where China probably
backtracks and says it's fine. Like bike dance and is like begging for Nvidia chips, right?
like they don't want to use
they use some camera con
they use some Huawei but
they really want to use Nvidia because it's way
better they don't care about like
the domestic supply chain they want to make the best models
they want to deploy their AI
as efficiently as possible and so this is like
you know the the government can
mandate them to like not do it right
so it's not that Nvidia is not competitive
it's that the government's sort of trying to
instigate it
and then like I guess the
the last sort of thing is like
There's always the argument of like, hey, if banning Nvidia chips to China is so good for China, why didn't China do it for itself? And I'm finally doing it for themselves. So again, like it'll be interesting to see. Smuggling is still happening, right, reexportation of chips from other countries to China. That is still happening at some volume, low volume, lower medium volume, right? But then, you know, the direct shipments of Nvidia chips that are legally allowed to,
China are not necessarily happening today, but may have to restart at some point because
China won't have the production capacity to, you know, they would just have so many fewer
AI chips being deployed domestically versus the U.S.
And at some point, you kind of have to pick, like, am I all about the internal supply chain
or am I all about chasing, you know, super powerful AI?
Yeah.
So is there an angle here about a negotiation angle as well?
because currently there's still discussions ongoing,
what exactly are the boundaries,
what can be exported to China.
So these are well-timed announcements
if you want to make a point
that, you know,
U.S. should allow more exports.
Do you think that's a factor or not?
Yes. So, you know, in the report we did a few weeks ago
about the production capacity of Huawei
and the supply chain,
there was a bit in there that we wrote about how,
you know, honestly, like,
if you were China and you want Nvidia,
you do want Nvidia chips, actually,
how do you play this right
and and it's by
it's by hyping up your domestic supply chain
and it's by it's like
yes we can do everything it's Huawei
announced the most crazy shit possible
announced seven years of fucking
three years of roadmaps
so you read your report basically
I think they knew
they were already bid and then like say we're banning
Nvidia right like and then it's like
then the government official is going to think
alongside sort of like lobbying from domestic
players like of course we want
ship them better AI chips. Like, we're losing this market. We can't lose this market. And it's
sort of like, it is 10,000 IQ, right? And we're here playing checkers while they're playing chess.
Well, so I guess negotiating chip aside, in that report, you talked about HBM or high band
with memory being a bottleneck to Huawei. To your point on one of the surprising aspects of the
announcement, do you think it's credible that it's no longer a bottleneck based on what they're
saying? Or are they, is it just hype?
I think production capacity-wise, it is still absolutely a bottleneck.
Certain types of equipment required for making HBM need to be imported.
They're working on domestic solutions, but as far as we know, they have not imported enough equipment for this.
Although, if you look at Chinese import data for different types of equipment, right,
there's sort of like fabs spend, you know, roughly, it depends on the process technology,
but fabs spend roughly different amounts of money on lithography, etch, deposition, metrology, right?
like these different steps.
And historically, lithography has hovered around, you know, 17, 18%, with EUV, it grew to 25%.
Right.
But China, because they wanted, they sort of like wanted to stockpile lithography and they were worried
about the becoming ban, they were importing lithography at a much higher rate than that, right?
Like 30, 40% of their equipment imports were lithography.
And they were just stockpiling lithography equipment.
This is sort of like reversed now in that like, hey, if I would,
want to, and so if you look at the monthly import-export data, both into provinces in China,
but also out of countries, you can see that etch specifically is skyrocketing. And the main
thing about, you know, stacking HBM is that you have to, you know, when you have each
wafer, you have to etch, create the thing called it through silicon via so it can connect
from the top to bottom and then you stack them on top of each other, right, 12 high, 16 high
for HBM. That's how you make super high band with memory. And they're imports for etches like
skyrocketing now. So it's like, they don't have the production capacity yet. How fast can they
ramp it as a function of how much equipment can they get, A, and B, like the yields, right? Improving yields
is really hard on manufacturing. Intel and Samsung are really good, and TSMC is just amazing. Not that
those companies suck, like I think is a better way to put it. And so, you know, it's those two things.
I think yield, they haven't even started production of high speed, of HBM3, right? They've only done
some sampling of HBM 2, HBM 3 came out
a few years ago. So there's still quite a bit of ways on going
up the learning curve. I obviously I expect them to catch up
faster than it took the technology to be developed because
it exists, right? In the world, we know how to do it. It's just
a matter of actually doing it versus inventing it.
And then the other one is sort of the production capacity.
You know, a couple months of import-export data is not enough to
set up for, you know, years worth of supply chain built up, right?
which is what we have today in Korea for the Korean companies.
Now, Heinex is also investing in the U.S. in Illinois and then microns,
primarily in Japan, the American memory companies, primarily in Japan and Taiwan,
but they're also expanding in Singapore and the U.S. now.
Like, there's so much capital that's been invested,
it would take some time for China to build up that production capacity
to actually match the West.
And when I say the West, I mean East Asia in production,
non-China, East Asia in production capacity.
So it'll take some time to get there
And I don't think I think it's like
Hey we can design this
It's always a question of can we manufacture
And then and then the thing like that Jensen would say is like
You're betting on China not being able to manufacture like
Right
You know it's a it's a matter of when not if
And that's the whole calculus that like I think the US government
Has to be aware of when they're like hey what level of AI chips do we sell
Do we sell everything
Probably not because AI is far more powerful
And the end market of AI is going to be
way larger than the end market of semiconductors and equipment, do we sell, you know, what level
do we sell at? Well, how much can China make at each specific, you know, sort of performance tier
and then, you know, analyze that and what's the volume and then figure out, like, what is okay,
which is like maybe a little bit above or around the same level. Yeah. So if you, to your point
on, like, playing chess versus checkers, if you're Jensen, what would your next move be given
the situation at hand? It's both like, part.
partially true that he's afraid of Huawei more than he is like an AMD.
Right.
He called them permissible.
Yeah.
Well, like, I mean, like, every other, like, Huawei's beat Apple, right?
They passed Apple up in TSMC orders.
They passed Apple up in phone market share, not in the U.S.,
but, like, in many parts of the world, before the bands came down.
And then even now they're growing back again in market share without, like, Western supply chains.
You know, they've done this to numerous other industries.
I would say Apple is like a formidable competitor, right?
Like they've beaten a lot of industries.
And so it's reasonable that he's afraid of them.
It's sort of, you know, and he's not afraid of A and B.
So like I think like the best thing is like try and so as much like Huawei,
what Huawei announced is reality rather than like their hope target.
And so away all dealt on manufacturing capacity, which I think is not fair, right?
I think manufacturing capacity is a real bottleneck for them.
And then the yield learnings, real bottleneck, like temporary, maybe.
We'll see how long and we'll see how fast the rest of the, you know,
the Nvidia technology advances past what Huawei is capable of, right?
And how fast Huawei is able to close the gap.
But I think his main sort of pitch would be Huawei is real.
They're a formidable competitor.
They're going to take over not just the Chinese.
market, but also foreign markets, whether it be the Middle East or Southeast Asia or South Asia
or Europe or Latam, right, everywhere besides America. And there's a, I think Noah Smith has
this analogy, right? This whole idea is that you should Galapagos China, right? Make them have their
own domestic industry that is so different from the rest of the world, right? Kind of what happened
with Japan in the 70s and 80s, and 90s, their PCs were so specific and hyper optimized
to the Japanese market with like, you know, the weird, like, I don't know if you've seen
the weird scroll wheel on these Japanese PCs. Like, you literally, like, it's like, you go like
this and it scrolls, right? And it's like, and then the touchpad is a circle, and then that's around
it. It's like, things like that are so weird. Totally. And the rest of the world doesn't care,
but Japan market likes it, right? And his whole idea is like, let's Galapagos them, i.e., keep their
technology within China and then that's like dead weight loss and they never expand outside versus
that we serve the whole world. But the whole risk is that the opposite can also happen, right?
Our technology is hyper optimized to running, you know, language models at this scale and RL and
you keep, you know, you keep like hardware software code design can take you down a trap path of the
tree that like is a dead end. And then China like because they're not allowed to access this tree,
they're like, oh, okay. And then they end up in the like optimal spot, right? We, we had a
a local minima, they had a local maximal, they had a local, a global maxima, right? Like,
that, that's sort of like technological Galapagosing is sort of what Noah Smith's analogy is. I like
it a lot. I don't know if it's accurate, but it's an interesting one. Yeah, I love that.
Well, actually, maybe just taking a step back from current events, even though there's so much to
talk about right now, last time you appeared with us, Nvidia came up, obviously, and you talked about
a couple of the potential paths forward for
NVIDIA. Give us maybe the bull case
the bear case.
There's a lot embedded in their numbers
now, but what's interesting is
consensus for
the banks
is like for across like this
the hyperscalers. So Microsoft, Corrieve,
Amazon, Google, and Oracle,
right?
Meta, right? So it's the six hyperscalers, right?
I would consider hyperscalers.
The consensus for the banks is $360 billion
of spend next year across all of them.
And my number is closer to, like,
it's like $450,500.
And that's based on like, you know,
all the research we do on, like, data centers
and, like, tracking each individual data center
in the supply chains, right?
So this is just Nvidia spend.
This is HAPX for the hyperscalers.
Right?
And that back, Kappex gets split up
across different companies,
but the vast, vast majority still goes to a video, right?
and Nvidia is in a position
not where they can't take share
right it's they grow with the market
slash defend share yeah
and so the question is like
how fast is the growth rate of
of capex for
hyperscalers and other users right
and the reason I included Oracle and
Corrieve was hypers galers even though they're traditionally not
called hyperscalers because they are
opening eyes hyperscaler
right so you know
when you look and you look at the Oracle announcement
right like first of all the Oracle
announcement
I don't understand why people don't think this is crazier.
They did the most unprecedented thing in the history of, like, stocks and public, and companies ever.
They gave a four-year guidance, and it made Larry the richest man in the world, you know, like all these things.
Anyways, you know, the question is, like, how fast does revenue grow, right?
Do you think Oracle and Open AI, which signed a $300 billion plus deal with Oracle, will actually be able to pay $300 billion?
right across raising capital and revenue and I think most and and it gets to a rate of like
over 80 billion eight over 90 billion dollars a year uh in just a handful of years right so it's like
do you believe the market will grow that fast um it's it's very possible yes and it's very possible
for like you know open AI what is their revenue going to be exiting next next year some people
think 35 billion some people think 40 billion some people think 45 billion
you know, ARR by the end of the year
next year, this year they hit 20, right?
ARR.
You know, so if that growth rate is maintained,
then all of that cost goes to compute
plus all the capital they continue to raise, right?
And again, there are financials that they sort of like
gave to investors for their last round
was like, hey, we're going to burn like $15 billion next year.
It's probably more likely going to be like 20.
But like, you know, and you stack this on
and they're not turning a cash flow, they're not going to be
profitable until 2029.
so you sort of have like
they're going to continue to
burn 15, 20, $25 billion
of cash each year plus revenue growth
that's their compute spend and you do this for
enthrophage, you do this for open-ana, you do this for all the labs
it's very possible that the pie does get
to you know
more than 500, you know, not
360 billion next year, 500 billion next year
and for total capex and the pie
continues to grow for hypers
Nvidia says actually it's going to be multiple
trillions a year on AI
infrastructure and he's going to capture
a huge portion of it. That's his bull case, right? That's the bullcase
is AI is actually
so transformative in the world
just gets covered in data centers and
the majority of
your interactions are with AI
whether it's like, you know, business
productivity and telling an agent
to do some code or you're just talking to your
AI girlfriend Annie, right? Like it doesn't matter.
You know, all of this is running on Nvidia for the most part.
The bear case is
even if it does grow a lot.
Yeah, go ahead. Stay with the bull case for a second.
I think fundamentally the value creation
I think personally is there
right I mean create trillions of dollars of value with
AI I can totally see this happen
so assume it's true where will Nvidia top out
I guess
how much do you believe in takeoffs
right yes
so like if there
if there is like a takeoff scenario right
where like
powerful AI builds more powerful AI builds
more powerful AI builds more powerful AI
or you know that creates more and more
each level of intelligence like
enables more for the economy, right?
Like, how many monkeys can you employ in your business
versus how many, like, humans, right?
You know, sort of the same, or how many dogs, right?
Like, you know, they're sort of like,
what is the value creation of a human versus a dog?
Sort of like the same with AI.
So, like, I mean, in this case,
the value creation could be hundreds of trillions,
if not, you know, the negative.
We're after that.
Do you need this?
I mean, if we take every wide-collar worker,
make them twice as productive with AI,
that's in the hundreds of trillions, isn't it?
Yeah, well, like, what is twice, you know, like, I mean, like, if you talk to people at the labs, right, like, twice as productive, what does that even mean? It's replaced them, right? It's, and it's be ten times better than death.
Like, I mean, like, I don't know how soon that's.
If it's sort of white-collar workers essentially useless, without a constant stream of L-O-M tokens, right, that make them productive, right? At that point, you basically can tax every single knowledge worker in the world, right? Which is most, most workers in the world long term.
Yeah, so, I don't know. I mean, what's your guess? Give us a number. What's the cap for me?
cap? I mean, like, why aren't we making a Matriosa brain? Like, I don't know. Like, uh, like, uh, I mean, at some point, the, the machine says humans don't need to live and we need, I need even more compute. One step before that, may. Are we, are we colonizing Mars yet?
TBD. I don't know, man. It's, it's, it's, it's, I find it like completely, like, impossible to predict anything beyond five years, given how much stuff is changing. Like, like, a year. I was a lot. I was a lot. I was a lot. I'm
I'll leave it to economist, right?
Like, you know, like, honestly, like, you know,
supply chain stuff is like three, four years out and that's it.
And then fifth years, like, sort of like, yellow, right?
So, like, I just try and ground myself
in the supply chain stuff, right?
Like, it's like, you know, supply chain,
and then, like, what is the adoption of AI?
What's the value creation?
What's the usage?
Like, and you can see that in, like, a short horizon.
Beyond that, like, I don't know, like,
are we all going to be connected to computers,
like, BCIs and stuff?
Like, I don't know, dude.
are humanoid robots
are they going to be, you know, I mean, you saw
Elon's thing, right? Like he's like, yeah, humanoid robots
are why Tesla's worth more than $10 trillion.
So like, oh, hey, great. What is all
that being trained on? Great, invidia. Okay, awesome.
So that's worth also $10 trillion, right?
Like, I don't, I don't know.
Like, it's too out there for me. I don't like
the out there discussions.
Very fair.
Read some sci-fi books.
So just pulling out the thread
where you talked about, I mean, this is kind of a throwaway comment,
but how market share can't really grow
just because it's such a dominant market share
and we talked about
or you guys talked about the moat of Amidia last time
and obviously this moat is tied to maintaining
that very high market share that they currently have
and I love this sort of historic journey
you took us through with Huawei just earlier
can you kind of walk through what Nvidia did
throughout history to build their moat?
It's super awesome because they failed
multiple times in the beginning
and they bet the whole company multiple times right
like Jensen's just crazy enough to bet the whole company
right like whether it was like
certain chips ordering volume before he knew it even worked
and it was like all the money he had left
or like ordering volumes for projects he had not won yet
like I heard a rumor that or not a rumor but like a story
from someone who's like a gray beard in the industry and I think would know
was like yeah no no no like in video ordered
the volume for the Xbox
before Microsoft
gave them the order
they were just like
they're just like fuck it yellow
right
I don't know like
I don't know how true this
I'm sure there's more nuance there
like you know verbal indication
or whatever but like the order was
placed before he got the order right like is what he said
you know there's there's cases like
with the crypto bubbles right
like there was a couple of them
but like
Nvidia did their
damn best to convince everyone
on the supply chain
that it wasn't crypto
and that it was gaming, real demand
and it was gaming and data center
and professional visualization
and therefore you guys should ramp your production
and they all ramped production
and spent all this CAPEX on increasing production
and building out new lines for them
and they pay per item
and then they bought them and sold them
and made shit loads of money
and then when it all fell apart
they tend to write down a quarter's worth of inventory
whatever
everyone else was like
well crap
I have all these empty production lines, right?
And so it's like, you know, but, but like what did AMD do then, right?
Their chips, they were actually better for crypto mining, right?
On a, on a, you know, amount of silicon, uh, cost versus how much you hash, but like they
just didn't, they, AMD was like, ah, we're going to not really raise production, right?
Like, as a reasonable, you know, thing, right?
It wasn't a, it's sort of like strike while the iron's hot.
And so like, you know, the same has happened with Invidia, right?
They've, in recent times, like, sort of, they've ordered capacity that no one believes, right, multiple times.
They see the end demand, obviously.
But in many cases, they're just like, their number for, like, Microsoft was higher than Microsoft's internal planning, right?
And then Microsoft's internal planning went up, but, like, their number for Microsoft was way higher.
And it's like, oh, we just don't think Microsoft's going to need this much, even though they tell us this.
It's like, who the heck is like, no, no, no, customer, you're going to buy bar.
Like, and orders, right?
And then when the orders come through the supply chain, it's like, I have to put pay NCNR, right, non-cancel, non-returnable, like, you know, this is, you know, this is, I asked a question in Taiwan once.
There was like a, it was, it was Colette, which is the CFO and Jensen CEO.
They were, they were both there.
And it was a room full of like mostly finance bros and they're asking stupid finance questions like three days.
for earnings. So obviously they just could not answer anything because it's like, you know,
SEC regulations. But then my question to them was like, look, Jensen, you're like so vibes
driven and like very gut feel and like very visionary. And then that's, you know, CFO, like,
she's amazing in her own right. But like, you know, those personalities clash, how do you work
together? And he's like, I hate spreadsheets. I don't look at him. I just know. Right. It's like,
is his response. And it's like, of course, you know, the best innovators in the world have
really good gut instinct.
Right.
Right.
And so like the gut instinct to like order with, you know,
with non-cancable, when you don't know,
and they've had to write down over their history multiple times, right?
Many, many billions of dollars in accumulative orders, right?
Accumulate in total orders.
Whether it be, you know, the age 20, which is more regulatory,
but like other cases they've ordered and had to cancel.
Is that many billions?
It's many billions.
Peanuts.
Well, it depends, right?
The crypto writedown was like,
multiple billion when their stock was like less than $100 billion, right?
Like, it's like a, you know, it's pizza.
That's compared to the upside, right?
I think, I think everything you did was right.
Yeah.
And I think everything AMD did was wrong, like, you know, in that scenario.
But like, it is crazy to, especially in a cyclical industry like semiconductors
where companies go bankrupt all the time, which is why we have all this consolidation is
every down cycle, companies go bankrupt.
I mean, if a little bit risk return perspective, right,
these bits were totally worth taking.
Yes.
If you look at it from, I'm a CEO,
I want to have a predictable quarters
for Wall Street. It's a very different story.
I think that's sort of a part of detentions from now.
Yeah, so we, I don't know if
you've seen these, like,
Li Kuan Yu edits, where they're, like, him
like saying some, like, fiery speech
and then, like, and then it's, like, some cool
music at the end, and it's, like, showing different
pictures of him. And so we made one of Jensen
recently and put it on social media,
right, on, like, Instagram, TikTok,
uh, XHS, Red Book,
right uh twitter of course right like all the different social media uh and i really liked it because
he's like he's like you know the goal of like playing is to win and and the goal or sorry and the reason
you win is so you can play again right and you compared it to pinball where like actually you just play
all day and you keep getting more rounds and it's like his whole thing is like i want to win
so i can play the next game um and like it's only about the next generation right it's only about
now next generation it's not about 15 years from
now because it's a whole new playing field
every time or five years from now.
I think that's, you're right,
it's the risk or reward is, is correct.
Yeah, but there's few people
take these kind of risks.
It's the only semiconductor company that's worth,
you know, I think even north of $10 billion
that was founded as late as it was.
Like, MediaTech was in the early 90s
and then Nvidia and everyone else
is like from the 70s mostly.
Yeah, big ones.
Yeah. Yeah, I think you raised this great point
on the bet the farm.
And he's actually been wrong a couple times, to your point.
Mobile, right?
Like, what the hell will happen with mobile?
Exactly.
And he still takes them.
And I think Mark actually had this great conversation with Eric
where he talked about being founder run,
where you have this memory of the risks you took
to get to where you are today, right?
And so in a lot of cases, if you're a CEO brought on later on,
you're sort of like, okay, continue to steer the ship as is.
But in this case, he remembers all the times
they almost went belly up, and he's like, I've got to bet, keep making bets like that.
How do you think he's changed over?
I mean, he's been one of the longest running CEOs over 30.
He's kind of right up there with Larry Ellison now.
How do you think he's changed over the last 30 years or so?
I mean, obviously, like, I'm 29.
I don't forget that what he was like.
I've watched a lot of old interviews.
I won't say he wasn't.
He's a CEO longer than you've been alive.
Yeah, exactly.
exactly. Like,
Nvidia was founded before I was born.
I'm 96, right? Like, you know?
Yeah, anything over the last
couple of you, right? I think even
like watching old interviews, right? Like, I watched
a lot of old interviews, a lot of old, like,
presentations he's given.
One thing is that he's just like sauced up and
dripped up, like, wait, like, the charisma
he's gotten has only gotten stronger.
Right?
Yep.
Which is an interesting point.
I don't know if it's quite relevant.
I don't agree with that, yeah. But like,
the man like has learned to be a rock star more even though he was always charismatic it was like he's a complete rock star now and he was a rock star you know a decade ago too it's just people maybe didn't recognize it i think i think the first live presentation that i watched it was extreme was like um it was what's the what's the cons is c es like 2014 or 2015 or whatever um he's he's he's he's it's consumer electronic show
I'm like moderating like gaming
gaming hardware subredits right
like at the time I'm a teenager
and like the dude is like
talking only about AI
he's telling he's telling like all these gamers
about AlexNet and self-driving cars
right it's like know your audience
first of all but also like
like
it has nothing to do with consumer electronics
at Gaby you know at the time
I was also like I was half like
holy crap this is amazing
but also as a hot half, like, I want you to announce new gaming GPU, right?
Like, you know, but I know, like, on the forums, on the forums, quickly everyone was like, you know,
screw this, you know.
Yeah, yeah.
I want to hear about the gaming GPUs, Nvidia's price gouging.
Like, you know, of course, Nvidia's always had to, like, we priced the value and like, plus a little bit, right?
Because we're just smart enough to know.
You know, I'm guessing Jensen just has the gut feel of how to price things, right?
He'll change the price, like, at least on gaming launches, he'll change the price up until.
like right before the presentation.
Wow.
So like it really is like a gut feel thing probably.
And anyway, so, so he had that charisma to know what was right.
But I think people, a lot of people were like, oh, no, whatever, Jensen's wrong.
He doesn't know what he's talking about.
But now like he talks, people are like, oh, very, very, you know, so it might just be that
he's been right enough.
Yeah, there's a post on X recently that said he had moved up into God mode with a select
group of CEOs, but that this was
reeth, like, it's exactly...
Who's the other gods?
It was Zuck.
Pretty other gods.
Elon.
Elon, Zuck, and Jensen.
Nice, nice.
Okay.
Good crew to be in.
So we pray to Silicon Valley.
It's sort of the cult now, is it?
Exactly.
Just on one last thing on people.
You mentioned Colette, his CFO,
and, you know, there's
sort of a famously loyal crew
at NVIDIA, even though all of the
OGs could retire at this point.
Is there anyone akin to a Gwynne Shotwell at SpaceX
or previously a Tim Cook to Steve Jobs at Apple
that is at Nvidia today?
I mean, he had two co-founders, right?
Like, that's, you know, let's not overlook that.
One of them's, like, you know, not involved
and hasn't been for a long time.
But the other one was involved up until just a, you know,
a few years ago, right?
So it's not just Jensen running the show.
Totally.
Although he was running the show.
there's quite a few people on the hardware side
I've always
there's someone at Nvidia that's like mythical to me
like when you talk to the engineering teams
he leads a lot of engineering teams
he's a private person so I don't want to say his name
actually fair enough
but you know he's he's like
he's like effectively like chief
engineering officers like his role
and people within his org will know who he is
And I think there are people like that, but, you know, he's intensely loyal.
And there's a number of these types of people.
There's another fella who's like, you know, like, there's all these, like, innovative ideas at NVIDIA.
And he's the guy who literally is like, we need to get the silicon out now.
We're cutting features.
And that's like what he's famously known for.
And all the technologists in NVIDIA hate him.
This is like a second guy.
This is a second guy.
Also intensely loyal to NVIDIA has been around for a long time.
time, but it's like, you know, it's sort of like when you have such a visionary company
and forward, you know, one problem is that you get lost in the sauce, right? You know, oh, I want
to make this. It's got to be perfect, amazing. And it's like, you know, you got to have that
sort of like, and these people are like, you know, obviously they're close to Jensen for a reason
because Jensen also believes like these things, right? Have the visionary future looking, but also
like screw it, cut it, we'll put in the next one, ship, right? Like, you know, ship now, ship
faster in a space like silicon which is like really hard to do so and and and sort of like the thing
about invidia that's always been you know super impressive and it's from the beginning days where
he's talked about this before is their first chip their their first successful chip they're going
to run out of money and he had to go get money from other people to even finish the development
and even then he just had enough money because he'd already had a failed chip before this was
the chip came back and it had to work otherwise it would not you know and so they were like
because they could only pay for it's called a mask set right basically you put these like
I'll call them stencils into the lithography tool and then it like says where the patterns are
and you you know you put the stencil in you deposit stuff you etched off you deposit materials on the
way for etched away and you put the stencil in and like you you like tell it where to put stuff right
and then the deposition and etch keeps happening in those spots and you stack dozens of layers
on top of each other, and then you make it with chip.
These stencils are custom to each chip, right?
And they cost today in the orders of tens and tens of billions of dollars.
But even back then, it was still a lot of money.
It wasn't that much then, of course.
They could only pay for one set.
But the typical thing with semiconductor manufacturing is,
as good as you can simulate it, as good as you can do all the verification,
you'll send a design in and you have to change it.
There's going to be something.
It's so hard to simulate everything perfectly.
And the thing about Nvidia is they tend to just get it, right, the first time.
Even great executing companies like AMD or Broadcom or whoever, they often have to ship, you know, they're denoted in like A and then a number or B and then a number.
So it's like two different parts of the masks.
So like, Nvidia always ships A0.
Almost always.
They sometimes ship A1.
And a lot of times, even if they'll start production of the, you know,
the A is basically the transistor layer, then the numbers, like the wiring that connects all the transistors together.
So, Nvidia will start production of the A and ramp it really high
and then just hold it right before you transition to the metal, just in case they do need to change the metal layers.
And so, like, the moment they're ready and they've confirmed that it works,
they can just, you know, blast through a lot of production.
Whereas everyone else is like, oh, let's get the chip back.
Oh, okay, A0 doesn't work.
We've got to make this tweak, make this tweak, get the chip back.
It's called a stepping, right?
We were very jealous of Nvidia at that time, right?
They consistently delivered in the first one we did not.
The data center CPU group, there was one product where, you know, I said A1, A0A1,
or you go to B if you have to change the transistor layer as well.
So it's like B.
Invia, sorry, Intel got to like E2 once, E2.
Like that's like a 15 revision.
This is, this is.
It was like the peak of A&D's
Like when they went skyrocketing on market share versus Intel
Was when Intel was at E2
Right like 15 stepping
Because it's quarters of delay right
I mean it's catastrophic for a go to market
Yeah each each time is a quarter of delay or something right yeah
So it's it's absurd
So I think that's the other thing about in video is like
You know screw it let's ship it
Let's let's get the volume ASAP
Let's
Let's let's you know let's do these things that
you know and so anyways they like you know have some of the best simulation verification etc
that lets them sort of go from design uh you know from idea to shipment as fast as possible
um you know cutting out any and the necessary features that could delay it making sure they
don't have to do revisions so they they can get you know they can respond to the market
ASAP there's a story about how Volta which was the first invidia chip with tensor course
you know they saw all the AI stuff on the prior
generation P100 Pascal, and they decided we should go all in on AI, and they added the
tensor cores to Volta only a handful of months before they sent it to the FAP.
Like they said, screw it, you know, let's change it. And it's crazy. And it's like, if they
hadn't done that, who would have, maybe someone else would have taken the AI chip market, right?
So there's all these times where they just, and it's, those are major changes, but there's
often like minor things that you have to tweak, right? Number four maths.
or like some architectural detail.
Invidia is just so fast.
The other crazy thing is they have a software division
that can't keep up with that, right?
I mean, if you come out with the chip, right,
and basically no stepping required,
it's immediately in the market,
then being ready with drivers
and all the infrastructure on total.
That's just super impressive.
Yeah.
I love that point because you think of
Nvidia benefiting from tailwind after tailwind,
but I think both of you're saying,
you have to move fast enough and execute well enough
and take advantage of those tailwinds.
And if you think about, and by the way, I loved your CES story.
I'm just envisioning him more than 10 years ago talking about self-driving cars.
But, you know, if you think about nailing the video game tailwind, VR, Bitcoin mining, obviously AI now.
You know, one thing that, or one of the things that Jensen talks about today is robotics, AI factories.
Maybe my last question on NVIDIA, what do you think about the next 10 to 15 years?
I know calling Beyond 5 is hard.
but like what does
Nvidia's business look like?
It's really a question of
and this is like
I think every time I've talked to
you know
some executives at Nvidia
have asked this question because I really want to know
and they won't answer it obviously
but it's like what are you going to do with your balance sheet
like you are the most high cash flow company
and like you have so much cash flow
now the hyperscalers are all taking their cash flow
way down right
because they're spending on GPUs
what is what are you going to do
with all this cash flow right
like you know even even before this whole
takeoff he wasn't allowed to buy ARR
right
so so what can he do
with all this capital and all this cash
right even this $5 billion
investment Intel is
there's regulatory scrutiny there right
like it's in the announcement
like, yeah, this is subject to review it, right?
Like, you know, I imagine that it'll get past,
but, like, he can't buy anything big.
He's going to have hundreds of billions
of dollars of cash on his balance sheet.
What do you do? Is it
start to build AI
infrastructure and data centers? Maybe.
But, like, why would you do that
if you can just get other people to do it?
And just take the cash?
Well, he's investing those, when?
Investing peanuts.
Right?
You know, like, he gave recently, like,
a core wave a backstop
because today it's really hard to find
a large number of GPUs
for burst capacity, right?
Like, hey, I want to train a model for three months,
right? I have my base capacity where I don't know
my experiments, but I want to train a big model three months,
done. We know from our portfolio.
Yeah, yeah. So, like, Invidia sees this issue.
They think it's a real problem with startups.
It's why the labs have such an advantage.
But what if I could?
You know, right now, like, you know, most companies
in the valley spend, what,
75% of their round on GPUs, right?
Or at least, yeah.
Yeah, we see it.
What if you could do 75% in three months on one model run, right?
You know?
Yeah.
And really scale and have some sort of like competitive product.
And then you have the model.
Then you raise more capital, right?
Or start deploying, right?
What do you do with it?
Is it start buying a crap load of humanoid robots and deploying them?
But like they don't really make good software.
They don't make really that amazing software for them in terms of the models, right?
They make, you know, the layer below is great.
where they deploy their capitals is like the question he has been investing up and on the supply chain
a little bit though right investing in the neo clouds investing in some of the model training companies
yeah but again small fries like he could have just done the entire anthropic round if he wanted to
of course he didn't right and then like really got them to use GPUs or like he could have done
the entire you know opening i round yeah he could have done the entire like any xAI round
do these are things he should be doing or what's i mean like yeah good question i don't know right i
I think, like...
We'll quote you up for the next round that we're raised.
But anyways.
He could make venture a dead industry.
Take all of the best rounds.
But it's a lot of business, yeah.
You can do the scenes and then have Jensen mark you up.
That's why I can work.
No, I don't think...
I don't think...
I don't think...
I don't think it.
I think picking winners is obviously really tough for him
because he has customers all across this ecosystem.
If he starts picking winners, then, like,
his customers will even be even more anxious.
to leave and give even more effort to whether it's AMD or, you know, some startup or their
internal efforts, um, et cetera, et cetera, right, uh, buying TPUs, whatever it is. Like, you know, people
will, he can't just like invest in these, like, you know, he can do a little bit, right? A few
hundred million in an open AI round is fine or a few hundred million the next AI round is
fine. Um, core weave, right? Like, yeah, everyone's like throwing a fuss about it. But it's like,
he invested a couple hundred million plus, you know, early on, plus, um,
you know, rented a cluster from them
for internal development purposes
instead of renting it from a hyperscaler,
which is cheaper for Nvidia to do, right?
It's better for them to do it from them
than the hyperscalers.
It's like, did he really, like,
is he really backstopping
core weave that much, right?
Or, you know, any of the other customers
or Neo-Clouds?
Like, there's some investment, but it's like,
it's more like, this is a good cloud,
you know, we'll throw like
five or 10% of the round, right?
It's not he's taking 50% plus
of the round.
Is he also reshaping his market?
I mean, look, a couple of years ago, there were four big purchases of these cards.
You just listed six.
To what extent is that...
That's him and Nevis and...
There's a long list there.
Of course, yeah.
Is that a strategy?
It is. I think it absolutely is.
But he didn't have to put much capital down to do this.
Just chip one earlier than the other?
I don't know. Yeah, that's...
No, but it's like, if you look at the grand amount of capital they spent investing in the Neo-clouds,
it's, it's, it's, it's, it's, a few billion, but he has a lot of other levers if he wants to.
Right, right. Allocations, as you mentioned. Um, what's nice is, you know, historically, you gave volume
discounts to the hyperscalers. Uh, but because he can use the argument of antitrust, he's like,
everyone gets the same price. Uh, as. So fair. It's very fair. It's very fair. You know?
So what should he do with the, or what to guide his, uh, I mean, I think like, you know, like, there's
the argument he should invest in data centers and only the data center layer not the not the not what
goes in the data center so that more people build data centers and then if the market demand
continues to grow up data centers in power not the issue right invest in data centers in power
um i've said that to them they should invest in data centers in power not in the cloud layer
because the cloud layer is is is is quite commoditized but quite um it's it's commoditize or
compliment right is the whole phrase and i won't say being a cloud is commoditized but it's
certainly like you have a lot of competitors who are decent now um
And you've educated the commercial real estate and other infrastructure investment firms into going into AI Infra as well.
So, like, I don't think it's the cloud layer that you invest it, right?
Do you invest in data centers and energy?
Yeah.
Do you invest it?
Because that's the bottleneck for you're in growth, really.
Is A, how much people want to spend and can spend, and B, the ability to actually put them in data centers.
And then, like, robotics.
And, like, I think there's, like, areas he could invest in.
Nothing requires $300 billion in capital.
So what do you do with the capital?
Like, I really, I really don't know.
And I, like, feel like Jensen has to have some idea.
There's some visionary plan here, because that's what shapes the company, right?
I mean, they could sell, they could just continue to, you know, I mentioned $200 billion of free cash flow, $250 billion a year.
What do they do they do they do buy back stock forever?
Like, do they go Apple route?
And the reason my Apple hasn't done anything interesting and, like,
you know, nearly a decade is, you know, they've got, they've got a not visionary at the head.
Tim Cook's great, a supply chain.
And they're just plowing the money into buybacks.
They're not really, you know, automotive, the self-driving car thing failed.
We'll see what happens with ARVR.
You know, we'll see what happens with wearables, right?
But like meta and opening eye might be even better than them.
We'll see, like, in others, right?
So what does he invest in?
I have no clue, but nothing, what requires so much capital?
is the tough question
and it actually gets a return
because the easy thing is
like my cost of equity right I just buybacks
and doesn't completely change the company culture
I think that's another thing right
there probably areas he could invest it in
but you suddenly end up with the company
doing two completely different things
which are very difficult to keep on it
but they do like 10 completely different things right
I mean I mean one way to look at is we build
AI infrastructure and in the guys of we build
AI infrastructure robots
human rights around the world are
AI infrastructure
or data centers and energy
is AI infrastructure, right?
Like, you know, like...
So the human rights would totally work, right?
If you're suddenly pouring concrete
and building power plants, it has completely
different culture, completely different stuff for people, and it's very much
much harder.
Okay, agree. There's different ways to do it, like, invest
in the various companies or, like,
backstop, like, the building of a power plants,
right? Like, you know, because no one will have to build
power plants because they're 30-year underwriting things.
You know, there's all these different
areas where could use
capital to, you know,
allow something to happen,
right? Not necessarily owning it in something.
And look, look and Barry
a matter of the biggest problems we
had was that our customer base
sucked, right? I mean, we were selling to
most of the chips went into the large hypers
you know,
which they're way too concentrated,
and they build their own chips
and so you can push down your prices. So
honestly, spending it on diversifying the
cloud, you know, the company was in
14, you guys should have just charged so much
that your margins were 80%.
What were the world
have done?
Nothing.
The marks were pretty good back then.
That wasn't the problem.
That was the primary.
They were 60, 65.
They were 80.
Still, yeah.
Oh, boy.
There was Jensen.
UDST is picking in here.
Well, wait, I think Guido's
comment is actually a really good
segue into something else
we wanted to talk to you about,
which is the hypers
and one of the reasons that I love reading semi-analysis
is you guys make these out-of-consensus calls
that you're often right about
and one of them recently
was calling...
Only often?
You have a Jensen hit rate.
It's very high.
Where's my billion-dollar, you know,
PV-positive bet?
But the one that caught my eye
was Amazon's AI resurgence.
So I wanted to talk to you a little bit
about that, just because, you know, I think we found it pretty interesting being on the ground
helping our portfolio companies pick who their partners are. And so we have some microdata on this,
but you sort of walk through why they're behind. Yeah. So in Q1, 2020,
I wrote an article called Amazon's Cloud Crisis. And it was about all these neoclouds are going
to commoditize Amazon. It was about how Amazon's entire infrastructure,
was really good for the last era of computing, right?
What they do with their elastic fabric,
ENA and EFA, right, their NICs,
the whole protocol and everything behind them,
what they do for custom CPUs,
et cetera, right?
It's really good for the last era of scale-out computing
and not the era of sort of scale-up AI-infra
and how Neoclouds are going to commoditize them
and how their silicon teams were focused on, you know,
cost optimization, whereas the name of the game today
is max performance per cost, right?
But that often means you just drive up performance like crazy.
Even if cost doubles, you drive up performance more triples
because then the cost per performance falls still.
That's sort of the name of the game today with NVIDIA's hardware.
And it ended up being really good call.
Everyone was calling us out like, no, you're wrong.
And this was like when Amazon was like the best stock.
And Microsoft really hadn't started taking off yet.
and Nor had like all these other, you know, Oracle and so on and so forth.
And since then, Amazon has been the worst performing hyperscaler.
And the call here is that, you know, they still have structural issues, right?
They still use elastic fabric, although that's getting better, still behind
invidia's networking, still behind broadcoms slash Arista, like type networking, NICs.
They still use, you know, their internal AI chip is okay.
But the main thing is that they're now waking up and being able to actually capture business, right?
So the main call here is that since that report, AWS has been decelerating revenue.
You're on your revenue has been falling consistently.
And our big call is that it's actually going to start re-accelerating, right?
And that's because of anthropic.
It's because of all the work we do on data centers, right?
Tracking every single data center, when that goes online and what's in there.
the flow through on cost,
or if you know how much the chips costs,
the networking costs, the power cost.
You know how much generally margins are
for these things,
then you can sort of start estimating revenue.
So when we build all that up,
it's very clear to us that they trough
on AWS revenue growth this point, right?
This is the lowest ADWS revenue growth
will be on a year or a basis
for at least the next year, right?
And it's re-accelerating to north of 20% again
because of all these,
massive data centers they have online with Traneum and GPUs, right?
Depends on which one. It depends on which customer.
The experience is not as good as, you know, say, a Corrieve or whatever, but the name
of the game is capacity today.
Corrieve can only deploy so much.
They only can get so much data center capacity, and they're really fast at building.
But the company with the most data center capacity in the world, that and still today,
although they may get passed up in the next two years.
is Amazon. Actually, they will get passed up based on what we see is Amazon,
but incrementally, Amazon still has the most spare data center capacity
that is going to ramp into AI revenue over the next year.
Let me ask a question. Is that the right type of data center capacity?
Like for the high-density AI buildouts today, you need massively more cooling,
you need to have enough water close by, and you have enough power close by.
Is it in the right place or is it the wrong type of data?
So data center capacity, in this sense, I mean, all the way from power is secured to
substations built, to transformers, to
you can provide the
power whips to the racks.
Now, obviously, the data center capacity will
differ, right?
You know, historically, actually, Amazon's had the
highest density data centers in the world.
Right? They went to like 40
kilowatt racks when everyone was
still at 12. And if you've ever stepped
foot inside of most data centers,
they're like pretty cool
and dry-ish.
If you step inside of the Amazon data center,
they feel like a swamp. It feels like where I grew
right? It's like, it's like humid and hot because they're like optimizing every percentage.
And so sort of like your point in here is that like Amazon's data centers aren't equipped
for the new type of infrastructure. But when you compare them to the cost of the GPU,
like getting, getting, you know, having a complex cooling arrangement is fine, right?
You know, we made a call on a Sarah Labs a few months ago, a couple months ago,
when they were like at 90 and it's it's gone to 250 the month after because of what
their orders Amazon is placing with them
but there's certain things with Amazon's
infrastructure, I won't get too much into it, but the
rack infrastructure requires them
using a lot more of like a sterolabs connectivity
products
and the same applies to cooling, right?
So it's on the networking and cooling side.
They just have to use a lot more of this stuff.
But again, this stuff is inconsequential
on cost compared to the GPU.
You can build, my question is more like, look,
I may need a major river close by for
cooling at this point, right? It's
in many areas, I just can't get enough water.
And, you know, it's probably power in the same region.
There's two gigawatt scale sites that they have power all secured,
wet, wet chillers and dry chillers all secured.
Like, everything's fine.
It's just not as efficient.
But, you know, that's fine, right?
Like, you know, they're going to ramp the revenue.
They're going to add the revenue.
Not that I necessarily think Amazon's internal models are going to be great.
Or, hey, their internal ship is better than NVIDIAs or competitive with TPU.
or their hardware architecture is the best.
I don't necessarily think that's the case.
But they can build a lot of data centers
and they can fill them up with stuff
that will be rented out, right?
And it's a pretty simple, it's a pretty simple,
it's a pretty simple thesis.
How important has Anthropic been to the co-design for Tradium?
Because I remember we had a portfolio company.
This was summer, 2023.
They invited them to AWS.
they spent, man, I think
eight hours with them over the course of
a week trying to figure
out training them back then. It was just
impossible to work through.
Is that, you know,
that, obviously that portfolio company hasn't
gone back and tried it now, but like how
different is it now based on what you're hearing?
Oh, it's still bad.
Okay. Okay.
You know, it's tough to use.
So there's sort of like,
this is sort of the argument that every
inference company offers, right, including
the AI hardware startups is
because I'm only running like three
different models at most,
I can just hand optimize everything
and write kernels for everything and even
like go down to like an assembly level
right. How can it be? It is
pretty hard. It is pretty hard.
But like you tend to do this for
production inference anyways.
Like you aren't using
KudianN which is Nvidia's like library
that's like super easy to generate your
you know to generate kernels and stuff right like you're not
or not generate kernels but anyways
you're still
you're not using these
ease of use libraries
you know when you're running
inference you're either
you know using cutlass
or stamping out your own PtX
or you know in some cases
people are even going down to the SaaS level
right
and like when you look at like
say an open AI or like
you know an anthropic
when they run inference on GPUs they're doing this
right
and the ecosystem is not
that amazing
when you
Once you get all the way down to that level,
it's not like using Nvidia GPUs is easy now.
I mean, you have an intuitive understanding of the hardware architecture
because you work on it so much and everyone's worked on it.
And you can talk to other people.
But at the end of the day, it's not like easy, right?
Whereas, you know, anthropic, Traneum or TPUs.
Actually, the hardware architecture is a little bit more simple than a GPU.
Larger, more simple cores,
rather than having all this functionality, you know, less general.
So it's a little bit easier to code on.
There's, there's tweets from anthropic people saying they, when they are doing that low level, actually they prefer working on tradium and TPU because of the simplicity.
No.
Interesting.
To be clear, Tradium and TPU, I mean, Tradium especially is very hard to use.
Like, not for the faint of heart.
It's very difficult.
But you can do it if you're just running, like, if I'm anthropic and I must only run Claude 4.1 opus, four, sorry.
Sonnet. And screw it, I won't even run Haiku. I'll just run IQ on, like, on GPUs or whatever, right?
I'm just going to run two models. And actually, screw it. I'm just going to run Opus on GPUs too,
and true TPUs. Sonnet is the majority of my traffic anyways. I could, I could spend the time.
And how often am I changing that architecture every four or six months, right? Like, how much?
It's nothing changing that much, honestly. Right? I think, I mean, from three to four definitely did change, right?
Yeah, I mean, define architectural change. You know, at a high level,
The primitives are more or less the same
across the last couple of generations.
I don't know enough about
Anthropics model architecture, to be honest.
But I think from what I've seen at other places,
there have been enough changes that it takes time to
program this and really get...
The main thing is like, you know,
if I'm anthropic
and I have, what, 7 billion ARR now or whatever,
north of 10...
By the end of next year, north of 20, right?
Like, ARR is like, maybe even 30
is like, that's, that's, and my margins are 50%, 70%,
that's $15 billion or tranium that I need, right?
Then I can run on Sonnet.
And most of that's going to be Sonnet 3, 5, or, sorry, 4, 5, whatever it is, right?
It's going to be one model serving most of the use cases.
So, like, you know, I could spend the time and it'll work on this hardware.
Yeah, totally.
Maybe on the topic of non-consensus calls you've made, and maybe
I'll move to another cloud.
In June, you guys said that Oracle is winning the AI compute market.
And then in this pod, we've already referenced the big jump, obviously, that Oracle had.
I think it was the single largest gain that a company with over $500 billion of market cap has ever had.
So, an enormous...
Was the 2023 Q1 Nvidia not bigger?
It might have been smaller.
Okay.
I think it was maybe close.
We'll fact check ourselves.
That's amazing.
But, you know, obviously, this is the massive...
commitment that was announced, can you walk us through why you made that call then
and just sort of why Oracle is poised to do so well in such a competitive space?
Yeah, so Oracle, they're the largest balance sheet in the industry that is not dogmatic
to any type of hardware, right? They're not dogmatic to any type of networking.
They will deploy Ethernet with Arista. They'll deploy Ethernet through.
their own white boxes, they'll deploy
Nvidia networking, Infinite Band,
or SpectrumX, and they have really good network engineers.
They have really great software across the board, right again,
like ClusterMax, they were ClusterMax gold
because their software is great. There's a couple of things that they needed to add
that would take them higher, and they're adding those, right?
To Platinum, right, which was where Corby was.
And so, like, when you couple of two things, right?
Like, opening eyes got insane compute demand.
Microsoft is quite pansy.
They're not willing to invest in.
They don't believe Open AI can actually pay the amount of money, right?
I mentioned earlier, right?
The $300 billion deal, opening I, you don't have $300 billion.
And Oracle is willing to take the bet.
Now, of course, the bet is a bit like, there's a bit more security in the bet in that.
Oracle really only needs to secure the data center capacity, right?
So this is sort of like how we came across the bet, right?
And we've been telling our institutional clients,
especially in a super detailed way,
whether it be the hypers or AI labs
or semi-electric companies or investors
in our data center model
because we're tracking every single data center in the world.
Oracle doesn't build their own data centers either, by the way.
They get them from other companies.
They co-engineer, but they don't physically build them themselves.
And so they're quite nimble in terms of being able to assess
new data centers, engineer them.
So we saw all these different data centers, Oracle is snatching up
in deep discussion,
snatching up, signing, et cetera.
And so we have, you know, hey, gigawatt here,
gigawatt there, gigaw out there, right?
Avalene, you know, two gigawats, right?
You have all these different sites
that they're signing up and discussions with,
and we're noting them.
And then we have the timeline
because we're tracking into our supply chain,
we're tracking all the permits,
regulatory filings, you know,
through language models,
using satellite photos constantly,
and then supply chain of like chillers,
transformer equipment, generators, et cetera.
we're able to make a pretty strong estimate of quarter by quarter in our data center
or quarter by quarter how much power there is for each of these sites, right?
So some of these sites that we know of aren't even ramping until 2027, but we know that Oracle signed it, right?
And we have the sort of ramp path.
So then it's this question of like, okay, let's say you have a megawatt, right?
For a simple sake, simplicity sake, which is a ton of power, but now it doesn't feel like much.
It's, you know, we're in the gigawatt era.
But, you know, if you're talking about a megawatt, right, you fill it up with GPUs.
How much do the GPUs for a megawatt cost, right?
Or actually, it's even simpler to do the math, right?
If I'm talking about a GV-200, right, each individual GPU is 1,200 watts.
But when you talk about the CPU, the whole system, it's roughly 2,000 watts.
At the same time, you know, all in everything, simplicity's sake, $50,000 per GPU, right?
The GPU doesn't cost them.
There's all the peripheries, right?
So $50,000, cap-x for 2,000 watts.
So $25,000 for 1,000 watts.
And then what's the rental price for GPU?
If you're on a really long-term deal, volume 270, right, 260 in that range,
then you end up with, oh, it costs like $12 million per megawatt to rent a megawatt.
And then each chip is different.
So we track each chip, what the cap-x is, what the.
networking is. So you know what each chip is. You can predict what each, you know, what chips
they're putting in which data centers, when those data centers go online, how many megawatts
by quarter. And then you end up with, oh, well, Stargate goes online in this time period.
They're going to start renting it this time. It's this many chips. Each Stargate site, right?
And so therefore, this is how much opening I would have to spend to rent it. And then you,
you prick that out. And we were able to predict Oracle's revenue with pretty high certainty.
and we matched pretty dead on what they announced
for 25, 26, 27,
and we were pretty close on 28.
The surprise for us was that, you know,
they announced some stuff that 28, 29,
data centers that they, we haven't found yet,
but we'll find them, right, of course.
And sort of like, this methodology lets you see
sort of, hey, what data centers are you getting,
how much power, what are they signing,
how much incremental revenue that is,
when that comes online.
And so that's sort of the basis of our
Oracle bet. Obviously in the newsletter we included a lot less detail, but, you know,
sort of, it was that thesis, right, that like, hey, they have all this capacity. They're
going to sign these deals. In our newsletter, we talked about two main things. We talked about
the opening eye business, and then we talked about the bite dance business. And presumably
tomorrow, you know, on Friday, there's going to be announcing it about TikTok and all this.
but like the BightDance business, you know,
huge amounts of data center capacity
that Oracle is also going to lease out to BightDance, right?
And so we did the same methodology there.
You know, with BightDance, it's pretty certain they'll pay
because they're a profitable company.
With Open AI, it's not.
And so there's got to be some like error bars
as you go further out in terms of like,
will opening I exist in 28, 29, 30,
and will they be able to pay the $80 plus billion a year
that they've signed up to Oracle with?
Right. That's the only like risk here.
And if that happens, then Oracle's downside is also somewhat protected
because they only sign the data center, which is a minority of the cost.
The GPs are everything.
And the GPs, they purchase one to two quarters before they start renting them.
So they're not, you know, the downside risk is pretty low for them in terms of
if they don't get the deal.
Well, they don't get the revenue, but it's not like they're stuck with a bunch of assets
they bought that are worthless.
Yeah.
Is that another angle here?
I mean, opening air in Microsofts of BFFs and now they're filed to voice papers
and they just want to diversify.
and then that's pushing them away towards other providers.
Yeah, so Microsoft was exclusive compute provider.
It got reorg to write a first refusal.
You know, and then Microsoft...
Was it no last choice or something like that?
No, it's still right of first refusal, but it's like Microsoft...
Those two are not mutually exclusive.
Well, if opening eyes is like, we're going to sign a $80 billion contract
or a $300 billion contract for the next five years, you guys want it?
Or, you know, it's like...
Yeah, yeah.
And they're like, no.
what?
Okay, cool.
Right?
And then they go to Oracle, right?
And it's opening I is like sort of like,
this is the, you know, opening I need someone with a balance sheet
to actually be able to pay for it, right?
And then they'll make tons of money, you know, off of opening I
on the margins on the compute and the infra and all these things.
But someone's got to have a balance sheet.
And Open AI doesn't have a balance sheet.
Oracle does.
Although given the scale of what they say,
side. We also had another
source of information, which was
that they were talking to
debt markets, right? Because
Oracle actually just needs to raise debt
to pay for this many GPUs over time.
Now, they won't do it immediately. They can pay for
everything this year and next year from their own
cash. But like in 27, 28,
28, 29, they'll start to have to use debt to pay for
these GPUs, which is what
Corby was done. In many of the Neal Clouds, most
of its debt financed. Even meta
went and got debt for
their Louisiana Mega Data Center. Not
because just because it's cheaper than
it's literally better on a financial
basis to do buybacks with your
cash and get debt because the debt is
cheaper than the return on your stock.
It's like a financial engineering thing, but like
you know, who's out there, right?
It could be Amazon, it could be Google,
it could be Microsoft. It was a very short list.
Or it could be Oracle
or meta, right? Meta's
obviously not. Microsoft's chickened out.
Amazon, Google, and
Oracle, right? That's all that's left.
Google would be an awkward fit.
Yeah, Google would be an awkward fit.
Amazon would be a fine fit, but, you know, exactly, right?
It's like, it's a very drop-endropic, yeah.
Well, I guess maybe, you know, on the topic of these giant data center buildouts,
you guys just released a piece on X-AI and Colossus 2.
Do you, are you getting less impressed by these feats of building something this massive in six months?
Or is it still very impressive to you guys?
You know, this is the thing I've said about AI researchers
is that they're like the first class of humans
to think about things on an order of magnitude scale
whereas like people have always thought about things
in terms of like percentage growth
like ever since industrialization
and before that it was just like absolute numbers
right?
You know, sort of like humanity is involving
in terms of how we think
because things are changing faster.
Everything is a knock scale.
And so like
you know, it was like really impressive
when GPD 2 was trained on so many chips
and then GPD 3 was trained on that
like on 20KA 100s
and you know or sorry GPD 4
20KA 100s GPD you know sort of like
it's like holy crap and then it was like
oh the era of 100K GPUs clusters
right and we did some reports around 100K GPU clusters
but now there's like
there's like 10
100K GPU clusters in the world
I was like okay this kind of boring
but it's like 100K GPUs is like
you know, over 100 megawatts.
Now it's like, you know, like literally, you know,
we, in our Slack and in some of these channels,
like, oh, we found another 200 megawatt data center.
There's, there's someone who like puts the yawning emoji.
Every time.
And I'm like, dude, what?
Like, now it's only, it's only exciting if you do gigawatt scale days.
Like, we're in gigawatt era.
Yeah.
Yeah, yeah.
And I'm sure, like, you know, and, you know, I'm not sure.
Maybe, maybe we'll start yawning to that too.
But, like, you know, the log scale of this is like, the capital numbers are crazy, right?
Like, you know, it's like, it's crazy enough that opening I did like $100 billion
dollar trading run, you know, or, you know, like, then they did a billion dollar
training run.
Now we're talking about $10 billion training runs, right?
Like, you know, it's, it's crazy that we think in log scale.
But yes, things are only impressive.
Yeah.
What they do, like, what Elon's doing, so what Elon's doing in, in Tennessee, in Memphis, first
time was crazy, right?
100K GPUs in
six months. He bought a factory in
like February of 24
and had
models training within six months, right?
And he did liquid cooling, you know,
first large scale data center with liquid, at this scale
for AI doing liquid cooling. Like all
these sorts of crazy first.
Putting generators outside like cat
turbines, all these things for different things to get the power
mobile substations, all these different
crazy things, tapping the natural
gas line that's like running alongside the factory.
All these. So he does
this as like, holy crap.
And he did it for 100K
GPs. Right. You know, 200, 300
megawatts, right? Now he's doing it for
a gigawatt scale
and he's doing it
just as fast. Right?
And so like, you would think like
this is obviously way more impressive that
he did it again. Yeah. But
like, like
they have desensitizes, but like it's like
you know, like you've given the child too much
candy, right? And now, like, the child has no, you know, is like, you know, doesn't like
apples, right? Like, I don't know. So, so, so like, yeah, a gigawatt data center. There was all
these protests around his Memphis facility. People like, oh, you're destroying the air. And it's
like, have you booked around that area of Memphis? Like, there is like a gigawatt gas turbine
plant that's just powering generally that area. There's a sewage plant that's servicing the entire
city of Minnesota, or sorry, city
of the Memphis. And there's
like open air pits of like
the like there's open air mining.
Like there's all sorts of disgusting shit
around there, which is needed.
Right, we need that stuff to have a country run
right, like to be clear.
And you know,
it's like people are complaining about like a couple
hundred megawatts in air. Yeah.
Of a generation.
So he got like protests from all sorts of people.
You know, you got super into the politics side
of things. And AACP even protest.
tested him like. And so like he really got like some local municipalities to be like, oh, I don't like, you know, like this. And so he couldn't do as much as he wanted to in Memphis. But he still needed the data center to be close because he wanted to connect these data centers super high bandwidth, super close. And he always already had a lot of infrastructure set up there. So he bought another distribution center at this time. And it's still in Memphis. But the cool thing about Memphis is it's right across the border from Mississippi. Right. So now, you know, it's
like 10 miles away from his original one, but his
facility is like a mile away from Mississippi
and he bought a power plant in Mississippi
and he's putting turbines
there. The regulation is
completely different, right? And if the
question is really like
galvanize resources and build it
really fast, maybe Elon
is ahead of everyone.
You know, he hasn't made the best model yet or
he doesn't have the best model at least today, I think.
You know, you could argue
Grockford was the best for a little period of time.
But like, you know, it's
it's truly amazing
how fast he's able to build these things
and for first principles it's like
most people are like fuck like you know
we can't we can't build the power
we can't do power here anymore
because we have to find a new site and it's like
no no just go across the border
go to Mississippi
and my favorite thing is like Arkansas's right there
so Mississippi gets mad you know
I don't know
the regulation the all future data centers
you know built in places where multiple states
meat. Is that the...
Four quarters, yeah.
The optimal regular...
I think there's one...
There are you guys.
Is there a point in the U.S. with five?
I know there's a point with four.
Four states intersect.
There, yeah.
Maybe that's recorded a data center.
Kind of certain.
I'm going to buy real estate in that area
of front Reddit.
Well, I guess on the topic of just maybe
new hardware, you had this
piece analyzing TCO
for GB200s.
And I'm kind of going to ask this question on behalf
of our portfolio companies, which it sounds like you're helping them already. But one of the
findings that I thought was really interesting was TCO was sort of 1.6x H-100s for GB200s. And so obviously,
you know, there's this point on, okay, that's sort of the benchmark for the performance
boosts that you're going to need to at least make the sort of performance cost ratio benefit
from switching over. Maybe just talk about what you've seen from a performance.
standpoint, and what do you recommend to portfolio companies, maybe in a smaller scale than XAI,
who are thinking about new hardware, try to get it? There's capacity constraints, obviously.
Yeah, I mean, that's a challenge, right? With each generation of GPU, it gets so much faster
that you end up like you want the new one. And in some metrics, you could say GB200 is
three times faster than, or two times faster than the prior generation. Other metrics,
so you can say it's way more than that, right?
So if you're doing pre-training versus inference, right?
They can run everything for a bit, right?
Yeah, if you can run it for a bit or just inference
and take advantage of the huge NVLink, NVL-72, you know,
there's ways you can squit and say GB200 is only 2x faster than H-100,
in which case, 1.6x TCL.
It's, you know, it's worthwhile, right?
It's worth going to the next gen.
More marginal.
It's more marginal.
It's more marginal. It's not a big deal.
Then there's other cases where it's like, well, if you're running deep-seek inference,
the performance difference per GPU is north of like 6-7x, and it continues to optimize for deep-seek inference.
And so the question, you know, then it's like, well, I'm only paying 60% more for 6x.
And it's like it's a 4x or 3x performance per dollar gain.
Like absolutely, right?
And if you're like running inference of deep-seek, that can also include RL, right?
and so the question is sort of
and then the other question is like
well the GPU is new
you know there's also B200
there's GV 200 there's B200
B200 is much more simple
from a hardware perspective
it's just eight GPs in a box so then it's not as much
of a performance gain especially in inference
but you have
you have all the stability
right it's an 8 GPU box
it's not going to be unreliable
the GV 200s are still having some
reliability challenges those are being worked through
it's getting better and better by the day
but it's still a challenge
But, you know, when you have a G.B.2, when you have a H100, right, box, or H200, 8 GPUs, one of them fails. You take the entire server offline, you have to fix it, right? So usually your, if your cloud's good, they'll swap it in, right? But if it's GV200, what do you, what do you now do with 72 GPUs? If one fails, you break the whole thing and you get a new 72, the blast radius of a failure, right? No, GPU failure rates at best are the same.
and likely worse, right,
gen on gen, because everything's getting hotter,
faster, et cetera. So at best, the failure
rates are the same. Even if you model the failure rates as the
exact same, because you go from one
out of eight to one out of 72, it's a huge problem.
So now what a lot of people are doing is,
they run a high priority workload on 64 of them,
and then the other eight,
you run low priority workloads.
Which is then like, okay, this is this whole, like,
infrastructure challenge, like, I have to have high priority
workloads, have low priority workloads.
When a high priority workload has a failure,
instead of taking the whole rack off line, you just take
some of the GPUs from the low priority one,
put it in the high priority one,
then you just let the dead
GPU sit there until you service the rack
at a later date. And it's like, there's all
these complicated infrastructure things that
make it so,
oh wait, actually that
that 3x or 2x performance
increase in pre-training
is lower because the downtime is
higher. Slash, I'm not using all the GPUs
always, slash, I'm not able to
you know, I'm not smart enough or I don't have the
infra to like have low priority and high priority
workloads. Like, it's not impossible. The labs are doing it, right? Like, it's just, I mean,
if I'm running a cloud, it's actually really hard, right? Because I probably have to rent the
spot one, like the spares out of spot instance or something. No, no, no, no, no, because then,
because it's a coherent domain. It's NVLink. You don't want anyone touching that. So,
it has to be the end customer. I don't have to leave them as an empty spares. That's even
worse. The end customer usually would just be like, I want them and I will, you know, and the
SLAs and the pricing, everything is like accounting for that, right? So, like, generally when
you have a cloud, you have an SLA, right? That is, hey, it's going to be uptime is going to be
99%, you know, blah, blah, blah, right? For this period. With GB200, it's 99% for 64 GPUs,
not 72. And then it's like 95% for 70%. Now it differs across every cloud. Every cloud is a different
SLA. But like, they've adjusted for this because they're like, look, this hardware is just finicky.
Do you still want it? You know, we will credit you in that 64 of them will always work, right? Not
not 72 and so like there's this whole like finicky nature and the end customer has to be
capable of dealing with the unreliability and it's like and the end customer can just continue
to use b200 right performance games not as much the whole reason you want this 72 domain is so you can
have you know some of these gains right um but you have to be smart enough to be able to do it and
and that's challenging for small companies totally so the
And Vier does announce the Ruben pre-fell cards, like C-TX, C-T-X, there we go.
What's your take on that?
Does it cannibalize?
Dude, by the way, I don't know if this is like brain rot or like, I don't know.
But like, I can't remember what I had for lunch yesterday.
But I know the model number of every fucking shit, like.
Hots you in your dreams.
We're broken, we're broken.
Living the dream.
No, no, no, no, no.
You know.
Why do you pre-announce a product that's?
5X faster for certain use cases?
Is that that much?
I think it's like historically AI chips for AI chips, right?
And then we started getting a lot of people saying
this is a training chip, this is an inference chip.
Actually, training and inference are switching so fast
and what they require that like now it's still like one chip.
Actually, there are still workload level dynamics that like differ,
but the main workload is inference even in training, right?
Because of RL, most of that is, you know, generating stuff in an environment and trying to, you know, achieve a reward, right?
So it's inference still, right? Training is now becoming mostly dominated by inference as well.
But inference has, like, two main operations, right? There is calculating the KV cash for pre-fill, right?
Here's all these documents. Do the attention between all of them, right? Between all the tokens, however, you know, whatever type of attention you use.
And then there's decode, which is
auto-aggressively generate each token.
These are very, very different workloads.
And so initially, the ideas
or infrastructure techniques, the ML systems
techniques were, oh, okay,
I will just make the batch size
every single forward pass
this big. And
if I make it, let's call it, I'll make it
a thousand big. And maybe
I'll run 32 users
concurrently. That way, you know, now
I still have 900-something left, 960,
left, right? That 960
is actually doing the pre-fell for
if a request comes in
it chunks it. It's called trunk pre-fell. You
pre-fell chunks of it now. You get really good
utilization on GPUs.
But then that
ends up impacting the decode workers,
right? The people were auto-aggressively generating
each token that being having slower KPS.
And tokens per second is
really important for user experience and all these
other things, right? So then the
idea is like, okay, these two workloads are so different
and they are literally different, right? You
pre-fill and then you decode. It's not like you're interleaving them. So why don't we
split them entirely? And this is done on the same type of chip, right? Open AI, Anthropic, Google.
Pretty much everybody does that. Everyone, everyone good. Everyone build it together, fireworks.
All these guys do pre-fill decode, disaggregated pre-filled decode. So they run pre-fill on a set of
GPS. Why is this beneficial? Because you can auto-scale them. Right? You can, hey, all of a sudden,
I have a lot more long context workers. I allocate more.
resources to pre-fell. Oh, all of a sudden, not all of a sudden, but like, you know, over time,
my traffic mix is not long input, short output. It's short input, long output. I have more decode workers.
This way I can guarantee, and so now I can auto-scale the resources differently, and I can also
guarantee that my pre-fill time is, you know, the by the time, you know, what's really important
in search is how fast you get the page to start loading. Not when does the resource happen.
What do people do in games? Like the loading screen often has some sort of interactive
environment or it blends in over time or whatever it has tips and tricks ways to distract you
the same thing is it's there's like studies and papers out there that users prefer a faster time
to first token right first token gets streamed to me in like sooner even if the total time to get
all my tokens is a little bit longer i can't read that fast anyways right so i mean i mean i like
just give yeah i like just good yeah i mean most models we turn about speed reading speed
but you need that right i think i think but like you know the the the
The idea is that you want to guarantee time to first token is a certain level for user experience reasons.
Otherwise, people are like, screw this, not using AI.
The decode speed matters a lot, too, but not as much as time to first token.
And so by having separate pre-filled decode, you do this, right?
But now you've already, and this is all in the same infrastructure, you've already done this.
So now it's like, what's the next logical step?
These workloads are so different.
Decode, you have to load all the parameters in and the KV caches to generate a single token.
you batch a couple users together,
but very quickly you run out of memory capacity
or memory bandwidth because everyone's KV cache
is different. The attention
of all the tokens, right? Whereas on pre-fill,
I could even just serve like
one or two users at a time, because
if they send me a 64,000 context
request, that is
a lot of flops, right?
64,000 context requests.
I'll use Lama 70B because it's simple to do math
on, like 70 billion parameters.
That's 140
gigaflops per token.
and 70 times 64,000, that's many, many terraflops.
You can use the entire GPU for like a second, right?
Like potentially, right?
Depending on the GPU to just do the pre-fill, right?
And that's just one forward pass.
So I don't necessarily care about, you know,
loading all the tokens or all the parameters in KV cache and fast.
All I care about is all the flops.
And so that leads us to sort of like, you know,
I had to, I think it was long-go-in-date explanation
because it's hard for people to understand what CPX is.
I've had a lot of like, even my own clients
like, we set like multiple notes like
explaining and they're like, I still don't understand. I'm like,
shit, okay.
Send it to the attention is all you need paper.
You can't expect.
I mean, like, think about like a, like a networking person.
Like they're like, no, I don't need to know about this.
You know, attention is all you need, right?
Like it's like, we're thinking about an investor, right?
Like, you know, there's all people.
Maybe the data center operator.
Like, they're like, oh, there's two chips.
Why?
Should I build my data center for differently?
It's like, like, you know, I got to explain everything or just like, no.
You don't have to build differently.
But anyways, you get to now...
In Stanford, there's 25% of all students, not CS students,
of all students, read their paper.
Read what paper, attention is what you need?
That's low.
They do gym majors and you know, like the philosophy guy.
I'm like this amazing.
Anyway, sorry.
The Middle East, I can't remember what country it is,
has AI education starting at, like, age eight,
and in high school, they have to read attention is all you need.
Wow.
Someone told me that they're saying they had to read attention is all you need.
which is
I don't know
like top down mandates
for education
you know
maybe they work
maybe they don't
like you know
maybe people like
homeschooling
or kids
I don't know
I went to public school
but like
back to your readers
yeah
just on the topic
of hardware cycles
I wanted to maybe
yeah
I didn't actually explain
what CPX is
so CPX is a very
like
compute optimized chip
whereas you know
for pre-fill
and then decode
is just to
just to simply say
it is like the rest
is the normal
chips with
HBM. HBM is more than half the cost of the GPU. If you strip that out, you end up having a
much cheaper chip passed on to the customer. So, or like, you know, if Nvidia takes the same
margin, then the cost of this pre-fill chip is much, much lower. And now the whole process is
way cheaper and more efficient. Now long context can be adopted. Right. Yeah. Well, so I, I love that
we're actually going with all this detail, because I had a more 10,000 foot view question for you,
which is, I haven't been following the semi-market
as closely as you have.
I probably started with the A-100.
And I remember helping Gnome at Character,
this is summer of June 23,
chased down GPUs.
And the only thing that mattered at that time
was delivery date because there was a huge capacity crunch.
And then to see that over the last two years evolve
where, let's say, six to 12 months ago,
people were doing these RFPs to 20 neoclouds, right?
And the only thing that mattered to some degree was price.
Right, people actually do RFPs for GPUs.
Yes.
So just to be clear, my opinion on how you buy GPUs is that it's like buying cocaine or any other drug.
This is described to me, not me.
I don't buy cocaine.
Okay, yeah, yeah.
Great.
Someone tells me this.
I'm like, holy shit, it's right.
You call up a couple people.
You text a couple people.
You ask, you know, how much you got.
What's the price?
It's like.
Exactly.
Exactly. This is fucking, like, buying drugs.
Sorry, sorry.
No, I mean, very accurate.
It's the same way. You just send, like, we have Slack connects with, like, 30 neoclouds.
There you go. As well of, like, some of the major ones.
And we just send them a message, like, hey, customer wants this much.
You know, this is what they're looking for. And then they send quotes.
I know this guy.
I know a guy. Well, so I think that's actually a very accurate description.
And I've sent countless port codes your ClusterMax original post, because I thought it
a really good job breaking them down. But maybe one question to end on for me is just,
what era are we in now with Blackwells coming online? Are we sort of back to the summer
2023 era? And that's kind of the cycle that we've just entered? Or what sort of your view on
where we are? So for a very good question. For one of your Fort Co's, we were like, you know,
after their difficulties with Amazon, we tried to, we were like, okay, let's actually like
it's Gigi fuse. The original deals we got you
were gone, but like here's some other deals, right?
It turned out that
multiple major neoclouds had sold out of
Hopper capacity.
And their blackwell capacity comes online
in a few months. So it's
a bit of a challenge, right?
In that... Due to inference?
Inference demand has been skyrocketing this year,
right? Reasonable models, no.
These reasoning models are revenue.
It's been
skyrocketing this year. And then also
there's a bit of like the, you know,
Blackwell comes online but it's hard to deploy
so it takes a little, you know, there's a learning curve
to deploying it. So whereas like you got down
to like you buy the hopper, you install the data
center, it's running within like, you know, a month
or two, right? For Blackwell, it was like, it's a longer time frame
because of liability challenges, it's a new
GPU. I mean, it's just learning pain, right? Learning
growing pains.
So there was like this gap of like
how many GPs are coming onto the market right as
revenue starting to inflect. And so a lot
of capacity got
sucked up, right? And actually
prices for Hopper bottomed like three or four months ago
or like five or six months ago. Yeah. And actually they've like crept up a little bit now.
They're still like, you know, not, not. So, um, I do, I don't think we're quite
2023, 2024 era of, uh, GPUs are tight. But certainly if you want to, if you want like
just a few GPUs, it's easy. But if you want a lot, it's, it's hard. Yeah. Like you,
you can't get capacity that instantly. Yeah. Wow. What a time. So we, uh, so we wrap
on that. Dylan, this was another
instant classic. Thank you so much for coming in the podcast.
It's like two hours, bro.
Oh, no. I didn't know. I couldn't stop.
Thanks so much. It was great.
Thank you so much for having me.
Thanks for listening to the A16Z
podcast. If you enjoyed the episode,
let us know by leaving a review at rate
thispodcast.com slash A16Z.
We've got more great conversations
coming your way. See you next time.
As a reminder,
the content here is for informational purposes
only. Should not be taken as legal
tax or investment advice or be used to evaluate any investment or security and is not directed
at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates
may also maintain investments in the companies discussed in this podcast. For more details,
including a link to our investments, please see A16Z.com forward slash disclosures.