Dwarkesh Podcast - Dylan Patel — Deep Dive on the 3 Big Bottlenecks to Scaling AI Compute
Episode Date: March 13, 2026Dylan Patel, founder of SemiAnalysis, provides a deep dive into the 3 big bottlenecks to scaling AI compute: logic, memory, and power.And walks through the economics of labs, hyperscalers, foundries, ...and fab equipment manufacturers.Learned a ton about every single level of the stack. Enjoy!Watch on YouTube; listen on Apple Podcasts or Spotify.Sponsors* Mercury has already saved me a bunch of time this tax season. Last year, I used Mercury to request W-9s from all the contractors I worked with. Then, when it came time to issue 1099s this year, I literally just clicked a button and Mercury sent them out. Learn more at mercury.com.* Labelbox noticed that even when voice models appear to take interruptions in stride, their performance degrades. To figure out why, they built a new evaluation pipeline called EchoChain. EchoChain diagnoses voice models’ specific failure modes, letting you understand what your model needs to truly handle interruptions. Check it out at labelbox.com/dwarkesh.* Jane Street is basically a research lab with a trading desk attached – and their infrastructure backs this up. They’ve got tens of thousands of GPUs, hundreds of thousands of CPU cores, and exabytes of storage. This is what it takes to find subtle signals hidden deep within noisy market data. If this sounds interesting, you can explore open positions at janestreet.com/dwarkesh.Timestamps(00:00:00) – Why an H100 is worth more today than 3 years ago(00:24:52) – Nvidia secured TSMC allocation early; Google is getting squeezed(00:34:34) – ASML will be the #1 constraint for AI compute scaling by 2030(00:56:06) – Can’t we just use TSMC’s older fabs?(01:05:56) – When will China outscale the West in semis?(01:16:20) – The enormous incoming memory crunch(01:42:53) – Scaling power in the US will not be a problem(01:55:03) – Space GPUs aren’t happening this decade(02:14:26) – Why aren’t more hedge funds making the AGI trade?(02:18:49) – Will TSMC kick Apple out from N2?(02:24:35) – Robots and Taiwan risk Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
Transcript
Discussion (0)
All right, this is the episode of my roommate teaches me semiconductors.
It's also the sendoff for this current set.
Yeah, you know, after you use it, I'm like, I can't use this again.
I got to get out here.
No sloppy seconds for Dorken.
Okay, Dylan is the CEO of semi-analysis.
Dylan, the burning question I have for you, if you add up the big four, Amazon,
meta, Google, Microsoft, their combined forecasted cabbacks that you published recently,
this year is $600 billion.
and given, you know, yearly prices of renting that compute, that would be like close to 50 gigawatts.
Now, obviously, we're not putting on 50 gigawatts this year.
So presumably that's paying for compute that is going to be coming online over the coming years.
So I have a question about how to think about the timeline around when that CAPEX comes online.
Similar question for the labs where, you know, Open AI just announced that they raised $110 billion.
Anthropic just announced they raised $30 billion.
and if you look at the compute that they have coming online this year,
you should tell me how much it is,
but is it not,
is it not another four gigawatts total that they'll have this year?
It feels like the cost to rent the compute that Open AI and Anthropic will have this year
to like sustain their compute spend at, you know, $10, $13 billion a gigawatt.
Those individual raises alone are like enough to cover their compute spend for the year.
And then this is not even including the revenue that they're going to earn this year.
So help me understand first.
First, when is the time scale at which the big tech CAPEX is actually coming online?
And two, what are the labs raising all this money for if like the yearly price of a one gigawatt data center is like $13 billion?
So when you talk about the CAPX of these hyperscalers, right, on the order of $600 billion, and you look at the cross the rest of the supply chain, gets you to on the order of a trillion dollars.
A portion of this is, you know, immediately for compute going online this year, right?
the chips and the other parts of CAPEX that do get paid this year.
But there's a lot of setup CAPEX as well, right?
So when we have, when we're talking about 20 gigawatts this year in America, roughly,
incremental.
Incremental added capacity.
A portion of this is not spent this year.
A portion of that CAPX has actually spent the prior year.
And so when you look at, hey, Google's got $180 billion.
Actually, a big chunk of that is spent on turbine deposits for 28 and 29.
A chunk of that is spent on data center construction for 27.
A chunk of that is spent on, you know, power purchasing agreements and down payments and all these other things that they're doing for further out into the future so that they can set up this super fast scaling, right?
And this applies to all the hyperscalers and other people in the supply chain.
And so, you know, 20 gigawatts roughly deployed this year, a big chunk of that being hypers, a chunk of not being.
And all of these companies, their biggest customers are Anthropic and Open AI.
Anthropic and Open AI are in the, you know, two gigawatt and, you know, two and a half gigawatt
and one and a half gigawatts roughly right now.
They're trying to scale too much larger, right?
If you look at what Anthropic has done over the last few months, you know, $4 billion,
six billion revenue added, and if we just draw a straight line, hey, yeah, they'll add another
$6 billion of revenue a month.
People would argue that's bearish and that they should go faster.
What that implies is that they're going to add $60 billion of revenue across the next
10 months, right? And $60 billion of revenue at the current gross margins that Anthropic had,
at least last reported by media, would imply that they have, you know, roughly $40 billion
of compute spend for that inference for that 60 bill of revenue. That 40 billion of compute at roughly
$10 billion a gigawatt rental cost means that they need to add four gigawatts of inference capacity
just to grow revenue. And that's saying that their research and development training fleet
stays flat, right?
So, you know, in a sense,
Anthropic needs to get to well above
5 gigawatts by the end of this year,
and it's going to be really tough for them to get there,
but it's possible.
Can I ask a question about that?
So if Anthropic was not on track
to have 5 gigawatts by the end of this year,
but it needs that to serve both
the revenue that's gone crazier than expected,
and maybe it's going to be even more than that,
plus the research and training to make sure its models
are good enough for next year,
how, where is that going to come from?
You know, Dario, when he was on your podcast,
podcast was very, very, like, conservative.
He's like, you know, I'm not going to go crazy on compute because if my revenue inflex
at a different rate, at a different point, I don't want to go bankrupt.
You know, I want to make sure that we're being responsible with this scaling.
But in reality, you know, he's definitely missed the pooch in terms of like going like
Open AI, which was let's just sign these crazy fucking deals, right?
And Open AI has kind of got way more access to compute than Anthropic by the end of the
year.
And so what does Anthropic have to do to get the compute?
Well, they have to go to lower quality providers that they would not have gone to before, right?
You know, optimally, you know, Anthropic, at least historically, has had the best quality providers been like Google and Amazon.
Whereas, you know, at least historically minded, you know, the biggest companies in the world, now Microsoft and now they're expanding across the supply chain and going to other players that are newer.
OpenA.
OpenA. has been, you know, a bit more aggressive on going to many players.
Yes, they have tons of capacity from Microsoft.
They have Google and Amazon as well, but they also have like tons with CoreWeave and Oracle.
and they've gone to like random companies or you know one would think random companies like soft bank energy who has never built a data center in their life but you know they're building data centers now for opening i so they've gone to and many others like n scale and others that they're going and getting capacity from and so there's this like conundrum for anthropic because they were so conservative on compute um because they didn't want to go crazy right and in some sense a lot of the financial freakouts in the second half of last year were like opening i signed a
all these deals, but they don't have the money to pay for them.
Okay, Oracle stock's going to tank.
Oh, okay, Corrieve stock's going to take.
Oh, okay, like, you know, all these company stocks tanked.
And credit markets went crazy because people were like, the end buyer can't pay for this.
Now it's like, oh, wait, they raised a ton of money.
Okay, fine, they can pay for it.
But in the sense, Anthropic was a lot more conservative.
They were like, we'll sign contracts, but we'll be principled and we'll purposely
undershoot what we think we can possibly do and be conservative because we don't want to
potentially go bankrupt.
The thing I want to understand is, so, you know, what,
What does it mean to have to acquire compute in a pinch?
Is it that you have to go with, like, neoclouds that, is it that they have worse computers?
Like, in what way is it worse?
And is it that you have to pay gross margins to a coprider that you wouldn't have otherwise
had to pay to because they're coming in the last minute?
Who built the spare capacity such that it's available for Anthropic and OpenEye to get last minute?
And, like, basically, what is the concrete advantage that Open AI has gotten if they end up
at similar compute numbers by 2027?
is it just like they're going to end this year
with different gigawatts?
If so, how many gigawatts
is anthropic and opening I'm going to have
by the end of this year?
Yeah.
So to acquire excess compute,
I mean, yes, there is capacity at hyperscalers
that and not all contracts for compute
are long term, right, five years, right?
There's compute that in 2023 or 2024,
H-100, 2020-25,
that were signed at not five-year deals, right?
Open-I, the vast majority of their compute
is signed at five-year deals.
But they could, you know,
there were many other customers
that had one year, two year, three year deals, six-month deals on demand.
And as these contracts roll off, who is the participant in the market most willing to pay
price?
And in this sense, right, we've seen H-100 prices inflect a lot and go up and people willing to
sign long-term deals for, you know, as above $2 even, right?
Like I've seen deals where certain AI labs, I'm going to be a little bit vague here for a reason,
have signed at as high as $2.40 for two to three years for H-100s, which if you think about the margin,
$1.40 for Hopper when you release it or Hopper to build it across five years. And now two years
in your signing deals that are two to three years that are $2.40. Those margins are way higher.
And so now you can crowd out all of these other suppliers, whether it's Amazon had these
or Corrieve had these or Together AI or Nebius or whoever it is, right? You know, these
neoclouds are the firms that had a higher percentage of hopper in general because they were more aggressive on it a and b
b they tended to sign shorter term deals you know not core weave but the others tended to sign shorter term deals
and so hey if i want hopper there is some capacity out there and then also while most of the capacity
at like an or a cor weave is signed for a long-term deal in terms of blackwell uh anything that's going
online this quarter's already sold. And in some cases, they're not even hitting all the numbers
that they promised they would sell because there are some data center delays, not just those two,
but like Nebius and all the other folks, Microsoft, Amazon, Google. But there is a lot of Neal Clouds,
as well as some of the hypers who have capacity they're building that they did not sell yet,
or capacity that they were going to allocate to some internal use that is not necessarily
super AGI focused that they may now turn around and sell. Or they may, you know, in the case of
Anthropic, they don't have to have all the compute directly, right? Amazon can have the
compute, they can serve bedrock or Google can have the compute and serve Vertex or Microsoft
can have the compute and serve foundry and then do a revenue share with Anthropic or vice versa.
Basically, you're saying, Anthropic is having to pay either this like 50% markup in the
sense of the revenue share or in the sense of last minute spot compute that they wouldn't
have otherwise had to pay had they bought the computer early.
Right.
And, you know, there's a trade off there.
But also at the same time, you know, for a solid like four months, everyone was like opening
we're not going to sign deals with you.
Like, that sounds crazy, right?
Because you guys don't have the money.
Now everyone's like, yeah, open eye, we believed you the whole time.
We can sign any deal because you've raised all this money.
But in a sense, Anthropic is constrained in that sense.
There are not that many incremental buyers of compute yet because Anthropic hit the
capabilities here first where their revenue is moaning.
That's interesting.
Like that's this, you know, because otherwise we're like, well, having the best model
is an extremely depreciating asset that, you know, three months later are you
don't have the best model. But like, the reason it's important is that you can sign these deals
and then lock in the compute in advance, get better prices. Doesn't this also imply, by the way,
and maybe this is an obvious point, but there's, at least until recently, people had made this
huge point about, oh, what is the depreciation cycle of a GPU? And the bears, Michael Burry's or
whatever, have said, look, people are saying that four or five years for these GPUs. And in fact,
if you, maybe it's because the technology is improving so fast or whatever,
in fact it makes sense to have two-year depreciation cycles for these GPUs,
which increases the sort of like reported amortized KAPX in a given year,
and so it makes it maybe financially less lucrative to building all these clouds.
But in fact, you're pointing at like maybe the depreciation cycle is even longer than five years,
because if we're using hoppers and then especially if AI really takes off and in 2030,
we're like, fuck, we got to like get the seven nanometer fabs up and we got to like,
We've got to go back to the A100s.
Like, turn on the A100s again.
Then it's like, actually the depreciation cycle is incredibly long.
And I feel like that's an interesting financial implication of what you're saying.
There's a few strings to pull on there.
One is what happens to depreciation of GPUs, right?
And I guess I didn't answer your prior question, which is like Anthropic, I think we'll be able to get to like five gigawatts-ish, maybe a little bit more by the end of the year.
through themselves as well as their product being served through bedrock or through vertex or through foundry.
I think they'll be able to get to five or six gigawatts, which is way above their like initial plans, right?
You know, and anyways, that's sort of like, and an open eye will be a little, roughly the same, maybe a little higher.
Actually, a little bit higher based on our numbers.
But anyways, the depreciation cycle of a GPU, right?
Michael Burry was saying it's, you know, three years or less, right?
is like sort of his argument.
And there's sort of two ways and lenses to look at this.
Like mechanically, in this, you know, there's a TCO model, right,
total cost of ownership of a GPU where we sort of project pricing out for GPUs
and build up the total cost of a cluster.
But there's a number of costs, right?
There's your data center cost, right?
There's your networking costs.
There's your smart hands and people in the data center swapping stuff out.
There's your spare parts, right?
There's your actual chip cost.
There's your server costs.
All these various costs get slumped together.
and there's some depreciation cycles on it.
You know, there's certain credit costs on it.
And you get to, okay, that's how you build up,
hey, an H100 cost $1.40 an hour to deploy at volume across five years if your depreciation
is five years.
And then if you sign a deal at $2 an hour for those five years, your gross margin is
roughly 35%.
It's a little bit above that.
But, you know, if you sign it for $1.90, it's 35% roughly.
And then you assume at that fifth year, the GPU falls off a bus, right?
It's dead.
And in some cases, you know, sort of the R&DGPWRGP.
argument people are making is, well, if you didn't sign a long-term deal, because every two years
in videos, tripling, quadrupling the performance, while only two-xing the price or 50% increasing
the price, then the price of an H-100, sure, maybe the value in the market was $2 at 35% gross
margins in 2024, but in 26, when Blackwell is in super high volume and deploying millions
a year, you're actually now worth a dollar an hour. And when Rubin in 27 is in super high volume,
even though it starts shipping this year
isn't super high volume next year
doing millions of chips a year
deployed into clouds
you've got another 3x in performance
and another 50% or 2x in price
actually the hopper's only worth
70 cents an hour
and so the price of a GPU
would continue to fall
that's like one lens
the other lens is what is the utility
you get out of the chip right
because if you could build infinite Rubin
or infinite of the newest chip
then yes that's exactly what would happen
the price of a hopper would fall
at a spot or a short-term contract rate
as the new chips come out
and the price per performance goes up.
But because you are so limited on semiconductors
and deployment timelines and all these things,
you end up with actually what prices these chips
is not, hey, what's the comparative thing
I can buy today?
It's actually what is the value
I can derive out of this chip today, right?
And in that sense, let's take GPD 5.4.
GPD 5.4 is both way cheaper to run than GPD 4, has fewer active parameters.
It's much smaller, right, in that sense of active parameters.
Plus, because it's a sparser M.O.E versus GPD4 being a coarser M.O.E.
There's also been so many other advancements in training, RL, model architecture, et cetera, data qualities, all these things that have made GPD5.4 way better than GPD4, and it's cheaper to serve.
And so when you look at an H100, it can serve more tokens per GPU of 5.4 than if you had ran for GPD4 on it.
So at some sense, it's producing more tokens of a model that is of higher quality.
Interesting.
And so in some sense, you know, obviously GPD4, what is the maximum tam for its tokens?
You know, maybe it was a few billion dollars, maybe those tens of billions of dollars, adoption takes time.
For GPD 5.4, that number is probably north of 100 billion, but there's an adoption lag and there's competitions.
Other people are getting it.
and there's the constant improvements that everyone else is having.
So if improvements stopped, you know, here, the value of an H-100 is now predicated on the value that GPD 5.4 can get out of it instead of the value that GPD4 can get out of it.
And the margins and all that stuff that these labs are doing and they're in a competitive environment so their margins can't go to infinity.
So you sort of have this, like, dynamic that is quite interesting in that.
And H-100 is worth more today than it was three years ago.
That's crazy.
And, I mean, it's also interesting from the perspective of like, just take that forward.
If we had actual AGI models developed,
if we had like genuinely human on a server,
and a human, like, on a flop basis, in H100,
these are such hand-wavy numbers
about how many flops can the brain do.
But on a flop basis, an H-100 is estimated to 1E15
is like how much some people estimate the human brain does in flops.
Obviously, in terms of memory, the human brain has way more.
H-100 is like 80 gigabytes and brain might have petabytes.
Oh, yeah, you've got petabytes?
Name a petabyte of ones and zeros, bro.
Name me a string.
Well, this is actually the point.
Or like actually in...
No, we've just got the best sparse attention techniques ever.
Genuinely, right?
Like, in the sort of like amount of information that is compressed, it might be petabytes.
But like the actual...
You know, it's like extremely sparse ammo.
But anyways, imagine if we had a human knowledge worker can produce six figures a year of value.
And so if an age 100 can produce something close to that, if we had actual humans on a server,
The value of an H100 is like it can repay itself in the course of like a couple of months.
So as I've been going through everything to prep for taxes,
I realized that I worked with over 50 different contractors last year,
from cinematographers to audio technicians to editors.
And I owed all of them 10.99s.
In the past, I've just used a spreadsheet and a big folder of invoices
to figure out who I need to collect tax forms from.
But with so many contractors, this takes a bunch of time,
and I've almost missed some people.
This year, though, Mercury made my process way more straightforward.
Whenever I pay somebody in 2025, I just hit a toggle to have Mercury request to $1.9 from them.
Because of that, everything that I needed to issue 1099s got sent directly to Mercury.
I literally just clicked a button and Mercury generated and sent them all out.
This is just one of the many things that I never would have assumed that a banking platform could just handle for me.
Mercury has a bunch of features like this, which are going to collectively save me multiple days this tax season.
You can learn more at mercury.com.
Mercury is a fintech company, not an FDIC insured bank.
banking services provided through Choice Financial Group and Column N.A. Members FDIC.
So when I interview Dario, the point I was trying to make is not that I think the singularity is two years away, and therefore, Dario desperately needs to buy more compute.
Although the revenue is certainly there that he needs to buy more compute.
But the point I was trying to make is that given what Dario seems to be saying, given his statements that were two years away from a data center of geniuses, certainly not more than five years away.
and data server geniuses should be running trillions upon trillions of dollars of revenue.
It just does not make sense why he keeps making these statements about being more conservative on compute or to your point,
being less aggressive than open AI on compute.
And I guess that point got lost because people were like roasting me about like, oh, this podcast was I try to convince this like multi-hundred-billion-dollar company CEO.
Like, why don't you, you all know it, bro?
But no, I was trying to say that internally his statements are inconsistent.
Anyway, so it's good to iron it out.
Yeah, I think, you know, going back to like sort of the earlier view that if the models are so powerful, the value of a GPU goes up over time.
As we approach closer and closer to, you know, let's say a point where right now only open-eyeanthropic have that viewpoint.
As we approach further and further out, actually everyone is going to, even with open source models, be able to like sort of like start to see that value skyrocket per GPU.
and so in that sense you should you should commit now to compute but interestingly in like in
anthropic fashion right you know there there's a bit of a meme that they are they don't they have
problems with commitment issues and they're like sort of polyamberous not not darry but this is a bit
of a meme this explains everything um by the way so there's this interesting economics effect
called alken allen which is the idea that
if you increase the fixed cost of different goods, one of which is higher quality and which is lower quality, that will make people choose the higher quality good on the margin.
So to give a specific example, suppose the, you know, better tasting apple costs $2 and then like the shittier apple, apple costs $1.
Okay, now suppose you put an import tariff on them.
And so now, now it's $3 or just $2 for like great apple, medium apple, right?
Is that because they both increased by a dollar or should it be like 50% increase?
No, no, because they both increased by a dollar.
The whole effect is that if there's a fixed cost that's applied to both, the relative price, the price difference between them, the ratio changes.
So previously it was like the more expensive one was 2x more expensive and that was just 1.5x more expensive.
So I wonder if applied to AI that would mean that, look, if GPUs are going to get more expensive, there will be a fixed cost increase in the price of compute.
Yes.
as a result, that will push people to be willing to pay higher margins to for slightly better models.
Because the calculus is, I'm going to be paying all this money for the compute anyways.
I might as well just pay slightly more to making sure it's like the very best model rather than a model that's slightly worse.
Right. So the hopper went from $2 to $3.
And if a hopper can make a million tokens of Opus and it can make two million tokens of Sonet,
the price differential between Opus and Sonnet has decreased because the price of the GPU
has increased by a dollar from two to three.
Interesting.
I think that makes a ton of sense.
Also, I think we just see all of the volumes are on the best models today, all the revenues
on the best models today.
And in a compute limited world, there's sort of two things that happen, right?
A, companies that have locked up, you know, and don't have commitment issues, you know, have
these five-year contracts for compute, they've kind of locked in a humongous margin advantage
because they've locked in compute for five years at a price of what it transacted at five years
ago or three years ago or two years ago, whatever it is. Whereas if you're now three years
into that five-year contract and someone else's two-year contract or three-year contract rolled
off, and now you're trying to buy that at, you know, modern pricing when you're priced to the value
of models, the price is going to be up a lot more. And so in a sense, like the person who committed
early has better margins in general.
And the percentage of the market that is in long-term contracts is much larger than the
percentage of the market in short-term contracts that can be this sort of flex capacity
that you add at the last second.
And at the same time, right, so where does the margin go, right?
Because models get more valuable.
How much can the cloud players flex their pricing?
Well, in fact, like if you look at Corrieve, their average term duration is like over three years
right now for like 90% plus of their compute.
It's over three years.
And so they end up with this like conundrum of like, well, they can't actually flex price.
But every year they're adding incrementally way more capacity than they had previously, right?
This year alone, right, meta's adding as much capacity as they had in the entire fleet of
compute and data centers for all purposes for serving WhatsApp and Instagram and Facebook
in 2022 and doing AI, right?
They're adding that alone this year.
So in the same sense, you know, you talk about meta's doing that.
Correve and Google and Amazon, all these companies are adding insane amounts of compute
year on year on year, that new compute gets transacted at the new price.
So in a sense, yes, you've locked in, as long as we're in a sort of a takeoff, right?
Oh, opening I went from 600 megawatts to two gigawatts last year and from two gigawatts to,
you know, six plus this year and, you know, six to 12 next year, right?
The incremental added compute is where all the cost is, not the prior long-term contracts.
So then who holds the card is the info providers for,
charging margin, right? So now the cloud players, the neoclods or the hypers can charge the margin,
oh, they can't because, or they can't to some extent, but then as you go upstream to, oh, well,
who has access to all the memory and logic capacity? Well, it's, it's in video for the most part.
They've signed a lot of long-term contracts. You know, they've got like $90 billion of long-term
contracts today, and they're negotiating three-year deals with the memory vendors today.
You know, you've got, you know, obviously Amazon and Google through broadcom and they're, you know,
Amazon directly and all these companies sort of AMD. These
companies hold all the cards because they've secured the capacity. And TSMC is not raising prices,
but memory vendors are just like sort of to some extent raising a lot of price, right? So they're
going to double or triple price again. But then they're also signing these long-term deals.
So who is able to accrual the margin dollars is actually, you know, potentially the cloud,
potentially the chip vendors and the memory vendors until TSMC or ASML like break out and
they like, no, actually we're going to charge a lot more. But at the same time, do the model
vendors get to charge crazy margins.
I think at least this year we're going to see margins for the model vendors go up a lot,
right?
Because they're so capacity constrained, they have to destroy demand, right?
There's no way they can continue, Anthropic can continue at the current pace without
destroying demand.
Yeah.
Let's get into logic and memory.
How specifically Nvidia has been able to lock up so much of both?
So if you, I think according to your numbers by 27, Nvidia is going to have like 70 plus
percent of N3 wafer capacity or something like that, or around that area.
And then I forget what the numbers were for a memory at SK-Hinex and Samsung and so forth.
But if you look at – so think about how the neocloud business works and how NVIDIA works with
that or how the RL environment business works and how Anthropic works with that.
In both those cases, Nvidia is purposely trying to fracture the complementary industry to make
sure that they have as much leverage possible.
So they're giving, you know, allocation to random neoclots to make sure that there's not one
person that has all the compute.
Similarly, Anthropic or Open AI when they're working with the data providers, they say,
no, we're going to just cede a huge industry of these things so that we're not locked
into any one supplier for data environments.
And I wonder why on the three nanometer process, that's going to be trinium three,
that's going to be TPUV7, other accelerators potentially.
and why is TSM's just giving it all up to
Nvidia rather than trying to fracture the market?
Yeah, so I think there's a couple like points here, right?
On three nanometer, you know, if we go back to last year,
the vast majority of three nanometer was Apple.
Apple's being moved to two nanometer,
memory prices are going up so Apple's volumes may go down, right?
Because as memory prices go up, they have to,
either they cut margin or they move on.
You know, there's some time lag because they have long-term contracts.
But basically, Apple likely reduces demand slash moves to two nanometer faster, where two nanometers
is only capable of sort of mobile chips today.
And in the future, AI chips will move there.
So sort of Apple has that.
And then Apple is also talking to third-party vendors because they're getting squeezed out of
TSM a little bit because TSM's margins on high-performance computing, HPC, AI chips, etc.,
is higher than it is for mobile because they have a bigger advantage in mobile, sorry, in HPC than they do in mobile.
But anyways, when you look at what's TSM running calculus here?
Actually, they're providing really good allocations to companies that are doing CPUs, right?
So when you think about, hey, Amazon has Traneum and Amazon has Graviton, both of those are on 3 nanometer.
Graviton being their CPU training being their AI chip, they're actually, TSMC is much more excited to give allocation to Graviton than they are to Trayium because they view CPU business as more stable long-term growth, right?
And as a company that is conservative and doesn't want to ride cycles of growth too hard,
you actually want to allocate to the market that is more stable and lower growth rate first
before you allocate all the incremental capacity to the fast growth rate market.
Now, that is the case generally.
And so when you look at like, hey, same for AMD, right?
The allocations they get on, you know, their CPUs is like TSM is much more excited about
those than they are for GPUs.
Likewise for Amazon.
And Invidia is a bit unique because, yes, they have CPUs, yes, they make switches, yes, they make networking.
They make NVLink.
They make all these different infantaband Ethernet, all these different products, Nix.
By and large, most of these things will be on 3 nanometer by the end of this year with the Rubin launch and all the chips that are in that family.
The GPU being the most important one.
And yet, Nvidia is getting the majority of supply, right?
part of this is because you look at the market and you like sort of like, you know, TSM and
others, like there are many ways that they forecast market demand.
But also it's market signal, right?
The market signaled, hey, we need this much capacity next year.
We need this much.
We need this much.
We'll sign non-cancelable, non-returnable.
We may even pay deposits, right?
Things like this.
InVidia just did it way earlier than Google or Amazon.
And in some cases, Google and Amazon.
Amazon had stumbling blocks.
You know, there was one, one of the chips got delayed slightly by a couple quarters,
tranium and all these sorts of things happened.
And so in that case, there was a huge sort of like, okay, well, these guys are delaying,
but Invidia is wanting more, more, more, more, more, more, and we are checking with
the rest of the supply chain.
Is there enough capacity?
Right.
So they're going to all the PCB vendors, and they're saying, hey, is there enough
victory giant?
Is there enough PCB?
This is like one of the largest suppliers of PCBs to Nvidia and they're a Chinese company.
All the PCBs come from China, sort of from them.
or many of them.
And anyways, they're like, do you have enough PCB capacity?
Great.
Oh, hey, memory vendors, who has all the memory capacity?
Oh, okay, Nvidia does.
Great.
So when you look at sort of in the same way, you know, who is AGI-pilled enough to
buy compute in long timelines at levels that seem ridiculous to people who aren't
AGI-pilled, but nonetheless, they're willing to pay a pretty good margin and sign it now
because they view in the future that ratio is screwed up.
The same thing happens with the supply chain for semiconductors, right?
InVideo was, while I don't think Nvidia is quite AGI-pilled, right?
You know, Jensen doesn't believe software is going to be automated fully and all these things, right?
Accelerated computing, not AI ships, right?
It's AI chips, right?
But that's what he calls it, right?
Yeah, because, I mean, I think there's a broader term, right?
AI is within that, but, like, physics modeling and simulations and like...
Or maybe just like he's not embracing the sort of, like, main use case.
I think he's embracing it.
But, like, I just don't think he's, like, AGI-pilled like Dario, right?
Or Sam.
But he's still way, way more AGI-pilled than...
Google was at Q3 of last year or Amazon was at Q3 of last year and he saw way more demand, right?
And the reason is pretty simple.
You know, you can see all the data center construction.
He's like, okay, I want to have this market share.
You know, we sort of like have all the data centers tracked and, you know, you can see, you know, there's a lot of data centers that you could say, well, they could be one or the other, right?
And so to some extent, Google and Amazon, you know, Google especially, even though their, you know, their TPU is just better for them to deploy.
They have to deploy a crap load of GPUs because they don't have enough TPUs.
to fill up their data centers.
They can't get them fat.
Wait, can I?
So I have a question about that.
Google sold, I think, a million, was it the V7s, the Ironwoods to Anthropic?
And you're saying in general, there's this big bottleneck right now this year or next year.
I mean, I guess going forward forever now is going to be the, you know, logic memory,
the stuff that like it takes to build these ships.
And Google has deep mind.
This is the other third prominent AI lab.
And if this is the big bottleneck, why would they sell it rather than just giving it to deep mind?
Right. So this is, again, like, a problem of, like, you know, deep mind people were like, this is insane. Why did we do this? Right. But then Google cloud people and Google executives saw a different like thought process, right? And basically, you know, you and I know the compute team. There's one guy from, you know, both of them actually came from Google at the main people on the compute team atthropic. They saw this dislocation. They negotiated a deal and they were able to get access to this compute before Google realized. And so actually the change. And so actually the change.
chain of events, at least from our data that we found was in early Q3, we saw over the course
of two, over the course of like six weeks, we, we saw capacity on anthropic, or sorry, on
TPUs go up by a significant amount over the course of those six weeks. And it went up like multiple
times in those six weeks, right? There were multiple requests. Google even had to go to TSMC
and explain to them why they needed this increase in capacity because it was so sudden. But that, a lot of
that capacity increase was for selling to Anthropic.
Yeah.
Because Anthropic saw it before Google.
And then Google had Dano Bonano and Gemini 3, which caused their user metrics to skyrocket.
And leadership at Google was like, oh.
And then they started making the statement of we have to double compute every, is it six months?
Or I don't remember the exact number that they said.
But they really woke up a lot more.
And then they're like, oh, hey, TSMC, we want more.
We want more.
And it's like, well, sorry, guys.
Like, we're sold out for next year.
We can work on next year.
we can maybe get like 5, 10% more for 26, but really we're going to work on 27, right?
There's sort of like, you know, there's this like information asymmetry of the labs in my mind, right?
I don't know if this is exactly the narrative I've spun myself from seeing all the data in the supply chain on like way for orders and like what's going on with the data centers that, you know, anthropic signed and fluid stack signed and all this.
Like sort of it's, it's pretty clear to me that Google screwed up.
And you can see this from Google's Gemini ARRs, right?
they had next to nothing in Q1, Q3, Q3 a little bit, right, once they started inflecting.
But Q4, they were at like 5 billion ARR, right, exiting or something like this.
So it's like, or 5 billion revenue for Q4 on an ARR basis.
And so it's clearly like Google didn't see revenue skyrocket.
And in a sense, right, Anthropic was not willing, you know, has kind of had like a little bit of commitment issues before their ARR exploded, even though they have far more information asymmetry and see what's coming down the pipe.
Google is going to be more conservative than anthropic is, A, and B, Google had even less ARR.
So they sort of were like, I think, just not willing to like sort of do it.
And then they realized they should do it.
And so now since then, Google has gotten absurdly AGI-I-pilled, right, in terms of like what they're doing.
They bought an energy company.
They're buying putting deposits down for turbines.
They're buying a ridiculous percentage of the powered land.
They're going to utilities and negotiating like that.
long-term agreements are doing this on the data center and power side very, very aggressively,
right? So, you know, I think Google woke up towards the end of last year, but it took them
some time. And how many gigawatts do you think Google will have by the end of next year?
By my data.
You charge for that kind of information. Yes, yes.
I feel like every year, the bottleneck for what is preventing us from scaling AI compute
keeps changing. A couple years ago was co-os. Last year, it was power this year. You'll tell me
where the bottleneck is this year, but I want to understand five years out, what will be the thing
that is constraining us from deploying the singularity? Yeah, I think the biggest bottleneck is compute,
and for that, the longest lead time supply chains are not power or data centers. They're actually
the semiconductor supply chain themselves, right? It switches back from being power and data center
as a major bottleneck to chips. And in the chip supply chain, there's a number of different
bottlenecks, right? There's memory, there's logic wafers for.
MTSMC, there's FABs themselves. Construction of the FABs takes a couple years, two to three
years versus a data center takes less than a year, right? We've seen Amazon build data centers
in as fast as eight months, right? So there's a big difference in lead times because of the
complexity of the building, the FAB that actually makes the chips. And then the tools, right,
those also have really long lead times. And so the bottlenecks as we've scaled have shifted from,
hey, what is the supply chain currently not, what is it currently not able to do, which was co-os
and power and data centers, but those were all shorter lead time items, right?
Co-os is a much more simple process of packaging chips together.
Power and data centers are ultimately way more simple than the actual manufacturing of the chips.
And so there's been some sliding of capacity across, you know, mobile or PC to data center
chips, but that's been somewhat fungible, whereas on, whereas co-os and power and data centers,
those have sort of had to start anew as supply chains, but now there's sort of no more capacity
for the mobile and PC industries, which used to be the majority of the semiconductor industry,
to shift over to AI, right?
NVIDIA is now the largest customer at TSMC, and Nvidia is the largest customer,
SK Hynix, the largest memory manufacturer, right?
So it's sort of impossible for this scaling or this sliding of,
resources away from the common person, right, PCs and smartphones to shift any more towards
the AI chips. And so now how do we scale the AI chip production? And that's the biggest bottleneck
as we go to 2030 is those. It would be very interesting if there's an absolute gigawatt ceiling
that you can project out to 2030 based just on, hey, we can't produce more than this many
EUV machines. Right. So to scale compute further, right, there's some different bottlenecks. This
year, next year, but ultimately by 28, 29, the bottleneck falls to the lowest rung on the supply chain,
which is ASML, right? ASML makes the world's most complicated machine, i.e. an EUV tool,
and the selling price for those is $300, $400,000, $400 million. And currently, they can make about
$70. Next year, they'll get to $80. Even under very aggressive supply chain expansion, they only get
to a little bit over $100 by the end of the decade. And so what does that mean? Okay, they can make
100 of these tools by the end of the decade and, you know, 70 right now. How does that actually
translate to AI compute, right? We see all these numbers from Sam Altman and many others across
the supply chain, gigawatts, gigawatts, gigawatts, right? How many gigawatts are we adding? And we see,
you know, Elon saying, hey, the 100 gigawatts in space. A year. A year, right. The problem with any of
these numbers or the challenge to these numbers is, you know, actually not the power or not the
data center. We can dive into that. But it's, it's, it's, it's, it's, it's, it's many.
manufacturing the chips, right? So a gigawatt of, you know, NVIDIA's Ruben chips, right? So Rubin is
announced at GTC, I believe the week this podcast goes live. And to make a gigawatt worth of
data center capacity of Nvidia's latest chip that they're releasing at the end of this year,
towards the end of this year, you need, you know, a few different wafer technologies, right? You need
about 55,000 wafers of 3 nanometer. You need about 6,000 wafers of 5 nanometer. And then you need
about 170,000 wafers of DRAM, right, memory.
And so across these three different buckets, each of these requires different amounts of
EUV.
Right.
So when you manufacture a wafer, there's thousands and thousands of process steps where
you're depositing material, removing them.
But the sort of key critical step, which at least in advanced logic is like 30% of the
cost of the chip, is something that doesn't actually put anything on the wafer.
You take the wafer, you deposit photoresist, which is like a chemical that, basically
chemically changes when you expose it to light, and then you stick it into the EV
tool, which shines light at it in a certain way.
It patterns it, right?
Because there's what's called a mask, which is a stencil effectively for the design.
And so when you look at a wafer, you know, leading edge 3 nanometer wafer has 70 or so
masks, right?
70 or so layers of lithography, but 20 of them are the most advanced EUV, right?
And that specifically, you know, if you think about, okay, well, if I need 55,000 wafers
for a gigawatt, if I do 20 EUV way past.
per wafer, you then can do the math that's like, okay, that's 1.1 million passes of
EUV for a single gigawatt.
So actually, like, it's pretty simple.
And then once you add the rest of the stuff, it ends up being two million, right, across
five nanometer and all the memory.
You're at roughly two million EUV passes for a single gigawatt.
You know, these tools are very complicated.
So when you think about what it's doing across a wafer, it's taking the wafer and it's
scanning and it's stepping across, right?
It's standing, stepping across, and it does this hundreds of times across the
entire, or dozens of times across the whole wafer.
And so when you're talking about, hey, how many EUV passes, that's the entire wafer is
being exposed at a certain rate.
A wafer, a UV tool can do roughly 75 wafers per hour.
And the tool is up roughly 90% of the time, right?
So in the end, you end up with, actually, I need about three and a half EUV tools to do
the 2 million EUV wafer passes for the gigawatt.
So three and a half EUV tools satisfies a gigawatt.
So it's funny to think about the numbers, right, because we're talking about, oh, what's a gigawatt cost?
It costs like $50 billion, roughly, right?
Whereas what does three and a half?
EUV tools cost, that's like 1.2, right?
It's actually, like, quite a lower number, which is interesting to think about, like,
oh, 50 gigawatts of economic, you know, sort of CAPEX in the data center.
And what gets built on top of that in terms of tokens is even larger, right?
It might be $100 billion worth of AI value into the supply chain is held up by this $1.2 billion worth of tooling
that simply just cannot expand its supply chain quickly.
In fact, it goes, even the intermediate layers are sort of shocking here.
So Carl Zeiss, which is like the optic supplier that is bottlenecking ASML itself,
I checked its market cap this morning.
You know what it is?
$2.5 billion.
Dude, let's LBO it.
Let's LBO it.
And I think, so you had this article recently where you were saying over the last three years,
TSM has done $100 billion of CAPEX.
So it's like 30, 30, 40.
And if you think about, I mean, a small fraction of that is sort of like being used by
Nvidia for the three nanometer that it's going to, or, you know, previously for a nanometer
that it's using for its chips.
But Nvidia has turned that into, what was, what are it's like, earnings last quarter was
like 40 billion?
And so 40 billion times four.
So 160 billion dollars.
So invidia alone is turning some small fraction of 100 billion in Kepax that's going to
be depreciated over many years, not just this one year, into $160 billion in a single year.
And then that gets even more intense when you go down the supply chain to ASML, which is taking
a billion dollars with the machines to produce a gigawatt.
And, of course, those machines last for more than a year, right?
So it's doing more than that.
Okay, so now I want to understand, okay, well, how many such machines will there be by 2030
if you include not just the ones that are sold that year, but are, have been compiling over the
previous years.
And what does that imply about the...
Sam Altman says he wants to do a gigawatt a week in 2030.
When you add up those numbers, is that compatible with that?
Right.
That's completely compatible, right?
Because if you think about TSM and the entire ecosystem has something 250 to 300
EUV tools already, and then you stack on 70 this year, 80 next year, growing to 100
by 2030, you're at like 700 EUV tools by the end of the decade.
700 EUV tools, 3.5 tools per gigawatt, assuming it's all allocated to AI, which it's not,
but 3.5 tools per gigawatt gets you to 200 gigawatts worth of AI.
chips for the data centers to deploy, right? So 200 gigawatts, Sam wants 50 gigawatts, right, 52
gigawatts a year. He's only taking 25% share then, right? Obviously, there's some share given to,
you know, mobile and PC, assuming that, you know, for some reason, we're allowed to even have
consumer goods still, you know, and we don't get priced out of them. But, you know, roughly, like,
he's saying 25%, 50%, you know, 25% market share of the total chips fab. That's, that's kind of like very
reasonable given, you know, this year alone, I think he's going to have access to 25% of the
Blackwell GPUs that are deployed, right? So it's not that crazy. I find it surprising that,
you know, when was the first, when did ASML start shipping UV tools when the 7 nanometer
started? So I don't know when that was exactly. But you're saying in 2030, they're going to be
using machines that initially were shipped in 2020. So 10 years, you're using the same most important
machine in this most technologically advanced industry in the world. I find that surprising.
So ASML's been shipping EUV tools now for roughly a decade, but it only entered mass volume
production around 2020. You know, the tool's not the same. You know, back then the tools were
even lower throughput. There's various specifications around them called overlay, right?
You know, as mentioning your stacking layers on top of each other, right? You'll do some EUV.
You'll do a bunch of different process steps, depositing stuff, etching stuff, cleaning the wafer,
you know, dozens of those steps before you do another EUV layer. There's a spec called overlay,
right, which is, okay, you did all this work, you drew these lines on the wafer. Now I want
to draw these dots, right? Let's just say I want to draw these dots to connect these lines of metal
and then, you know, holes, and then the next layer up is another set of lines that goes perpendicular,
so now you're connecting wires going perpendicular to each other. You have to be able to land them
on top of each other. So it's called overlay. And overlay is a spec that's been improved rapidly by
ASML. Way for throughput has been improved rapidly by ASML. And also the price of the tool has gone up,
but not as much as the capabilities of the tool, right? Initially, the EUV tools were like 150 million,
and over time, they're now like 400 million, you know, as I look out to 2028. But the capabilities
of the tools have more than doubled as well, right? Especially on throughput and overlay accuracy,
which is the ability to stack, you know, accurately align the subsequent passes on top of each other,
even though you do tons of steps between.
And so this is, you know, ASML is improving super rapidly.
I think it's also something noteworthy to say.
ASML is, you know, maybe one of the most generous companies in the world, right?
They have this linchpin thing.
No one has anything competitive.
Maybe China will have some EUV by the end of the decade.
But no one else, you know, has anything even close to EUV.
And yet they haven't taken price and margins up like crazy, right?
You know, you go ask, you know, some other folks, you know, that we talk to all the time, like, you know, for example, Leopold, and they're like, you know, let's let's have the price go up, right?
Because they can.
The margin is there.
You can take the margin.
Like, Nvidia takes the margin.
Memory players are taking the margin.
But ASML has never risen the price more than they've increased the capability of the tool.
And so in a sense, they've always provided net benefit to their customer.
It's not that the tool is stagnant.
It's just that, like, you know, these tools are old.
Yes, you can upgrade them some and the new tools are coming.
and for simplicity's sake, we're kind of ignoring, you know, the advances for this podcast,
the advances in overlay or throughput per tool.
So you say we're producing 60 of these machines this year and then 70, 80 over subsequent years.
What would happen if ASMO just decided to double its cap X or triple as cap X?
What is preventing them for producing more than 100 in 2030?
Why is so confident that even five years out you can be relatively sure what their production will be?
So I think a couple factors here, right?
ASML has not decided to just go YOLO, let's expand capacity as fast as possible, right?
In general, the semiconductor supply chain has not, right?
It's lived through the booms and bust, and we can talk a bit more about it.
But basically, no one, you know, some players as of very recently have, like, woken up.
But in general, no one really sees demand for 200 gigawatts a year of AI chips or, you know,
trillions of dollars of spend a year in the semiconductor supply chain.
chain. They're just like, they're not, they're not AI-pilled, right? They're not AGI. We're going to get to a trillion dollars this year. Yeah, I feel you, but I'm saying like, no one really understands this in the supply chain.
Constantly, we're told our numbers are way too high. And then when they're right, they're like, oh, yeah, yeah, but your next year's numbers are still too high. And it's like, but anyways, like ASML has sort of their tool has four major components, right? It has the source, right, which is made by Symer in San Diego. It has the,
reticle stage, which is made in Wilmington, Connecticut, right, has the wafer stage and the
optics, right, the lenses and such. And those two are made in Europe, right? And so when you look at each
four, each of these four, they're tremendously complex supply chains that, A, they have not tried to
expand massively, and B, when they try to expand them, the time lag is quite long, right? And so,
again, this is the most complicated machine that humans make period, right, at a volume,
at any sort of volume, but like, let's talk about the source specifically, right? What does the
source do? It drops these tin droplets. It hits it three subsequent times with the laser
perfectly. So the first one hits this tin droplet, expands out. It hits it again, so it expands
out to this perfect shape, and then it blasted at super high power, and the tin droplets get excited
enough that they release EUV light, 13.5 nanometer. And then it's in this thing.
that is like basically collecting all the light
and directing it into the lens stack, right?
Then you have the lens stack, which is Carl Zeiss, right,
as you mentioned, and some other folks,
but Zeiss being the most important part of it.
They also have not tried to expand production capacity
because they don't see any, you know,
they're like, oh, yeah, yeah, yeah, like we're growing a lot
because of AI.
We're growing from 60 to 100, right?
It's like, no, no, no, no, we need to go to like a couple hundred,
but it's fine, whatever.
Each of these tools has, you know,
I think 18 of these lenses effectively,
mirrors, they are multi-layer mirrors, which are perfect layers of molybdenum and ruthenium,
if I recall correctly, stacked on top of each other in many layers, and then the light bounces
off of it perfectly.
But it's not just like, you know, like when we think about a lens, you know, it's like
in a shape and it focuses the light.
This is a, this is like a mirror that's also a lens.
And so it's pretty complicated.
Any defect in this perfect layer of stat in these like super thinly deposited stacks will
mess it up, any curvature issues. Like, there is a lot of challenges with scaling the production.
It's quite artisanal, right, in this sense, right? Because you're not making tens of thousands
of these a year. You're making hundreds. You're making thousands, right? You know, talk about
60 tools a year, 18 of these per tool. You end up with, you know, you're still in the, you know,
hundreds of tools or a thousand, you're at the thousand number roughly for these lenses and
projection optics. So then you, and then you step forward to the reticle stage,
which is also something really crazy. This thing moves at, I want to say, nine G's. Like it will
shift nine Gs because as you step across a wafer, the tool will go, and the wafer stage is
complementary. It's the wafer part. So you line these two things up. You're taking all the light
through the lenses that's focused. And here's the reticle. Here's the wafer. And you're passing,
the reticle's moving one direction, the wafer is moving the other direction, as it
scans a 26 by 33 millimeter section of the wafer, and then it stops, it shifts over to another
part of the wafer and does it again. And it does that in just seconds, right? And each of them are
moving at 9g's in opposite directions. So each of these things is like a wonder and marvel of
like chemistry, fabrication, you know, sort of like mechanical engineering, optical engineering,
because you have to align all these things and make sure they're perfect, all of these things.
all these things have crazy amounts of metrology because you have to perfectly test everything
because if anything is messed up, the yield goes to zero, right? Because this is such a finely tuned
system. And by the way, it's so large that you're building it in all these, you're building
in the factory in Heindhoven, Netherlands, and they're deconstructing it and shipping it on
many planes to the customer site, and then you're reassembling it there and testing it again.
And that process takes many, many months. So like, it's just, there's so many steps in the supply chain,
right, whether it's Zice making their lenses and projection optics or Simer, which is an ASML owned
company, making the EUV source.
And each of these has its own complex supply chain, right?
AsML's commented their supply chain has over 10,000 people in it, right?
Like individual suppliers.
Yes.
And it might not be directly.
It might be through like, hey, you know, Zyce has so many suppliers and, you know,
XYZ company has so many suppliers.
But, you know, they, these, you know, if you just think about like, okay, you're talking
about two physically moving objects that are like this large and this.
this large, you know, the size of a wafer, right? And it has to be accurate to the level of,
you know, single digit nanometers or even smaller because the entire system, the overlay, right,
layer to layer variation has to be on the order of three nanometers, right? And so if the overlay
is three nanometers, that means each individual part, the accuracy of its physical movement has
to be even less than that, right? It has to be sub one nanometer in most cases because the
error of these things stacks up, right? And so there's no way to like, you know, just like snap your
fingers and increase production, right? You know, things simple as power, right? The U.S. going from
0% power growth to 2% power growth, even though China's already at 30, was like so hard for America
to do, right? And that's a really simple supply chain with very few people in the supply chain, right,
who make difficult things. And there's, you know, probably what, 100,000 electricians,
slash people work in the supply chain of electricity or more in the U.S.
And, you know, when you look at, oh, ASML employees like so few people.
Carl Zeiss probably employs like less than a thousand people working on this.
And all of those people are like super, super specialized.
So it's, you know, you can't just train random people up for this like in the snap of a finger.
You can't just get your entire supply chain to get galvanized, right?
NVIDIA's had to do a lot to get the entire supply chain to even deliver the capacity they're going to make this year.
even though when you go talk to Anthropic,
they're like, well, we're short of TPUs or short of
train and we're short of GPS.
When you go talk to Open Eye, they're like,
we're short of these things, right?
So Open Eye in Anthropic, they know they need X.
Invidia is not quite as AGI-I-pilled,
and they're building, you know, X minus 1,
and you go down the supply chain,
everyone's doing minus 1.
And in some cases, they're doing, like, divided by 2, right?
Because they just don't, they're not AGI-GI-pilled, right?
And so you end up with the time lag for this whip to react,
right?
the sort of AI-pildness and desire to increase production is so long.
And then once they finally understand, hey, we need to increase production rapidly, right?
And they think they understand, oh, AI means we have to go from 60 to 100.
In addition to the tools all just getting better and faster, you know, the source getting
higher power from 500 watts to 1,000 and, you know, all these other aspects of the supply chain,
you know, advancing technically plus increase of production.
They think they're like actually increasing production a lot.
But if you float through the numbers of, hey, what does Elon want?
He wants 100 gigawatts a year in space by 2028 is it?
Or 2029?
And, you know, Sam Altman wants 50 gigawatts, 52 gigawatts a year by the end of the decade.
And you look at, you know, probably Anthropic needs the same.
And then, you know, Google needs that.
You know, you go across the supply chain.
It's like, wait, no, the supply chain can't possibly build enough capacity for everyone to get what they want on the side of compute.
Real conversations are full of fits and starts and pauses and interruptions.
I mean, just listen to this episode.
At least superficially, voice models have gotten pretty good at handling these kinds of things.
But at a deeper level, interruptions can throw off a model's understanding and degrade the quality of its responses.
And it's not always clear why.
Labelbox realized that this was a huge bottleneck for their customers.
So they built an evaluation pipeline called Echo Chain to help you diagnose and fix your voice model's specific failure modes.
Echo Chain starts by feeding conversations into your voice model.
It then injects interruptions at specific intervals and classifies any failures into one of three different modes.
One, did it acknowledge a correction but keep the old plan?
Two, did it adapt briefly but then slide back to old assumptions?
Or three, did it abandon the old task entirely?
This is extremely useful information because Labelbox can get your model the exact data it needs
to fix whatever issue is preventing it from being a viable and competent voice model.
So if you want to ensure that your voice model states performant in real conversations, you should reach out to Labelbox.
to labelwalks.com slash thwarcash.
So I feel like in the data center supply chain
for the last few years, people have been making arguments of
this specific thing we are bottlenecked by,
therefore AI compute can't scale more than X.
But then, as you've written about, oh, no, if, you know,
say the grid is a bottleneck, then we just do
behind the meter on the site, we do gas turbines, et cetera.
If that doesn't work, there's like all these other alternatives
that people fall back on.
And I want to ask you a question about whether we can imagine a similar thing happening in the semiconductor supply chain.
So if EUV becomes a bottleneck, well, you know, what if we just went back to 7 nanometer and do what China is doing currently and producing 7 nanometer ships with multi-patterning with DUV machines?
And, you know, if you look at a 7 nanometer ship like the A100, there's been a lot of progress, obviously, since from the A100 to the B100 or B200.
but how much of that progress is just numerics.
And then like if you just hold constant, say FP16 from A100 to B100, the B100 is like a little over one petaflop.
And then A100 is like 300 terraflops.
And so you have like basically 3x holding numerics constant.
You have like a 3x improvement from A100 to B100.
And then some of that is the process improvement.
and some of that is just the accelerator design improving,
which we could replicate again in the future.
And then it just seems like that actually is like very small effect
from the process improving from 7 nanometer to 4 nanometer.
So I don't know, say we have, I don't know the numbers offhand,
but let's say there's like 150K waferes per month of 3 nanometer
and then eventually similar amounts for 2 nanometer.
But then there's a similar amount for 7 nanometer, right?
So if you have all those old wafers and then there's maybe a 50% hair cut
because the process, you know, the bits per wafer area are like, what is it, 50% less or something?
Then it's like, it doesn't seem like that bad to just bring on seven nanometer wafers.
And then, oh, that gives you another 50 or another 100 gigawatts.
Yeah, tell me why that's naive.
Yeah.
So I think, you know, we potentially do go crazy enough that this is, this happens because we just need incremental compute and the compute is worth the higher cost power, etc.
of these chips.
But it's also unlikely to some extent, to a large extent, because of, I think, I think just
comparing, you know, some of these are like not fair comparisons, right?
For example, you know, from A100, which is 312 taraflops to Blackwell, which is like
1,000-ish of FP-16, or maybe it's 2000, and then Rubin is like 5,000 or so FP-16.
It's not a fair comparison because these chips have vastly different.
design targets, right?
At A100, that is what
Nvidia optimized for was FP16, B-FUD-16 numerics.
When you look at Hopper, they didn't care as much about that.
They cared about FP-8.
When you look at Rubin, they don't care about FP-16
and BF-16 as much.
They care mostly about FP4 and 6, right?
And so numerics are what they've designed the chip for.
And so there's a couple, like, you know, okay,
let's just say, let's make a new chip design on 7 nanometer, sure, we can do that.
And then it's optimized for the numerics of the modern day.
The performance difference is still going to be much larger than the flops different you mentioned, right?
Often it's easy to boil things down to flops per watt or flops per dollar, but that's actually not a fair comparison, right?
And so this is where you can bring in, hey, let's look at Kimmy K1 or Deepseek.
when you look at Kimmy or Kimmy K2.5, sorry, and Deepseek, when you look at these two models
and you look at their performance on Hopper versus Blackwell on, you know, very optimized
software, you get vastly different performance, right? And most of this is not attributed to
flops. A lot of this is, or numerics, right, because those models are actually 8-bit. So it's
not like Blackwell's and Hopper, they're both optimized for 8-bit, and Blackwell's not really
taking advantage of its 4-bit there. You know, the performance gulf is, is actually,
actually much larger. And, you know, the way you can sort of compare them and think about them is,
sure, it's one thing to, you know, shrink process technology and make the transistors smaller,
and each chip has X number of flops. But you forget the big gating factors. These models don't run
in a single chip. They run on hundreds of chips at a time, right? If you look at Deepseek's production
deployment, which is well over a year old now, they were running on 160 GPUs, right? And that's what
they serve production traffic on. And so they split the model across 160 GPUs. Every time,
you cross the barrier of a chip to another chip, there is an efficiency loss because you now have to
transmit over, you know, high-speed electrical surrety's, and there's a latency cost, there's a power
cost, there's a, there's all these, um, dynamics that hurt. As you shrink and shrink and shrink
the process node, you've increased the amount of compute in a single chip. Now in-chip, right,
uh, movement of data is, you know, at, at hundreds or of, of at least tens of terabytes a second,
if not hundreds of terabytes a second.
Whereas between chips, you're on the order of a terabytes of second, right?
And so this movement of data between chips that are super close to each other physically,
and then you can only put so many chips close to each other physically.
So you have to put chips in different racks.
The order of data between that is on the order of hundreds of gigabits a second, right?
400 gig or 800 gig a second.
So 100 gigabytes a second, roughly.
And so you've got this huge ladder of like, oh, on ship I can communicate at super fast speed.
within the rack I can communicate at, you know, order of magnitude speeds, outside the rack I can
communicate at an even order of magnitude lower than that. And as you break the bounds of chips,
you end up with this performance loss. So anyways, the reason I explain this is because when I look at,
when you look at Hopper versus Blackwell, even if both of them are using, you know, a rack worth
of chips, the hopper is significantly slower because the amount of performance that you have
leverage to the task within that, you know, within each domain of, hey, tens of terabytes
of communications of communication between these transistors or these processing elements. And
you know, terabytes a second between these processing elements is much, much higher and therefore
the performance is much higher. So when you look at inference at, let's say, 100 tokens a second
for Deepseek and Kimmy K2.5, Hopper versus Blackwell, the performance difference is on the
order of 20x. Interesting. Not two or three X like the Flops performance difference indicates,
even though those are on the same process node. You know, there's just differences in networking technologies
and what they've worked on.
And so you can translate some of these back.
But when you look at Rubin, what they're doing on three nanometer,
some of these things are just not possible to do all the way back on A100,
even if you make a new chip for seven nanometer.
There's just like certain architectural improvements you can port.
There's certain ones you cannot.
And so the performance difference is not just going to be the difference in flops.
It's in some senses cumulative between the difference in, you know,
flops per chip, networking speed between chips,
how many flops are on a chip versus a system, memory bandwidth on a single chip and on an entire system, all of these things compound.
Can I ask a very naive question?
So this year, last year, the B200 has now two dies on a single chip so you can get that bandwidth on a single chip without having to go through Enme link or Infinaband.
And then next year, Ruben Ultra will have four dies on one chip.
What is preventing us from just doing that with?
Like, how many dies could you have a single chip and still get these tens of terabytes a second?
Yeah, so even within Blackwell, there are differences in performance when you go, when you're communicating on the chip versus across the chips.
Those bounds are obviously much smaller than when you're going, you know, out of the entire chip, but each die versus, you know, within the package.
And so anyways, when you scale, you know, the number of chips up, there is some performance loss.
It's not just perfect, but it is way better than different entire packages.
Now, how large can advance packaging scale?
The way Nvidia is doing it is co-os the way Google and with Broadcom and MediaTack and, you know, Amazon, Traneum.
All these chips are doing is called Co-OAS.
But actually, you can go and look back at what Tesla did with Dojo, right?
Dojo, which they canceled and restarted.
Anyways, Dojo was a chip that was the size of an entire wafer.
They had 25 chips on it.
and there were some tradeoffs, right?
They couldn't put HBM on it.
But the positive side was that they had 25 chips on it.
And so to date, it is still probably the best chip for running convolutional neural networks.
It's just not great at transformers because the, you know, the sort of the shape of the chip, the memory, the arithmetic, all these various specifications of it are just not well suited for transformers.
They're well suited for CNNs.
And anyway, so, you know, Dojo chips were optimized around that they, they were.
made a bigger package, but at the same time, you know, as you make packages bigger and bigger and
bigger, you have other constraints, right? Networking speed, memory bandwidth, cooling capabilities,
all of these things start to rear their heads. It's not simple. But yes, you will see a trend line
of more chips on the package. And yes, you're going to be able to do that on 7 nanometer. In fact,
that's what Huawei did with their Ascend 9, 10, C, or D. They put, they were initially just one,
and then they did two, and they're focusing on scaling the packaging up because that is an area
where they can advance faster than sort of process technology where they can't shrink.
But at the end of the day, that's still, you know, that's something that you can do on the
leading edge chips too, right?
Anything you do on 7 nanometer, you can also probably do on 3 nanometer in terms of packaging.
So if we're, if you end up in this world in 2030 where the West has the most advanced process
technology, but it has not ramped it up as much, whereas China, I don't know if you think by 2030,
they would have UV in, I don't know, 2 nanometer or whatever.
But they are semiconductor pills, so they're producing in mass quantity.
Basically, I'm wondering what the year is where there's a crossover,
where our advantage in process technology has faded enough
and their advantage in scale has increased enough.
And also their advantage in, like, having one country
that has the entire supply chain envisionize
rather than having random suppliers in Germany and Netherlands and whatever
would mean that China would be ahead in its ability
to produce mass flops.
Yeah, so to date, China still does not have, you know,
entire indigenous semi-gaductor supply chain, right?
But were they in 2030?
Yeah, by 2030, it's possible that they do.
But to date, right, all of China's 7 nanometer and 14-nometer capacity
uses ASMLD-UV tools, right?
And the amount that they can ship and import from ASML is large.
And the point being that the vast majority of ASML's revenue, especially on EUV, all of it, is outside of China.
So the scale advantage is still in the favor of, let's call it, the West plus Taiwan, Japan, etc.
But they're trying to make their own DUV and EUV tools, right?
They're trying to do all these things.
The question is how fast can they advance and scale up production as well as quality.
And to date, we haven't seen that.
Now, I'm quite bullish that they're going to be able to do these things over the next five to 10 years, right?
really scale up production, really kick it into high gear.
They have more engineers working on it.
They have more desire to throw capital.
So by 2030, do they have fully indigentized DUV?
I think for sure, for sure.
DoV, yes.
And fully indigenous EUV by 2030?
I think they'll have working tools.
I don't think that they'll be able to manufacture a bunch yet, right?
You know, there's sort of having it work and then there's production hell, right?
And ultimately, like, ASML had EUV working in the early 2010.
at some capacity.
Right.
Right.
Now, the tools
were not accurate enough.
They were not.
Scaled for high production,
scaled for high volume manufacturing,
reliable enough.
And then they had to ramp production
and that all took time.
Production hell takes time, right?
Which is why it took another five to seven years
to get EUV into mass production at a fab
rather than just working in the lab.
So how many DUV tools
do they need to manufacture in 2030?
ASML?
No, China.
Oh, that's a great question.
You know, current, it's it's a bit of a challenge to look into this supply chain, especially.
We try really hard.
But, you know, in some instances, they're like buying stuff from Japanese vendors.
And if they want to fully indigenize supply chain, they need to not buy these lenses or buy these projection optics or stages from Japanese vendors.
They need to build it internally.
So it's really tough to say where they'll be able to get to.
Like, I honestly think it's like a shot in the dark.
But it's probably not unlikely that they'll be able to do on the order of 100 DUV tools a year, whereas ASML is doing hundreds of DUV tools a year currently.
You know, no one's made a process node.
No company has a process node where they make a million waferes a month, right?
Elon says he wants to do it, and China's obviously going to do it, right?
And I don't think the, you know, TSMC is trying to do that.
the memory makers may get there as well, right, to the million wafers a month, but not in a single fab.
It's sort of mind-boggling to think of that scale and challenging to see the supply chain galvanized for that.
So I'm not sure, you know, I don't want to doubt, you know, China's capability to scale.
Right.
I guess this is an interesting question that I think it might, you know, at some point I may analysis,
we'll do the deep dive on this.
But I think this question of like, by when would China be able, like,
indigenous Chinese production
could be bigger
than the rest of the West combined
if you just add up like
all the deep
input of the input of your model
when they'll have do you view machines at scale
when they'll have view machines at scale
because I think there's this like question around
if you have long timelines on AI
by long meaning 2035
which is not that long in the grand scheme of things
should you expect a world
where China's like dominating in semiconductors
which I think I don't know
it doesn't get asked enough
in San Francisco we're just like thinking on time scale
of like, you know, weeks.
And then if you're outside of San Francisco,
you're not thinking about AGI at all.
And so this question of like, okay, what if we have AGI?
What if you have this transformational thing
that is commanding tens of trillions of dollars
or hundreds of trillions of dollars of economic growth
and weight, you know, token output and so forth?
But then it happens in 2035.
And like, what does that imply for the West versus China?
I think it's just like, I don't know.
The semi-analysis has got to write the definitive model on this.
Yeah, so I think it's really challenging.
challenging when you move time skills out that far, right?
Like, what we tend to focus on is, like, we're tracking every data center, we're tracking every
fab, we're tracking all the tools, and we're tracking where they're going.
But the time lags for these things are relatively short, right?
We can only make, like, reasonably accurate estimates for data center capacity based on, you know,
land purchasing and, you know, permits and turbine purchasing and all these things.
And we know where all these things are going and we like, that's what the data we sell is.
But, like, you know, as you go out to like 2035, you know, things.
are just so radically different and your error bars get so large, it's kind of hard to make an estimate.
But at the end of the day, like, you know, there is, if takeoff or timelines are slow enough,
right, then certainly China, I don't see why they wouldn't be able to catch up drastically, right?
You know, in some sense, we've got like this valley, right, of where, you know, call it three to six
months ago Chinese models were, or maybe even now, Chinese models are competitive as they've ever been.
I think Opus 46 and GPD 5.4 have really pulled away and made the gap a little bit bigger,
but I'm sure some new Chinese models will come out.
But as we move from, you know, hey, these companies are selling tokens where they provide the entire reasoning chain and all that to selling automated, you know, white collar work, right?
Automated software engineer, send them the request.
They give you the result back and there's a bunch of thinking on the back end that they don't show you.
The ability to distill out of American models into Chinese models will be harder, A.
B, as the scale of the compute that the labs have, right?
Open AI exited the year with roughly two gigawatts last year.
Anthropic, we'll get to, you know, two plus gigawatts this year.
And by the end of next year, they'll both be at like 10 gigawatts of capacity.
China is not scaling their AI lab compute nearly as fast.
And so at some point, you know, when you can't distill the learnings from these labs
into the Chinese models, plus this compute race that OpenAi Anthropic, Google, et cetera,
meta are all racing on, at some point they end up getting to a point where, you know, the model
performance should start to diverge more. And then all of this CAPEX that's being spent on, you know,
data centers and all that, right? Amazon, you know, 200 billion, Google 180, you know, so on and so
forth. All these companies are spending hundreds of billions of dollars of CAPEX. You know, there's,
you know, nearly a trillion dollars of CAPEX being invested in data centers in America this year,
roughly, right? You end up with, okay, well, what's the return on invested capital here? You and I
would think that the return on invested capital for data center capax is very high. And at least
if we look at Anthropics revenues in January, they added like $4 billion in February, which is a
shorter month. They added like six. We'll see what they can do in March and April. Given compute
constraints are what's bottlenecking their growth, right? The reliability of Claude Code is actually
quite low because they're so compute constraint. But if this continues, then the ROIC on
these data centers is super high.
And at some point, the U.S. economy starts growing faster and faster over the next,
you know, this year and next year because of all this cap-x and all this revenue that these
models are generating and downstream supply chain versus China doesn't have that yet, right?
They have not built the scale of infrastructure to then invest in models to get to the
capabilities, to then deploy these models at such scale, right?
Because when you look at like Anthropics, hey, they're at, call it,
20 billion ARR, of that, you know, the margins are sub 50% at least last reported by the
information. So then, you know, you're at, okay, that's like $13, 14 billion of compute that
it's running on rental cost-wise, which is actually like $50 billion worth of CAPEX that someone
laid out for Anthropic to generate their current revenue. And China has just not done this.
If and when Anthropic 10X is revenue again, and I think our answer would be when, not if,
then China doesn't have the compute to deploy at that scale.
And so there is some sense of like, oh, we're in fast takeoff-ish, right?
It's not like we're talking about, you know, Dyson sphere by X day.
It's more like the revenue is compounding at such a rate that it does affect the economic growth.
And the resources these labs are gathering are going so fast that, you know, and China hasn't done that yet.
So in that case, the U.S. and the West is actually diverging.
The flip side is actually these infrastructure investments have middling returns,
Maybe they're not as good as hoped.
You know, maybe Google is wrong for wanting to take free cash flow to zero and spend $300 billion on CAPEX next year.
Maybe they're just wrong.
And, you know, people on Wall Street who are bearish and people who don't understand AI are correct, right?
And in which case, then the U.S. is building all this capacity, it doesn't get really great returns.
And China is able to build the fully vertical indigenous supply chain, not, you know, U.S., Japan, Korea, Taiwan, Southeast Asia.
you know, Europe, all these, all these countries together building this like less vertical supply chain.
And in a sense, at some point, China is able to scale past us.
If AI takes longer to get to certain capability levels, then, you know, I would say the vast majority of your guest on this podcast, believe.
It's like fast timelines, US wins, long timelines, China wins.
Right. But I don't know like, I don't know what fast timelines means, right?
Like, I like don't think you have to believe in AGI to have the timelines where the U.S. wins.
Okay, let's go back to memory because I think this is maybe people on Wall Street and people in the industry are understanding how big of this is, but maybe generally people don't understand how big of a deal of this is.
So we've got this memory crunch, as you're talking about.
And earlier I was asking about, oh, could we solve for the EUV tool shortage by going back to 7 nanometers?
So let me ask a similar question about memory.
HBM is made of DRAM but has 3 to 4X less bits per wafer area than the DRM is made out of.
Is it possible that accelerator is in the future?
It could just use commodity DRM and not HBM.
And so just we can make much more capacity out of the DRM we get.
And the reason I think of this might be possible is, look,
if we're going to have agents that are just going off and doing work
and it doesn't really, you don't, it's not a synchronous chatbot application,
then you don't necessarily need extremely high, fast latency kinds of things anymore.
And so maybe you can have the,
low bandwidth because the reason you stack DRM into stacks and make HBM is for higher bandwidth.
And so is it possible to go to HBM accelerators and basically have the opposite of clodcode fast,
like have clot code slow and do that?
I think at the end of the day, the incremental purchaser who's willing to pay the highest price for tokens also ends up being the one that's like less price sensitive.
and, you know, the compute should be allocated in a capitalistic society towards the
goods that have the highest value and the private market determines this by willingness to pay.
And so to some extent, sure, Anthropic could actually release a slow mode, right?
They could release Claude Slow Mode and have an increase in tokens per dollar by a significant
amount.
They could probably, like, reduce the price of Opus 4x, 4x, 5x, and reduce the speed by maybe just
like 2x.
Like the curve on inference throughput versus speed is there already just on HBM.
And yet they don't because no one actually wants to use a slow model.
And furthermore, on these agentic tasks, you know, it's great that the model can run at this time horizon of hours.
That's kind of like, okay, well, if the model was just running slower, that hours would become a day, right?
Or vice versa, right?
If the model's running faster, that hours becomes hour.
and yet no one really wants to move to that day-long wait period because the highest value tasks
also have some time sensitivity to them, right?
And so I struggle to see, you know, yes, you could use DDR, but then there's a couple
like things that are challenging with this, right?
You could use regular DRAM.
One is you're still limited, you know, one of the like core constraints of chips, even though
they're, you know, sort of like, you know, there's a chip is like a certain size.
All of the I.O. escapes on the edges of the chip, right? So oftentimes, you know, what you see
is the left and the right of the chip are HBM. The IO from the chip to the HBM is on the sides,
and then the top and bottom are IO to other chips, right? And so if you were to change from HBM to DDR,
then all of a sudden this IO on this edge would have significantly less bandwidth, but it had
it's significantly more capacity per chip.
Yeah.
Because, and so, yes, you're making less, you know, the metric that you actually care about is
bandwidth per wafer, not bits per wafer.
Because the thing that is constraining the flops is just getting in and out the next matrix.
And for that, you just need more bandwidth.
Yeah, getting out the weights and getting in and out the KV cache.
Right.
And so in many cases, these GPUs are not running at full memory capacity.
Yes, it's obviously like a system design thing, you know, model hardware software co-design of,
hey, what do I, how much KVCash do I do?
How much do I keep on the chip?
How much do I offload to other chips and call when I need it for tool calling or whatever?
How much do I, how many chips do I paralyze this on?
Obviously, these are like, the search space of this is like very broad, which is why we
have like inference X, like this is like an open source model, like searches all the optimal points
on inference for a variety of eight different chips and models.
Anyways, like the point is
you don't necessarily,
you're not always necessarily
constrained by memory capacity.
You can be constrained by flops.
You can be constrained by network bandwidth.
You can be constrained by memory bandwidth.
Or you can be constrained by memory capacity.
There's sort of like four,
if you're really to simplify it down,
there's like four constraints.
And each of these can break out into more.
But in this case, if you switch to DDR,
yes, you produce 4x the bits per DRM wafer,
but all of a sudden the constraints shift a lot
and your system design shifts a lot.
You go slower, yes, is the market smaller?
Okay, maybe possibly.
But also now all of a sudden, all these flops are wasted.
Because they're just sitting there waiting for memory.
It's like, great, I don't need all that capacity
because I can't really increase batch size
because then the KV cache is going to take even longer to read.
And so you never, you can, yeah.
Interesting.
What is the bandwidth difference between HBM and normal DRM?
Yeah, so an HBM stack of HBM4,
let's just talk about like the stuff that's in Rubin
because that's where we've been indexing on,
is 2048 bits across connected in an area that's like 13 millimeters or wide.
So 2048 bits, and it transfers memory at around 10 gigar transfers a second.
So HBM, a stack of HBM 4 is 2048 bits on an area that's 13 millimeters wide, roughly,
or 11, and that's the shoreline that you're taking on the chip.
And in that shoreline, you have 2048 bits transferring at 10 giga transfers per second.
You multiply those together and you divide by 8 bits to bytes.
you're at roughly 2.5 terabytes a second per HBM stack, right?
When you look at DDR, in that same area, it's maybe 64 or 128 bits wide.
And that DDR 5 is transferring at anywhere from 6.4 giga transfers a second to maybe 8,000 giga transfers a second.
So your bandwidth is like significantly lower, right?
It's 64 times 8,000 divided by 8.
you're at 64 gigabytes a second
and even if you take a generous interpretation of 128 times 8 gigar transfers
you're at 128 gigabites a second for the same shoreline versus 2.5 terabytes a second.
There's an order of magnitude difference in bandwidth per edge area.
And if your chip is a square or it's 26 by 33, right, is the maximum size for a chip,
individual die, you only have so much edge area.
And then on the inside of that chip, you put all your compute.
There's things you can do to try and change, write more S-RAM, more caching, blah, blah, blah.
But at the end of the day, you're very constrained by bandwidth.
Interesting.
So then there's a question of like, where can you destroy demand to free up enough for AI?
And I guess the picture is especially bad because as you're saying, if it takes 4X more wafer area to get the same bite for HBM, you know, to destroy 4X as much consumer demand for laptops and phones and whatever in order to free up one byte for AI.
So, yeah, what does this imply for the next year or two of?
Sorry for the run-on question.
I think on your newsletter, you said 30% of the CAPEX in 2026 of Big Tech is going towards memory?
Yes.
That's insane, right?
Yeah.
Like, of the 600 billion or whatever, you're saying 30% is going just to...
And, you know, obviously there's some level of like margin stacking that Nvidia does.
And so if you separate out, you know, and you apply their margin to the memory and the logic.
But at the end of the day, yeah, like a third of their CAPX is going to memory.
That's crazy.
Okay.
So what is the question I'm trying to ask?
It's something like, yeah, what is this?
basically what should we expect over the next year or two as this memory crunch hits?
Yeah, so memory crunch will continue to be harder and harder.
And prices continue to go up.
And this affects different parts of the market differently, right?
Gets to sort of the like, are people going to hate AI more and more?
Yes, because now smartphones and PCs are not going to get incrementally better year on year.
And in fact, they're going to incrementally worse.
If you look at the bill of materials of an iPhone, what fraction of it is the memory?
Like, how much more expensive does an iPhone get if the memory is two X more extensive?
or whatever it has to be.
So I believe an iPhone has 12 gigabytes of memory.
Each gig cost, used to cost roughly $3 or $4, so it's $50.
But now the price of memory is like triple.
Let's call it if it's now, it's $12 per gig for DDR.
So now you're talking about $150 versus $50, right?
$100 increase in cost on Apple.
Also, Apple has some margin.
They're not just going to eat the margin.
So now that's $100 cost increase.
That's just on the DRAM.
The NAND also has the same sort of like market.
in fact, you know, it's probably $150 increase on the iPhone.
Apple has to either pass it on to the consumer, A, or B, they have to eat it.
I don't see Apple reducing their margin too much.
Maybe they eat a little bit.
But at the end of the day, that means the end consumer is paying $250 more for an iPhone.
And now that's on like, hey, what is last year's memory pricing versus today's?
Now, there is some lag for Apple to have to feel the heat because they have tended to have, you know,
3, 6 or a year-long contracts for a lot of memory.
But at the end of the day, Apple gets hit pretty hard by this.
But they won't really adjust until the next iPhone release.
But that's the high end of the market.
Actually, that's only a few hundred million phones a year, right?
Apple sells, what, two, 300 million phones a year?
The bulk of the market is this mid-range low end, right?
Used to be 1.4 million smartphones were sold a year.
Now we're at like 1.1.
But our projections are we maybe get down to like 800 million this year.
And next year are like 600 or 500 million.
because, and we look at like, you know, there's some data points out of China from some of our analysts in Asia and Singapore and Hong Kong and Taiwan.
They've been trekking this and they see Xiaomi and Opo are cutting low end and mid-range smartphone volumes by half.
Because, yes, it's only a $150 price increase on a $1,000 smartphone or $150 bomb increase on $1,000 iPhone where Apple has some larger margin.
But if we look at the smaller phones, the percentage of the bomb that goes to,
memory and storage is much larger and the margins are lower. So there's less capacity to even
eat the margins. And they have like generally tended not to do as long term agreements on
memory. And why this is like a big deal is if smartphone volumes, let's say half, the halving will
frankly happen in the low and mid range, not in the high end. So it's not like the bits released
are halving, right? You know, currently consumers more than half of memory demand. Even if you half the
smartphone volumes because of the shape of the halving, right?
It's like low end gets cut by more than half, high end gets cut by less than half,
because you and I will buy, you know, the high end phones that cost north of $1,000,
we'll buy them, even if they get a little bit more expensive.
And Apple's volumes will not go down as much as like a low-end smartphone provider.
And the same applies to PCs.
And what this does to the market is quite drastic, right?
DRAM gets released, goes to AI chips, who are willing to do longer-term contracts,
willing to pay higher margins, et cetera, et cetera,
because at the end of the day,
the margin that they extract
is much larger from the end user or whatever.
And so this probably leads to, like,
people hating AI even more, right?
Because they're going to start being like,
today you already see all the memes, like,
on like PC subreddits and PC, like, Twitter,
gaming PC Twitter is like, you know,
cat dancing videos and it's like,
this is why memory prices is doubled
and you can't get a new gaming GPU, right?
Or you can't get a new desktop.
And it's going to be even worse
when memory prices double again, especially DRAM.
Another dynamic that's quite interesting is it's not just DRAM, it's also NAND.
NAND is also going up in price.
Both of these markets have expanded capacity very slowly over the last few years.
NAND almost zero.
But smartphones, the percentage of NAND that goes to phones and PCs is larger than the
percentage of DRAM that goes to phone and PC.
So as you destroy demand, you unlock, mostly for the DRM purposes, you unlock more NAND that gets
allocated and can sort of go to other markets. And so the price increases of DRAM will be
larger than those of NAND because you've released more from the consumer. And in fact, you've
produced more memory for AI. Sorry, but the NAND is, maybe you just explained it and I missed it.
Is it because SSDs are being used in large quantities for data centers?
They are, but not as large quantities as DRAM.
Okay, but you're saying they will also increase because they're busying some quantity, but
Like, there's not as much in need as there is for HBO.
Makes sense.
One thing I didn't appreciate until I was reading some of your newsletters is that basically
the same constraints that are preventing logic scaling over the next few years.
It's quite similar to what's preventing us from producing more memory waferers.
In fact, like literally the same exact machine, this EUV tool is needed for memory.
So I guess, yeah, maybe there's a question that somebody could be asking right now, like, well, why can't we just make more memory?
Is that somebody you?
Yeah, who knows?
So I think the constraints, as I was mentioning earlier, are not necessarily UV tools today or next year.
They become that as we get to the latter part of the decade.
But currently, right, the constraints are more so.
They physically just haven't built fabs, right?
So over the last three to four years, these vendors have just not built new fabs.
That's because memory prices were really low.
Their margins were low.
And in fact, they were losing money.
in 2023 on memory.
So they're like, oh, we're not building new fabs.
And then, like, the market slowly recovered over time, but never really got amazing until
last year.
You know, in 2024, we were, like, banging on the drums that, like, hey, reasoning means
long context, which means large KV cash, which means you need a lot of memory demand.
And we've been talking about that for, like, a year and a half, two years.
And people who understand AI, like, went really long memory then, right?
You know, and so you've seen that sort of, like, dynamic.
but now it finally played out in pricing.
It took so long for what was obvious, right?
Hey, long context, KV Cash gets bigger.
You need more memory.
And accelerators, half their cost is memory.
So, of course, they're just going to start, you know,
they're going to start going crazy on it.
It took a year for that to actually reflect in memory prices.
Once memory prices reflected, then it took another six months,
three months for the memory vendors to start building fabs.
And those fabs take two years to build.
And so we don't have really meaningful fabs that you can even put these tools in,
until late 27 or 28, right?
And so instead what you've seen is like some really crazy stuff to get capacity, right?
Micron bought a fab from a company in Taiwan that makes lagging edge chips, right?
Hynix and Samsung are doing, you know, some pretty crazy things to try and expand capacity at their existing fabs that also have like very large knock on effects in the economy.
and so, hey, why can't we build more capacity?
It's like there's nowhere to put the tools, right?
And it's not just EUV.
There's other tools involved in DRAM and logic, right?
Like, logic, you know, N3, 30% or so of the cost, you know, 28% of the cost is
EUV of the Waifer, of the final wafer.
When you look at like DRAM, it's in the teens.
And it's going up, but it's in the teens.
So it's as much smaller percentage of the cost is DRAM or is EUV.
These other tools are also bottlenecks, although they're supposed to be able to be.
apply chains are not as complex as ASMLs.
And so you see applied materials in lamb research and all these other companies also expanding capacity a lot.
And anyways, you don't have anywhere to put the tool because the most complex building that people make is FAPs.
And Fabs take two years to build.
You can think of Jane Street as a research lab with a trading desk attached.
Their infrastructure team has built some of the biggest research clusters in the world with tens of thousands of high-end GPUs and hundreds of thousands of CPU cores and exabytes of storage.
This compute is part of how Jane Street surfaces all the hidden patterns that are embedded in incredibly noisy market data.
Even beyond the noise, the nature of the signal changes constantly in reaction to things like pandemics and elections and new regulations, and even changes in sentiment.
There's this unremitting game of trying to figure out whether your old models still reflect the real world, and if not, what to do about it.
If you're interested in working on this sort of thing, Jane Street is hiring ML researchers and engineers.
They're also accepting applications for their summer ML internship program, with spots in London,
New York and Hong Kong.
And if you happen to find yourself at GTC,
which is happening the week after this episode drops,
Jane Street's GPU performance team is giving a talk.
Go to jane street.com slash thwar cash to learn more.
I entered with Elon recently,
and his old plan is that I guess they're going to build this gigafab,
terra fab, some power of 10,
and they're going to build the clean rooms.
I don't even ask you about the dirty rooms thing,
but let's say they build the clean rooms.
and okay I have a couple questions
one
do you think this is the kind of thing
that Elon Coe could build much faster
than people are conventionably building it
but this is not about building the end tools
this is just about building the facility itself
how complicated is it to just build the clean room
and do it extremely fast?
Is this something that like Elon
with this move fast thing could do much faster
if that's what we're bottlenecked on this year or next year?
And two, does that even matter
if in two years your view is that we're not
bottlenecked on clean room space
but we're bottlenecked on the tooling.
So I think, I think, you know, as with any complex supply chain,
it takes time and constraint shift over time.
And even if something isn't any longer a constraint,
that doesn't mean that market no longer has margin, right?
So, for example, energy will not be a big bottleneck
as we get to, you know, a couple years from now.
But that doesn't mean energy is not growing super fast
and there's no margin there.
It's just like it's not the key bottleneck.
And in the space of fabs, right,
clean rooms are the biggest bottleneck this year and next year.
and as we get over time,
29, you know, 28, 29, 30,
there will be still constraints there.
The thing about Elon is,
I think he's had a tremendous capability
to garner physical resources
and really smart people to build things.
And the way he's able to recruit
really amazing people is just try and build the craziest stuff, right?
In the case of AI, that's not really worked
because everyone's trying to build AGI,
everyone's very ambitious.
But in the case of like, we're going to make,
you know, we're going to go to Mars
and we're going to make rockets that land themselves
or we're going to make fully autonomous cars that are electric, right?
Or we're going to make human aid robots, right?
These are methods of recruiting the people who think that's the most important problem in the world to work on that problem
because he's the only one trying really hard.
In the case of semiconductors, I want to make a fab that's a million wafers per month.
No one has a fab that big.
That's what he stated, right?
He wants to make a million wafers a month.
You know, it's possible that he's able to recruit a lot of really awesome people
and get them on this herokly, you know, this crazy task of trying to build a fab that does a million
waifers per month.
Step one is to build the clean room.
And I think that he probably can do, right?
I think, you know, there's some mindset, you know, his mindset around like delete things.
It can be dirty.
It's fine.
Probably not right.
Or actually, I think 100% it's not right.
You like need the fab to be very clean.
I think the entire air, the entire, all of the air and the fab gets replaced like every three seconds.
It's like that fast.
And there's so few particles per, but I think he can build the clean room.
It'll take a year or two maybe.
Initially, it won't be super fast.
But then over time, he'll get faster and faster at it.
But then the really complex part is actually developing a process technology and building wafers.
And I don't think he can develop that quickly.
I think that has a lot of built-up knowledge.
It's, again, like the most complicated integration of very expensive tools and supply chain that's done is a TSM or an Intel or a Samsung.
And those, some of these two other two companies aren't even that great.
And they're like tremendously complex.
How surprised would you be if in 2030 people like there just happened to be some total disruption.
We're not using UV.
We're using something that has like much better facts,
it's much simpler to produce.
We can produce in much bigger quantities.
I'm sure as an industry insider that sounds like a totally naive question.
But do you see what I'm asking?
Like, is it what probability should we put on,
oh, something totally out of the left field comes out.
And none of this is relevant.
Something that's very simple and easy to scale,
I have very, very low probability for.
There are a number of companies working on effectively like particle accelerators
or synchotrons that generate light
that's either 13.5 nanometer like EUV or even X-ray, like even narrower wavelength,
like seven nanometer or whatever wavelengths of light, to then use in lithography tools.
But those things are like massive particle accelerators that are then generating this light.
It's a very complicated thing to build.
So there's a couple of companies, and I think that that could be a big disruption to the industry
beyond what EUV is.
I don't necessarily think that like we're going to just magically build something new
that is like direct right and super simple and can be manufactured at huge volumes
although there are some attempts to do things like this.
Yeah.
Because I ask because if you think about Elon Co's in the past,
rocketry was this thing, though, I was thought that.
I mean, it is incredibly complicated.
Look, I'm just a naive yapper compared to Elon, right?
What have I built?
So maybe it's possible, right?
Yeah, yeah.
In order to be able to build more memory in the future,
could we build 3D RAM the way we do 3D and and then go back to DUV?
This is the hope currently everyone's roadmap for 3D RAM.
is that you'll still use EUV because you want to have that tighter overlay
because now when you're doing these subsequent processing steps,
you want it to be, you know, everything is vertically stacked.
You have more layers on top of each other.
And you want the pitches to be tighter and all these things.
So generally people are still trying to do an EUV,
but what 3D would do is it would take the, you know,
hey, a single EUV pass, how many bits can it make, right?
If you do this sort of like calculation,
and that number would go up drastically if you go to 3D RAM.
That is the hope.
But right now everyone's roadmap is sort of,
like you go from current, it's called a 6F cell to a 4F cell, and then finally 3D DRAM, like,
by the end of the decade or early next decade.
So there's still like a lot of R&D and manufacturing and integration to be done.
I wouldn't call that out of the cards.
I think it's very much likely going to happen.
It also is going to require a huge retooling of fabs, right?
The breakdown of tools in a fab are very different, right?
Actually, the lithography tool is the only thing that isn't like that different, but the number of them
relative to different types of chemical vapor deposition or atomic layer deposition or dry etch
or different kinds of etch chambers with different chemistries, all of these things, you have
all these different kinds of tools for different process nodes.
You can't just convert a logic fab to a DRAM fab or vice versa back and forth or a nan fab to
in a short amount of time.
And in the same way, existing DRAM fabs require a lot of retooling just to go from 1B or
one alpha to one beta to one gamma process nodes because now they have to add DEUV and
change the chemistry stacks for when you're using EUV in terms of deposition in etch, and the
EUV tool has to be there. And furthermore, like when you change to 3D RAM, there's going to be
an even larger shift. And so there's a lot of retooling of these fabs that needs to happen in terms
of the tools. And so that would be a big disruption. That would make EUV demand generally lower,
but as we've seen across time, EUV demand as a percentage of wafer costs has trended up initially
or lithography, right? Lithography initially, I want to say in like 2014-ish,
era was like 16% of the wafer cost, 17% and it's gone to 30 over the last, you know, 15 years.
And for DRAM, it was in the mid-teens as well, or low-teens, and now it's trended towards the high teens.
And before we get to 3D-RAM, it'll likely cross into the 20s percentage range.
But then if we get to 3D-RAM, it tanks again in terms of the total end wafer cost as a percentage of EUV.
Yeah.
I guess you care less about like the percent of cost and more about how much it bottlenecks being a pretty smart.
Right, but the percentage of cost is sort of-
A proxy, yeah, yeah.
Yeah.
So if you're Jensen or Sam Waltman or whoever who stands to gain a lot from scaling up AI compute,
there's these stories that they'd go to TSM and say, hey, why can't we X and Y and Z?
But I think the point you're making here is it doesn't really matter in some sense what TSMC does.
And in fact, even if you have Intel and Samsung building more foundries,
in the long run, you're going to be bottlenecked by ASML and other tool makers and other material makers.
So first is that correct interpretation and second, then why should basically search Silicon Valley people be going to the Netherlands to try to pitch ASA?
Like right now should they be trying to pitch ASML to make more tools so that like in 2030 they can have more AI compute?
You know, it's a funny dynamic we saw in 23, 24 and 2025.
People who saw the energy bottleneck before others asymmetrically went to, you know, Siemens, Mitsubishi and of course G.
Vernova and bought up turbine capacity and now they're able to charge excess amounts for deploying
these turbines places because of energy.
And in the same sense, this could be done for EUV except ASML is not just going to trust any
random bozo who wants to buy EUV tools in the sense that like, you know, these turbines are much
cheaper than EUV tools and there's many more of them produced, right?
Especially once you like get to like industrial gas turbines or like, you know, not just combine cycle,
but like the cheaper, smaller, et cetera, less efficient ones.
people put down deposits for these.
So in a sense, someone could do this, right?
Someone should go to the Netherlands and be like,
I'll pay you a billion dollars.
You give me the right to purchase 10 EUV tools two years from now, right?
And I'm first in line two years from now.
And then over those two years,
you then go around and wait for everyone to realize,
oh, crap, I don't have enough EUV tools.
And then you try and sell your option at some premium.
But all you're effectively doing is you're saying,
ASML, you're dumb.
You weren't making enough margin on these.
I'm going to make a margin.
And the question is like, will ASML even agree to this?
Right?
And I'm like, I don't think so, right?
So, but there's a world where they at least get the demand signal from that to increase production.
Potentially, potentially.
I agree.
But it sounds like you're saying, oh, they couldn't even increase production if they wanted to, given the supply chain.
Right.
But that's exactly the market in which if they can't increase production, just like TSM cannot increase production that fast.
And yet demand is moaning, then the obvious solution is to arbitrage this because you and I know demand is way higher than they're projecting.
and their capability to build.
So then you arbitrage this by locking up the capacity and then sort of doing like a
forward contract and and then trying to sell it at a later date once other people realize
actually shit, everything is fucked and we don't have enough capacity.
And then you'll have like this insane margin that ASML and TSMC should have been charging.
But the thing is, I don't know if ASML and TSMC will ever agree to this.
Okay, let me ask about power now.
So it sounds like you think power can be arbitrarily scaled.
Not arbitrarily, but yes.
But beyond these numbers.
And I think, if I remember correctly, your blog post on the power, how I have their increasing power, you were like, where you were implying that Giovernova and Mitsubishi and Sima assignments could produce and gas turbines was like 60 gigawatts a year.
And then there's these other sources, but they're like less significant than the turbines.
And so in only a fraction of that goes to AI, I assume.
So yeah, if in 2030 we have enough.
logic and memory to do 200 gigawatts a year? Do you just think that these things are on a path
to ramp up to more than 200 gigawatts a year? Or what do you see? Yeah. So, I mean, right now we're at 30,
right? Or 20, 20. So this is critical IT capacity, by the way, right? This is an important thing to
mention. When I'm talking about these gigawatts, I'm talking about critical IT capacity,
server plugged in, that's how much power it pulls. But there's losses along the chain, right?
There is loss on the transmission. There's losses on the conversion. There's losses on cooling,
etc. And so you should gross this factor up, you know, from 20 gigawatts for this year or 200
gigawatts by the end of the decade to some number 20, 30 percent higher. And then you have
capacity factors, right? Turbines don't run at 100%. In fact, like if you look at PGM, which is the
largest grid, I think, in America, sort of the Midwest, sort of northeast kind of area-ish,
not the full northeast. But anyways, PGM, they rate in their models for like, hey, turbines,
how much capacity, we want to have excess, you know, roughly 20% capacity.
In addition, in that 20% excess capacity, we're running all the turbines at 90%
because they are derated some for reliability.
Oh, things go down, maintenance, et cetera, et cetera, et cetera.
So then in reality, the nameplate capacity for energy is always way higher than the actual
end critical IT capacity because of all of these factors.
But it's not just turbines, right?
If you were just making power from turbines, like, that's simple, boring, easy, right?
we're humans and capitalism is far more effective.
And so the whole point of that blog was, yes, there's only three people making combine cycle gas turbines, but there's so much more we can do.
We can do error derivatives, right?
We can take airplane engines and turn them into turbines as well.
And there's even new entrance to the market, like boom, supersonic's trying to do that, right?
And they're working with Crusoe.
And also there's all the other ones that already exist in the market.
There's medium speed reciprocating engines, right?
Engines that spin in circles, right?
So sort of like any diesel engine, right?
There's like 10 people who make engines that way, right?
So Cummins, you know, at least I'm from Georgia and people used to be like, oh, man,
you got a Cummins engine in there, you know, like, you know, regarding RAM trucks.
But it's like, well, actually, automobiles manufacturing is going down.
These companies all have capacity and could scale and convert that to for data center power, right?
Stick all these reciprocating engines.
Yes, it's not as clean as combined cycle.
Maybe you can convert them from diesel to get to gas if you want.
But at the end of the day, these spinning engines, oh, what about,
ship engines, right? All of these engines for these massive cargo ships. Those are great. Nebius is doing
that for a data center for Microsoft in New Jersey. They're running these ship engines to generate power.
Oh, there's, you know, Bloom Energy is doing fuel cells. We've been like very positive on them for
like a year and a half now because they have like such a capability to increase their production
and their payback period for production increase is like very fast, even if the cost is a little bit
higher than combined cycle, which is like the best cost and efficiency.
You know, and then there's solar plus battery, which as these cost curves continue to come down, those can come online.
There's wind. And, you know, of course, the derating of those, you know, hey, when you put on it a wind turbine, you might say, oh, I'm only going to expect 15% of the maximum power because things oscillate.
But yeah, batteries, there's all these things. And then the other thing is that, like, the grid is scaled for, you know, hey, we're not going to cut off power at peak usage, which is like the hottest day in the summer.
but in reality that's a load spike that is 10, 15, 20% higher than the average.
Well, if you just put enough utility scale batteries or you put paker plants that only run a small
portion of the year, then all of a sudden, you know, and those could be gas, they could be
industrial gas turbines, they could be combined cycle, they could be any of the other sources
of power I mentioned, they could be batteries, then all of a sudden you've unlocked 20%
of the U.S. grid for data centers because most of the times that capacity is sitting
idle and it's really only there for that peak, right, which is a day or two, right? And it's a few
hours of like maybe a few days of the full year is that peak. And so you just have enough
capacity to absorb that peak load and all of a sudden you've transferred all. And today,
data center is only 3, 4% of the power of the U.S. grid. And by 28, there'll be 10%. But if you can
just unlock 20% of the U.S. grid like this, like it's like not that crazy. And the U.S. grid is
terawatt level, not hundreds of gigawatts level. Right. So we can add a lot.
lot more energy. It's not easy. I'm not saying it's easy. These things are going to be hard.
There's a lot of hard engineering. There's a lot of risks that people have to take. There's a lot of
new technologies people have to use. But Elon was the first to do this behind the meter gas.
And since then, we've seen an explosion of different things that people are doing to get power.
And they're not easy, but people are going to be able to do them. And the supply chains are just
way more simple than chips. Interesting. So I guess he made the point during the interview
that the specific blade for the specific turbine he was looking at,
the lead times for that go out beyond 2030.
And your point is that...
That's great.
There's so many other ways to make energy.
Like, just be inefficient.
Like, it's fine.
Right.
So you're like right now, I guess combined cycle gas turbines have capax of $1,500 per kilowatt.
And you're saying you could just, it would make sense to have either technologies
that are much more expensive than that or other things are getting cheap enough to that
to make it competitive.
Exactly.
Exactly.
You know, it can be as high as $3,500 per kilowatt, even.
So it could be twice as much as the cost of combined cycle.
And the total cost of the GPU, you know, on a TCO basis, has gone up a few cents per hour.
Right.
Again, because we've been talking about hopper pricing, a $1.40 now becomes, you know, oh, the power price doubles.
Okay, the hopper that was $1.40 is now $1.50 in cost.
Right.
It's like, oh, I don't care because the models are improving so fast that the marginal utility of them is worth way more than that tends to increase in energy.
Okay, and then so you're saying 20% of the grades are winter, what about, 20% of that can just come online from utility scale batteries, increasing what you'd be comfortable putting on the grade.
The regulatory mechanism there is like not easy, by the way.
But like that's 200 gigawatts, like if that hypothetically happens.
But you're saying on just from the different sources of gadget generation you mentioned, the different kinds of engines and turbines combined, how many gigawatts could they unlock by the end of the decade?
Yeah, so we're tracking in some of our data where all, you know, there's over 16 different manufacturers of power generating things just from gas alone, right?
So, you know, yes, there's only three turbine manufacturers for a combined cycle.
But we're tracking 16 different vendors, and we have all of their orders and things like that.
And it turns out there is just hundreds of gigawatts of orders to various data centers.
As we get to the end of the decade, we think like something like half of the capacity that's being added will be behind the meter.
And when we look at like a lot of this is actually behind the meter is almost always more expensive than grid connected, but there's just a lot of problems with getting grid connected and, you know, permits and interconnection cues and all this sort of stuff.
So it ends up being even though it's more expensive, people are doing behind the meter.
And then what they're doing behind the meter with ranges widely, right?
It could be reciprocating engines.
It could be ship engines.
It could be error derivatives.
It could be combine cycle, although combine cycle is not that great for behind the meter.
It could be bloom energy fuel cells.
It could be solar plus battery, right?
Like, it could be any of these things.
You're saying any of these individually could do like tens of gigawatts?
Any of these individually will do tens of gigawatts and in a whole they will do hundreds of gigawatts.
Okay.
So that alone should more than.
I mean, it's going to take, I mean, like, electrician wages probably double or triple again, right?
And like, there's going to be a lot of new people entering that field and there's going to be a ton of people who make money.
But it is something that I don't, like, I don't see that as the main bottleneck, right?
So right now in Abilene, the 1.2 gigawatt data center that Crusoe is building for Open AI, I think they have like 5,000 people working there, or at peak they had did.
And if you turn that into 100 gigawatts, and I'm sure things will get more efficient over time, but that would be like 400K people it would take to build 100 gigawatts.
And if you think about the U.S. labor force of how many electricians there are, how many construction workers there are,
Yeah, I guess there's like 800K electricians.
I don't know if they're all substitutable in this way.
There's millions of construction workers.
But if we're in a world we're adding 200 gigawatts a year,
are we going to be crunched on labor eventually,
or do you think that is actually not a real constraint?
So labor is a humongous constraint in this.
People have to be trained.
Likewise, we probably start importing the highest skilled at labor in this way, right?
Because now it makes sense that, you know,
hey, a really high-skilled electrician in Europe who was working on destroying power plants now comes to America and is building data center, you know, high voltage electricity, you know, power moving across the data center, right? Something like this, right?
Humanoid robots maybe start to, or robotics at least start to, but the main factor is going to be for reducing the number of people is modularizing things and making them in factories in Asia, unfortunately, but, you know, at least for America, but, you know, Korea, Southeast Asia.
in many ways China as well
but these areas are going
to do, are going to
ship more and more built
out
sections of the data center
and those will be shipped in, right?
Maybe today you, you know, you currently ship
servers in or a rack in
and then you plug that into, you know,
different pieces that you're shipping from different places.
But now you'll ship it to a factory
and integrate the entire, you know,
hey, maybe this is a two megawatt block
and this block
goes from, you know, high voltage power to the, you know, the voltage power that you, the voltage
and maybe DC that you deliver to the rack instead of being AC and high voltage, right?
Or something like this, right?
Or cooling, you take, you ship a fully integrated thing that has a lot of the cooling subsystems already
put together.
Or, because plumbers are also a big constraint here.
Or, furthermore, you take, instead of just a single rack and now you have people wiring up
all these racks of power and electricity and blah, blah, blah, blah, blah, blah, you take a skid and you put
an entire row of servers, and that is shipped from the factories.
And today, a single rack may be 120, 140 kilowatts, but as we get to, you know,
next generation, you know, invidia khyber and things like that, it's almost a megawatt.
And then in addition, if you do an entire row, it'll have the rack, it'll have the networking,
and it'll have the cooling and the power racks all integrated together.
So now when you come in, actually, you have much less stuff to cable, whether it be networking,
fiber, whether it be the power, right? There's fewer power things to connect, and then there's fewer
plumbing things to connect, right? And so this drastically can reduce the amount of people working in
data centers, and therefore the capability to build these will be much larger. And along the way,
there will be, you know, new things mean, you know, some people move faster to new things,
some people move slower, right? Crusoe and Google have been talking a lot about this modularization,
as has people like meta and, you know, many others, right, have been talking a lot about this
modularization and others are going to be slower to doing it. But at the end of the day,
you know, and people who move faster to new things may have more delays or people who are
slower have labor problems. So there will always be dislocations in the market because this is
a very complex supply chain. At the end of the day, it's still simple enough that we will be able
to solve it through capitalism and human ingenuity on the time skills that are required.
Yeah. Okay. So speaking of big problems to solve, I, um, uh, Elon Musk is very bullish on space
If you're right, that power is not a constraint on Earth.
I guess the other reason they would make sense is that even you can, there is enough,
there will be enough gas turbines or whatever to build it on Earth.
I think Elon's next argument then is like you can't get the permitting to build hundreds of
gigawatts on Earth.
Do you buy that argument?
Land-wise, it's pretty, America's big.
Data centers don't take that much space.
You can, you can solve that.
Permitting-wise, air pollution permits are a challenge, but the Trump administration's made
it much easier.
you go to Texas and you can skip a lot of this red tape.
And so, you know, Elon had to deal with a lot of like this complex stuff in Memphis and then building a power plant across the border and all these things for Colossus 1 and 2.
But at the end of the day, there's a lot more you can get away with in the middle of Texas, right?
Given that Elon lives in Texas, why didn't you just go to Texas?
I think it was partially like they over indexed on grid power for a temporary period of time, right?
Because that's just what they thought they needed more of.
You said an aluminum refinery connected to the grid there.
It was an appliance factory that was dilated.
That was idled.
But I think they may have indexed more to what was grid power.
They may have indexed more to like water access and gas access because actually I think
they bought that knowing that the gas line was right there and they were going to tap it.
Same with water.
It was a whole host of different constraints.
It was probably an area where electricians and things like that were easier to find.
But at the end of the day, I'm not exactly sure why they chose that site.
I bet Elon would have chosen somewhere in.
Texas if you could have like gone back. But yeah, because of the regulatory faces these challenges
he's faced, it's it's ultimately like permitting is a challenge, but America is a big place
and there are 50 states and things will get done. And there are a lot of small jurisdictions where
you can just transport in all the workers that you need for a temporary period of six months to a
year depending on the type of contractor. It can be even three months for depending on the type of the
contractor that's coming in and put them in temporary housing, pay out the butt because labor is
very cheap relative to the GPUs and the power, or not the power, but the GPUs and the like
the networking and so on and so forth and the end value of the tokens it's going to produce.
So all of these things have plenty of room to like be paid for.
And so I think it's fine, right?
And also people are diversifying now, right?
Australia, Malaysia, Indonesia, India.
these are all places where data centers are going up at a much faster pace,
but currently still 70% plus of the AI data centers are in America,
and that continues to be the trend.
And so I think people are figuring out how to build these things and permitting.
Like, I just like ultimately like permitting in red tape in middle of nowhere, Texas,
or middle of nowhere Wyoming or middle of nowhere like New Mexico is probably a hell of a lot easier
than sending stuff into space.
Right.
Well, other than the fact that the economic argument makes less sense once you consider the fact that energy
is a small fraction of the cost of ownership of a data center.
What are the other reasons you're skeptical?
Yeah, so obviously power is free in space, basically.
That's the reason to do it.
Yeah, that's the reason to do it.
But then there's all the other counter arguments, right?
Which is because even if power costs double,
you're still at a fraction of the total cost of the GPU,
the main challenges is, and what we've seen that disperses, right?
We have cluster max, which rates all the neoclouds,
and we test them.
We test over 40 cloud companies,
including the hyperscalors and neoclouds,
what differentiates some of these clouds the most
outside of software is their ability
to deploy and manage failure, right?
GPUs are horrendously unreliable.
Even today, 15% of Blackwells or so
that get deployed have to be RMA'd.
You have to take them out.
You have to maybe just plug them and plug them back in,
but sometimes you have to take them out
and ship them to Nvidia, or rather there are partners
who do these RMAs and such.
What do you make a VELONS-Kine argument
that once you have the initial,
after initial phase, they actually don't fail that much?
Sure, but now you've done this, you've tested them all,
you deconstructed them, put them on a spaceship,
fucking put them into space,
and then put them online again, that's months, right?
And if your argument is that, you know,
hey, GPUs have a useful life of X years, right?
If a GPU has a useful life of five years
and it takes three additional months,
probably six, let's say six additional months,
then that is 10% of your cluster's useful life.
And because we're so capacity constrained,
that compute is most valuable
theoretically in the first six months you have it
because we're more constrained now than in the future
because that compute now can contribute
to a better model in the future
or can contribute to revenue now
which you can use to raise more money
to get about all these sorts of things
now is always the most important moment
and so you've delayed your compute deployment
by six months potentially
and the thing that separates these clouds
is we see clouds that take six months
to deploy GPUs today on Earth right
we see clouds that take a lot less than six months
right and so
the question is, where does space get in there?
I don't see how you would test them all on Earth, deconstruct them and ship, ship them
and shoot them into space, and it not take longer than just putting them in the spot that
you were testing them.
Yeah.
So the question I wanted to ask is the topology of space communication.
So right now, Starlink satellites talk to each other at 100 gigabits per second.
And you could imagine that being much higher with optical inner satellite laser links that
are optimized for this.
and that actually ends up being quite close to the infiniband bandwidth,
which is like 400 gigabytes a second, right?
But that's per GPU, not per rack.
I see, okay.
So multiply that by 72.
Also, like, that was Hopper when you go to Blackwell and Rubin,
that 2Xs and 2Xs again.
All right.
But how much compute is happening per, like, during inference,
are the different scale up still working together,
or is it just happening, it's a batch within a single scale up?
A lot of models fit within one scale up domain, but many times you split them across multiple scale up domains.
I think that you really have to, as models become more and more sparse, at least this is like the general trend,
then you want to ping just a couple experts per GPU.
And if leading models today have hundreds, if not thousand experts, then you'd want to run this across hundreds of chips or thousands of chips.
even as we continue to advance into the future.
And so then you end up with this problem of,
well, now you need to connect all these satellites together comms-wise as well.
Okay, so that would be tough.
Because I was imagine if there's a world where you could do a batch,
inference for a batch on a single scale-up,
then maybe it's more plausible.
But if not, then it's as much tougher.
Yeah, I mean, networking these ships together is a problem.
And you can't just make this satellite infinitely large, right?
Like, there are a lot of challenges with physics to making a satellite really big, right?
So then these inner, that's why you need these inner interconnects between the satellites.
Those interconnects are more expensive than the, you know, a cluster like 20% of the cost or 15% of the cost is networking.
All of a sudden, now you're making it like space lasers instead of like pretty simple, like lasers that are manufactured in millions of volumes with, you know, plugable transceivers.
And those things are very unreliable as well.
More unreliable than the GPUs, by the way.
Across the life of a cluster, you have to unplug, clean it all the time, right?
unplug, replug it just for random reasons.
These things are just not as reliable.
So you've got that problem as well.
Like you've got a more expensive, complicated space laser to communicate instead of this
plugable optical transceiver that's been in super high volume.
Okay, so all in all, what does that imply for space data centers?
So space data centers effectively are not limited by, you know, hey, we have this energy advantage.
It's actually just limited by the same contended resource.
We can only make 200 gigawatts of chips a year by the end of the decade.
So what are we going to do to get that capacity?
It doesn't matter if it's on land or in space.
It doesn't really matter, right?
Because you can build that power.
And I think human capabilities and capacity could get to the period where we're adding a
terawater year globally of various types of power.
At some point, we do cross the chasm where space data centers make sense, but it's not this decade, right?
It is much further out once you have energy constraints actually being a big bottleneck,
once you have space, land, permitting, be a much bigger bottleneck as it subsumes more and more
of the economy.
And chips are no longer the bottleneck because chips are the biggest bottleneck.
And so you want them deployed working on AI the moment they're done being manufactured.
And so there's a lot of things people are doing to increase that speed faster and faster,
whether it be modulizing data centers or even modulizing racks where you actually
put the chip in at the data center, but only the chip and everything else is already wired up and
ready to go at the data center. So there's things like this that people are doing to decrease that
time that you cannot do in space. And at the end of the day, all that matters in a chip-contrained
world is get these chips working on producing tokens ASAP in a world, you know, maybe 2035,
once the semiconductor industry and ASML and ZICE and all these other suppliers, land research
applied materials, fab manufacturers, like pendulum swings and they're able to make enough chips
and really we're optimizing every dial and like it makes sense to optimize the 10% of energy
costs or 15% of energy costs or as we move to A6 potentially and Nvidia's margins aren't 70 plus
percent. Maybe that energy cost is 30% of the cluster and fab construction, all this.
Like these are the things are data center construction. These are the things to optimize.
But that's not a, you know, Elon doesn't win by doing, you know, 20% gains.
Elon never wins that way.
Elon wins when he swings for the fences and does 10x gains, right?
That's what SpaceX is about.
That's what Tesla was about.
That's what all of his success has been about, right?
It's not a been about these chasing the 20%.
So I think space data centers will eventually be a 10x gain, potentially, as Earth's resources get more and more contentious.
But that's not this decade.
Yeah.
I mean, I think just to drive some intuition about how much land there is on Earth, obviously the ships themselves, especially if we move to a world where you have
racks that have megawatts, a
mega-watt-e charge, like literally,
it's not even a rhino factor.
That's the other thing, right?
The power-dense, you know,
if chips and manufacturing is the constraint,
right now, roughly, it's one watt per millimeter squared
for AI chips and such.
One easy way is to pump that to two watts per millimeter square.
Now, you may not get 2X the performance.
You may only get 20% more performance,
and that requires much more exotic cooling,
right? It requires more complicated cold plates
and very complicated liquid cooling,
or maybe it requires things like emergent,
cooling, but in space, higher watts per millimeter is very difficult, whereas on Earth, these
are solved problems.
And one of these things enables you to get a lot more tokens.
Maybe it's 20% more tokens per way for that's manufactured.
And that's a humongous way.
So that millimeter, you mean of diarrhea?
Yeah, of diarrhea.
Square millimeters of a direa.
I mean, it would be better for space because if you can run more watts per millimeter
would be the chip runs hotter and the hotter the chip.
I guess this is a question of computer chip engineering, but it cools to the power
or fourth by Stefan Boltzmann's law.
So if you can run a very hot chip because it allows a lot of go.
No, no, but you can't run it hotter.
You can only run it denser.
And the problem is getting the heat out of that dense area means you have to move away
from standard air cooling and liquid cooling to more exotic forms of liquid cooling or even
immersion to get to higher power densities.
And that's more difficult in space than it is on Earth.
Yeah.
And maybe it's at this point worth explaining what exactly a scale-up is and what it looks
like for Nvidia versus Traneum versus TPUs.
Yeah, so earlier I was mentioning how communication within a chip is super fast.
Communication within chips that are in the same rack is fast, but is not as fast.
And then, you know, it's on the order of terabytes.
And then communication very far away is on the order of gigabytes, hundreds of gigabytes, right?
So this order of magnitude as you get further distance compute and maybe across the country,
it's on the order of gigabytes a second, right?
Scale up domain is this like tight domain.
where the chips are communicating on the order of terabytes a second.
And so for Nvidia, previously this meant an H-100 server had eight GPUs,
and those eight GPUs could talk to each other at terabytes a second.
With Blackwell, NVL-72, they implemented rack scale up.
And that meant all 72 GPUs in the rack would connect to,
could connect to each other at terabytes a second speed.
And the speed doubled gen on gen, but also the most important innovation they did was going from
8 to 72 in the domain.
When we look at Google, their scale-up domain is completely different, right?
It is always been on the order of thousands, right?
With TPUV-4, they had pods the size of 4,000 chips.
With V8, they have pods, or V7, they have pods in the 7,000, or sorry, 8,000, 9,000 range.
And what's relevant here is that it's not the same as Nvidia, it's not like-for-like.
Google has a topology that's a tourist, right?
So every chip connects to six neighbors.
rather than in vidia, the 72 GPUs connect all to all, right?
So they can send terabytes a second to each other to any arbitrary other chip in that pod of scale up, whereas Google, you have to bounce through chips, right?
So this means if TPU1 needs to talk to TPU 76, then it has to bounce through various chips.
And there is always some blocking of resources when you do that.
So because that one TPU is only connected to six other TPUs.
And so there's a difference in topology and bandwidth, and there are tradeoffs and advantages of both, right?
Google gets to have a massive scale-up domain, but then they have the trade-off of you have to
bounce across chips to get from one chip to another. You can only talk to six direct neighbors.
And so there is like this trade-off. And Amazon, it has mutated their scale-up domain.
They're somewhere in between Nvidia and Google effectively, where they're trying to make
larger scale-up domains. They try and do all-to-all to some extent, which is what with switches,
which is what Nvidia does, but also to some extent they use tourist topologies like Google does.
And as we advance forward to next generations,
all three of them are moving more and more towards a dragonfly topology,
which means there's sort of like there is some fully connected elements
and there are some elements that are not fully connected.
So you can get the scale up to be hundreds or thousands of chips,
but also have it not contend for resources when you're bouncing through chips.
Related question.
I heard somebody make the claim that the reason the parameter scaling has been slow
and only now are we getting bigger and bigger models from Open AI and Anthropic is that,
so original GP4 is over a trillion parameters, and only now are models starting to approach that again.
And I heard a theory.
The reason is that Nvidia's scale-ups have just not had that much memory capacity.
And so what was the claim exactly?
If you have, say, one, five, unless you have a 5T model running at FP8, so that's five trillion gigabytes.
Yeah.
And then you have the KV cache.
Let's say it's like, let's say it's the same size for one batch.
So you need 10 gigabytes, sorry, 10 terabytes to be able to run.
A single forward pass, yeah.
And then only with the GB200 and VL72 do you have an Nvidia scale up that has 20 terabytes.
and before that they were much smaller.
Whereas Google, on the other hand,
has had these huge TPU pods that are not all to all,
but still have, I think, hundreds of terabytes
of capacity in a single scale-up.
So does that explain why parameter scaling has been slow?
I think it's partially the capacity and bandwidth,
but also as you build a larger model,
the ability to deploy it is slower, right?
Like in terms of, like, hey, what is the inference speed for the end user?
That's kind of irrelevant.
What's really relevant is R.L.
And what we've seen with these models and allocation of compute at a lab is sort of there's,
there's a few main ways you can allocate compute.
You can allocate it to inference, i.e. revenue, you can allocate it to development, i.e. making
the next model, and you can allocate it to research.
And in development specifically, you can split it between pre-training and RL, right?
And so when you think about, hey, what exactly is happening?
Well, the model, the compute efficiency gains you get from research are so large.
you actually want most of your compute to go to research, not to develop it, because, you know, all these researchers are generating new ideas, trying them out, testing them, and continuing to march along this and push the prado optimal curve of scaling laws further and further and further.
And at least what we've seen empirically is, like, model cost gets 10x cheaper every year or even more than that, which at the same scale gets 10x cheaper or get, you know, to get to reach new frontiers it costs the same amount or more, right?
so you don't want to
you don't want to allocate too many resources
to pre-training and
an RL, you actually want to allocate most of your resources
to research. And then
in the middle is this sort of this development period.
If you pre-train a $5 trillion
parameter model, now
you have to spend
all this time, how many rollouts do you have to
do in these RLs? And these rollouts
for a trillion parameter model versus a
$5 trillion parameter model are five times
larger, which then means it takes
if you wanted to do as many rollouts, maybe
the larger model is more sample efficient. Let's say it's 2x more sample efficient. Okay, great. Now you need
two and a half X much time of RL to get the model smarter. Or you could RL the smaller model for
2X the time and you'd still have a 50% or you'd still have a 25% difference in the big model,
which is 2x more sample efficient and doing X number of rollouts versus the small model, which
is a trillion parameters doing, although it's less sample efficient, is doing twice as many
rollouts, it's still done faster. And so you get the model faster sooner and you've done more
RL, and then you can take that model to help you build the next models, help your engineers,
train, and do all these research ideas.
And so this feedback loop is actually weighed towards smaller models in every case, no matter
what your hardware is.
And then as you look to Google, Google does deploy the largest production model of any of the major
labs, right, with Gemini Pro.
It is a larger model than GPT-5-4.
It's a larger model than Opus.
And so you end up with, yes, Google does this because they have a unipolar set of compute, right, almost all TPU.
Whereas Anthropic is dealing with H-100s, H-200s, Blackwell, Tranioms, TPUs of various generations, right?
And Open AI is dealing with mostly InVITA right now, but going towards having AMD and Traneum as well.
The fleets of compute, like Google can just optimize around a larger model,
And they can leverage 1,000 chips in a scale-up domain to get, you know, the RL speed much faster so that you can actually have this feedback loop be fast.
But at the end of the day, in isolation, you almost always want to go with a smaller model that gets R-led faster and gets deployed into research and development.
Right.
So you can build the next thing and get more compute efficiency wins.
And then this compounding effect of, oh, I made a smaller model that I RLed more, that I then deployed into research and development earlier.
and I spent less compute on the training itself
because I was able to allocate more compute
to the research.
This compounding effect of being able to do the research
faster and faster and faster is potentially a faster takeoff.
And that's all these companies want
as fastest takeoff possible.
Okay, spicy question.
You know, you're explaining you make the,
a semi-analysis sells these spreadsheets
and you're always like,
six months ago or a year ago,
we told people the memory crunch
or now you're telling people the clean room crunch
and then in the future the tool crunch.
Why is Leopold the only person that is using your spreadsheets to make outrageous money?
What is everybody else doing?
I think there are a lot of people making money in many ways.
I think obviously Leopold jokes that, you know, he's the only client of mine that tells me our numbers are too low.
Everyone else tells me our numbers are too high, almost ad nauseum, you know, whether it's a hyperscaler saying, hey, that other hyperscaler, their numbers are too high, you know, and we're like, nah, that's it.
And they're like, no, no, no, no, it's impossible, blah, blah, blah.
And then you're like, finally have to convince them through all these facts and data when we're working with hyperscalers or AI labs that, in fact, no, that number isn't too high.
That's correct.
But eventually, like, sometimes it's like six months later it takes them to realize or a year later.
I think other clients, like on the trading side also use our data, right?
We sell data to a lot of, you know, I think roughly 60% of my business is industry.
So AI labs, data center companies, hyperscalers, semiconductor companies, you know, you.
the whole supply chain across AI infrastructure.
But then like 40% of our revenue is like hedge funds, right?
And, you know, I'm not going to comment on who our customers are,
but I think a lot of people use the data.
It's just how do you interpret it?
And then what do you like view as beyond it?
And I will say Leopold is pretty much the only person who tells me my numbers are too low always.
And sometimes he's too high.
Sometimes I'm too low, right?
But in general, I think other people are, you know, doing that.
and you can check certain, you can look across the space at hedge funds and look at their 13
Fs and see actually they own maybe not exactly what Leopold does, because it's always like a question
of like, what is the most constrained thing, what's the thing that's going to be, that's most
outside of expectations.
And that's what you're really trying to exploit is inefficiencies in the market.
And in a sense, what our data shows is like making the market more efficient by making the base
data of what's happening more accurate versus like, but in a sense, I think many, many funds do
trade on information that is out there.
And it's not, I don't think, I don't think Leopold's the only person.
I think he has the most conviction on the entire, in the entire, like, about the AGI takeoff, though, right?
Right.
I mean, but the bets are not about like what happens in 2035.
The bets that you're making that are at least exemplified by public returns we can see for
different funds, including Leopold, about what has happened in the last year.
And the last year stuff could be predicted using your sprint.
spreadsheets, right? So it's like, it's less about, it's about buying like the next year of spreadsheets.
They're not just spreadsheets. You know, there's reports. There's API access to the data. There's a lot of data. But anyways, you know, I think. Do you see what I mean? Like it's like, it's not about some crazy singularity thing. It's about like, oh, do you buy the memory crunch? A simple one, though, is like, you only buy the memory crunch if you believe AI is going to take off in, in a huge way. And the memory crunch, a lot of it was predicated on like, you know, at least for like people in the Bay Area who think about infrastructure. It's like obvious. KV. KV. Cash.
explodes as context lenses go longer, so you need more memory, and then you do the math,
and you also have to have a lot of supply chain understanding of like what fabs are being built
and what data centers are being built and how many chips and all these things. And so we,
we track all these different data sets like very tightly. But at the end of the day, it takes,
you know, someone to fully believe that this is going to happen. Like, I think a year ago,
if you told someone memory prices were quadruple and smartphone volumes are going to go down 40%
you know, over the over the year or two after that, people were like, you're crazy.
That never happened, except a few people do believe that, and those people did trade memory, right?
And people did.
I don't think, like, Leopold was the only person buying, like, memory companies.
I think there were a lot of people buying memory companies.
He, of course, sized and positioned and did things in a better ways than some, maybe most, right?
I don't want to comment on whose returns or what, but he certainly did well.
But other people also did really well, right?
Trying to be, like, this is, wow, you've made me diplomatic for the,
the first time ever.
No, no, you're fine.
I think it's hilarious, right?
I'm being a diplomat, you know,
whereas usually I'm like spicy.
Yeah.
Okay.
Maybe some rapid fire to close out.
Can TSMC, if you're saying, look,
the memory logic, et cetera,
the N3 is mostly going to be AI accelerators,
but then there's N2, which is mostly Apple now.
And then in the future, I guess,
AI would also want to go on N2.
can they kick out Apple if
NVIDIA and Amazon and Google say,
hey, we're pulling to pay a lot of money for N2 capacity?
So I think the challenge of this is chip design timelines take a long while.
And so that's more than a year.
And the designs that are on two nanometer are more than a year out.
And so what would really happen is Apple or sorry,
Nvidia and all these others will be like,
hey, we're going to prepay for the capacity.
And you're going to expand it for us.
and then Apple would be, and maybe TSM takes a little bit of margin, but not a ton,
they're not going to kick Apple out entirely, right?
What they're going to do is when Apple orders X, they may say, hey, we project you only need
Y or X minus one.
And so that's what we're going to give you is X minus one.
And then that flex capacity Apple's kind of screwed on.
Whereas traditionally, Apple is always overordered by like 10 percent and cut back by 10%
over the course of the year.
And some years, they hit the entire 10%.
Just, you know, volumes vary, right, based on the season, a macro, blah, blah, blah, blah, blah.
And so I don't think TSM would kick out Apple.
I think Apple will become a smaller and smaller and smaller percentage of TSM's revenue
and therefore be less relevant for TSM to cater to their demands.
And TSM could eventually start saying, hey, you've got to pre-book your capacity for next year for two years out.
And you have to prepay for the CAPEX because that's what Nvidia and Amazon and Google are doing.
Yeah.
I wonder if it's worth going to specific numbers.
I don't have any of them on the hand of how many N2Wafers
or what percentage of N2
does Apple have its hands on
versus over the coming years
versus AI?
Yeah, I mean, this year
Apple has the majority of N2
that's going to get fabricated.
There's a little bit from AMD.
They are trying to make some AI chips
and CPU chips early.
There's a little bit.
But for the most part, it's Apple.
And as we go forward to the year after that,
Apple still gets closer to like half of it
as other people start ramping.
But then it falls drastically, right?
Just like for N3,
they were half.
We'll see, and when I say N2, that includes A16, which is a variant of N2,
over time, those nodes will be the majority.
And what's also interesting is traditionally Apple's been the first to a process node.
Two nanometers actually the first time they're not.
Well, besides Huawei, right?
Huawei back in 2020 and before was the first with Apple, but they were both making smartphones.
Now with two nanometer, you've got AMD trying to make a CPU and a GPU.
chiplet that they used advanced packaging to package together in the same time frame as Apple.
And this is a big risk for AMD that causes potential delays potentially because it's a brand
new process technology. It's hard. But at the end of the day, this is a bet that they want to do
to scale faster than Nvidia and try and beat them. As we move forward, actually, when we
move to the A16 node, the first customer there is not even Apple. It's AI. And as we move forward,
that will become more and more prevalent,
not only will Apple not be the first to a node,
they will also not be the majority of the volume to the new node,
and then they'll just be like any old customer,
and because the scale of TSM's CAPEX keeps ballooning,
but Apple's business is kind of not growing at the same pace,
they become a less and less relevant customer.
And they also will just cut their orders
because things in the supply chain are kicking them out,
whether it be packaging or materials or DRAM or NAND,
these things are increasing in costs.
They can't pass on all the cost of customers likely because the consumer is not that strong.
And you end up with like this conundrum where they are just not Apple, TSMC's best bud like they have been historically.
Do you think if Huawei had access to three nanometer, they would have a better accelerator than Rubin?
Potentially, yeah.
I think Huawei, they were the first for the seven nanometer AI chip as well.
They were the first with a five nanometer mobile chip, but they were the first with a seven nanometer AI chip.
the Huawei Ascend was like two months before the TPU and like four months before
invidia's I want to say was it V100 or A100?
A100, I think.
And so, you know, I mean, that's just moving to a process.
No, that doesn't imply software.
It doesn't imply hardware design, all these other things.
But Huawei is arguably the only company in the world that has all the legs, right?
Huawei has cracked software engineers.
Huawei has cracked networking technologies.
that's in fact their biggest business historically, right?
And they have cracked AI talent.
But furthermore, beyond Nvidia, they actually have better AI researchers.
And furthermore beyond Nvidia, they have their own fabs.
And furthermore beyond Nvidia, they have their own end market of like selling tokens
and things like that.
And Huawei tends to be like they're able to get the top, top, top talent.
Invidias as well, but not as in much concentration.
And Huawei has a bigger pool in China.
it's very arguable that Huawei, if they had TSM, would be better than NVIDIA.
And there are areas where China has advantages outside of areas that Nvidia can't access as easily, right, around not just scale, but also like some things around, you know, certain optical technologies.
China's actually really good at.
So there's certain, I think it's very reasonable that if in 2019 that Huawei was not banned from using TSMC,
Huawei would
had already eclipsed Apple
as the biggest TSM customer
and Huawei has huge share
in networking and compute
and CPUs and all these things
they would have kept gaining share
and they'd likely be
TSM's biggest customer
Wow, that's crazy.
I've got kind of a random
final question for you.
So the other part of the Elon interview
was robots.
And so if
human noise take off faster
than people expect,
if by 2030
there's millions of human noise
running around
which each need
local compute,
any thoughts on what that implies?
What would be required for that?
You know, there's a lot of like difficulties with like the VLMs and all these things that people, VLAs that people are deploying on robots.
But to some extent you don't need to have all the intelligence in the robot.
And it would be much more efficient to not do that, right?
Because in the server, in cloud, you can batch process and all these things.
So what you may want to do is, hey, a lot of the planning and longer horizon tasks are determined by,
by a much more capable model in the cloud
that runs at very high batch sizes.
And then it pushes those directions to the robots
who then interpolate between each subsequent action
or is given like, hey, pick up that cup
and then the model on the robot can pick up the cup.
And it's like as it's picking up, it's like, oh, you know, in fact,
this, you know, things like weight and all these things
might have to be and like force may have to be like determined
by the model on the robot, but not everything needs to be like,
you know, hey, pick up the robot, you know, this, right?
or like, hey, that's a headphone.
Actually, I'm the supermodel in the cloud.
I know that this headphones are, you know, Sony XM6s,
which is not a darkish ad spot, but, you know.
I'm like, why he's this guy plugging this thing so hard?
It's like on the table.
It's like on his neck when we're entering Satya together.
Like, is he getting paid by Sony?
Unfortunately not.
Unfortunately not.
But anyways, like, you know, it might say, hey, the headband is soft.
And this is the weight of it and all these things.
And then the model on the robot can be less intelligent
and take these inputs and.
do the actions. And it may get told by the model in the cloud every second, every 10 times a second.
Maybe, you know, it depends on the hurts of the action, but a lot of that can be offloaded to the
cloud because otherwise, if you do all of the processing on the device, I believe it would be more
expensive because you can't batch. Two, you couldn't have as much intelligence as you do in the
cloud because the models will just be bigger in the cloud. And three, we're in a semiconductor
shortage world, and any robot you deploy needs leading edge chips because the power is really bad for
robots, right? You need it to be low power and efficient. And all of a sudden, you're taking power
and chips that would have been for AI data centers and you're putting them in robots.
Yeah. So now that 200 gigawatts gets lower if you're at, if you're deploying millions of
humanoids. I think this is very interesting because something people might not appreciate about the
future is how centralized in a physical sense intelligence will be. We're right now with humans,
your compute, like there's eight billion humans and their compute is on their heads, on their
person. And in a future,
even with robots that are out of physically in the world.
I mean, obviously knowledge work will be done in a centralized way from data centers with
huge, like hundreds of thousands of instances or maybe millions of instances.
But even for robotics, the future you're suggesting is one where there's like more centralized
thinking and centralized computation that's driving, you know, millions of robots out in the world.
And so I think that's just like, yeah, there's an interesting fact about the future that I think
people might not appreciate.
I think Elon recognizes this, which is.
why he's like going to different places for his chips, right? He signed this massive deal with
Samsung to make his robot chips in Texas because he thinks, you know, like I personally think
he thinks that, you know, Taiwan risk is huge. And because of that and the centralization of
resources in Taiwan, him having his robot chips in Texas and also being a separate supply chain
that is not as constrained by no one's making AI chips really on Samsung besides
Nvidia's new LPU that they're launched.
They're launching it next week, but we're recording it the week before.
It's coming out this week.
This episode's coming out Friday.
Oh, this episode's coming out before.
Sick.
So they're launching this new AI chip next week for, which is built on Samsung, but that's like sort of a recent development from Nvidia.
And then that's the only other AI like AI demand there, whereas on TSM, everything is competing.
So he gets this like both geopolitical diversification, but also supply chain diversity for his robots.
And he's not as competing as much with the willingness to pay of infinity of the data center of geniuses.
Okay.
Final question.
On Taiwan, if we believe that tools are the ultimate bottleneck, how much of Taiwan's place in the AISD semiconductor supply chain could we de-risk simply by having a plan to airlift every single process engineer or TSM out when things come to, if they're a semiconductor supply chain, could we de-risk simply by having a plan to airlift every single process engineer or TSM out when things come to, if they,
get blockaded or something. Or do you actually still need to ship out the EUV tools,
which would be multiple plane loads per single tool and would not be practical?
If you ship out all the process engineers and assuming it's like hot enough that you destroy
the fabs, no one has all the fabs in Taiwan now, which is a big risk, right?
You know, these tools actually use a lot of semiconductors, which are manufactured in Taiwan.
So it's like a, it's like a, you know, a snake eating its own tail sort of like meme because you
can't make the tools without the chips from Taiwan, which you can't use.
without the tools in Taiwan.
There's obviously some diversification there,
and they don't use super advanced chips in lithography tools,
but at the end of the day, there is some tail-eating the dragon.
Just shipping out all the engineers and blowing up the fabs
means China has a stronger semiconductor supply chain
than the rest of the world, right?
In terms of verticalization, now that you've removed Taiwan.
And now you've got all the know-how,
but you've got to replicate it in, let's say, Arizona,
or wherever for TSM.
And it's going to take a long time
to build all the capacity
that TSM has had built over the years.
And so you've drastically slowed
U.S. and global GDP,
not just growth,
you've shranked the GDP massively.
And you've got a lot bigger problems
and your incremental ability
to add compute goes to almost zero,
right?
Instead of hundreds of gigawatts a year
by the end of the decade,
let's say by the end of the decade
something happens to Taiwan.
Now you're at maybe like
10 gigawatts across Intel and Samsung or 20 gigawatts, it's like nothing.
Right.
But now all of a sudden you've like really cost some crazy dynamics in AI.
Of course, you have all the existing capacity, but that existing capacity pales in comparison
to the capacity that's being expanded.
Yeah.
Okay, Dylan, that was excellent.
Thank you so much for coming on the podcast.
Thank you for having me and see you tonight.
Yes.
