@HPC Podcast Archives - OrionX.net - @HPCpodcast-100: Dr. Ian Cutress on the State of AI and Advanced Chips – In Depth
Episode Date: March 12, 2025Just before the GTC25 conference, and in the 100th episode of the full format @HPCpodcast, we welcome a very special guest, the great Dr. Ian Cutress, Chief Analyst at More Than Moore and host of the... popular video channel TechTechPotato to discuss the state of AI and advanced chips, new technologies and architectures, the startup scene, and top trends in semiconductor design & manufacturing. Join us! [audio mp3="https://orionx.net/wp-content/uploads/2025/03/100@HPCpodcast_ID_Dr-Ian-Cutress_State-of-AI-Advanced-Chips_20250312.mp3"][/audio] The post @HPCpodcast-100: Dr. Ian Cutress on the State of AI and Advanced Chips – In Depth appeared first on OrionX.net.
Transcript
Discussion (0)
Headed to NVIDIA GTC?
Visit Cool IT Systems at booth 1109 to see how they've been advancing liquid cooling with NVIDIA from 2009 to today and beyond.
Ensure performance and reliability for your next-gen AI systems with the world leader in liquid cooling.
Learn more at CoolITSystems.com.
ITSystems.com.
Last year we had $3 billion of new investment in those startups.
If you want a GPT-3 class model today, the inference cost is 1200 times less than it was a year ago.
Everybody wants a finger in every pie just in case. Like I said, say analog computing, there's optical computing,
neuromorphic computing, CGRAs, quantum,
adiabatic computing, biological computing.
All of these are in one state or another.
Is the future of the machine learning market
GPU or ASIC?
Because right now we have a lot of the cloud service providers
building their own silicon.
From Orion X in association with InsideHPC,
this is the At HPC podcast.
Join Shaheen Khan and Doug Black as they discuss
supercomputing technologies and the applications,
markets, and policies that shape them.
Thank you for being with us.
Hi, everyone.
This is the At HPC podcast.
I'm Doug Black at Inside HPC.
With me, of course, is Shaheen Khan of OrionX.net.
And our special guest today is Dr. Ian Cutrus.
He is chief analyst at More Than More,
a consulting firm that works with semiconductor companies.
He's also a host of the popular YouTube channel Tech
Tech Potato. Prior to launching more than more, Ian for 11 years was a senior editor
at UnunTech, a semiconductor technology review and news site. Ian, welcome.
Thanks, Doug. Thanks, Shaheen. Great to be here.
Yeah, thanks for being here. I have to say that when you were at UnunTech, you were one
of the must-read folks that I always tracked. And I was delighted when you went to More
Than More. We're going to have to talk about Moore's Law, I'm sure. But anyway, delighted
that you're here. Thank you for making it work.
It's been almost three years to the day.
Wow. Congratulations.
Thank you.
Thank you very much. Well done.
So we understand you were just at the big ISCC conference and there was a lot of discussion
and news coming out of Intel and TSMC about the future of process node technology. What
were some of the trends and news that you picked up there?
Yeah. So ISCC, International Solid-State Circuits Conference every year held downtown San Francisco,
a chance for industry and academics to show
off the latest and greatest in circuit design, as opposed to IEDM held in December, which
is Devices.
This year, the conference as a whole, as is usually the case, these conferences are getting
bigger and bigger and bigger each year, it seems.
Almost 3,000 people, 29 tracks, I think, at the end, over four days, lots and lots of people discussing
analog and SerD's connectivity, how to get it low power, how to get it faster or in both
dimensions or on newer process nodes.
I think the highlight for me was the section entitled SRAM.
Now in my day to day, one of the things I am keenly tracking is how process nodes compete
against each other.
We've got TSMC that have been on a tear in recent years, providing everybody with the
latest and greatest.
Intel, the former incumbent now playing catch-up.
Samsung exists somewhere in the middle, and then we have Global Foundries and a few other
players.
But it's really Intel versus TSMC with
Intel's previous CEO, Pat Gelsinger, and his famous five nodes, four years, I bet the company on 18a,
this next generation process nodes that's due to enter high volume manufacturing at the end of
this year. There were two papers, one from TSMC and one from Intel directly one after the other about their newest process
node technology.
So for Intel, that's called 18A.
A means Angstrom, but as I'm sure your listeners know, these aren't real dimensions.
These are just proper nouns, node names.
So that's Intel's 18A versus TSMC N2 or 2 nanometer class design.
Now one of the big discussion points was whether Intel's 18A was gonna catch up
with the performance of TSMC N2,
and we actually got a good hint of that at the conference.
What the two papers showed in this session
is that they were comparing SRAM densities,
so L1, L2, L3 caches on die SRAM densities
for high density and high performance designs.
And to the fourth decimal place, it turns out they're identical.
Wow, that's new.
So both companies were showing off 38.1 megabit per square millimeter density.
And just for comparison, seven nanometer is around about the 20 to 22 megabit per square
millimeter. So despite what everybody is saying about the difference between logic and memory
scaling as we advanced to new process nodes, according to TSMC, Moore's law is still alive.
They've been saying that for a few years.
Yeah. Compared to Jensen who says Moore's law is dead.
Well, I mean, Moore's law was supposed to be about transistor density on a two-dimensional
surface. That does seem to have plateaued if not stopped. And I think just by going
two and a half D, three D, that's how they get more transistors, but it's no longer per
square inch. It's more like per cubic inch. Isn't that true? Kind of. In this case, these
two papers at the conference were specifically talking about a two-dimensional fixed area and an increase in density in SRAM, which historically hasn't shown or people
have promoted it as not being scaling as well as it should have done.
But yeah, both companies were showing off some ways in which they've enabled this SRAM.
The key difference between the two is that Intel with its 18A process node
is introducing its power via technology. So this is the technique known as backside power
delivery. Both companies are enabling gate all around transistors, but backside power
delivery will be an Intel only design when they launch 18A. So that was a major difference
between the two. Intel is technically ahead on the technology, even
if both densities are the same.
So Ian, it sounds, I would say my quick impression is you're pretty impressed with what Intel's
doing there.
I am somebody who loves competition. Competition breeds innovation and you put people in a
box and they will innovate, innovate, innovate. The fact that both companies are now square
equal in their densities for their high density
designs, we just have to wait and make sure that the rest of Intel's process node is similarly
competitive.
That's been the bulk of the post-conference discussion after we've gone through the nitty-gritty
of these two papers is whether the rest of Intel's process node can deliver the next
generation of chips because Intel is promising to come to market about a year sooner with
its 18a versus TSMC's N2.
Do you have thoughts on how this comes to bear on this widely rumored talk of breaking
the company up, acquisition of the foundry business, and so forth?
Yeah, I've had for a long time now,
and I know Shaheen has heard this a few times.
Intel can't spin off its foundries right now.
I mean, they could do a partial IPO
to gain some value out of that side of the business,
but they can't split.
And there's a very distinct reason.
Most of Intel's foundries right now
build Intel 14 nanometer, 10 nanometer,
and 7 nanometer chips. In order to build those at Intel, the design tools to design the chips,
typically we refer to this in the industry as the EDA tools, the electronic design automation tools,
rather than using industry standards synopsis and cadence tools, which everybody else uses,
Intel uses their own for 14, 10, and 7.
And those tools aren't available externally to Foundry customers.
That means in order to keep those fabs up and running and producing chips and profitable,
they can only process Intel's own designs.
So if you split and say have a wafer agreement between foundry and product,
the value is there is next to nil because if product suddenly didn't want to use 1410
and seven, sure, there'd be a wafer agreement to get money back. But then you'd end up with
a foundry company that can't produce anything for anyone else either. Unlike AMD split with
global foundries, global foundries very quickly transition to a industry standard mechanism and they had most of the place anyway.
It tells me a process notes intel three and until it all fully sent you ninety nine point nine nine percent industry standard so that open for foundry but that's only one or two of intel's fabs worldwide can.
Process can manufacture that technology. So the split
would only happen post 2027 because this is a time we expect a industry standard design kit for 1410
and seven to be made available. Until then, I don't think the foundry has any value externally.
Yeah, I think they're pretty baked.
Ian, would you be comfortable talking about the departure of Gelsinger? What led to it?
Yeah, I can speak to this though. I should stress that this is mostly personal opinion.
I have almost no additional insight into what's already out there. If anybody's been following it that closely.
It did seem immediate. I mean, I've known companies who have changed CEOs and internally they've had a day's notice.
I'm not sure that most of Intel had even a day's notice at this point.
When Pat left, I saw the email and I instantly got up and went and live streamed about that
with my colleague, George Cosma from Chips and Cheese, because it's such a monumental
event. He was installed post-Bob
Swan after what was essentially a search that yielded no CEO. I'm under the impression he
was actually offered it after Brian Krasanch left and he said, well, I need to do this.
And here's his five-year plan, his manifesto, and the board said no, though two years later,
they came crawling back, it seems to say yes.
And Pat was under the impression that he would be able to see out his plan.
Pat is Intel through and through.
He bleeds blue.
Started there at age 18, worked his way all the way up through the company to reach a
CTO on the product side, helps design at USB and Itanium and a number of other things.
So the fact that he left in an instant in the same way that a few months later,
Lit Butan from Alden International left the board almost unexpectedly three months prior, makes me feel like that in some way those may be connected.
And here's my conspiracy theory if you'll allow me to indulge.
Please go ahead.
So there is a suggestion and this hasn't been confirmed publicly by Intel,
not by any stretch, that's the chairman of the board, Frank Urie may be going
around Washington DC trying to quote unquote offload, as you said, Doug, to
other companies or extract value from Foundry and products to other people.
If this is true, I wonder if Frank was doing
that back in August and that's what caused Litbu to leave. Then in September we had a
major board meeting and then come December it gets too much for Pat and Pat says, I don't
want you to split it up. And Frank turns around and says, tough, we need to do this to save
the company. And so Pat says, well, I'm out of here then. Yeah, that was more or less my guess as well. Also not having any info from any,
any, any source at all.
It feels like he wasn't allowed to execute his vision the way he needed to.
But he was consistent in saying, it's going to take five years and like three
years in, they lost appetite.
It's all the money was starting to run out.
That's also another factor.
But the proof will be in the pudding.
When Intel 18A hits the shelves, the first product
we believe will be Intel's Panther Lake,
which will ramp through the second half of 25.
Then we'll have Intel's Clearwater Forest,
which is a Xeon based product,
e-core based product coming out the first quarter of 2026.
They've announced that these will be the first real world examples of whether
that process nodes that Pat Gelsinger bet the company on is competitive against
the competition.
And there's a lot in the industry who appreciated Pat Gelsinger in his tough
position who wants to see him prove right.
But how's it different now?
What are they doing now that they couldn't do on their path?
I don't know, to be honest,
because what they're doing now
feels almost like a holding pattern.
Now they have two co-CEOs in Tarrim,
Michelle Johnson-Holthaus, who's also CEO of Product,
and David Zisner, who's also the CFO.
I know both of them.
Michelle's great.
Both well regarded, yeah.
Yeah, Michelle knows and understands Product almost better than anyone how she's come up through the
organization. Dave has been criticized for sounding as if he's overworked but
whenever I've had discussions with him during analyst calls he's always very
open. He doesn't seem, it feels to me he doesn't hide anything when we have those
discussions. So while Intel is in good hands, they do need
a CEO to right the ship. One question has been asked is if the chairman of the board
really is going around trying to partition bits of Intel off, can he do that without
a permanent CEO in place? And I've not found an answer to that.
Well, certainly less drama without one.
That's true. There was a rumor going around that Global Foundry CEO, Dr. Tom Caulfield,
might take that position.
He took himself out, right?
Well, he was stepping down from the CEO of Global Foundry's role. I think that takes effect 1st of
April or the end of April. He's moving to chairman of the board, which Global Foundry's have told me
is going to be his full-time role. There is another stock situation.
Some people noticed that almost $200 million of shares or $176 million was sold around
the time of Global Foundries announcement.
And that is roughly the same amount as Gelsinger's compensation package when he took over the
job. So people are putting two and
two together and getting somewhere between pi and five.
Is it fair to ask five years from now, your sense of Intel and overall picture?
Best case Intel product and Intel Foundry are competitive. TSMC is likely still going
to be the major Foundry leader, but people will have more
confidence in Intel Foundry.
People already have confidence in Intel packaging, advanced packaging.
The question is, can Intel bring that to Foundry customers on a competitive note?
Worst case, we get a split and everything gets mothballed.
This has been suggested that if Intel were to split and
or essentially offer partial IPO of different business units like what they're doing with
Altera and Mobileye but instead do it to Foundry and Proct, nobody's going to buy a Broadcom
Core Ultra 9 285K. Moreover, Broadcom won't want a consumer product line so they would
mothball certain product
lines that even if they are the cash cows, it's that sort of rhetoric that's echoing
in these circles right now.
Though, I guess when this podcast goes live, that may have died down a bit if there's no
added fuel to the fire.
What about their 14A fab?
So they are building out some of the fabs to bring high volume to 18A, 14A. 14A is expected to offer a high NA EUV variant so using the latest
generation technology from ASML. Right now there have been no serious
announcements or roadmaps on the future of that process node, just that they're going
to be offering differentiated versions of both 18 and 14 and then whatever comes beyond.
I guess one question to ask is, will they keep the same rate of pace of updates in process
node technology?
To be honest, it's hard to do it as fast as they're
doing it. You need the people in place, you need them in their long term, you need consistency.
Kudos on the 18a development. Any new CEO that comes in, I'm not going to say is going to have
to stay in top gear, but is at least going to have to keep their foot down in a lot of aspects
in order to remain competitive for as long as they can. Are we going to see their glass substrate?
Glass substrate, I hope so. You liked it. I know when they announced that that was a
pretty cool thing. So the things that get me up in the morning in this industry is when you can
showcase something like glass core substrate. And I know that there's so much research that goes behind it that
they don't want to talk about it yet.
And I want to tease every little bit of information out of them.
They also talked about this directed self-assembly thing.
I don't know whether that was part of 18 or 14.
Yeah.
DSA has been in the literature for a while.
It gets more complex as you go to smaller process nodes and how you implement it.
Uh, what, again, it's going to be one of those, you know, gen on gen improvements
that if it works and you can do it at scale, it provides to benefit to
performance or yield or power, you know, just like a high K and strange silicon and got moving to
FinFET has done back in the day.
Every process node right now, you know, has lots and lots of little
enhancements like that to improve it on the line.
The thing that didn't help Intel when it was during its 10 nanometer run is that
they tried to do too much at once.
TSMC's right now, it's the king of doing it a small steps at a time.
And eventually you do enough small steps, you get a big step.
Intel is now following that with its process node flow.
So Ian, we're moving toward GTC, AI and GPUs are, you know, I mean, the big thing going
on these days, maybe sticking with Intel, what's your overall sense of Intel in GPU
technology?
They canceled, they had Falcon Shores
change in status recently. Are they riding the ship on GPUs? Is this more of the same? Do you think,
what do you think of the direction they're going? It's, yeah, I think there's a delineation here,
the difference between GPUs and AI. Because of Nvidia and AMD, we kind of lump it all into the same bucket.
Intel's past is littered with the corpses of high-end performance silicon, projects
that it's cancelled over time. Falcon Shores becomes, even though it's relegated to test
chip status, is another one of those announcements so people look ahead towards their next generation Falcon Shores and if it's at the right time.
If we specifically speak about the AI business, last year, I think in the last 12 months,
Nvidia has made somewhere close to $100 billion on AI silicon as part of that business.
AMD has made over $5 billion in that business.
Intel last year failed to reach half a billion.
So less than 10% of the revenue of AMD on the AI Silicon through a mixture of
delays or design or inoperability between software packages or just not having a
good proposition or just fighting against
an extremely strong incumbent, whichever way you look at it.
Intel has acquired three AI hardware companies over the years, two for enterprise, one for
consumer.
The consumer one is still going and that's in their consumer products and that's working
fine.
But on the enterprise side, between Nirvana, the first acquisition and Havana, the second
one. fine. But on the enterprise side between Nirvana, the first acquisition and Havana, the second one, I made a recommendation in Intel's Q3 financials
last year that given the state the company is in, in the same way that Intel
kind of lost smartphone, maybe they've lost AI in this case, and given how much
they've invested, you know, it's sunken cost at this point.
Maybe it's a case of let's stay away from the hardware, focus on the software,
and then at a point where we become more financially stable on our feet,
then we acquire somebody worth acquiring.
With a technology that's more integrated into the community,
there are other experts who can do this.
Intel disagrees.
That's why Gaudi 3 is ramping, though the appetite for Intel's AI business is relatively
low. They have some installations in the thousands of GPUs, thousands of chips, but yeah, how
are they going to continue that momentum as AMD iterates through MI350, MI400,
as Nvidia iterates through Blackwell, Blackwell Ultra, Rubin, Rubin Ultra, it's unclear right
now.
That being said, Intel's desktop GPU business for gamers is going great.
They've always been strong there, have they not?
An integrated GPU, I mean.
They have hundreds of millions of integrated units, but they had issues when scaling up that design into a more desktop class, sort of in the, so 150
to 300 watt range, just for graphics.
They did launch their Battlemage GPUs for that market in December last year
at a price point that made them sell out. For the performance for dollar,
it's a price point that AMD and Nvidia just aren't focusing on right now.
So they're going great guns and that got a great response. Whether they've made any money from it,
given the multi-year investment is taken, it's unclear right now.
So AMD has a 5% market share basically in GPUs for enterprise AI.
Is that a fair, am I capturing that correctly?
Yeah, it's about that.
I did see some analysts claim that that is actually a reduction over the last couple
of quarters simply because Nvidia's sales have been sky high.
They're growing so fast, right?
Yeah, I think I did the math in that last year,
yeah, 24, Nvidia sold 6 million AI GPUs,
AMD sold about 200,000 max.
Yeah, that's a bigger ratio than the aggregate numbers last year.
Yes.
Yeah. So what do you expect at GTC?
Should we expect the 500 kilowatt rack?
Yeah.
The big thing last year, wasn't it with GTC,
that they announced the 120 kilowatt Blackwell rack.
And everybody turned around and said, well, hang on.
My Colo does maximum 12 kilowatts per rack, or 25 if it's AI,
50 if it's a new build. How the hell are you going to do 120
with a design? I think the thing is it's been a year since then we expected Blackwell and the
NVL 72 to ship four months ago, five months ago. However, it's only this week in this past week
that HPE's Trish Damkroger has said that they're finally
shipping NVL 72.
We know that they've had some delays in the packaging and in the thermals and TSMC has
even admitted this.
To get that product to date, to get it out the door essentially, Supermicro also confirmed
that they're now shipping.
Yeah, I've seen it this past month.
I think Supermicro and Dell were a little bit ahead of HPE,
perhaps.
But it's all been within the past month, let's say.
So yeah, and the next one on that roadmap
is obviously Blackwell Ultra, the higher memory capacity
version.
That's expected to increase the power per rack as well.
Though what's interesting, and I wonder if Nvidia will talk about this at all, or whether
we're still a year away, is that our friend, Zilumpa12 at SemiAnalysis, posted numbers
for how much power a Rubin rack and a Rubin Ultra rack.
I believe the numbers were around 303 kilowatts a rack for Rubin and 819 kilowatts for a Rubin Ultra rack. I believe the numbers were around 303 kilowatts a rack for Rubin and 819 kilowatts
for Rubin Ultra rack.
I see. Okay. So on average 500.
It's I mean, Jensen was right last year when he said if you're going to go build an AI
factory today, you're going to populate it with what we have available. And next year,
you'll just buy some more of that. And the year after you buy some more of that, you'll still keep the stuff you have.
However, when it comes to building out the infrastructure, we know that Microsoft and
Meta and Elon and his companies and all the other CSPs, they're looking for power, they're
looking for space.
I believe at one point Meta mothballed a new data center that they were in the process
of building because the architecture of the data center was already too old by the time
the data center was half built.
Interesting.
So, in order to optimize for power and data and throughput and all the other metrics that
data centers need, they decided to pull the building down and start building a new one more designed for computer of the future.
So do you buy this notion that in a post-Morse law world, you just stay with the gear that
you bought and you're not going to salivate over the latest, greatest, even though it
might have more of this and less of that?
Well, I think a limiting factor is can you even get hold of the latest and greatest?
With the big companies right now committing to, you know, Microsoft is $80 billion of CapEx
and other CSPs are either approaching that number and China's doing what China does
and that's another flying the ointment of what's happening over there. The lead times for Nvidia hardware right now, the latest I heard was around 52 to 60 weeks.
Yeah, that's for Blackwell.
Yeah, but if you want an AMD system, it's half that.
If you want an Intel system, you can get one today.
The question is, is the market moving quick enough and can companies be more agile enough to
leverage the benefits of this silicon or sit on their hands and wait? And I don't,
and FOMO is massive right now, nobody wants to sit and wait. Right, but I mean
that's one side which I agree and I'm we see that as well but there's also the
notion of at some point where is the beef? Is this stuff making
me money? And is presumably AI really isn't making money. I think Satya was quoted as
saying we need to some kind of a cautionary tone anyway. How do you see that? Is this
just going to be formal forever because?
Yeah, I hear what you're saying. And it's a question that investors ask me from time
to time as well. Does AI as a service really exist?
And is it just a race to the bottom in terms of costs, but then also a race to the top
when it comes with compute with agenting workflows with larger models and such.
And the way I frame it is, is in two factors, the main two ways of generating revenue.
One is simply scale.
If you're, I mean, we're taking
research out of the equation here and just looking at your revenue of what's available.
If you look at OpenAI and their subscription model, it is making money, but the company
is spending so much money in research, it looks like they're not making anything. But
they're making billions of dollars with their infrastructure.
Obviously they've obviously had to build out billions and billions and least
billions to get there, but at a point over time, just in the same way that I
guess the Uber effect, you come in, you undercut the competition, you provide a
better service, then when everything else disappears, you're the only game left in
town.
Yeah, it used to be called dumping.
Yeah.
But scale is a wonderful thing when you have a fixed cost enterprise because you
can just leverage your margins and just keep growing and growing.
The other side of the coin is highly focused and targeted AI and machine learning.
And the one I usually quote for this is the consulting method. If you can go to one of your clients
as a consultant, improve their productivity either by directly increasing throughput or
reducing baseline costs. Say you save a company $5 million in HR, for example, they don't
care whether you're charging them $20 per million tokens or $2,000 per million
tokens.
All they see at the end of the day is they've got a baseline savings of X million dollars.
So in reality, as a consultant company providing that infrastructure and service, you can manufacture
those tokens at, say, $4 per million, but then sell them
at $2,000 per million tokens, then you've got a massive upside to your business.
This is something that IBM and other consultant companies like that are doing really well
in.
They've got books of revenue in the billions for providing services like these in a very high margin
environment. Now, you have to do the R&D to get there to keep your costs low to give you the value
add. I think I saw a statistic that if you want a GPT-3 class model today, the inference cost is
1200 times less than it was a year ago. Wow. That's not just software or hardware or both.
So the models are better.
That's a big part of that.
So instead of GPT-3 was what, a 400 billion parameter model,
or they never released a number, but it was up
in the hundreds of billions, you can now
get the same performance in certain benchmarks
with a 2 billion parameter model.
So your compute is lower, but also the efficiency
cost of your compute is better, but also the efficiency cost of
your compute is better. So it becomes a multiplicative factor. And the thing is, what's going to
1200 X in the next 12 months? Is it going to be something like a deep seek? Is it going
to be something like a hardware? Or is, are we going to have a new software revolution?
Right now it's unclear. I thought of an analogy the other day that I might use in future content.
Back in the eighties, all we had was ASCII gaming. You know, can you do this
little cursor on the screen? Nobody imagines the immersive VR experiences we
have today that came 40 years later. So what's the 40-year horizon for machine
learning in that context?
Oh, yeah, for sure. But I mean, even that sort of a gain within 12 months
is enough to keep the formal mood alive.
Oh, of course.
Yes.
If you built a product infrastructure based on GPT-3,
then you could easily be undercut by a new entrance
today, unless you also did the transition
and ensured that you kept consistency.
Part of this is if you build a model and a business based on machine learning today,
whatever hardware you use, whatever data centers you use, you have to make sure it's state
of the art and that's what's costing money.
What does 16 years of liquid cooling innovation with NVIDIA look like?
See for yourself at Cool IT's booth 1109 at NVIDIA GTC.
Check out Cool IT Systems'
high-density liquid cooling technologies,
optimized for current and next-gen NVIDIA platforms.
Attend Cool IT's sessions to learn what it takes
to cool today's most compute-intensive workloads
and what's next for AI cooling infrastructure.
Ensure performance and reliability for your next-gen AI systems with the world leader What's next for AI cooling infrastructure?
Ensure performance and reliability for your next-gen AI systems with the world leader
in liquid cooling.
Visit coolitsystems.com to learn more.
Ian, you mentioned DeepSeq.
What is your take on that?
Yeah.
DeepSeq was a lot of hype and a lot of misunderstanding. And I guess my opinion has revised somewhat
since it was announced. Initially, I was quite downbeat saying what is everybody talking about
if this just makes it cheaper, everybody's going to use it more, which is true called
Jevons paradox, as opposed to Jensen's paradox, which is the more you buy, the more you say.
That is a paradox.
Yeah, what DeepSeek were able to do, what the team were able to do, regardless of whatever is the more you buy, the more you save. That is a paradox. Yeah.
What DeepSeq were able to do, what the team were able to do, regardless of whatever infrastructure
they had on the backend, is that they managed to implement several techniques that were
in research.
They managed to do it at an extremely low level and efficient compute. And they're also introduce a few new techniques of their own i mean the open i vp or cc executive turned around and said oh yeah no the other bunch of stuff we had we've also discovered we just had implemented in a public product yet.
Right so they were the first to market with a few techniques. They made it incredibly optimized.
As a result, we got the benefit that we saw in the papers, this whole lower cost of inference.
There is a couple of things to note here.
The DeepSeq mixture of experts model is the biggest mixture of experts model on the market.
A mixture of experts does have efficiency over a dense transformer like GPT or Lama
and some of those other open source benchmarks.
So there is a benefit by just simply having a different architecture.
Then optimizing it for a super low level.
Then they also, because they were constrained by the networking of the chips they had in
China, they had to rewrite the network stack in order to optimize communications.
Now that was something completely new, but it turns out again, if you put people
in a box, they will innovate and they innovated around the area of GPU to GPU
communication, latency and bandwidth.
And now we can apply that.
If they open up and talk about what they did, I don't believe they have yet.
That could provide new skills and new dimensions for us to focus on in the future with new models.
And the thing is the deep-seat moment isn't going to be unique.
There's going to be a ton of these going forward.
The question is whether the media and the market are going to react in the same way and each time cut half a trillion dollars off of NVIDIA's market cap.
Part of it is education and part of what I do day to day is try and help not only investors
but also engineers and other analysts understand what's going on such that it isn't just a
one and done situation with things like DeepSeek.
They're still going to overreact.
I think that's another characteristic of the market.
Investors will.
Well, right, because they can't arbitrate.
They don't know and they have to be.
There's two types of investors.
There's the tech investors who play against the tech and then there's the market investors
who play against the market. The then there's the market investors who play against the market.
And the market investors will be fooled every time.
And then you buy the dip on the dip and all that stuff. So looking at GTC, being there last year,
and it was not only just so crowded and oversubscribed, you might say over attended, And I've never been to a keynote that filled at a
professional sports arena. And it was almost euphoric. Do you think that same sense will
be in place this year? I kind of doubt that. But also, what do you think will be the driving,
maybe the big news coming out of the conference? I expect the same thing to happen this year.
Jensen's profile has only increased.
Last year it was at the San Jose Convention Center,
which doesn't fit 20,000 people.
This year we'll probably get more than 20,000 people attend.
Jensen has done other large keynotes in stadiums of a similar nature.
We saw that at Computex, we saw that at CES,
and they just keep getting bigger.
It reminds me back in, I think 2014, 2015,
I was at a Samsung keynote at Mobile World Congress
where they were doing one of their first virtual reality
or glasses, and they gave everybody in the stadium
one of these headsets to play out part of the presentation.
When everybody took the headset off, Mark Zuckerberg was on stage and immediately a
bunch of photographers, mostly from the Asian press, decided just to bum rush the stage
to get a picture with him.
That's what Jensen has right now.
It's that cult of personality.
It almost doesn't matter what he's going to announce. That being said, one of the big things I think people are looking forward to is actual real world
results on Blackwell and showing what markets is going to be used for. I have a strong feeling
this conference is going to be heavily robotics focused. Jensen has been a champion of putting
AI into robotics.
It's the perfect example of an agentic mixture workflow that brings together many, many compute disciplines.
So I expect to see another bump on that side.
Because it's an embedded market, it will be a slow burn compared to some of the other large language model type announcements.
And then it's just going to be a case of building the partnership,
building the ecosystem. As we're recording this, NVIDIA's financial end of year financials are
tomorrow. I was actually going through my numbers and they've been on a revenue increase like no
other. Incredible. But it's also consistent. And if you look at their what they're predicting for the what they're going to announce tomorrow when this is
recorded, it's still the the CAGR is there, the annual growth
in the revenue is there. So come GTC, there may be a bit more
emphasis on networking. Actually, today we saw an
announcement of Nvidia with Cisco. So there's going to be
some custom Cisco silicon and embedded in Nvidia Spectrum X ethernet switches.
And Cisco is going to be the only partner
to be able to do that.
At CES early this year, we saw Nvidia Digits,
this small little ARM Core plus Blackwell GPU mini PC
for $3,000 designed essentially the Mac mini, but Nvidia.
The Mac mini, that's right.
And that is actually co-designed with MediaTek, funnily enough.
So I expect to hear more about what they're going to be doing for developers because that
may very well be the first time a lot of developers experience Blackwell through one of those
systems.
So there's going to be a lot going on.
I think the venue is insufficient for the amount of people that will attend easily.
So it's funny, last year I got a tour of Nvidia's campus and there was plenty of room at Nvidia's
campus.
There was plenty of what?
They also have had a very enlightened view of remote work, unlike many companies that
are repelling their best and brightest by demanding that they show up.
I think they've been pretty relaxed about it.
Well, whenever I hear a CEO or a boss complain about working from home and back to the office,
I just turn around and say, you do realize there's a $3 trillion company that still has
100% work from home policy except for certain specific roles.
But other than that, it's pretty much all 100% work from home. Yeah.
Or exactly, or whatever you want, right?
And that I think has worked,
and they are able to attract the best and brightest,
and they execute like crazy, so clearly that's the case.
I mean, the benefit of working at NVIDIA right now,
I think, is the stock options.
Because when I did have that tour,
I did ask my guide about what benefits
do employees have. And apparently Jensen has this policy, you only get two things for free
at Nvidia, the air and the caffeine. So you can get a coffee anytime you want, but at
the canteen you still have to pay for.
Yeah, I think their canteen is subsidized but not free. Yeah,
sure, sure. So there's some benefit there. And of course, the office is beautiful and the food is
great and the environment is fantastic. And maybe it's a good time to talk a little bit more
specifically about software, both AI software, HPC software, etc NVIDIA obviously is expanding into software. Initially, it was just CUDA and libraries that went with it.
Then it was Omniverse.
And then it's now Enterprise AI.
And they've been making acquisitions.
They talked about NIM last year, which was quite fantastic.
And I think it accelerates application development
by quite a lot.
Where do you see that going next?
It's funny because a friend once told me, or maybe this is a saying in the industry, I can't remember.
Hardware company can't sell software and a software company can't sell hardware.
However, Nvidia is the exception.
They're building software into its own revenue model.
But I think the big thing here is that it essentially
guarantees vendor lock-in.
This is something we criticized Intel for decades about,
especially when it first introduced AVX 512
into its processors when AMD had no equivalent.
If you can build a software package,
build a set of libraries, build a service
that is reliant purely on your hardware, and then you can go sell that, you can lock in
a customer for five to 10 years because the effort needed to get out of that is double,
triple, 10x.
On the Nvidia side, we've often heard stories that Nvidia is now 60% software engineers, 65% software
engineers and I believe it.
I often state that one of the reasons that AMD is playing catch up is the fact that their
portfolio is so much wider from a hardware perspective.
But if you look at Nvidia from a software perspective, they want to play in every vertical,
every layer of the stack.
They want to be your best friend because if they're your best friend, then you'll use their
hardware and that's where a lot of the margin is. Yeah, right on. I mean, they have their own
everything. They have their own driverless car software, their own LLM, their own weather
forecasting. And I think they have a really interesting model
to say that come learn from us.
This is like a reference architecture.
And even their DGX is more of a recipe,
because as far as I know, you can't actually
buy it directly from Nvidia.
Yeah, I mean, name one other company
that feels as vertically integrated as Nvidia.
And you'd probably say Apple.
You'll see a different market.
Yeah.
I mean, one of my claims to fame is that I took the first ever Cuda course outside the
US.
Back in 2009, they gave a bunch of money to the mathematical finance professor and a few
GPUs and said, go teach a course. So I went and the guy was really smart, but felt almost felt like he was a
couple of chapters ahead in the teaching manual every time.
So when I, when I went to the same course, the second year, it was much more refined.
And, you know, that just goes to show that if you, NVIDIA placed a bet on education,
That just goes to show that if you, Nvidia placed a bet on education, enabling Cuda and to a certain extent, PTX in the HPC community that long ago, it comes down to you never
know what the, what tools you create will be used for.
And then this magical workload called AI and machine learning came around and has blown
the top off their market cap
and everybody else is playing catch up while Nvidia is doubling down to stay ahead.
I actually was going through Nvidia's financials in preparation for their new earnings and
actually as a percentage of revenue, their spending on R&D has gone down considerably
over three years.
And that may be because they're making too much money and can't spend it.
Yeah, the revenue has been going up so much. It used to be somewhere in the 30% and now
it's about 10%. So yeah, 10% is already like higher than industry average.
Yeah, and it's more revenue than its competitors. In absolute terms, it's huge, of course.
And in absolute terms, it's huge. That's right. Now, how about the next tier of GPU providers who are doing really cool
stuff, Cerebrus does and Samba Nova.
And we talked about Graphcore having been acquired by Softbank in our pre-call.
Grok seems to be doing really well.
Yeah.
I've been tracking the AI hardware startups quite a lot over the last few years.
Oh, you kind of come up with vendors that nobody knows even exist that they're doing
stuff.
Yeah.
Yeah, I think my list is over 100 these days.
But I track around $17 billion of investment across the companies that I target.
Most of them are AI.
Some of them are very much AI adjacent, like optical connectivity companies.
But last year we had $3 billion of new investment in those startups.
Grok had $690 million in a Series D,
Tenstar had $693 million in a Series D,
Lightmatter had $400 million,
IR Labs had $155 million,
and a bunch of other smaller startups had startups had anywhere between 20 and 50 million.
We've had some, a couple of exits, as you said, Graphcore was sold to
SoftBank, that's effectively an exit.
Blaze, which is an edge AI company, did a SPAC, a sort of reverse
acquisition, which doesn't tend to go down too well, apparently they only
had $15,000 worth of hardware sales or $50,000 worth of hardware
sales previous year.
We saw two Korean companies merge.
We're starting to see some consolidation basically, but we're also seeing an
increased rate of investment, which is really interesting.
And among those players, we've seen one or two companies start to get revenues
that match their investment rate.
The one I often highlight is Cerebris.
They've essentially generated a billion dollars worth of book revenue, which is going to filter
in over time to go beyond what their investment is.
Now you could argue that purchase was made by one of the investors, but revenue is revenue.
Is that the G42?
Yes, that's the G42.
Grok announced that they've got a $1.5 billion arrangement, again, with another Middle Eastern
deployment company.
Though it's unclear whether that's direct revenue or that's the value of the entire
project for everybody involved.
I haven't spoken to the CEO about that recently.
I need to, but we're getting into it.
We're getting into a spot now where a lot of these hardware startups are in
the sort of five, six, seven, eight years old now.
They've had a burn rate.
They need to start producing.
They need to start getting some pickup.
And to be honest, most of the CSPs are engaged with most of
them. Those who are data center focused at least, because everybody wants a finger in every pie,
just in case. If I take the investment into IR Labs, and so IR Labs is an optical connectivity
company. They're developing point to point optics for GPU or AI chip to chip, but also chip to memory or
system to system. So, you know, shorter range optics for the data center. So instead of having
an eight way GPU cluster, you have a 512 way GPU cluster. Their funding rounds, the biggest investors
there were AMD, Intel and Nvidia. For light matter, it was Google ventures.
Right.
Yeah.
So everybody's getting involved in as much as they can, at least in a super
small way, because even if they're only offering a financial, not a technical
investment, if it gets to the point where it matters, then they can come in and say,
Hey, actually we want to have a technical engagement with you.
And so that's also fueling a lot of the growth in the startup market today.
Exactly who will win?
I mean, I could pick apart pretty much every company we've mentioned, plus a few
others about where their strengths and weaknesses lie.
I think over the next 12 months, we're going to see a number of companies come
out with new chips, which will make or break those companies. So for example, Grok currently
has a Global Foundry's 12 nanometer chip that's quite power hungry, and you need racks and
racks to do a 70 billion parameter model. But the next generation is in Samsung 4 nanometer,
and I believe we'll have stacked DRAM.
It's that sort of thing that's going to be coming out.
And then you get folks like SambaNova, Mythic, LightSolver, and that causes me to ask you
both about FPGA and you mentioned Analog in the ISCC conference and MixSignal.
analog in the ISCC conference and MixSignal, where do you see that there's like a lot of potential there for especially this kind of a long pipeline execution that LLMs can be
and some people are using it Grok too, really effectively. Let's talk a little bit about
that.
It's funny enough, I'm actually meeting up with Sandbun over tomorrow because I haven't
actually met their technical team yet.
However, I've attended so many of their presentations over the years.
It all comes down to non von Neumann architectures, I think.
Right now with the standard computing paradigm, there are limits on power consumption and how much energy per operation, how much energy per bit in data transfer, especially with
the movement to chi's in the industry you can't move data between check list without taking a power penalty compared to a monolithic.
Please select and so there's a lot of.
You are innovative ways in not only the compute but also the memory and the packaging and that's why we're also getting a number of these startups coming to play.
Analog comes in two forms in this context. What I saw at the conference was mostly connectivity,
analog, just getting the lowest energy per bit transferred across the connection, whether
that's electrical, whether it's optical, whether it's UCIE, but then also maintaining high
bandwidth. So it's not only 50 gigabit per second, but 100 or 200 or beyond.
And whether that's NRZ or PAM4 or PAM6, I saw optical QAM16 as well being trialed at
the conference.
So a lot of that's going on.
When it comes to analog compute, I mean, this is a relatively old idea,
older than I've been alive, 40, 50 years.
And one of the key issues with analog
is just maintaining manufacturing consistency.
Analog has had a number of successes in the edge market.
And the failures we hear about are
when people want to do analog in the data center at scale
with these large LLMs because we
haven't figured out how to scale these designs or have the right level of digital to analog or analog
to digital converters. It's like quantum computing. It requires a full ecosystem
in order to support it. So right now we're seeing ideas that have merit but don't have necessarily the infrastructure to scale in the same way a standard GPU or a standard digital computer might.
Now will that change in the future I mean I say analog computing is optical computing, all of these are in one state or another and
everybody's trying lots of different things. And a lot of it is in base research right
now. And what happens, like for example, with Mythic is that eventually one of these academics
gets enough money to go and try and make a commercial device. And if there's one thing
I've learned with academics is that if they
don't have the right team around them to go to market, they end up six years into a startup
with nothing to show for it. Great research, no product.
You have to actually engineer it.
You have to actually sell it.
Ian, you mentioned IR Labs. We do cover photonic I.O. if that's the right term, silicon photonics.
How far from commercial readiness is that technology?
Yeah, a lot. It's great. I actually had a conversation with AMD's Sam Nafziger about
this at the conference. He just happened to be in the demo room and I called him for 20
minutes, because it's a question that comes up a lot when I speak with investors as well.
I mean, so part of the problem right now is simply scaling up manufacturing.
We have technologies that do point to point optical data transfer.
We have it in chiplet form, we have the lasers, whether you believe we have it at the right
power or the right bandwidth.
For example, IR Labs is four terabits
per second, so they're going up to eight terabits per second. Celestial AI believes that they
can do hundreds of terabits per second. I believe Litbu just invested in them or joined
the board as well.
Though, when I was speaking to these companies, they talk about having an enablement in the sort of late 26 to 28 time frame. So essentially at least two years out from today.
However, one of the big complexity parts of this is that optics is an extreme cost add.
So you're only going to do it to your high margin parts. But if you do it to your high margin parts,
you have to put in a lot of R&D effort long term to make sure that you trying this new technology, it's stable and it works.
So it came to mind that would a company build both an optical and an electrical version
of their design? Is this something that chiplets helps enable or helps hinder. I don't expect any company to build a 100% optical product without
having an electrical backup, a standard point-to-point electrical backup. I spoke to Sam from AMD
about this and he's of the opinion, yeah, until it becomes a proven mass market adopted
technology, people will hedge their bets. And that makes sense.
When speaking with IR Labs, like I say, they believe they've got the technology down pat.
And part of the buildup, this is why they've just done their funding round, is scaling
up the manufacturing and ensuring manufacturing consistency and yield.
I mean, Nvidia is not going to put it on a product if you can't produce a million chiplets
a month.
If you can only manufacture 10,000 today, that's not enough.
Nobody's going to take a gamble.
Jim Keller, our good friend and CEO over at Tencent Torrent, he told me that when you
are a startup, you have to approach the industry as you can't just go straight after the million
GPU systems. You have to start at the hundred and say, yeah, can I build a hundred? Let's go after
customers who want a thousand. Can I build a thousand? Let's go after customers who want 10,000.
Optical seems, because of the cost involved, but also the potential upside,
it's something that has to jump up to at least mid to high level
first before it filters back down
and gets manufactured at cost.
It drives down the cost over time.
IRs, aren't they partnered with Global Foundries,
I believe?
Yes, I believe they're using Global Foundries
45 nanometer optical technology.
To be honest, that's really the only one in the market
that makes sense right now.
TSMC has a COOP, which is their co-packaged optics.
I know IMEC has like a 200 nanometer optics line that's
used by companies like QANT as well.
And Intel also has an optical side to their foundry.
They haven't spoken about it much yet, though.
Well, they had a famous demo that Pat did early on with their silicon photonics.
Yeah, that was an acquisition they made with a company in Scotland showing pluggable optics.
Because Intel is an investor in IR, I think that may be the same technology that IR is using.
I see. Then there's Light Matter and Avicenna and a few others, right?
Yeah, Light Matter is different. They're going after photonic interposes. So you can take
your standard AI chip today and instead of using a silicon interposer, use an optic interposer,
save power, increase bandwidth, make the design, quote unquote, simpler. And apparently they've
productized it for one of their customers already. They showed a demo of this at Supercomputing24.
Aversina is different because most optic solutions
we talk about today, they multiplex wavelengths
of light over a wire.
Aversina are going after one specific wavelength
of blue micro-LED light and having bundles
of hundreds of fibers. So you lose the complexity of
having modulation and thermal variance because all you're relying on is a blue
LED and then a detector at the other end. But you have to take into account the
fact that it's not as area efficient as... Interesting. But it's a simpler design.
But they are years behind the others, as far as I'm aware.
So I have just one more question, but it might involve a long answer and we don't want to
take your old day.
No worries.
So my question is, AMD in the GPU market, what are their objectives looking out three
to five years and do they have a path to get there? What is their path against this juggernaut Nvidia?
Yeah, it's hard to compete against somebody who can both outspend and out hire you.
So you have to be innovative in every which way you can. Recently, Lisa did a fireside Q&A
with AMD's analysts to answer a bunch of questions, post the financial earnings, but also
on the scope
of 2025 and beyond. Everybody's wondering what AMD's goals are. Basis of your question,
really.
She'd come out with a few things that are worth highlighting. First, I guess, is that
the nature of machine learning is being driven towards system-level design, not chip-level
design. We've heard about design technology co-optimization for designing the chip, but when you have
system technology co-optimization, maybe the design of the system can be optimized for
specific workloads.
So in this case, we've spoken about NVIDIA's Blackwell NVL72 full 120 kilowatt racks, AMD will be moving towards that level of design come, if not
MI350, then MI400 over the next year or so.
There's part of the reason why AMD acquired ZT Systems, their $5 billion acquisition last
year, which incidentally, once they complete, they're going to divest from ZT Systems manufacturing
facilities, and that will recoup about $4 billion.
So they hide it for the expertise and the people and the design capabilities they're in,
but not necessarily the manufacturing. That's interesting.
One of the other things that 2025 for AMD, they're going to focus a lot on training.
So 2024, they made big strides in inference workloads in machine learning.
So much so that OpenAI actually uses AMD's hardware for inference on GPT-4,
because it's 10% cheaper than running it on Nvidia.
The problem, however, is that training is where a lot of the sales are today.
This is what we've seen from the CSPs with their big clusters.
So in order to get on that,
AMD needs to make concerted software effort
to support training workloads
and optimize it to perform better the competition.
So 2025 is going to be the year that they're going to do that.
They have customers who have been using the hardware
for inference now for over a year,
and those customers are now confident to turn around to say to AMD, what about training?
AMD is committed to working on training as its main software focus for 2025. Beyond that,
obviously, dual-prong strategy for as much as training is incredibly relevant.
for as much as training is incredibly relevant.
AMD also committed to providing a developer cloud.
So the ability for developers to go and test the latest and greatest hardware without having to buy it.
Intel has had this for a number of years now,
especially key customers who want to trial run their new software
built on future architecture designs, but need, quote unquote,
behind the scenes access before installing it. software built on future architecture designs but need quote-unquote behind-the-scenes access
before installing it.
So AMD is going to commit to going down that route.
They did say it's not going to be necessarily an in-house developer cloud, but they will
leverage their partners to provide resources.
AMD is covering the cost or offer it as a service going beyond in the same way that
Intel does.
And realistically, Nvidia has just relied on other clouds to do that right now.
In terms of future looking revenue for AMD, one of the big comments from 24 is that AMD
kept upping their revenue estimate for their AI hardware from one billion to two to four to five,
and it ended up at five plus.
Everybody asked, can you provide us same number for 2025?
And Lisa, AMD CEO said no,
which really annoyed all the investors,
but made the rest of us laugh.
AMD's of the opinion now that it has scale,
it has confidence in the market,
and it's going to leverage that
in order to go
after customers.
Lisa actually made a big comment in the back and forth about, is the future of the machine
learning market GPU or ASIC?
Right now, we have a lot of the cloud service providers building their own silicon, not
only ARM CPU based CPUs for the head nodes of their compute, but also AI accelerators.
We think we've got Microsoft Maya, we've got Graviton 4, we've got Google's Axion,
we've got Tranium, Inferentia. I've always said, it doesn't take much to build an average chip,
but if you want a really, really good chip, you need a design company
in charge of doing it. AMD believes they have the expertise, but Lisa cited DeepSeek in this.
She said the first thing that everybody wanted to do when DeepSeek came out and was open-sourced, open-weighted, is run it on their hardware. GPUs, her perspective are infinitely flexible for doing that.
Asics not so much, and that's why you saw a lot of DeepSeq runs on GPUs before it got
onto Asics several weeks later.
From her perspective, it's also the fast iteration that the market needs is built upon GPUs.
Not only that, but their main customers are the ones built making their own silicon.
And it ends up entirely being workload dependent from her
perspective, but the market will be a rich mix of all these other
hardware, all these other hardwares and architectures.
And that's also where a lot of the startups we mentioned earlier
are trying to fit in.
Everybody wants a piece of everything right now, But AMD is going to continue on the cadence.
They've spoken about essentially their architecture developments through to
2026 going beyond that, there's questions with AMD ever use Intel's process or
Intel packaging is, you know, what's beyond HBM when there's optics fit into
this and right now I think beyond 25,
beyond mid 26 is a bit of an unknown right now.
Fascinating, great discussion.
Excellent, thank you Ian.
Thank you for making it happen.
Thank you for staying late in the evening
with the time difference that we have.
No worries.
Great conversation, looking forward to seeing you in person.
It's, I could talk about this stuff for hours
as you can probably gather.
We too, right?
I mean, we should do one of these like five-hour sessions then.
Very good.
All right, well, thanks so much.
Excellent.
Thank you for having me on.
Take care.
Until next time.
That's it for this episode of the At HPC podcast.
Every episode is featured on insidehpc.com and posted on orionx.net.
Use the comment section or tweet us with any questions or to propose topics of discussion.
If you like the show, rate and review it on Apple Podcasts or wherever you listen.
The At HPC Podcast is a production of OrionX in association with Inside HPC.
Thank you for listening.