@HPC Podcast Archives - OrionX.net - @HPCpodcast-101: Live from Nvidia GTC25
Episode Date: March 21, 2025In what is becoming an annual tradition, we are "Live from Nvidia GTC25" AI-everywhere show. We cover everything from industry landscape to Hopper to Blackwell to Rubin and Feynman, plus silicon phot...onics for cluster interconnect fabric (star of the show, really), the complexity of inference for customers, low-end systems, power and cooling (did we hear 600 KW per rack?), software including cluster-level AI-workload-focused open-sourced Dynamo "OS", and storage (the semantic kind). [audio mp3="https://orionx.net/wp-content/uploads/2025/03/101@HPCpodcast_ID_GTC25_Nvidia-Feynman_20250320.mp3"][/audio] The post @HPCpodcast-101: Live from Nvidia GTC25 appeared first on OrionX.net.
Transcript
Discussion (0)
Transform the industry and I think they recognize this moment in time that AI is a once-in-a-lifetime
kind of a technology shift and they're in a position to drive it.
Which will mature from generative AI to reasoning models which require more compute of course
and agentic AI which we're moving toward right now,
and so on toward physical AI.
Inference is going to be online transaction processing
for AI.
By 2028, AI CapEx, AI infrastructure CapEx,
will hit $1 trillion.
From Orion X in association with InsideHPC, this is the At HPC podcast.
Join Shaheen Khan and Doug Black as they discuss supercomputing technologies and the applications,
markets and policies that shape them.
Thank you for being with us.
Hey everybody.
This is the At HPC podcast.
I'm Doug Black at InsideHPC and with me is Shaheen Khan of OrionX.net.
We're speaking to you live from the GTC conference in San Jose in Vidia's big
conference that's really become one of the most important tech conferences of the year.
This year's conference is estimated 25,000 attendees and it has really overwhelmed San
Jose's capacity for a conference of this scale.
I mean, try getting an Uber, try moving around the conference floor.
Forget getting a hotel room if you didn't get it more than a month ago.
It's become an amazing annual event in the tech world.
First thing at the seams really.
And this year they've taken over a couple of blocks of the city itself and other adjacent
buildings to the convention center.
A lot of the meetings and talks are taking other adjacent buildings to the convention center. A lot of the
meetings and talks are taking place at hotels around the convention center as well. It's bigger
than SC24. SC24 was a lot bigger than SC23. So between the two of them, we can do a nice coverage
of HPC and AI and quantum computing and other advanced technologies. We're hearing estimates of renting booth space
that are really eye popping.
Yes.
One vendor said $375,000 for their booth.
It's almost non-believable, but there you have it.
Well, you have to remember at the end, it is a corporate event.
And if you want to be a platinum sponsor,
there will be a lot of takers for that with very deep pockets.
So that shows.
Well, it just points up that Nvidia and Jensen Wong, the CEO of Nvidia, are really at the
top of the tech pyramid in a lot of ways.
And what they've accomplished is just amazing.
My sense, Jaheen, is this could be very subjective on my part.
Last year seemed more of a party or more euphoric in some ways.
I mean, Chachi PT, the generative AI revolution
was really getting very wound up. This year, it seems a bit more pragmatic, more moving
toward implementation, deployment, solutions, infrastructure, along with, you know, next
generation ships and compute. I don't know if you shared that perspective.
I believe a lot of that is also because of the business focus of the conference.
In addition to the technology and the algorithms and the models and the bits and bytes, there's
also the business aspect of it.
How do you deploy it?
How do you pay for it?
How do you save money by it?
How do you make money by it?
That is a lot more prominent in a show like this compared to the traditional HPC shows.
That brings a different kind of crowd.
I think it adds a different kind of energy.
Everybody feels like they're at the early days still of the AI global revolution.
I think they've covered a lot of the bases.
They didn't have the eye-popping announcements like they did last year with Nim and with
Enterprise AI and the older focus on software.
A lot of that is just maturing
now and it's getting better. So it was very much in the background, but it wasn't like
a major announcement.
Yeah, probably the centerpiece of the conference is Jensen Wong's keynote address, which was
yesterday morning, two hours. And by the way, two hours before the event, I was right near
the SAP center, which is San Jose's professional hockey stadium, hold 17,000 people. Two hours before the event, I was right near the SAP Center, which is San Jose's professional
hockey stadium, holds 17,000 people.
Two hours before the address, just hordes of people streaming toward the stadium.
I can't get over how impressive that was last year and it still is to me this year.
It's Jensen and Taylor Swift.
They're neck and neck.
Yeah.
There we go.
So I guess a headliner statement from Jensen, he's talking about looking ahead up the scale
of future generations of AI innovation that compute will need to improve by a factor of
100 to meet the needs of future workloads, which will mature from generative AI to reasoning
models which require more compute, of course, and agentic AI, which we're
moving toward right now, and so on toward physical AI. And the other kind of eye-popping number he
threw out was that by 2028, AI infrastructure CAPEX will hit $1 trillion. Of course, a big part
of what we targeted at Wall Street and explaining what he's doing
and why the mention of a trillion dollar investment that is coming for data centers around the
world is obviously aligned with that.
I thought it was nice that they put a roadmap together for AI saying that we've gone from
the early days of pattern recognition and to retrieval-based AI to generative AI to agentic AI and ultimately to
physical AI. And that is a nice way of putting it together. Some of the breakthroughs that will
happen will probably end up on that roadmap as well. And they may not quite match that,
like generative AI matched with large language models. Agentic appears to be matching
with reasoning models. And those are some of the points that he made to make the point
of why computational requirements are going to continue to go up. Initially 100x, but
I think he's going to say that you're going to need another 100x after that.
There's no end. And that's the core of their business. I mean, Jensen is broadening the definition of NVIDIA beyond just a chip company. We're a chip design company. They want to be the
AI factory solutions deployment company supporting customers in a number of different ways. But of
course, citing a factor of 100 times more powerful chips that will be needed gets right to the core
of their business. So it works very well and the CapEx projections, by the way, you can see his entire keynote address on
YouTube, but if you don't have time for that, he notes at the beginning that he
was speaking without a net, as he put it, no script, no teleprompter.
And he spoke for two hours and his energy and his evident passion never flagged.
But it also really showed that his fingers are in the works
throughout this company.
He's fully invested and involved
in product development, product strategy.
He can dig down into any number of different ways
you want to go with him.
It was a very impressive presentation.
That's a really good point,
that he seems very much on top of everything,
from the details of the technology
to the general strategy for the company and where the market is going and staying a few steps ahead of everybody
else because they've had a front row seat to evolving technology and doing it all as
he's become more and more of a celebrity.
And we've talked about that before, including last year.
I think he continues to be a masterclass on how to be a celebrity without loss of authority, without loss of credibility, while keeping his warmth. All of that is
really hard to do. And he projects that quite well.
Yeah, a little bit of self effacing humor there. He'll poke fun at himself
sometimes. And there's some humility there. So, you know, he's strong, a pretty
good balance with his public persona.
Yeah, yeah, I think a lot of it is just being real that comes across as being very important for the
company.
Now, of course, not to exclude a lot of other companies in this business who also do a very,
very good job of that, but not all of them have a $3 trillion valuation.
No, not all of them.
Now beyond some of those market projection numbers, major piece of news was unveiling
the successor to their Blackwell
flagship chip. This will be the next generation Rubin. Vera Rubin will be the combination of
ARM CPU plus the Rubin GPU scheduled to be shipped the second half of 2026, slated to deliver 14
times the compute power of Blackwell. Pretty impressive. The next generation will also incorporate
silicon photonics, a very important piece of news here. Bottom line is a 10%
power savings, but it's also moving toward a silicon photonic fabric faster,
more efficient, less cabling, and a major step in the evolution of advanced
computing. As you mentioned, they have successfully transitioned
from being a chip company to a system company
to now what comes across, and I believe they say so,
as an AI infrastructure company.
And that is also testimony to the unbounded ambitions
that they have to transform the industry.
And I think they recognize this moment in time that AI is
a once-in-a-lifetime kind of a technology shift and they're in a
position to drive it and it will impact every aspect of computing and they would
like to lead that and they have been doing a great job of doing that.
The first half of his talk to me was really an explanation of if not a
justification for the strategy that
they've had. Nvidia has been unique in building really big fat GPUs that are in
big fat SMP, shared memory, symmetric, multi-processing systems and racks and
then combining those into clusters. So he made a point that you want to scale up
before you scale out. He, as you mentioned, talked about how inference is going to require a hundred
times more compute. And unlike what a lot of people thought that training is really
what's hard and then inference is going to be a lot easier and cheaper to do. And he
essentially was saying, no, it's not because because for inference you wanna get the right answers,
not just the fast answer.
So you want it to be right and fast.
And to do that you need to look at a lot more options
and reason through them and do it in real time
and 10 times faster.
So if you need to look at 10 times the tokens
and 10 times faster, that gives you the 100.
And that's just for
the next step. And then he spent a lot of time talking about tokens, tokens, more tokens.
So really tokenomics, as I would call it, was a big piece of this, that the number of
tokens you generate per second is one metric, but then the accuracy of them and how they're
going to be synthesized for an answer is another another and you'd like to get both of those.
And that entire equation is very consistent with what I've been saying for a long time now,
that inference is going to be online transaction processing for AI.
And online transaction processing, otherwise known as OLTP, is what put mainframes on the map.
It was transaction processing. The economics and the mathematics of transaction processing
is really different from HPC.
And it is harder because it has to do
with the number of simultaneous users
and mission criticality of the application
and response time.
And every click is going to cost you
a certain amount of money.
So it better make that much money.
So what he was describing really is consistent with that.
Well, talking with people today about moving into inference, one of the distinctions of
it is it changes dynamically as demands on the system change.
So training is a workload and can be done when you want it to be done.
But once you get into inference, you've got, as you say, an online system that needs to
be there readily available whenever it's needed.
It just introduces all different level of complexity.
That's right.
That's another dimension of them explaining why their strategy is the right strategy was
to also say that when you do inference, you're reliable to do a whole series of spectrum of things. And he had a slide with a spectrum of applications
and like a rainbow coalition of applications. And each one of them is going to have a slightly
different requirements for it to be optimized and that you need a chip like Blackwell that
can do all of that.
Now, I'm sure the competitors to NVIDIA are going to say, no, I can actually optimize it
for the inference that I want and I can be better, faster, cheaper. They're going to say,
what about power and cooling? All of this stuff is really great. But when I look at the roadmap,
it's going to be a difficult challenge for the customers. So there will be that. But I also
think that they did a great job of explaining why they're pursuing the strategy that
they're pursuing. And then the second hash was really a description of the roadmap
and some of the new capabilities that they're launching. Now as you mentioned,
Silicon Photonics, now that is really an interesting thing and I believe it was
probably the star of the show, believe it or not. The idea of using Silicon
Photonics for scale-out interconnect, because if you're staying within the it was probably the star of the show, believe it or not, the idea of using silicon photonics
for scale out interconnect, because if you're staying within the rack, copper will do and
it's quite optimal for what you need. And I know that there are a lot of projects with
startups to try to bring optics to shorter and shorter and shorter distances all the
way to the silicon interposer. And we talked a little bit with Ian Cutress
in our last episode.
But the idea of looking at a really large cluster that
requires hierarchical switching and therefore benefits
from really big switches.
And with NVL72, they've got a 72-way switch.
And if you want to do 100,000 GPUs or a million GPUs,
you're going to be pretty hierarchical.
So there's connectivity between the switches and every port of every switch is going to be occupied.
And converting electronics to photons and back, the transceivers that are required,
is both energy consuming and dollar consuming. It's expensive and it's hot.
And if you can eliminate a good bunch of them, you really have put a very big dent. So they now have a way of putting
a transceiver directly connected to the GPU. And that piece of it happens to be also the
part that is liable to show failures over time and is replaceable. And the other end of it goes to the switch
and now you save a bunch of transceivers.
He did a bit of math that a GPU connects to a switch,
a switch connects to the other switch
and that connects to the other switch
and every connection has two transceivers on both ends.
So for each GPU, you can eliminate six transceivers
and then six times a couple of thousand dollars
each times 10 or 15 watts each. That all adds up to significant savings. And of course,
he's going to want you to put that all towards additional GPUs.
And I think it brings optical interconnects to a new realm.
Yeah. And this podcast is always interested in Silicon Photonics as part of the next generation of accelerated computing and much needed provider,
supporter of more efficient computing, cooler computing. So right on. It would be interesting
to know who is manufacturing this. And of course, the expectation is that TSMC and it is indeed TSMC.
Now, traditionally for optical components, it's global foundries that's doing that. We've seen that with some of the
photonics quantum computers as well as some of the photonics companies that we've talked to.
This one is manufactured by TSMC. It is part of what they call CUPE, C-U-P-E, and that stands for
Compact Universal Photonics Engine, which TSMC went public with about a year ago.
And by the time it shows up in Rubin, it's going to be second half of 2026.
Yeah.
He also moved toward robotics.
Ian Cutress, our guest last week, looking ahead to this GTC, predicted
a strong focus on robotics.
Jensen announced a strategic alliance between NVIDIA and General Motors to support GM's autonomous car development,
really making a car into a robot, as well as partnerships with Disney and Google's DeepMind called Newton that can simulate physical systems.
The other thing they talked about was their low-end workstation looking box digits Spark,
and they're trying to fill up the entire
product spectrum from small to extra large. So Gene I know you have some
thoughts around Rubin and some of the power requirement numbers involved. Yeah
so first of all just to refresh everybody's memory we have Hopper going
to Blackwell going to Rubin and they actually unveiled the name for the GPU after that, and that's named after Richard Feynman, a very popular and beloved
scientist in our world.
Anyway, so when Blackwell was announced last year, the idea that it would be 120 to 125
kilowatts per rack was a big shock to a lot of people.
And I think that was one of the big domino moves
towards how much power do I need for my data center
and how much cooling do I need for my data center.
Because when you get to that level,
it's not enough to take heat out of the box.
You have to take heat out of the building
and then you have to secure the power.
Well, the roadmap that they're on seems to just be
up and to the right on that scale.
And they unveiled that the Rubin Ultra Rack will be 600 kilowatts per rack.
And then when you go to Feynman, it's going to increase from there, but who knows.
He also mentioned this is one reason why he discloses the roadmap so many years in advance
is so people can plan for it because the outlay and the investment for these things
is quite significant. The other thing that was very interesting is how he touted Blackwell almost
against Hopper. At some point he called himself chief revenue destroyer because he was saying,
look how much better Blackwell is. And it really is-
Don't buy Hopper, right?
Essentially. Not quite that blatantly, there are a couple of occasions where
Hopper might be a better choice.
But I tell you where it might be a better choice is the air-cooled versus liquid-cooled.
If you are looking at liquid cooling and you can't handle it, then Hopper is a very fine
engine for what you need to do.
But it is definitely the case that if you're building a big AI factory, Blackwell is the
way to go. This also tells me that they are anticipating the lead times and the wait times for Blackwell
to drop dramatically.
I believe he said they're in full production with Blackwell at the beginning of the talk.
He did, except that Ian was saying the wait time for Blackwell right now continues to
be 56 weeks.
Okay.
Well, but anyway, the way they positioned it on stage, it seemed like it was imminent and
that moving customers towards Blackwell is okay.
Now Adrian was making the point that enterprises and even clouds are now having indigestion
with this annual cycle and just an incessant upgrade to technology and that they're probably going to just get
whatever they can get their hands on and they may have to skip a turn.
Okay.
All right.
So speaking of competition, it was also interesting that several competitors to Nvidia had side
events organized around GTC.
They had receptions and...
Well, you and I were...
We were at one of them together.
That's right.
We were at the Sareepers event last night, which was well attended.
It was a packed room and Accelera had one the night before that I also swung by
and it was also pretty well attended.
But that's a Dutch company that does edge AI chips and they had really nice
demos and we're able to talk about what they're doing.
Of course, everybody has to compare what they do to what Nvidia does.
So Nvidia is still very much in the ether. They talked about Dynamo, which is their quote
operating system for a big AI factory. And that manages a lot of the system management and
orchestration and workflow management that these clusters need. And they're open sourcing it.
They talked about working on 6G in a consortium of players
and startups that they're put together. We've talked in this podcast about 5G and then going
to 6G. It actually is computationally quite demanding to look at those beams coming in
and going out and the so-called MIMO for multi in, multi out. Some of them require differential
equations because everything in life does.
And for the GPU and the AI to sort of optimize that
is a ripe application.
Likewise with simulating a data center
before you actually implement.
Likewise with chip manufacturing,
how you decide how to choreograph the manufacturing process
and then instructing the equipment to go do it.
So there was points made about how you need AI to figure out what to do and then the thing that
actually does it. And as you mentioned with the driverless car and GM, the AI figures out what the
car needs to do and the car needs to actually do it. And that is going to be a pattern that we will
see. Which kind of makes sense. Very impressive, Xiao. Very exciting, high energy and a lot of interest.
Very much. I'll close with this comment. There was also mention of semantic storage. We've
talked about that in the past. If you go all the way back to episode 21 of this podcast,
when we met with Gary Greider of Los Alamos National Labs, and how the footprint of that lab
on every storage product that is out there
is pretty legendary.
So that is another thing that will benefit from AI.
If you are doing semantic storage,
well, you know, AI can really accelerate that.
What does that mean for the storage of tomorrow?
We're already doing computational storage.
We're doing erasure coding,
which is computational protection of data.
So the more computational something becomes,
the more eligible it is for it to be AI enabled,
and it's going to touch everything.
So Jensen did have an eye towards that,
which we were talking with Joseph George later on,
and he was saying how that could indicate
future developments.
So something to watch and quite interesting as well.
Great stuff Shaheen.
Very well.
Thanks everybody for joining us and we'll talk to you again soon.
Alright, take care everybody.
That's it for this episode of the At HPC Podcast.
Every episode is featured on insidehpc.com and posted on orionx.net.
Use the comment section or tweet us with any questions or to propose topics of discussion.
If you like the show, rate and review it on Apple Podcasts or wherever you listen.
The At HPC Podcast is a production of OrionX in association with Inside HPC.
Thank you for listening.