@HPC Podcast Archives - OrionX.net - @HPCpodcast-106: TOP500 at SC25 Conference
Episode Date: November 20, 2025Join Shahin and Doug as they analyze and discuss the new TOP500 list. They go over HPL, HPL-MxP, HPCG, Green500, geographical distribution, vendor distribution, and other observations. [audio mp3="ht...tps://orionx.net/wp-content/uploads/2025/11/106@HPCpodcast_SP_TOP500_SC25_20251119.mp3"][/audio] The post @HPCpodcast-106: TOP500 at SC25 Conference appeared first on OrionX.net.
Transcript
Discussion (0)
Cool IT Systems cools the world's fastest and most advanced supercomputers on the top 500 list.
For over 24 years, we've been the leading liquid cooling provider for the world's top HPC and AI systems.
Ensure performance and reliability for your next-gen AI systems with the world's liquid cooling expert.
Explore more at cool IT systems.com.
The first non-U.S. system certified to have reached the XISCale class level of computing.
That kind of leads me to a discussion of why are we doing this and like the benefits of the top 500.
I believe it was 2016 that China dropped out of top 500 and it's really unfortunate.
Wish they would come back.
The ability to take advantage of what an AI optimized chip gives you.
to do traditional HPC.
From OrionX in association with InsideHPC,
this is the at-HPC podcast.
Join Shaheen Khan and Doug Black
as they discuss supercomputing technologies
and the applications, markets, and policies that shape them.
Thank you for being with us.
Hi, everyone, I'm Doug Black at InsideHPC.
And with me, of course, is Shaheen Khan of OrionX.net.
And we are at the opening day of the
Supercomputing Conference here in St. Louis, SC-25. And today, as always happens on the first day of the
conference, was the release of the new Top 500 list of the world's most powerful supercomputers.
Shaheen and I both went to the press briefing with some leading HBC luminaries talking about the
list and developments in supercomputing and HBC and AI. And Shahin, probably the most significant development
to the new list is the addition of the Eviden built Jupiter Boost supercomputer. This is at the ULIC
Supercomputing Center in Germany. This system now becomes the first non-U.S. system certified
to have reached the XISCale class level of computing. So congratulations to them. In fact,
Jupiter Boost's performance improvement is the most significant change from last spring's top 500 list.
Last spring, Jupiter Boost came in at just under half an X-a-scale and compute power.
Now the system hit the X-Scale mark on the nose, according to the data on the new list.
Yeah, the other change is El Capitan, the number one system at Lawrence Livermore National Lab.
They ran the benchmark again, and they got way closer to two X-Flops at 1.809 X-Flops.
At the moment, their rankings are therefore Livermore, Oak Ridge at 1.35, Argon at 1.01, and then Euro-HPC just below it, exactly one exaflops.
So El Capiton, this would be three straight lists. It's the number one system. It took the number one spot a year ago at last year's SC from the frontier system, which is, as you say, the number two system in the world.
El Capitan. This is an HB.Cray system with 11 million cores, and it's based on AMD 4th Generation Epic
CPUs and AMD Instinct MI300A GPU accelerators. Now, Frontier is measured at 1.35 Xaflops, and the number
three system, as you mentioned, was Aurora at just over an Xaflop. And looking through the rest of the top
10. After Jupiter boost, we have the Eagle system. This is a Microsoft Azure system. The legend is that
a few guys spent a couple of weekends building this AI supercomputer on Azure. I think it was two years
ago, Shaheen. Number six is the HPC6 system in Italy. This is the number one ranked commercial
system in the world. This is a system owned by ENI, the Italian Energy Company, and this one comes in
it just under half an exoslop of performance.
That's followed by number seven supercomputer Fugaku,
which was the number one system as of, I guess, more than three years ago.
That's an arm-based system at the Rican Center for Computational Science in Japan.
We have the Alps system, another HPEE system in Switzerland.
We have Lumi, an HPE AMD system in Finland.
And the number 10 system running out the top 10 is a Bulsacuana system called Leonardo.
This is at Euro-HPC Chenica in Italy.
Yeah, so a few notables with really the top 500 in general,
but also as one looks at top 10 with not a lot of movement,
is that, number one, China is not participating.
Number two, many sites don't actually run HPL, the benchmarks.
And number three, many chips don't run HPL either.
And we have a large number of AI-focused chips
that in principle could run this, but they don't.
That kind of leads me to a discussion of why are we doing this and like the benefits of the top 500.
Besides the historical perspective, having done it for so many years and the large body of data that it provides in terms of architectural approaches that have been tried and what sort of performance they've got and the software stack, et cetera, et cetera.
There's also a predictive quality to it that if you study it deep enough, you can start kind of being able to predict how a particular architecture,
might perform for various applications.
There's definitely a community aspect of Top 500.
It rallies the community behind a really good set of performance benchmarks.
And that leads to number four, which is the evolution of benchmarks.
So there's definitely HPL, but there's also HPCG and MXP and Green 500 and several other
derivatives that have come about that we will cover in this podcast.
Now, this list is quite static, yet regardless of that,
at looking at the top 10, looking at this list, it does represent a tremendous achievement
in advanced computing. And I think we should note, too, Shaheen, that recent lists have
shown a lot of changes. They've been surprisingly changeworthy, if you will, significant
additions. And a big thing we've seen on recent lists in recent years is the rise of
HPC in Europe. To the degree we see a change this year, and as noted, we continue to
see that in European system joining the Exascale Club, while Europe is also well represented
throughout the top 10 outside of the top three systems, we should say. That's a very good point.
So maybe with that we can move to HPCG. That's the conjugate gradient benchmark. It's a sparse
matrix iterative solver. It's a much, much harder problem to solve than the HPL dense matrix
factorization benchmark is. So you would
expect to not get a good fraction of the peak performance from this as you might from the
dense linear solver. And indeed, that's what you get. We've talked about this in the previous
coverages of the top 500 numbers. But you're doing well if you get 1% of your peak performance
with HPCG. Whereas for HPL, if you're doing less than 90%, you're probably not doing as well as you
could have. So that's a very sharp contrast between the two benchmarks.
Okay, so looking at the HBCG top 10, does anything jump out at you?
We do see an interesting system at number six for the HBCG top 10 list chain.
This is a soft bank system.
I'm wondering if you have any thoughts there.
Yeah, there are two new entrants to the top 10 with HBCG.
One is, as you mentioned, number six, and that's a soft bank system in Japan,
and it's a DGXB200 with Zeon Platinum configuration.
The notable thing about that is that it gets 2.5% of the peak performance, and it is ranked number 17 in the top 500 in general, but number 6 on HPCG.
There's another new system, and that's number 9, and that's an Exxon mobile system in the U.S.
It's an Nvidia-G-200-based system, and it gets to be number 15 on the top 500 system and number 9 on HPCG, and it gets 1.2% of the peak.
Now, the star of this list is really Fugaku system in Japan,
and that continues to be number two, number seven on the top 500,
and it gets 3% of the peak.
So in principle, it's like three, four times more efficient
than it's a general competition.
Jehine, help us understand that number 3%.
As you say, it's exceptional,
but what does that mean about the remaining 97%
that the system is unengaged?
What does this mean?
Well, in short, yes. It means you've got all that capability, but you're unable to actually take advantage of it. Some of this has to do with the compute intensity of your algorithm, like how many pieces of data do you move and how many computations do you do on them. So if you move a couple of few pieces of data and you do a lot of computation, that benefits the system and it allows the system to perform way better. But as you get into more garden variety, actual bread and butter calculation,
of HPC, you kind of run out of those. Now, if you're doing deep learning, then you back into
matrix multiply land and it's like you're good to go and you can get a pretty good fraction of the
peak. But for general scientific engineering work, Fugaku optimized for that. Fugaku built a system
that really favors data bandwidth as well as computation to try to get more of a balanced system.
Now, the counter argument sometimes is that if you reduce the top peak performance, then you are
eligible to get a better fraction of it, that can be true. And you can see some correlation
that when your peak performance is way high, then achieving a higher percentage of it becomes
more difficult. But it comes down to really data bandwidth and performance. Now, the Rican system
has been on the list for years now, five, I want to say, four or five. And I know you're a fan of
that architecture. It must be rewarding for you. It's for that reason. Yeah, exactly. Okay.
Absolutely. Yeah. No, I knew this was going to be a long-lasting system, and it is. And I think it's going to continue to figure prominently until the next generation shows. Okay. Should we move on to the next benchmark? Let's do that.
Supercomputers are driving the world's most exciting innovations, but all that power generates a ton of heat. For over 24 years, Cool IT has cooled the world's fastest and most advanced supercomputers, including those.
on the top 500 list.
With our unrivaled liquid cooling expertise,
extensive technology portfolio,
and in-house capabilities,
cool IT is enabling what comes next in computing.
Visit cool IT systems.com to check out what it really takes
to cool today's most demanding workloads.
Now, this one is the HPLMXP,
meaning mixed precision top 10.
And this is, as I understand it, Shaheen.
a good reflection of AI performance. Is that the case?
Well, I kind of consider it more as the ability to take advantage of what an AI optimized chip gives you to do traditional HPC.
Because with MXP, you are leveraging mixed precision, but you're getting the same exact answer as you did before.
Basically, you do low precision factorization, and then you iteratively refine it,
with high precision. So you don't use your expensive high precision hardware until you actually need
it. You use low precision to get close to the answer and then use high precision to finalize the
answer. So it turns out that you may have to actually do more computation to get there, but you're
doing it so much faster that you are way, way, way ahead. And the table that Jack Dungara showed at
the press conference demonstrated that really well. Yeah. And as we all know, many forms of AI compute
require less precision. So there we are. Now, the top 10 for this benchmark, to a great degree,
reflects the top 10 for the Limpact benchmark. We have El Capitan, Aurora, and Frontier in the top
three positions, and familiar system names like Loomi at number seven, that's number nine for the
Linpack. Fugaku comes in at eight, and as we said, it's seven on the Linpack, and the Leonardo
system. And then we see Jupiter boost the new Xoscale system. It's four on both benchmarks.
And we see the soft bank system again at number five.
Yeah.
So the soft bank system moves to number five on this benchmark versus number six on the HPCG.
And it is a new entrant.
So is the Jupiter booster that you mentioned.
That also is a new entrant at number four.
And what is notable with the HPLMXP benchmark is that the top three are all DOE lab
U.S.-based systems from El Capitan at Lawrence Livermore.
Aurora is number two this time.
Argon and it does particularly well on this benchmark. And then Frontier is at number three.
All three are above 10 exophlops. El Capitan is at 16.7 Xaflops. And then the other two are hovering
about 11.5 exaflops. Yeah. And I think we should note, just in the case of El Capiton,
number one, for this MXP benchmark, 16 billion billion calculations per second, which is just
beyond comprehension. It's pretty mind-boggling. And, and what?
What is also notable on this benchmark, as we said, if you kind of leverage what you got,
this is the hallmark of what HPC folks do. Whatever tools you give them, they will try to harness
it. And now they've got fast, low precision arithmetic available to them, so they use it to get speed-ups.
And the El Capitan system is nearly 10 times faster, like wall clock faster, doing HPL when it does
mixed precision compared to pure 64-bit. So this stuff really can benefit you if you can
take advantage of it. The leader is really soft bank. The soft bank system is 24.4 times faster,
doing mixed precision compared to just pure double precision, 64-bit arithmetic. So the next thing that
they covered was the 10 largest new systems in 2025, including what happened in June as well as now
in November. So the top 10 largest new systems are all in the top 100 system, really the top 7.000.
And only two of them are in the U.S.
The other eight are in other countries.
The highest performing one is at number 17, and that's the soft bank system that we covered.
And number 79 is a TELUS system in Canada.
Sovereign AI Factory is the name of the computer.
Yeah.
After the soft bank system, we have at 18, it's a system in Saudi Arabia, an HPECray,
NVIDIA system.
We have systems in India, Taiwan, Brazil,
you say, all over Israel and as you mentioned, Canada. So a lot of growth in other regions of the
world. So the other topic that they raised was the adoption of brand new technologies that are
also showing up on this list. The obvious candidate there is the Blackwell chips from
Nvidia. Where do they show up? Yes, and not very high numbers here, 1756, 57, you know,
and then in the hundreds and 200s. But I think that's a reflect.
of the larger systems, you know, it takes time for a new chip like a Blackwell to be
incorporated into the biggest and most expensive and systems that take the longest time to
install and deploy. Would you agree? Yeah, it's interesting to try to figure out what's going
on here because obviously Blackwells have been shipping for a while and you'd expect them to
show up on the list and indeed they do. But they're only eight new ones and they're all outside of
the U.S. So like what's going on? It can be that all the Blackwells are going to AI sites and
commercial sites that have no time to actually run HPL and participate. Or it could be that
these are all large systems and it takes a while to build it, to deploy it to orchestrate it,
as you mentioned. That could be a reason as well. In other words, we're anticipating we'll see
stronger presence of the Blackwell chips in the higher end of the list in the list to come next year.
So the other thing that was covered, of course, was the Green 500, the benchmark that's been
around for about 20 years now, and a couple of new entrants there as well.
Yeah, we have the Cairo's Bullsequana XH3,000 system, this Grace Hopper Super Chip, and then
the Levant GPU extension. This is another Bull Sequana system, powered by Grace Hopper
Super Chip. Yeah, in fact, top three are all evident Grace Hopper chips. The number one and two,
do better than 70 gigaflops per watt. That's pretty amazing too, that with one watt of energy
get that many calculations. And then number three to 10 are all above 66 gigaflops per watt. That's the
threshold before you become interesting. And of the top 10, five of them are Nvidia, four of them are
AMD, and one is just pure CPU without a GPU. Yeah, Zeon chips. Okay, so looking at vendors'
system share, Lenovo actually comes in at number one, 28% of the systems on the top 500 list
are from Lenovo, although we don't see them toward the top of the list. Number two, not surprisingly,
is HPE at 25.2% and then Eviden at 11.4%. And I think it's commonly held that HPE and
evident are the two leading vendors for the largest, most powerful supercomputers in the world.
Yeah. So Lenovo being part of the Chinese company Lenovo, they stopped participating with all
the systems that they might have in China. And they have traditionally had a lot of systems
on the top 500, as has in spare, as has Huawei, and they've collectively stopped participating.
So the number of systems from Lenovo is kind of gradually going down. They do, okay,
submit systems outside of China, but those are not generally in the top 10. So in terms
of supercomputing count, it's pretty impressive that there's still, number one, in terms of
performance, HPE wins that hands down, because all the big, big systems, many of them really,
are HPE system, and then those that aren't are mostly evident. So in terms of performance share,
number one is HPE, number two is evident, number three is other. There's, you know, and then others
kind of show up in smaller fractions.
Yeah, HB is number one at 46% of aggregate computing performance on the top 500 list.
Again, as you say, followed by evident at 13%.
They don't give the percent for Lenovo, but it's substantially smaller.
My guess, it looks like about 6 or 7%.
Yeah, that's what I recall.
And then if you look at geography, again, kind of similar.
When it comes to performance, Europe wins because they've got quite a
few and all boosted by Jupiter, the system in Germany. But in terms of count, it's all about the
same, Asia, Europe, North America, or about neck and neck in terms of count, which sort of says
that Europe is getting more of the high-performance ones, and the U.S. is getting more systems
at a lower performance, and Asia is somewhere in between. So, Shaheen, possibly not the most
scintillating, exciting list. But I know you're a proponent of the top 500 list. Those who want to
dig into it can get excellent information about trends, developments, and who's doing what. But as far
as newsworthiness, I guess we could say this is not the most exciting list we've ever seen.
Well, it is always exciting and interesting to me. And I really encourage anybody who has a bit
of an interest in this to go to the top 500.org site and download the spreadsheet.
spreadsheet, the full spreadsheet, and it's got obviously at least 500 rows and many, many columns,
and you can really go to town analyzing all sorts of different attributes that would be very,
very interesting. Now, one thing I should say before we close is that El Capitan is number one
in all three benchmarks that we mentioned, in HPCG and HPMXP, the mixed precision one. So that
continues to be the king of the hill. Yeah, a tremendous system. And as you,
you say, they've actually been developing the system and improving performance.
Right. So really, the big news is El Capitan number one and all three. China continues to be
missing in action. We hope that they will participate again and join the community again.
I believe it was 2016 that China dropped out of top 500, and it's really unfortunate.
Wish they would come back. Right on.
It's never been known, Shaheen, whether they're covering for falling behind the West in
supercomputing fire, or they're covering that they've really exceeded us. We really don't know.
Okay, well, thanks so much, Sheehan, as always, for sharing your insights, and I will see you at
the conference this week in St. Louis. Absolutely. Always a pleasure. Thank you all for listening,
and we'll be reporting from SC25. Take care. That's it for this episode of the At-HPC
podcast. Every episode is featured on insidehpc.com and posted on Orionx.com.
net. Use the comment section or tweet us with any questions or to propose topics of discussion.
If you like the show, rate and review it on Apple Podcasts or wherever you listen.
The at-HPC podcast is a production of OrionX in association with InsideHPC.
Thank you for listening.
