@HPC Podcast Archives - OrionX.net - @HPCpodcast-102: TOP500 at ISC25 Conference
Episode Date: June 10, 2025The new TOP500 list of the most powerful supercomputers was released today at the ISC conference, with a new addition to the top 10. Tune in as Shahin and Doug go through the list with their commenta...ry and analysis as they go over the details, key takeawats, how continents, companies, and architectures fair, and cover the full suite of benchmarks: HPL, Green500, HPCG, HPL-MxP (AI), IO500, and MLPerf. [audio mp3="https://orionx.net/wp-content/uploads/2025/06/102@HPCpodcast_SP_TOP500_ISC25_20250610.mp3"][/audio] The post @HPCpodcast-102: TOP500 at ISC25 Conference appeared first on OrionX.net.
Transcript
Discussion (0)
Cool IT Systems is proud to cool the world's fastest and most advanced supercomputers on the top 500 list.
For over 24 years, we've been the leading liquid cooling provider for the world's top HPC and AI systems.
Ensure performance and reliability for your next-gen AI systems with the world leader in liquid cooling.
Explore more at CoolITSystems.com.
and liquid cooling. Explore more at coolitsystems.com.
There's a serious point here.
And it's that Europe continues to expand its presence
with half of the top 10 and eight of the top 20.
And that's 1.2%.
So not too bad, which is really funny.
When you think of 1.2% of something
and you think it's good.
Based on total aggregate power of the entire top 500 list, the top three have about 20%
of that total.
As well as China not participating and thus we really don't know where they stand, although
they also have been under significant trade restrictions.
And if you go down the list,
if you're basically doing over 60 gigaflops per watt,
you're doing pretty well.
From Orion X in association with InsideHPC,
this is the At HPC podcast.
Join Shaheen Khan and Doug Black
as they discuss supercomputing technologies
and the applications, markets, and policies that shape them.
Thank you for being with us.
Hi, everyone. I'm Doug Black at InsideHPC and with me is Shaheen Khan at OrionX.net.
And this is the At HPC podcast.
Our topic for today is the top 500 organization has released its new list
of the world's most powerful supercomputers.
This is done twice annually at the two big HPC conferences
of the year.
ISC is going on in Hamburg and obviously the other conferences
SC in November.
The top line takeaway for this one is Shaheen,
I would say that the list is kind of reverted
to long term form.
Recently, there has been some rapid changes in the top 10
especially that were pretty surprising. But this time around, there has been some rapid changes in the top 10 especially that were
pretty surprising. But this time around, there's just one new system in the top 10 part of
the list. So coming in at number one is El Capitan, the exascale supercomputer that is
at Lawrence Livermore National Lab. The number two system is Frontier, which in 2022 came out as the first certified exascale system.
And Aurora at number three is at Argonne National Lab in Illinois.
Those are the top three systems.
Looking at El Capitan, it comes in at 1.7 exaflops.
This is an HPE Cray EX255A system. It's powered by AMD EPYC CPUs and AMD Instinct MI300A GPUs. So a very impressive
system. The other important takeaway is that the US has a somewhat diminishing presence at the top
of the list with six of the top 10 and 11 of the top 20 being from outside of the United States.
And Shaheen has that for a parochial way to look at things.
But there's a serious point here, and it's that Europe continues to expand its presence
with half of the top 10 and eight of the top 20 systems.
We also see a top 20 system from South Korea, their system coming in at number 18,
and Japan with two top 20s.
At number 7 is the formerly number one ranked
Fugaku supercomputer, and also at number 15.
Now, skewing the entire picture is that, as we all know,
China long ago, I believe in 2017,
stopped participating in the list.
We've reviewed the top 500 together now,
Shaheen, you and me for several years,
and we've often noted that China has a host
of exascale class systems.
We just don't know how many or how they are utilized.
But getting back to the US with a new administration
in place and supercomputing budgets and plans
at the US National Labs in flux,
leadership class supercomputing might continue
to diminish relatively speaking,
at least in sheer numbers. But we should also note that the top three systems are American,
and their aggregate compute power represents a significant portion of the aggregate power of the
entire top 500 list. Now, Shaheen, you've done some historical research. I believe currently the US has 174 of the top 500.
This is up pretty significantly from a low point of 109 in 2019.
And in 2006, the US had 309 systems.
So it's fluctuating.
The US now have 35% of all the systems on the list, and 28% of the top 50.
Europe now has 30% of the entire
list.
I think that's the size of it.
The investments in the US primarily through DOE have obviously yielded beautiful fruits.
The top systems are all US systems and a lot of architectural experimentation and implementation
that is providing excellent
insight. But the Chinese stopped participating, as you mentioned, and the Europeans started
investing in it properly in a big way, starting with the European Euro HPC joint undertaking
effort that started in 2018. And obviously it is also showing results. Albeit their strategy seems to be different, they are distributing their systems a little
bit more evenly across their geography.
So both of them are, in my view, challenges to US leadership in their own way, with Europeans
actually investing and providing a lot of resources and performance to a very wide range of scientists,
as well as China not participating, and thus we really don't know where they stand, although
they also have been under significant trade restrictions.
And it's going to be difficult for them to really come up with these big systems, so
who knows exactly what's there.
And the systems that they did have that were homegrown were excellent
performance in HPL, but they architecturally looked like they would not be performing well
on much else. So there is that sort of special purpose to them.
Yeah. As I was saying, there's serious questions about the utility of those Chinese systems.
By the way, the one new system in the top 10 is from Europe. It's the
Jupiter Booster system at Bulls Aquana. This will be part of the Jupiter system going in Germany,
which is expected to be Europe's first exascale class supercomputer. But Europe has not been
fixated on achieving that top performance level as much as, I think what you're talking about
is more of a distribution of supercomputing power throughout the continent.
We also note that England or the UK, pardon me, has continued to sort of de-Brexit itself
where HPC is concerned.
They are part of the Euro HPC effort.
Yeah.
The pulse Iquana system at Jupiter Booster booster is from evidence and as you've heard in our HPC news bites evidence is being acquired by the french government more details there but it's a great system and Jupiter is a modular heterogeneous system and booster is the big computational capability there that itself is projected to exceed one exoflops in 64-bit performance.
And of course, much more in lower precision.
So Sheen, obviously the list looks at more
than Linpack performance.
And I know you've dug into other aspects of what-
Yes.
What Top 5 announced.
Yes, for sure.
Let me just start with countries
because we already talked about it a little bit.
As you mentioned, the US leads in terms of supercomputer count, and it has about 35% of the total at 174
units. Europe collectively has about 30%. And then China has 9.2%, even now, despite not having
participated for a while. Japan has about 8%. and then we go to South Korea at 3%
and Canada at 2.6%.
And then notably Brazil shows up as 1.8%.
In terms of performance, however, it's
a pretty slam dunk for the US.
It has about 50% of the total performance, 48.4%
to be exact.
Now, the total performance, 48.4% to be exact. Now the total aggregate performance across the entire list
comes in at 13.84 exaflops.
That's up from last time, six months ago, when it was 11.72.
So there's a lot of improvements down in the list as well,
even though we kind of focused on the top 10.
The average concurrency, the average number of cores per system also went up. Last time around
it was 257,000 cores per system and this time is about 275,000 cores per system.
So it is high time to start using kilocores instead of just cores. So that makes it 258K compared to 275K now.
All right, and then moving to vendors.
In terms of supercomputing count, Lenovo has 27%,
HPE has just behind them at 26.4%,
and then Eviden follows with 11%, Dell at 8.2%, and then notably again, NVIDIA shows up at 5.4%.
So these are systems branded as NVIDIA systems. In terms of performance, HPE has a commanding lead
at 48% of the total aggregate performance, primarily because the top systems are all HPE systems. And then that is again followed by
Eviden because they also do big systems and they have 12.5% of the total aggregate. I expect that
Eviden's share might improve if the European systems come online, but HPE is winning deals
left and right as well. So it's going to be nice competition to watch. Shaheen, based on total aggregate power of the entire top 500 list, the top three have about
20% of that total. So yeah, yeah. You know, when you have a big one like El Capitan or Frontier
Aurora, it all adds up. So you get the big chunk.
It adds up quickly and hugely.
That's right. That's right. So now we get into accelerators.
Obviously, no surprise, NVIDIA leads
with 39.6% of all the systems and then AMD with 5.2%.
What this also tells you is that there are 265 systems, more
than 50%, that do not have any accelerators. So CPU only continues
to be a very big important vehicle for HPC, especially if you go down the list
a little bit. In terms of interconnects, InfiniBand has now exceeded 50%
of all systems, so it has about 54% of all the interconnects. And gigabit ethernet follows at 33%.
Omnipath carries on at 6.6%.
And it is up from last time.
And they just announced new switches last week.
And that's pretty interesting development there as well.
And then custom and proprietary is another 6%.
If we go to CPUs, also no surprise, Intel has dropped.
But it still has a pretty big lead at about 59% of all
systems, followed by AMD at 34%, which is obviously
up in a good, healthy way.
So AMD continues to gain, but Intel continues to lead.
That's the story there.
And then finally, we can look at cores per socket.
If you just, that's kind of a good indication
of just the kind of CPUs that HPC types tend to pick.
And 106 systems on the list have 64 cores per socket.
What follows after that is 79 systems
with 24 cores per socket.
So we're not seeing 96 or 128 or some of these other numbers. What follows after that is 79 systems with 24 cores per socket.
So we're not seeing 96 or 128 or some of these other numbers.
64 and 24 appear to be the popular ones.
Supercomputers are driving the world's most exciting innovations,
but all that power generates a ton of heat.
That's where Cool IT Systems comes in.
For over 24 years, we've been cooling the world's
fastest and most advanced supercomputers, including systems on the top 500 list.
We lead the way in liquid cooling for HPC, AI, and next-gen platforms. Heading to ISC 2025?
Come check out our expert presentations to see what it really takes to cool today's most demanding workloads
and explore what's coming next for AI and HPC cooling.
Visit coolitsystems.com to learn more.
So the benchmarks that we just talked about are all HPL,
high performance LINPACK, which is a dense matrix solver
of giant sizes to get to the performance that is reported.
Other benchmarks are HPCG for conjugate gradient that tends to establish a lower end to the
benchmark.
It's more indicative of everyday garden variety codes, and it's not as well behaved as HPL.
So the performance that you see on HPCG is dramatically lower than HPL,
as we'll discuss in a second. And then you have green 500, which is the same HPL benchmark,
but divided by the power that it used. So it's really gigaflops per watt. There are
a few other benchmarks in the industry, IO 500. It usually gets announced around the same time.
And when it comes, we'll talk about them.
But I'll reference a little bit about their previous list.
And then MLPerf, which is a suite of AI benchmarks, and it was a subject of one of our HPC podcasts
when we had David Cantor of ML Commons on as a special guest.
That was episode 91.
Go look it up. It explains a lot of what
they do. So with that introduction, maybe I can talk about green 500 first because that's
just HPL as compared with the wattage that is used. The number one system on that list
is Jedi. That's in Germany. Jedi is number 261 on HPL, but it is number one on green 500 and it delivers 72.7 gigaflops per watt.
So that's the gold standard. And if you go down the list, if you're basically doing over 60 gigaflops
per watt, you're doing pretty well. Number two is Romeo in France and that's 70.9 gigaflops per watt. Number three is Adastra also in France
and that delivers 69 gigaflops per watt.
And then number four is Eisenbart in the UK
and that's 68.8, almost 69 gigaflops per watt.
Now the number one system is Eviden
and it uses a Grace Hopper 200 system and InfiniBand.
Number two is also Eviden, also Grace Hopper 200 system and InfiniBand. Number two is also Eviden, also Grace Hopper
200. And number three is an HPE AMD MI300A slingshot system. And number four is also
HPE, but again, it's a Grace Hopper 200, except now it's slingshot instead of InfiniBand.
So basically Grace Hopper, these super chips from NVIDIA seem to be the sweet spot on being able to get energy efficiency.
So I expect MI300A might also perform quite well.
So Shaheen, as you went through that, I noted all four of the top four systems on the Green 500 list are in Europe.
I'm curious where the US shows up on the list for the first time.
I'm curious where the US shows up on the list for the first time. Well, the US shows up prominently in terms of a vendor list, but on the list itself,
number 10 actually is the first US system that shows up on the list, and that's Henry
at the Flatiron Institute that we have discussed on this show before.
It was higher up in the list before.
It has now dropped to number 414 on the top 500 itself,
but it shows up as number 10 on green 500. Okay. Okay. Then we go to HPCG. As I said,
HPCG is a really difficult benchmark and the fraction of the performance that you get is
really pretty low. And we'll demonstrate that in a second. So the number one system is El Capitan,
which is also the number one on the top 500.
And whereas on the top 500, it does 1.7 exaflops on HPCG,
it's only doing 17 petaflops.
So that's 1% of the performance that it gets for LimPAC.
The number two system is Supercomputer Fugaku, one of my favorites, as you know, and you
will see why in a second.
Because while it's doing 442 petaflops only, in quotes, 442 petaflops on LimPack, it's
doing 16 HPCG petaflops right behind El Capitan. And that's 3.61% of the performance that it gets
on Limpac. And that's a high watermark. So imagine that you're building a supercomputer
for one application, and then you can only get 3, 1, less than 1% for some other garden
variety application. So that kind of shows you how you might be able
to architecturally optimize for some apps
rather than others.
And also Y was saying that the Chinese systems
that they're doing really good on Limpak
didn't look like they could do well on much else.
Those guys would have a really difficult time doing this,
like even worse than the ones that we're seeing.
Number three is Frontier,
and it does 1.3 exaflops for LimPack.
It does 14 petaflops for HPCG.
That's just over 1% of the performance.
Number four is Aurora.
Aurora only gets 0.55% of its LimPack for HPCG,
and HPCG comes in at 5.6 petaflops.
Number five, and we stop there, is Lumi.
Lumi is in Finland.
And it is doing 379 petaflops for Limpac and 4.6 petaflops
in conjugate gradient.
And that's 1.2%.
So not too bad, which is really funny.
When you think of 1.2% of something, and you think it's 1.2%. So not too bad, which is really funny when you think of 1.2% of something and you think it's good.
Well, Cigin, I know you're a big fan of the Fugaku system and you can really see why.
I mean, you extolled the beauty, the virtues of its architecture and it shows in that benchmark.
That benchmark really is where you can see the benefits of that sort of architecture.
Now, of course, it is also physically beautiful and packaged beautifully. I would like one
at home really. The next one is HPLMXP, which is the mixed precision benchmark, that it's
doing HPL but doing it kind of the quote AI way by looking at lower precision but higher
performance arithmetic that's available and try to iterate
your way to the same exact result. So number one is again El Capitan. Now here we go way at the
other end. You know like that one with HPCG we could do really not a lot of flops. In this one
we are doing a lot of flops because we're using different hardware, different metrics. So number one is El Capitan.
It is obviously number one on the top 500.
It does 1.7 exaflops.
For HPL-MXP, it does 16.7 exaflops.
So it's 9.6 times more performance
compared to the 64-bit hardware.
Number two is Aurora.
It's doing one exaflop.
And for HPL-MXP, it does 11.6. So that's Aurora. It's doing one exaflop and for HP LMXP it does 11.6 so that's
11.5 times. And then you go to Frontier that's 8.4 times. There's a system in Japan, AIST,
that is an HPE system with NVIDIA H200 GPUs and that's 16.3 times the speed up. Now,
if we're comparing performance benchmarks
on one and to the other,
so a lot has to do with how well the team that did
the benchmarks did one versus the other.
So the speed ups are kind of indicative,
just a ballpark sort of a thing.
And if you look at like all the top 10 list,
you see anywhere from like four and a half times to 25 times.
But the typical range seems to be 6 to 10x. So really the walk away is that if you use
lower precision arithmetic that's available, doing something like HPL, you should expect
to see 6 to 10 times more performance.
Interesting stuff. I mean, so much of the spotlight over the last
five months really has been around AI factories, these massive tens of billions of dollars data
centers. But by the same token, there'll always be a place for these incredibly high powered
supercomputers with high precision computing. And really, I think a lot of the movement right now,
computing. And really, I think a lot of the movement right now, we have to say is coming out of Europe. In fact, Jupiter is scheduled to be stood up, installed and tested next year,
and could be Europe's first exascale system. Yeah, definitely. The time to redouble efforts
on supercomputing in the US is really now. And I know a lot of folks are working on that, as I see
reports from various committees.
So fingers crossed that leadership will be maintained
and spread across all the scientific community.
Yeah, it'll be very interesting to see how long it takes
before El Capitan is displaced by something more powerful.
Now, let me close with IO 500.
As I said, the list isn't out yet,
but the last one that they did,
this was all about file system. So you have obviously Aurora and a few other systems that show up on that list,
but the file systems are generally either DAOs at the top end and DAOs is the distributed
asynchronous object storage that's an open source storage software originally done by
Intel and it's object store. It's got key value, it's got the erasure
coding, it does scale out really well, it can use NVMe, it's kind of a modern
file system if you will. And then right behind it is Lustre, well-established,
also highly scalable, very robust, and you can see variations of that including the
ones from DDN. also Weka they have really
a good protocol that they use as well so that's what IO 500 looks like and then
MLPerf the latest results that were announced just a few days ago where
Blackwell leading in MLPerf training by good margin and then on the AI
inference there is the data center variety of inference and the edge
variety of inference. AI inference for a data center, the H200 is the current leader from Nvidia,
and we'll see how that evolves. So that kind of concludes my analysis of this with a little bit
of time that I had, but I'm delighted that this list continues. And it is the 65th edition of the top 500.
So divide that by two, and that's how many years they've been doing it.
It's just over, you can't do the math wrong when you say something like that.
I think we should fire up Il Capitan and figure out how many years they've been doing it.
So it's over 32 years that they've been doing this.
And it's wonderful because it provides
a whole lot of historical data.
It has predictive power if you dig deep
and really understand what the benchmarks
are really trying to do.
It is a good informer of future technologies
and future architectures.
And of course, it's also a good indication
of what's going on in the market.
And a good comparative piece as well.
Absolutely.
Yeah, huge value to the industry.
I think it really binds the industry together quite nicely and it's tremendous effort by
the team that does it.
Okay.
Well, as always, Shaheen, great to be with you on our long form podcast and we look forward
to the entire ISC show this week.
That sounds good.
All right.
Thank you, Doug. Thank you to our listeners. Take care. Until next time.
That's it for this episode of the At HPC Podcast. Every episode is featured on insidehpc.com
and posted on orionx.net. Use the comment section or tweet us with any questions or
to propose topics of discussion. If you like the show, rate and review it on Apple Podcasts or wherever you listen.
The At HPC Podcast is a production of OrionX in association with Inside HPC.
Thank you for listening.