@HPC Podcast Archives - OrionX.net - @HPCpodcast-84: TOP500 at ISC24 Conference
Episode Date: May 14, 2024The new TOP500 at ISC24 conference is here with a new addition to the Exaflop Club, a new HPL-MxP (AI) champion, new top three in Green500, and also a look at HPCG, the difficult benchmark that usual...ly sets the lower bound of system performance. Tune in as Shahin and Doug go through the list with their commentary and analysis. [audio mp3="https://orionx.net/wp-content/uploads/2024/05/084@HPCpodcast_TOP500-ISC24_20240513.mp3"][/audio] The post @HPCpodcast-84: TOP500 at ISC24 Conference appeared first on OrionX.net.
Transcript
Discussion (0)
Attending ISE 24 in Hamburg, Germany?
While you're there, be sure to stop by Lenovo booth D30 on the ISE Exhibitor Floor at the Congress Center Hamburg, May 12-16, 2024.
You can also visit lenovo.com slash HPC to learn more about Lenovo's HPC solutions.
We do, Shaheen, have a second exascale system.
That is, a second American exascale machine.
So arguably, as big a news is Aurora's performance on the HPL-MXP, what used to be called HPL-AI.
The Green 500 results very interesting, this list.
Major shakeup, the top three are new to the list.
That rubber band stretches even more than it did before.
The gap between the minimum performance
and the maximum performance becomes just huge.
So you really have to start reformulating problems.
From OrionX in association with InsideHPC, this is the AtHPC podcast. Join Shaheen Khan and Doug
Black as they discuss supercomputing technologies and the applications, markets, and policies that
shape them. Thank you for being with us. Hi, everyone. I'm Doug Black. This is the At HPC podcast.
I'm with my podcast partner, Shaheen Khan of OrionX.net.
Today's topic is the big news during the ISC conference in Hamburg, which is the new top
500 list.
We do, Shaheen, have a second exascale system.
That is a second American exascale machine. We know that China has
several exascale class systems, though we're not certain how many or what their true capabilities
are. In any case, the Intel Aurora system at the Argonne Leadership Computing Facility is the
number two system, and it squeaked over the exascale finish line chain with a LINPACK score of 1.012 Exaflops.
And that, by the way, is just over half of its projected ultimate performance when installation and tuning at Argonne are completed.
Nevertheless, Aurora's LINPACK benchmark improved by more than 585
petaflops from November's top 500 list. And by the way, those 585 petaflops alone would place
third on this list. So Aurora is an HPE Cray architecture, Intel powered with Intel Xeon CPUs and Intel GPUs, formerly called Ponte Vecchio, now the Mac series.
But that said, there's not a lot of change in the top 10 of the list, especially compared with last November, which had significant changes.
Remaining at number one on the list is Frontier at Oak Ridge National Lab with a LINPACK of 1.206 exaflops.
This again is HPE Cray EX architecture. It has 8.7 million combined AMD CPU and GPU cores,
and it utilizes the Slingshot fabric. It also has an impressive power efficiency of 52.93 gigaflops per watt, which puts Frontier
at number 13 on the green 500 list. The rest of the top 10 systems remain mostly the same. Number
three is the Eagle system installed on the Microsoft Azure cloud, making it still the highest
ranking cloud system on the top 500. The ARM-based Fugaku system at Japan's RIKEN Scientific Computing
Research Center is still at number four. At number five, we have the Lumi system. This again is an
HPE Cray system powered by AMD. This is in Finland at the CSC site in that country.
The one new system on the top 10 at number six is ALPS, the ALPS machine from the Swiss National Supercomputing Center with a score of 270 petaflops.
The number 7 system, Leonardo, is installed at the Chineca Euro HPC site in Italy.
This is an Atos Bull Sequana system.
We have Marinostrum 5 at the Barcelona Supercomputing
Center. This again is a Bulls Aquana system at number eight. Number nine is the IBM Summit,
which has been on the list, I believe, since 2018. It actually is still operational past its planned
retirement date, and it is at number nine. And the EOS system at number 10. This is an
in-house NVIDIA DGX super pod. And Sheehan, I think you wanted to make a comment on the number
11 system. Yeah, so the number 11 system is also new on the list. And that's the Venado system at
Los Alamos National Lab that comes in at 98 petaflops. It just didn't make it to number 10, but it is new, just like Alps is at
the Swiss Center CSCS. That kind of rounds up the top 11 anyway. So arguably as big a news is
Aurora's performance on the HPL-MXP, what used to be called HPL-AI. This is the mixed precision benchmark that solves linear equations to 64-bit
accuracy, but uses lower precision machinery to do it. It does a low precision factorization to
get an approximation and then iteratively makes it more accurate. So on the HPL-M MXP, Aurora is now number one in the world with 10.6 exaflops, and Frontier is pushed to number two with 10.2 exaflops, which is an improvement from the 9.95 exaflops, and Fugaku stands at number four with two exaflops. If you compare the exaflops
for MXP with the 64-bit benchmarks, you can see that it's a factor of about, oh, 7 to 10,
depending on how well you did with each benchmark. For Aurora, it's about 10x. For Frontier, it's about, let's say, 8x. And for
Lumi, it's about 7x. But that is really a variation with the kind of benchmarking you did on the 64-bit
compared to the lower precision. And of course, this whole benchmark is useful when you have GPUs
that support mixed precision in hardware, like FP16, which is
typically what it is. But the GPUs, of course, go way lower than that with FP8, FP4, and of course,
even lower than that these days. One other observation is the systems on the top 10 list
get anywhere from 51% of their peak performance. That's for Aurora, as you indicated,
all the way to 82% of the peak performance, and that's Fugako at RIKEN. When you look at the
entire 500 system list, that range of the fraction of peak that they get is anywhere from 23%
at the low end to 98%. And there are 21 systems that are getting more than 90% of that
peak performance. And those are the systems you want to look at and see how they architected it
or how they benchmarked it and how they managed to extract more power out of the system. Because
you look at some of the systems in the top 10, and if you improve the fraction of performance,
some of those positions could swap. So a platform that facilitates that become really important. Now the Green 500 results,
very interesting, this list, major shakeup, the top three are new to the list. The number one spot
on the Green 500 was claimed by the JEDI Jupiter Exascale Development Instrument. This is a new system. It's at the ULIC Center
in Germany. This is a Euro HPC site. It's 190th on the top 500, but it achieved an energy efficiency
rating of 72.73 gigaflops per watt while producing an HPL score of 4.5 petaflops. This is a Bull Sequana system powered by Grace Hopper Superchips.
Number two, the Isambard AI machine at the University of Bristol in the UK.
It has an energy efficiency rating of 68.8 gigaflops per watt
and an HPL score of 7 petaflops.
And at number three is the Helios system from Cyfronet out of Poland.
The machine achieved an energy efficiency score of 66.9 gigaflops per watt and an HPL of 19
petaflops. Yeah, that's one that's pretty impressive, getting 19 petaflops while producing
66.9 gigaflops per watt. Number four is Henry at the Flatiron Institute.
That's a Lenovo system also based on NVIDIA system, this time the H100, and that's getting
65.4 gigaflops per watt for a HPL of 2.9 petaflops. Number five is also kind of new. That's pre-ALPS, also at the Swiss Center CSCS.
That's getting 64.38 gigaflops per watt and produces 15.4 petaflops of performance.
If you fast forward to number 10, you're getting 58 gigaflops per watt. So the range
is between 58 to almost 73 gigaflops per watt. And if you're anywhere in that range, you're doing well.
So good for them.
And obviously the newer chips are producing better energy efficiency than the older ones.
All the new ones are GH200, and the ones that are right behind them are H100s.
So it basically just shows that technology improves,
and you want to be on the leading edge
to get the energy efficiency. Are you currently at ISC24? If so, stop by booth D30 at the Congress
Center Hamburg to see what Lenovo is up to. This year, Lenovo's theme is transforming HPC and AI
for all, which aligns with Lenovo's commitment to simplifying the technology adoption process with proven expertise and delivering faster insights from the pocket
to the cloud. Visit booth D30 and chat with an expert about Lenovo AI solutions,
Neptune cooling technology, TrueScale for HPC, and more. Learn more at lenovo.com slash HPC.
Now, Shaheen, I know you're a strong advocate of the top 500 list. You believe
it has good value if it's looked at and analyzed in the right way. Share with us some other thoughts
you have on this year's or this. Yes, I did spend some time in the spreadsheet. You have to live
inside of it. One thing is that the slope of performance improvement changed in 2013. It became a little
bit harder to get performance after about that installment of this list. The other thing is that
if you aggregate the entire performance of all 500 systems together, we are at 8.2 exaflops,
just starting to approach 10 exaflops, and we may get there by SC24. By comparison, it was 2018 when the total aggregate performance
just passed one exaflop, so about 10x within about six years. And then you have to go back to 2004
for the aggregate to have exceeded one petaflop. So a lot of performance improvement in a relatively
short time, and really a hallmark of HPC that has been improving faster
than Moore's law. The other thing that's interesting is the conjugate gradient benchmark, HPCG,
that usually sets a lower bar of performance. This is kind of the minimum that you could expect to
get more or less. Of course, it could get even less. So the number one system on HPCG is Fugaku with 16 petaflops. And just for comparison,
it is getting 3% of its peak performance. For HPL, it gets 82%. So it tells you the massive
impact that your particular application can have on what kind of performance you can expect.
Number two on that list is Frontier. It's getting 14 petaflops. And get this, that's 0.8% of peak, not even 1%.
Wow.
If you go to Aurora, that's number three with 5.6 petaflops.
Number four is Lumi in Finland.
That's getting 4.5 petaflops.
That's 0.9%.
And number five is Alps, the new system at number six.
It's getting 3.7 petaflops,
and that's 1% of peak. So again, the range of performance that you can get from these systems
highly depends on how well behaved your application is. And as we move to accelerators
and GPUs and such, that rubber band stretches even more than it did before. The gap between
the minimum performance and the maximum
performance becomes just huge. So you really have to start reformulating problems into the kind of
instructions and data flow that match the underlying hardware. Another observation just on HPL itself
is what size of matrix you need to get the performance that you reported. Generally, the smaller the matrix size that you need to get the performance,
the better the system is or the better the benchmarking that went into it is.
So on the top 10, number 8 and number 10 have the smallest so-called N max numbers.
That's the size of the matrix.
Both of them are H100 systems.
Now, in our episode with John Shelf last week, he had a passing comment about n half.
That's the size of the matrix that would give you half the performance that you got at the maximum.
And it sort of indicates the slope of performance as the size of the problem changes.
Noteworthy that only 19 systems are reporting that.
So it's just has fallen out of favor.
It's a lot of work to produce it.
So maybe they don't want to do that.
And that's kind of what we get.
The other things that we usually go through is the vendors and the countries and sort
of the system components.
Yes, let's do it.
So in terms of vendors, Lenovo is still number one
with 161 units. That's 32% of the total. And that's after they basically stopped reporting a
lot of the systems in China, but they only have 7.3% of the total performance. So it indicates
that a lot of the systems are lower down in the 500 list. HPE has 111 units, that's 22% of the total,
but 36% of the performance. And that really reflects their heavy presence in the top 10
and really the upper half of the list. Evident is advancing. They now have 49 systems, that's
10% of the units and almost 10% of the total performance, also indicating that
they play at the top end. And then after that, you kind of go to Fujitsu with 6.9%, that's probably
entirely Fugaku related. And then Azure with 8.7%, again, because of the seven systems that
Microsoft has on the list. And again, kudos to Microsoft to participate in this program,
and I just want to salute them for doing it. In terms of countries, the US has 169 units.
That's 34% of the total in terms of units, but it's more than half of all the performance at 53.4%.
The EU has 160 systems, and that's 27% of the performance.
And then Japan has 29 systems, and that's 8% of performance.
And Shaheen, I think it's fair to observe.
It would be nice if China were participating and we had updated information from that. I hope so.
Yes, I think they can really come back in, please.
And then in terms of system configuration, of course, we moved to GPUs some
years ago, and now you really can't get to the top without them. And when you look at what GPUs
are being used, 134 systems are using various renditions of NVIDIA GPUs, and that gives you 25.8% of all the performance that is there. AMD with MI250X,
because I don't think MI300X has really made it to the system yet, has 10 systems, which is 2%
of the total number of systems, but it's 21% of the performance. And that of course means
their presence is really in the top 10. And of
course, Frontier is a big chunk of that right there. Right. Including Frontier.
In fact, yeah. And if you look at interconnects, InfiniBand keeps improving, keeps increasing its
share. It now has 238 units, which is about 48% of total. And Ethernet is 194 systems, and that's 40% of total. But in terms of share
of performance, it kind of flips. Ethernet has 48% of performance, and InfiniBand has 40% of
performance. And you still have OmniPath with 34 systems and Custom, which includes really
Slingshot and Ares and a few other systems that have not
been available as a merchant system. And that's 26 systems. In terms of applications and industries,
it's typically research. There's a big bucket called other. I imagine that's all AI because
it has become even bigger than it was before. You get weather and climate research, cloud and service
providers, energy and geophysics, aerospace, computational chemistry, a couple of systems
and financial services. That basically indicates who is bothering to participate in this.
Participation is really good because it provides a lot of data, a lot of historical data that
allows us to look at system architecture. Again, in the
conversation with John Shelf, we were discussing things that were done years ago that may come back
in vogue. It might come back as another possibility to advance the state of the art. And having that
sort of historical data really helps you navigate through that. So that's some commentary on the list.
Excellent. Thanks for the analysis, Shaheen. I guess we can close by saying the most watched single item from this announcement was the Aurora system. We can congratulate the folks at Argonne,
at HPE, and Intel for getting across the Exaflop line, and good luck the rest of the way to two
Exaflops. Absolutely. And as I've said on this podcast before, I think a major objective of the exascale
programs at the national level is to build the expertise, to build these systems and navigate
through what are obstacles that you may or may not see. So if you don't see obstacles at all,
maybe you're not trying hard enough. And occasionally you're going to have a system
that produces challenges and you learn so much from all of that.
So I think the learning at Aurora are fantastic.
And of course, it's been a pain all the past several years that they've had to go through.
But crossing through at one exaflop is really pretty amazing.
And onward and upwards, because I think they will do even better, fingers crossed,
by SC24 if they have the bandwidth to actually get back to the benchmark rather than run science.
All right. Very good. Well, good to be with you again. And to our audience, thanks for listening.
All right. Have a great ISC. Talk to you next time. Take care.
That's it for this episode of the At HPC podcast. Every episode is featured on InsideHPC.com and posted on OrionX.net.
Use the comment section or tweet us with any questions or to propose topics of discussion.
If you like the show, rate and review it on Apple Podcasts or wherever you listen.
The At HPC podcast is a production of OrionX in association with Inside HPC.
Thank you for listening.