@HPC Podcast Archives - OrionX.net - @HPCpodcast-93: TOP500 at SC24 Conference
Episode Date: November 18, 2024SC24 is off to a great start with over 17,000 attendees, 480 exhibitors from 29 countries, and new TOP500 list that features a new champion! "The new El Capitan system at the Lawrence Livermore Natio...nal Laboratory in California, U.S.A., has debuted as the most powerful system on the list with an HPL score of 1.742 EFlop/s." Join Shahin and Doug as they analyze and discuss the new list. As usual, they go over notable additions, performance efficiency, power efficiency in the Green500 list, the difficult HPCG benchmark that usually sets the lower bound of system performance. [audio mp3="https://orionx.net/wp-content/uploads/2024/11/093@HPCpodcast_TOP500-SC24_20241118.mp3"][/audio] The post @HPCpodcast-93: TOP500 at SC24 Conference appeared first on OrionX.net.
Transcript
Discussion (0)
Are you in Atlanta at SC24? Be sure to visit Lenovo in booth 2201 on the show floor of the Georgia World Congress Center through November 21st, 2024.
You can also visit lenovo.com slash HPC to learn more about Lenovo's HPC solutions. SC24, which we hear reports is setting a new record on attendance.
It's an impressive system.
I'm so delighted that it's over one and a half exaflops.
That was really the threshold that I had in my mind.
On the value of the top 500 list, sometimes we hear, if not top 500 being poo-pooed, certainly maybe some of its limitations being pointed out.
Well, in general, AMD and NVIDIA have been ascendant because they have such highly competitive technology out there.
In the top 10, Europe has a very respectable showing.
From OrionX in association with InsideHPC, this is the AtHPC podcast.
Join Shaheen Khan and Doug Black as they discuss supercomputing technologies and the applications,
markets, and policies that shape them.
Thank you for being with us.
Hi, everyone.
I'm Doug Black at InsideHPC, and with me is my podcast partner, Shaheen Khan of OrionX.net.
We're speaking to you from Atlanta, the site of this week's big HPC AI industry conference, SC24,
which we hear reports is setting a new record on attendance, probably a 10 to 15% increase over last year. The number we're hearing is around 16,000. So the show continues to grow and there's quite a bit of excitement around the whole venue.
The focus of this conversation is the top 500, which is always a very big deal at the SC and
then ISC conferences, which happens in the spring, of course. And the headline is that, as generally expected, the El Capitan supercomputer,
which has been under development for about four years at Livermore Labs.
This is an HPE Cray system powered by AMD CPUs and GPUs.
And it's coming in at just under 1.75 exaflops. So it did not achieve two exaflops,
Shaheen, which had been expected by some or discussed by some, but still a very, very
impressive number. I was at a press conference Sunday afternoon in which the three organizations,
HPE, AMD, and NSA, which will be the user of this system, all spoke in glowing
terms about the new system. It's an impressive system. I'm so delighted that it's over one and
a half exaflops. That was really the threshold that I had in my mind. And the fact that it is
decisively above one and a half at 1.742, that's really quite nice. Obviously, it did not break
two exaflops. That would have been a lot
more of a momentous occasion. But as it is, it is quite amazing and wonderful to have three of these
now on the list. Yeah, now this system has more than 11 million combined CPU and GPU cores. It is based on AMD fourth generation EPYC CPUs with 24 cores at 1.8 gigahertz and AMD
Instinct MI300A accelerators. It uses the Cray slingshot fabric and achieves an energy efficiency
of 58.89 gigaflops per watt, which puts it at 18th on the Green 500 list. Number two system is Frontier, the former
number one system. Their benchmark improved a bit from 1.2 exaflops to 1.35, and they've
increased the total core count on that system to just over 9 million. The third system is Aurora
at number three, just over one exaflop, and that number has not changed
from its entry onto the list at ISC last May. The next system, the fourth, is the Eagle system
installed on the Microsoft Azure Cloud at number four. It's over half an exaflop. And at number five is the HPC6 system installed at any SPA supercomputing center in Italy.
The rest of the top 10 are supercomputer Fugaku in Japan, the Alps HPE Cray system,
Lumi in Finland, another HPE Cray system, and Leonardo in Italy at the Chineca Supercomputing Center.
In the top 10, there's one other new system, the Ptolemy HPE Cray system. This is an NNSA system
at Livermore Lab that actually complements El Capitan. It's interesting that seven of the top
10 are HPE systems. Also interesting, Shaheen, I think that of the top 10,
three are powered by NVIDIA chips. Although from 11 to 20, there are eight NVIDIA systems. So
if you take the top 20 overall, NVIDIA is certainly very, very strong.
Well, in general, AMD and NVIDIA have been ascendant because they have such highly competitive
technology out there. If you look at CPUs, let's start with CPUs. Intel continues to dominate with
310 entries. Number two is AMD with 162 entries, and they are the fastest rising. And that really
covers almost all the 500 of the remaining chips also kind of rising is NVIDIA with the Grace chip, nine systems,
and then Fujitsu ARM that has nine entries as well.
When you look at GPUs, interestingly, 290 system, the majority, do not have an accelerator
at all.
Of the remaining systems, 184 are NVIDIA, 19 are AMD, and that's
really a testimony to the recent additions of Instinct 250 and 300A, and five are by Intel.
One, Intel PHY, older system, and four, the GPU Max system. So really, demonstration of NVIDIA and AMD basically taking over the list gradually while
Intel finds its mojo. Now, looking at countries, the US is increasingly dominant, but we should
note, of course, that China no longer participates in the top 500. It does not submit benchmark
results. And that's the situation since 2016, I believe, Shaheen? It's been a while,
that's right. Yeah. And it's unfortunate. And hopefully they will start participating again
at some point. But right now, we really just don't know. And they used to have such a strong position.
So it does skew the results. However, that's what we are analyzing. So certainly, Shaheen,
if this list shows anything, it's the increasing, really amazing
dominance of the U.S. on the top 500 list.
The U.S. has 173 systems on the list with a system share of almost 35 percent.
And China is second.
But again, they're not updating their information.
So that data, we don't know what to make of it.
But after that, it's just a
long list of other countries. In fact, on the pie chart, other, other meaning just a whole list of
countries not big enough to mention by name, had over 20% of systems. So once you get past the US,
there's HPC everywhere, but at pretty low numbers. That said, though, in the top 10, Europe has a very
respectable showing between Lumi, Alps, the systems in Italy, very, very impressive at the high end.
And of course, Eviden is working in Germany to install the first European exascale system. Are you at SC24 right now?
Visit booth 2201 at the Georgia World Congress Center to see what's new with Lenovo.
This year, Lenovo's SC24 theme is Smarter Creates Cooler HPC.
But how?
Experience Lenovo's latest Neptune liquid cooling infrastructure designed for the next decade of technology
and their large language model systems purpose-built for the most compute-intensive AI workloads. Drop by booth 2201
and visit lenovo.com slash Neptune to learn more. Now looking at manufacturers, Lenovo has the
highest count, 162 of the 500 systems. HPE is at 115, but as we've seen, HPE is very strong at the
upper end of the list. And then Eviden comes in at 52, Dell at 37, NVIDIA at 26.
It's really quite nice to see Eviden increasing it to 52 and Dell showing up very nicely at 37.
NVIDIA with 26 systems is really another reminder
that NVIDIA is a full system manufacturer
as well as a chip provider
and of course a whole suite of software.
And Fujitsu at 15 and NEC at 14,
all a reminder of the power
that the Japanese manufacturers have in this area.
As you mentioned, the Chinese players really aren't playing,
and the systems that are on the list are getting a little bit stale. And also, they had a appear
to be a concerted push to count all the systems that they could possibly count. So we got a lot
of systems that were at a, quote, service provider, and it wasn't clear whether they were doing
science or not. Of course, fast forward to today, all
systems are doing some kind of AI. So HPC is showing up under the guise of AI everywhere.
And then also interesting is Microsoft Azure. They have eight systems on the list and one in the top
10, which is quite amazing and a signal to how cloud providers are providing really high-end
systems as well. Let's talk about the conjugate
gradient benchmark. And as a reminder, that's a benchmark that really measures the lower end of
the performance spectrum, not the upper end of the spectrum. You get a significantly higher
fraction of performance out of systems when you're running LINPACK compared to when you're running
HPCG. HPL, high performance LINPACK,
is a very well-behaved benchmark and it allows a system to shine. HPCG, on the other hand,
is a very difficult benchmark. It's almost like you have a sports car and you're running it on
a racing track compared to on a bumpy road, off-road driving, let's say. So HPCG is like that. And as a result, the fraction of performance that you get is really in the 1% to 3% area.
Whereas for LINPACK, you're getting anywhere from 50% to 80%.
And on the top 10, you're anywhere from 50% to 82%.
So if you look at that, the number one system on HPCG remains supercomputer Fugaku.
It is number six rank on the top 500, but number one for HPCG. And it is getting the princely
number of 2.98% of its peak, 3%. Frontier, which we love, is getting 0.68% of its peak. Aurora gets 0.28%. Lumi gets
0.86%. Alps, Leonardo, Perlmutter, Sierra, and Celine, they all get anywhere from 0.64% to 2.05%.
So collectively, we just are observing that the performance that you get is highly
dependent on how you formulate the problem. Now, thankfully, as we discussed last time,
nature provides lots and lots and lots of matrix multiplies. So you have a shot at actually getting
a pretty good fraction of the performance. And that's why LINPACK continues to be a very good barometer of performance in general. So the other measure of efficiency is gigaflops per watt, how much
performance you get per unit of energy that you spend. And that leads us to the green 500 list.
In the green 500 list, number one system is JEDI. That shows up as number 224 on the top 500. It's an evident Bull Siquana system with a Grace
Hopper super chip that is at Euro HBC, and it is showing up at 72.73 gigaflops per watt.
The second system is Romeo 2025. That's a number 122 on top 500. It's also an evident Bulls Aquana XH3000, also with Grace Hopper super chip.
And that's coming in as 70.91 gigaflops per watt. And then from then on, you go to 69, 68, 68.
So the top 10 is a range of 62, almost 63 gigaflops per watt, all the way to almost 73 gigaflops per watt.
And those systems are from HPE Cray, Lenovo, and of course, the Evident Bull Sequoia shows up
properly. Gene, noteworthy of the top six on the Green 500 list, three of them are Evidence Systems, two are HPE, Cray, and the sixth one is Lenovo. And looking at our
top two systems on the top 500 list, El Capitan is 18th on the green 500 list,
and Frontier is at number 22. It does seem to be more difficult to get energy efficiency
as you add scale. I'm sort of noticing that as the systems get bigger, their gigaflops
per watt drops gently, which makes it really impressive that these big systems do as well as
they do. But it kind of also explains why Frontier would be number 22 and not higher up.
Let's now finally comment on interconnects. As before, it's generally the bottom half is mostly Ethernet and the upper
half is mostly InfiniBand. To look at the numbers, Ethernet has 187 entries. InfiniBand has 253,
so more than 50% of the entries are InfiniBand. OmniPath, which was a variation of InfiniBand,
one could say, continues to show up with 31 entries. And then custom or
proprietary comprise another 29 systems. So that also indicates that interconnects remain an area
where you can add value in special ways. And these guys are doing that. So we expect to see a lot more
in the interconnect world. Again, we talked a little bit about all of this with Torsten Heffler,
our special guest in
our last episode. Please go listen to that. We talked about UltraEthernet Consortium.
And once that technology kicks in, you're going to see a lot more of a competition between Ethernet
and InfiniBand. Yes, that's the general expectation. And Sheen, why don't we close with,
I think it's an important point to be made. You have, I think, very interesting feelings on the value of the top 500 list.
Sometimes we hear, if not top 500 being poo-pooed, certainly maybe some of its limitations being pointed out.
But you see a very positive benefit out of the list.
I sure do.
And I've been a vocal advocate for the list.
And by the way, it's a lot of work to maintain
this list. If you go look at the spreadsheet, you can sort of imagine just how much work goes into
this to keep it all in there and updated, etc. Now, two things. One is that while it is very true
that not all the systems on the planet are covered, this was never meant to be an exhaustive
list of all the systems out there. It's just those who choose to participate. So that gets pointed out as a deficiency. And the other one is that,
well, you know, LIMPAC isn't the right benchmark. Now we get conjugate gradient, you see green 500,
there are other 500 level benchmarks like IO500. We talked about MLPerf and the MLCommons a couple
of episodes ago with David Cantor of that organization.
So please go listen to that too. So there are like lots of benchmarks that are out there that
together can give you a view of how a system operates. But it is also true that if you've
been tracking HPL and LINPACK over the years, you develop a feel for what it is really measuring and you can extrapolate
from that to other applications. Because if you look at this benchmark and you have a mental
image of what this means in terms of the cache sizes and the cache lines and how the functional
units are operating and what sort of data intensity and interconnect intensity you have in this
benchmark, you can translate that to other benchmarks.
You can say, oh, if it performs like this for this size of a matrix, and if it performs
like this on this other part, I can now combine the two to say, ah, I expect it to perform
like that for like an FFT or for like something else. That definitely happens with those who go deep into this.
And that also provides value.
And then finally, you've got 30 years worth of architectural and performance information.
And as technology shifts, you could go back and you can say,
oh, I wonder how the thinking machine CM5 did on this 10 years ago.
Or how, you know, the Sun E10K did it, or how the
convex C3 did it.
And you can glean a lot of architectural insights from doing that as well.
Good.
Thank you, Shaheen.
I think the Top 500 needs that kind of endorsement and a reminder of its value as well.
It's the kind of thing that, like so many things in life, what you put into it is what
you get out of it.
If you decide to dig in, really do some analysis, there's some tremendous value there.
So, well, thanks everyone for being with us and we'll be back soon.
All right. Take care, everybody.
That's it for this episode of the At HPC podcast.
Every episode is featured on InsideHPC.com and posted on orionx.net. Use the comment section or tweet us with any questions or to propose topics of
discussion. If you like the show rate and review it on Apple podcasts or wherever you listen.
The at HPC podcast is a production of Orion X in association with inside HPC. Thank you for
listening.