@HPC Podcast Archives - OrionX.net - @HPCpodcast-109: TOP500 at ISC26, China Re-Enters Race in Top Spot – In Depth

Starting point is 00:00:04 They're always interesting, but this one will be regarded as memorable. And I would say, Shaheen, the talk this week at ISC. Some people call it military civil fusion strategy. In terms of country-by-country trends, the U.S. remains significantly ahead of everyone else. Additional and consistent kudos to Microsoft Azure for participating. It's ranked 446th on the top 500 HPL list, but here it's number one. For every bite that you move, you can do 10 to 100 computations on it. From Orionx.net, this is the At-HPC podcast.

Starting point is 00:00:43 Join Shaheen Khan and Doug Black as they discuss supercomputing, AI, quantum technologies, and the applications, markets, and policies that shape them. Thank you for being with us. Hi, everyone. I'm Doug Black, and with me is Shaheen Khan. This is a special edition of the At-HPC podcast. We're looking at the new top 500. list of the world's most powerful supercomputers released at this week's International Supercomputing Conference in Hamburg, Germany. And Shaheen, this list packs significantly more than the usual

Starting point is 00:01:17 news value, as I'm sure you'll agree. And since you quite presciently predicted a big piece of the news this list offers, I think it's fitting that you share it with us. Well, that's very kind of you. Well, China, not only is it back in the top 500 fray for the first. time since 2017, but a Chinese system is the new number one on the list. There's been news about this new system reported over the last two or three months, but I heard this from nobody else. And you predicted that China would re-enter, would come back to participate in the top 500. So here we are. Now, before we speculate on why China has chosen to re-enter the top 500, let's first look at the system. It's called Lineshine, and it's housed at the National Super Bowl.

Starting point is 00:02:04 Supercomputing Center in Shenzhen, China, built by the Shenzhen Cloud Computing Center. Mineshine achieved just under 2.2 exaflops on the high-performance Lin-Pack benchmark, using nearly 14 million cores. That performance is more than 20% ahead of the number two system, which we'll get to in a moment. Lineshine is based on the custom LNGC platform with LX2 304C 1.55 gigahertz processors, that's RMV9, along with the proprietary Ling Key Interconnect and Kylan operating system. Line Shine also leads the HBC Green ranking with 22.0 petaflops per second. Now, listeners of our weekly podcast may remember that we talked about Lineshine in April at which time China said they had produced a new 2x flop system when it ships. Sheen, maybe you'd like to

Starting point is 00:03:02 take us through LineShine a little bit more. And were you surprised that they delivered the system this quickly? As you mentioned, we covered Line Shine when it was first mentioned back in April, towards the end of April. Various media articles started showing up as a result of a presentation that had been made at a conference in Shenzhen. And I'll just read from one of them. It says the entire system will house a total of 92 compute cabinets with a total of 47,000 CPUs, 36 network cabinets with large-scale expansion to hundreds of thousands of nodes, a million port interconnect, and will all be assembled as China's largest supercomputing storage base and the largest liquid-cooling solution in the world with 67 liquid-cooled storage cabinets, 428 storage

Starting point is 00:03:51 nodes and 10 terabytes per second of bandwidth. Liquid cooling capabilities of the system are on a grand tier as the entire unit will house secondary pipes measuring 3,214 meters, and will have a net weight of 243.9 tons. So this was already announced there, but it was positioned as sort of the second phase of this development. And given that they had done the first phase of it back in April, May time frame, the idea that by middle of June, they would be able to run an actual benchmark on a complete assembly was certainly not an expectation that one would take away. So that is, in fact, surprising and good for them. It's a great achievement. Yeah, I think you thought maybe by 2027 this would be coming on. Certainly the vibes were like that. We'll talk more about

Starting point is 00:04:44 Lanshan. We'll also theorize as to why China has returned to the top 500. But let's quickly go through the rest of the top 10, the remaining nine. There are two new top 10 systems, Lion Shine being, of course, one of them. But at number two is the former number one system, El Capitan at Lawrence Livermore National Lab. This is the HPECray system with 11,340,000 cores, with Limpac score of 1.8 exoflops, Frontier supercomputer, which debuted as the world's first certified XISCale system in 2022 is the number three system that we've talked about extensively in the past. Aurora is number four. This is the HPCray Intel XISCale compute blade system at the Argonne Leadership Computing Facility. Jupiter Booster is number five at the

Starting point is 00:05:40 ULIC Supercomputing Center in Germany. Now new to the top 10 coming at number six is HPC7 installed at the ENI Supercomputing Center in Italy. E&I being a major energy company. And I think, Shaheen, very likely, the largest commercial user of HPC systems. This is another HPECray system. In fact, we added it up. And it looks as though HPECray,

Starting point is 00:06:05 there's six of these systems in the top 10. Eagle at number seven, installed by Microsoft and its Azure Cloud. HPC6, this is another ENI system, is at number eight. This says almost 500 petaflop system. Fugaku, a former number one system, is now the number nine system. This is at the Rican Center for Computational Science in Japan. And at number 10 is the Alps system installed at the Swiss National Supercomputing Center in Switzerland, another HPECray system. Now, Gene, should we jump into why possibly China has come back to the

Starting point is 00:06:45 top 500? We certainly can. And we can. We can talk about the machine too. Okay, great. What's your take? Well, you know, we've been speculating, and this is speculation, about why China has come back to the top 500. A theory, or my thought, is it could be as simple as they want to be more highly regarded on the world stage for their technological prowess, which boosts their standing in trade

Starting point is 00:07:10 negotiations and other diplomatic endeavors. When they quit the top 500, 10 years ago, they knew full. well, they had to have, that they were within two or three years of standing up the world's first excess scale supercomputer. Doing this surreptitiously made sense because the achievement would certainly be a jolt to U.S. policymakers and could jeopardize China's access to American and, I have to mention ASML from Holland, the chip manufacturing machinery technology. And they were right. China's growing strength in super computing was a major factor in the Trump and Biden administration. entity list of organizations banned from buying U.S. technology and later restrictions on

Starting point is 00:07:54 GPU and ASML exports to China. And by all accounts, those restrictions have hurt China. The U.S. is seen as leading in AI and in supercomputing. China has had to build up their own supercomputing capabilities with homegrown technology, and they've been seen as years behind the U.S. But now, taking the top spot on the top 500, puts them in a certainly in a, certainly in a perceived position of strength, something they've done on their own and beyond our control. We know the PRC regime wants to be regarded as a first-rate global power economically, militarily, and technologically. They want to be able to throw their weight around as needed regionally and around the world.

Starting point is 00:08:36 They want to be deferred to. Having the top supercomputer sports that goal, certainly. That's my take, Sheen. Yeah, that's pretty well said. In a very similar way, my formulation is, big countries want global leadership and they want national security. So if bragging about things and disclosing things helps with global leadership, they will do it. But if it causes potential issues with national security, then they keep it secret.

Starting point is 00:09:03 But certainly the way China goes about doing things is substantially different from the way it is done traditionally in the West, with full government control, a push towards total independence from the West, Definitely a lot looser connection between civilian and military, I think some people call it military civil fusion strategy. But also at the same time, pretty open collaboration at the basic science stage, where you could argue peer review and collaboration with global scientists can actually accelerate and validate the technology to science before it really becomes something of, huge value. So those are all what we see, and it is the times we live in with the fragmented world supply chain and markets and different regions and different powers optimizing how they see it fit for themselves. I know you've been looking at the system reading into it what you can. We've heard lots of discussions about their existing exoscale systems, possibly limited utility,

Starting point is 00:10:15 limited usefulness. Do you have any thoughts on that score about Line Shine? Yeah, so let's step back a little bit. In supercomputing, performance at the end of the day is everything. I mean, that's our middle name in HPC. And performance, however, goes against portability and maintainability. But if you hit a wall, then performance takes over and then you take control. So performance benefits from full manual control. And, you, ability to optimize things at a very minute level with special patches and one-off optimizations and doing things just so. So we have seen in the course of Top 500 list and even before that, efforts that would absolutely focus on performance. Seymour Cray was the original master of that

Starting point is 00:11:05 skill. He very famously did not like caches. He very famously did not like virtual machines because they would introduce unpredictable, unexpected delays in things. And eliminating them would be the way to go. He even was word addressable, 64-bit addressable, because then fewer addresses could manage more data points. So there have been examples of that. So if you go back to the initial histories of top 500, you had Cray vector supercomputers,

Starting point is 00:11:36 then you had the NEC Earth Simulator, which was also a vector supercomputer, but also did not have caches. Those vector registers were manually controlled. And then you had the IBM Roadrunner, which was based on the IBM cell technology, something that they did with Sony and Toshiba for actual gaming, but they had a server version that they used in that system.

Starting point is 00:11:56 And then the Sunway system from China achieved number one system, Sunway Tai You light and then Ocean Light. Those systems also provided control without a cache, etc. And then Fugaku was another example that provided CPU-only manual control. And now with Lineshine, we have the system that also does that. So in that context, you kind of could see just how general purpose or special purpose these systems are, whereas the Taiyu light appeared to be pretty special purpose, that it was really, it came across as a Limpact machine, that it can do HVL, but not much else.

Starting point is 00:12:38 And that may be good enough, because if you are going to do a really big, highly compute-intensive calculation occasionally, that's fine. I mean, GPUs do that all the time. You know, when they run, they run really, really fast, but you may not get full utilization from them. Line Shine, on the other hand, appears to be a little bit more general purpose. And some of the papers that they have published about it have been a lot more focused on AI than HPC. and they're arm v9 cores, and they seem to come with all the accoutrements in terms of tool chain and compilers and libraries and such. So I expect this thing to be more general purpose than previous entrance in that category. And again, that makes it more significant and something that would be interesting to watch.

Starting point is 00:13:24 And I guess my final question, Sheen, your thoughts on, to what extent should this, is this a significant achievement for them moving ahead of? the top US system? Well, I would like to know more about the details of the system to make that kind of a judgment, but certainly on the face of it, this is a pretty significant achievement, both in terms of just playing with Top 500 again. They had not done that for nearly a decade. That by itself is significant. And the entry is a pretty strong entry.

Starting point is 00:13:56 Over to Xoflops, well ahead of the previous number one. all CPU systems have a lot going for them in terms of general purposeness. In fact, the HBM that they have on the system is also a pointer to the bandwidth that exists. The WCCF Tech article that I read a portion of talked about 10 terabytes per second bandwidth. So all of that bodes well for the system, but let's see more on what it does. and if it has some kind of a hidden weakness that will come out as they produce more results. All right. Well, on that possibly disheartening note from a U.S. perspective, we should note that in terms of country-by-country trends, the U.S. remains significantly ahead of everyone else.

Starting point is 00:14:45 We have 162 systems on the top 500 list, followed by Japan at 44, Germany at 41, and China is fourth at 30. Now, keeping in mind, this is the first time they've participated in top 510 years, followed by France, South Korea, Italy, Canada, UK, and then Taiwan, with 11 systems at number 10. So for the U.S., we have a system share of 32 percent. Any other observations on the country-by-country listing? Yeah, so I added up all the European countries, and if I did it right, it starts approaching United States, it's 154 systems compared to 162. Include the UK, right?

Starting point is 00:15:30 Include UK, that's right. So just calling it Europe for that purpose. Okay. And of course, they continue to collaborate quite closely on science and technology. So that sort of changes the ranking to 162 for the U.S., 154 for Europe, 44 for Japan, 30 for China, some of which are still left over from all times, by the way. and then 19 for South Korea, 17 for Canada, and on and on. All right, and then moving over to the vendor ranking. Lenovo has 129 systems on the top 500, followed by HPE at 124.

Starting point is 00:16:10 Bull, newly renamed its old name from Evaden. They're at 58, Dell at 49, Nvidia at 37, followed by NEC at 15, Fujitsu. Megware Super Micro at 8 and number 10 is Azure Microsoft. Well, first of all, additional and consistent kudos to Microsoft Azure for participating. Thank you very much. I think that keeps the list valuable. So other cloud providers, please do participate if you'd like my praises. But thank you for Microsoft for doing that.

Starting point is 00:16:45 So 8th system is quite awesome. And that represented a total of 757 petaflops. Now, Lenovo is number one with 129 system count, but HPE with 124 system, still pretty significant, is about 10 times more performance because all these super big systems are HPE. So in terms of the amount of computation that is provided on the top 500, HB by far has the lion's share, followed by Bull that is approaching two Xoflops. Well, and just to point out, Lenovo at number one has a total of 14 million cores, whereas HP is almost 55 million cores.

Starting point is 00:17:27 There you go. That's another indication. Now, maybe we can go straight into the MXP benchmark. So this is the same high performance Limpact benchmark that is done with mixed precision through a different algorithm that achieves the same performance. And because it's using lower precision arithmetic, it comes out ahead in terms of actual wall clock time performance. So for MXP, the number one system continues to be El Capitan, and that provides 16.7 exaflops compared to its 1.89 for HPL. That's 9.2 times faster, so almost 10

Starting point is 00:18:04 times faster. And that's typically what you see in terms of how much faster things are, unless there is some kind of a disparity in terms of how they were optimized or such. The number two is Aurora. That's an Intel-based HPECray system, and that comes in at 11.6 exoflops, and that compares to just over one exoflops of HPL, and that's 11.5 times faster. Number three is Frontier. That shows up at 11.4 exoflops for MXP. That compares to 1.3 exoflops for HPL, and that's 8.4 times faster. So you can see 9.2, 11.5, 8.4, that sort of conditions you to expect about 10x faster if you do MXP.

Starting point is 00:18:48 But then LineShine shows up as the new number four. And that's only 7.92 Xaflops in MXP. That compares to its 2.2 Xaflops for HPL. That's only 3.6 times faster. And that's likely because of the all-CPU nature of it and the way I'm sure how the mixed precision arithmetic has been implemented. there. But that's an interesting outlier there. It's the lowest multiple among the top six MXP benchmarks. Okay, now looking at the Green 500 list, this is a ranking by how much computational performance they deliver on the HPL benchmark per watt of electrical power consumed. This

Starting point is 00:19:30 electrical power efficiency is measured in gigaflops per watt. Ranked number one is the K-R-R-O-S. This is a Bull Sequana system at the University of Toulouse and France. It's ranked 446th on the top 500 HPL list, but here it's number one, with an energy efficiency of 73.28 gigaflops per watt. Number two is the Romeo 2025 system. This is 192nd on the top 500 list, as it did in the last edition. The system is installed at the Romeo HPC Center. in Champagne Ardennes in France, has almost 50,000 cores, an HPL benchmark of nearly 10

Starting point is 00:20:16 petaflops, and it achieved an efficiency of 70.91 gigaflops per watt. The third system on the Green 500 is the Levant GPU extension system at the DKRZ Supercomputing Center in Germany. This is a repeat at the number three spot on the Green 500. it has an identical architecture to the number one and two systems and achieved almost 6.8 petaflops per second HPL performance and an efficiency of 69.43 gigaflops per watt. So my take about doing 500 is that if you're doing better than 65 gigaflops per watt, you're up there. That would be the thing that I would shoot for if I were running a center. The number one system, like you said, is at 73 gigaflops per watt.

Starting point is 00:21:04 and the number 10 system is at 66 gigaflops per watt. Now, the top 10 include all permutations. Nvidia, AMD, Intel, Arm, Bull, HPE, Lenovo, all of the above. And as you mentioned also, they show up in all manner of places on the top 500 list itself. So that kind of tells me that it is probably quite possible to build these, and a lot has to do with how you optimize them and how you sort of regulate the system. So it really bodes well in terms of ability to achieve efficiency. Maybe the next one we go through is HPCG.

Starting point is 00:21:41 That's the conjugate gradient benchmark that kind of represents a lower bound on performance. The reason it does is whereas HPL requires something like 10 to 100 flops per byte. For every byte that you move, you can do 10 to 100 computations on it. And that range really depends on how your memory hierarchy is set up, et cetera. For HPCG, it's 400 to 4,000 times worse. You basically do four bytes per flop, kind of flipping the scale. So as a result, systems that do quite well on HPL end up being not so fast on HPCG. And we're talking about like 1 to 4%, 1 to 6% on a good day of the performance.

Starting point is 00:22:30 that they get for HPL. So the number one system on HPCG usually is the number one system on the HPL system too, simply because they're just so big, but not always. So number one is lion shine. They got 2.2 exophlops, 2.198 exoflops for HPL, and they only get 22 petaflops on HPCG. So that's 1% of what they got with the easier benchmark. El Capitan is number two. It gets 1.8 exophlops for HPL and 17.4 petaflops for HPCG, also just below 1%. Then you get to the system that I admire and like a lot. That's the supercomputer Fugaku from Japan that carries on at number three for HPCG after all these years.

Starting point is 00:23:24 and it is doing 442 petaflops for HPL, but it does 16 petaflops for HPCG. So that gets 3.62% of what it does for HPL, and that's the crawning achievement. So the next one in the top 10 that shows up quite nicely is an Nvidia DGXB200 system that gets 135 petaflops for HPL, and it does 3.76 petaflops for HPS. And that's 2.78%. A lot of numbers here, but suffice to say that, again, you're going to expect from less than 1% to about 4% of what you get on HPL for HPCG. Now, if your workload is all HPCG, then you're going to want to optimize for that kind of a thing. But if your workload is all AI-oriented, then you're going to optimize for something else. And there lies the complexity of running a system.

Starting point is 00:24:22 Let's run through other stats in terms of GPUs. Obviously more and more systems in the top 500 have GPU systems. There are 32 of them that run AMD, four that run Intel, but 237 that run NVIDIA, various forms of NVIDIA. And all told, 274 systems out of the 500, that's 55%, have accelerators of which the vast majority are in video with AMD rising quads. as they continue to do quite well. It's a similar situation with CPUs, with Intel having had the Lions share,

Starting point is 00:25:00 and even to this day, 265 out of the 500 have various forms of Intel chips, and 192 are AMD chips. And again, AMD is the one that is rising. Although as Intel gets its mojo back, it's going to be interesting to see how that battle continues. Eight of them are Fujitsu Arm, and of course there's one that is, line shine LX2 chips. In terms of interconnects, very interestingly, Infini Band is not only leading, it has actually increased its lead, if I'm not mistaken. So there are 293 systems that have Infiniband, and that in some ways is also an indication of Invidio's leadership and how they can

Starting point is 00:25:42 project their architecture on the top 500. Gigabit Ethernet is number two with 164 systems, and Omnipath carries on being on the list with 26 systems, which is quite nice. The rest of them are proprietary custom or sort of like older generations of Ethernet. Okay, a really interesting list, relatively speaking. They're always interesting, but this one will be regarded as memorable. And I would say, Shaheen, the talk this week at ISC. I would expect so. Yet another reason to love the Top 500 list.

Starting point is 00:26:16 As you know, I continue to consider the Treasure Troyes. of architectural and historical information, and it can also have predictive qualities in terms of how systems and architectures will perform. And I applaud the team that maintains it and has done it for so many, so many years. Okay. Well, thanks everyone for joining with us, and we'll be back soon.

Starting point is 00:26:39 All right. Take care, everybody. That's it for this episode of the At-HPC podcast. If you like the show, please rate and review it on Apple Podcasts or wherever you listen. Every episode is posted on Orionx.net. Contact us with any questions or proposed topics of discussion. The at-HBC podcast is a production of OrionX. Thank you for listening.

@HPC Podcast Archives - OrionX.net - @HPCpodcast-109: TOP500 at ISC26, China Re-Enters Race in Top Spot – In Depth

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.