@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20250721

Episode Date: July 21, 2025

- Top-20 AI Supercomputers - 1-million-GPU systems - Rapidus of Japan's 2nm fab - IBM Power11, Sypre accelerator - HotChips conference - CUDA for RISC-V [audio mp3="https://orionx.net/wp-content/uplo...ads/2025/07/HPCNB_20250721.mp3"][/audio] The post HPC News Bytes – 20250721 appeared first on OrionX.net.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to HPC News Bytes, a weekly show about important news in the world of supercomputing, AI, and other advanced technologies. Welcome to HPC News Bytes. This is Shaheen Khan, Doug is away. When we talk about the top 500 list, we always say that it has rules and history and provides a lot of great data, but it's not meant to be an exhaustive list, and especially in recent years as big clouds and AI projects trip over each other to build bigger and bigger systems. Well, the website Visual Capitalist has a list of the world's most powerful AI supercomputers.
Starting point is 00:00:40 They list 20 systems, nine of which are confirmed and 11 are rumored but likely systems. They are ranked according to the number of equivalent H100 GPUs. The top system is a likely system at XAI with 200,000 H100s, followed by three systems with 100,000 GPUs each. A confirmed system at XAI and likely systems at Microsoft and Meta. The smallest system at number 20 has just over 16,000 GPUs. They list two systems that also show up on the top 500, the El Capitan system at Livermore National Lab and the EuroHPC system at University of Eulic in Germany. Meta has three systems and Oracle and XAI have two each. Fourteen are in the US, four in China,
Starting point is 00:01:27 all of which are unnamed. One is in Norway and one in Germany. It is interesting but definitely not exhaustive and with a bare minimum metric. It makes me thankful for the top 500 lists and it goes to show how hard it is to have a system in place with reasonable metrics and a process to gather and organize the data and to do it consistently for so many years.
Starting point is 00:01:51 Speaking of huge systems, OpenAI said they will, quote, cross well over 1 million GPUs brought online by the end of this year, end quote, and envision larger systems yet. XAI had said as much last December, also mentioning that they intend to have more than a million GPUs. All of these chips need to get manufactured somewhere,
Starting point is 00:02:12 and that brings us to Rapidus, the well-funded Japanese company formed in 2022 by a Who's Who consortium of Japanese companies and with a strategic collaboration with IBM's research division, which was first to demonstrate a 2 nanometer chip node on a 300 millimeter wafer. Rapid has said that they are on track, if not ahead of schedule, with their 2 nanometer fab having started prototype runs already and plans to ship to advanced customers before middle of next year and to be
Starting point is 00:02:42 in volume production by 2027. This is a big deal in high-end chip manufacturing. First, it expands the small set of companies that can do this, TSMC, Intel, Samsung, and now Rapidus, albeit three out of four are located in a pretty small geographic region in Northeast Asia. Back in the 1980s, Japan had some 50% of the chip market, which shrank to about 10% by 2024. So this also brings Japan back into the club. Speaking of IBM, and it's hard to avoid saying it went to 11, as it
Starting point is 00:03:17 launched its latest power systems based on the Power 11 chip, which is a 7 nanometer chip manufactured by Samsung, but clocked pretty high, operating between 2.4 and 4.2 gigahertz. The chip has 16 cores with 15 active and one hot spare, and with the largest configuration supporting up to 16 CPUs and 64 terabytes of DDR5 memory. There's also an accelerator called the IBM Spire, which will be available for the Power 11 and also the Z17 mainframes. IBM previewed the Power 11 and Spire chips at last year's Hot Chip Conference, so you can find the slides in the Hot Chips archives.
Starting point is 00:03:57 The conference is coming up again in about a month at Stanford University, where it's always been held. We close with news about GPU development environments. As we've covered here, NVIDIA's CUDA is the reigning champion and remains focused and effective. Then there is ROCM, R-O-C-M from AMD, OpenCL, SICKL, S-Y-C-L, and Zaluda, Z-L-U-D-A, which started out as an open source project to build a translation layer that allows CUDA codes to run
Starting point is 00:04:27 as is on other hardware. NVIDIA put a stop to that through their licensing, acting as protective as was expected by many. In a seemingly significant move, NVIDIA said CUDA will support RISC-V in addition to x86 and ARM. In other words, a RISC-V CPU can now drive NVIDIA GPUs and run CUDA software. RISC-V is making rapid progress up the scale ladder,
Starting point is 00:04:52 so this news will be important for edge devices now, and it paves the way for RISC-V CPUs in servers, which are starting to show up in the wild. All right, that's it for this episode. Thank you all for being with us. HPC News Bytes is a production of OrionX in association with InsideHPC. Shaheen Khan and Doug Black host the show. Every episode is featured on InsideHPC.com and posted on OrionX.net.
Starting point is 00:05:18 Thank you for listening.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.