@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20250721
Episode Date: July 21, 2025- Top-20 AI Supercomputers - 1-million-GPU systems - Rapidus of Japan's 2nm fab - IBM Power11, Sypre accelerator - HotChips conference - CUDA for RISC-V [audio mp3="https://orionx.net/wp-content/uplo...ads/2025/07/HPCNB_20250721.mp3"][/audio] The post HPC News Bytes – 20250721 appeared first on OrionX.net.
Transcript
Discussion (0)
Welcome to HPC News Bytes, a weekly show about important news in the world of supercomputing,
AI, and other advanced technologies.
Welcome to HPC News Bytes.
This is Shaheen Khan, Doug is away.
When we talk about the top 500 list, we always say that it has rules and history and provides
a lot of great data, but it's not meant to be an exhaustive list, and especially in recent years as big clouds and AI projects trip over
each other to build bigger and bigger systems.
Well, the website Visual Capitalist has a list of the world's most powerful AI supercomputers.
They list 20 systems, nine of which are confirmed and 11 are rumored but likely systems.
They are ranked according to the number of equivalent H100 GPUs. The top system is a
likely system at XAI with 200,000 H100s, followed by three systems with 100,000 GPUs each.
A confirmed system at XAI and likely systems at Microsoft and Meta.
The smallest system at number 20 has just over 16,000 GPUs.
They list two systems that also show up on the top 500, the El Capitan system at Livermore
National Lab and the EuroHPC system at University of Eulic in Germany.
Meta has three systems and Oracle and XAI have two each. Fourteen are in the US, four in China,
all of which are unnamed.
One is in Norway and one in Germany.
It is interesting but definitely not
exhaustive and with a bare minimum metric.
It makes me thankful for the top 500 lists and it goes to show how hard it is to have
a system in place with reasonable metrics and a process
to gather and organize the data and to do it consistently
for so many years.
Speaking of huge systems, OpenAI said they will, quote,
cross well over 1 million GPUs brought online
by the end of this year, end quote,
and envision larger systems yet.
XAI had said as much last December,
also mentioning that they intend
to have more than a million GPUs.
All of these chips need to get manufactured somewhere,
and that brings us to Rapidus,
the well-funded Japanese company formed in 2022
by a Who's Who consortium of Japanese companies
and with a strategic collaboration
with IBM's research division,
which was first to demonstrate a 2 nanometer chip node on a 300 millimeter wafer. Rapid has said
that they are on track, if not ahead of schedule, with their 2 nanometer fab having started prototype
runs already and plans to ship to advanced customers before middle of next year and to be
in volume production by 2027.
This is a big deal in high-end chip manufacturing.
First, it expands the small set of companies that can do this,
TSMC, Intel, Samsung, and now Rapidus,
albeit three out of four are located in a pretty small geographic region in Northeast Asia.
Back in the 1980s, Japan had some 50% of the chip market,
which shrank to about 10% by 2024. So this also brings Japan back into the
club. Speaking of IBM, and it's hard to avoid saying it went to 11, as it
launched its latest power systems based on the Power 11 chip, which is a 7
nanometer chip manufactured by Samsung, but clocked pretty high, operating between 2.4 and 4.2 gigahertz. The chip has 16 cores with 15 active
and one hot spare, and with the largest configuration supporting up to 16 CPUs and 64 terabytes of
DDR5 memory. There's also an accelerator called the IBM Spire, which will be available for the Power 11
and also the Z17 mainframes.
IBM previewed the Power 11 and Spire chips
at last year's Hot Chip Conference,
so you can find the slides in the Hot Chips archives.
The conference is coming up again in about a month
at Stanford University, where it's always been held.
We close with news about GPU development environments.
As we've covered here, NVIDIA's CUDA is the reigning champion
and remains focused and effective.
Then there is ROCM, R-O-C-M from AMD, OpenCL, SICKL, S-Y-C-L,
and Zaluda, Z-L-U-D-A, which started out
as an open source project to build a translation layer that allows CUDA codes to run
as is on other hardware.
NVIDIA put a stop to that through their licensing,
acting as protective as was expected by many.
In a seemingly significant move, NVIDIA
said CUDA will support RISC-V in addition to x86 and ARM.
In other words, a RISC-V CPU can now drive NVIDIA GPUs
and run CUDA software.
RISC-V is making rapid progress up the scale ladder,
so this news will be important for edge devices now,
and it paves the way for RISC-V CPUs in servers,
which are starting to show up in the wild.
All right, that's it for this episode.
Thank you all for being with us.
HPC News Bytes is a production of OrionX in association with InsideHPC.
Shaheen Khan and Doug Black host the show.
Every episode is featured on InsideHPC.com and posted on OrionX.net.
Thank you for listening.