@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20240603
Episode Date: June 3, 2024- Computex-24, Nvidia Rubin, Nvidia Vera, AMD MI325X, AMD Turin - Ultra Accelerator Link (UAlink) - EU's 2nd Exascale System at CEA France - Is AI getting ahead of itself? [audio mp3="https://orionx....net/wp-content/uploads/2024/06/HPCNB_20240603.mp3"][/audio] The post HPC News Bytes – 20240603 appeared first on OrionX.net.
Transcript
Discussion (0)
Welcome to HPC News Bites, a weekly show about important news in the world of supercomputing,
AI, and other advanced technologies.
Hi, everyone.
Welcome to HPC News Bites.
I'm Doug Black with Shaheen Khan.
Let's start with news from the Computex conference in Taipei, where several vendors are making
important announcements.
NVIDIA's Jensen Wong gave a keynote in which he made chip interconnect and other announcements.
He announced the GA of generative AI microservices called ACE, which he said simplifies creating
lifelike digital humans. NVIDIA also announced the adoption of the NVIDIA Spectrum X Ethernet Network Platform by CoreWeave, Lambda, and Yotta, among other companies.
Wang also announced that a host of systems vendors have unveiled Blackwell architecture-powered systems with NVIDIA ARM-based CPUs, networking, and infrastructure.
The company also made announcements at robotics and AI PCs.
Computex has emerged as a significant global event where vendors feel obligated to make big announcements.
It started out as Taipei Computer Show in 1981 and became Computex in 1984, I think,
continuing the tech world's tradition of adding an X to names, it now ranks with CES in the US,
CBIT in Europe, and Jitex in the Middle East as major global tech events.
NVIDIA's annual refresh cycle seemed alive and well, with Blackwell Ultra coming next year.
After that comes a GPU codenamed Rubin and a CPU called Vera.
This rapid succession seems fine for cloud providers,
but I have to assume presents adoption challenges for the enterprise, a market they focused on at
their GTC conference in March and really need to enable. AMD, for its part, announced the MI325X
GPU accelerator with 288 gigabytes of high bandwidth memory 3E, HBM3E, and memory bandwidth
of 6 terabytes per second. And the company previewed its next-gen MI350 series, which they
said will deliver up to 35 times better AI inference performance compared to the MI300 series,
their current top line. They also previewed the fifth-gen AMD EPYC CPUs codenamed
Turin, which will be available in the second half of this year with 192 cores and 384 threads.
Remember, many apps still can't use GPUs and will be very interested in Turin and also Intel's
roadmaps. We expect Intel to make big announcements this week as well. Interconnect technologies are
the glue that binds compute resources together. They form a hierarchy where each level represents
a big market and aims to establish a sweet spot. The battleground is really the area between
Ethernet WANs and LANs and the proprietary CPU-specific buses. InfiniBand and PCIe used to occupy that space until recently.
We now have the Ultra Ethernet Alliance for faster Ethernet,
Compute Express Link, CXL,
which is a high-end interconnect protocol on top of PCIe
that consolidated several similar projects like C6, CCIX, that is,
Gen Z, and OpenCAPI,
and looked set to compete with NVLink from NVIDIA,
and then we have UCIe for chiplet-level interconnect. That all looked like the way the game would be played, but seems like NVLink has moved ahead enough that other vendors believe
they need something more than PCIe plus CXL to compete with it. What's going on here also is a
recognition that AI wants a lot of
memory and preferably shared memory, and that's more complicated and also less scalable. CXL with
PCIe and NVLink provide distributed shared memory, DSM. The industry was on a path towards a scalable
coherent interconnect, SCI, an IEEE standard that was approved 32 years ago in 1992,
but the emergence of so-called shared nothing scale-out applications moved it off the front
burner until now. And last week, AMD Broadcom, Cisco, Google, HPE, Intel, Meta, and Microsoft
said they have joined together to develop Ultra Accelerator Link, UA-Link,
a new standard for data center-grade AI and HPC accelerators. They said UA-Link will interconnect
up to 1,024 accelerators within one pod and compete against NV-Link. On the supercomputer
systems front, additional details have emerged regarding Europe's second exascale system called the Jules Verne project,
which apparently will include quantum classical HPC coupling, a 20 megawatt power envelope deployment in 2026,
and will have a five year TCO of about half a billion euros.
It will be hosted by CEA, the French Energy and National Defense Research
Organization. Eviden is on course to deliver Europe's first Exascale system later this year
to the Jülich Supercomputing Center in Germany, and last month announced their AI-optimized
offering centered around the Bolsaquana AI-1200H system with direct liquid cooling,
NVIDIA Grace Hopper superchips,
and software and services. It's hard to say exascale without touching geotechnopolitics.
According to news reports, the French government appears set to invest an estimated 700 million to 1 billion euros in Eviden. This is no surprise, of course. It underscores the
geopolitical importance of HPC that we routinely cover here, and evidence position as a top vendor globally, one of the crown jewels of the EU tech scene, along with ASML, Zeiss, and a few others, and one who hascurrent of concern about another AI winter later this year. Will AI be another dot-com boom and bust,
or will it be like the cloud, going from strength to strength? To be sure, the excitement continues
about AI in general and NVIDIA's continued explosive growth. But here to keep us all in
touch with the counter-argument, the Wall Street Journal tech columnist Christopher Mims has thrown something of a wet blanket on the party.
He wrote a column entitled, The AI Revolution is Already Losing Steam, raising doubts about
its continued hot growth due to high costs and limitations on its utility.
Mims stated that the pace of innovation in AI is slowing, its usefulness is more limited
than had been claimed, and the cost of running it
is, quote, ruinously expensive. Mims cited a calculation by a venture firm, Sequoia, that the
industry spent $50 billion on chips from NVIDIA for AI training in 2023, but brought in only $3
billion in revenue. The end of the article features a somewhat formulaic hedging statement.
Mims writes that none of this is to say that today's AI won't, in the long run, transform all sorts of jobs and industries. But then he adds, the problem is that the current level of investment
in startups and by big companies seems to be predicated on the idea that AI is going to get so much better so fast and be adopted
so quickly that its impact on our lives and the economy is hard to comprehend. Mounting evidence,
he states, suggests that that won't be the case. All right, that's it for this episode. Thanks so
much for joining us. HPC News Bites is a production of OrionX in association with Inside HPC. Shaheen Khan and Doug
Black host the show. Every episode is featured on InsideHPC.com and posted on OrionX.net.
Thank you for listening.