@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20250616
Episode Date: June 16, 2025- AMD MI350X and MI355X new GPUs - AMD ROCm 7.0 software - AMD Helios rackscale system - Fujitsu Monaka chip, - SIGHPC Travel Grants for SC25 - HPCGuru signs off [audio mp3="https://orionx.net/wp-con...tent/uploads/2025/06/HPCNB_20250616.mp3"][/audio] The post HPC News Bytes – 20250616 appeared first on OrionX.net.
Transcript
Discussion (0)
Welcome to HPC News Bytes, a weekly show about important news in the world of supercomputing,
AI, and other advanced technologies.
Welcome to HPC News Bytes.
This is Shaheen Khan, Doug is away.
The GPU race continues to tighten as AMD upped the ante with three important moves. It launched new GPUs, new software, an actual server,
while touting openness for AI.
Chip benders introduce new chips several times these days,
drip-feeding more information as they do.
So if you're not tracking it closely, it's
hard to tell exactly where a product is in its lifecycle.
Is it just a concept, or is is it shipping or somewhere in the middle?
AMD formally launched its MI350X and MI355X GPUs last week, saying deliveries have begun,
which means customers can expect them later this year. It projected big architectural performance
gains compared to the previous generation MI300X, four times for AI compute, meaning AI training,
and 35 times in inferencing.
When performance gains are that high,
it usually means low-hanging watermelons, as they say.
It means they finally got to optimizing big, obvious things.
But it is good to see anyway,
and AMD said it can now beat Nvidia in quote, like for like inference benchmarks by up to 1.3x and up to 1.13x in select training workloads,
end quote. As chips become more and more of a complete system, we see multiple layers and
various chiplets, like a city block with multi-story buildings on it, where each story could be a different fabrication technology.
So the two GPUs use TSMC 6 nanometer process for a base layer,
on top of which they have a 3 nanometer die for the accelerator itself.
They both have 288 gigabytes of high bandwidth memory, HPM3e,
at 8 terabytes per second memory
bandwidth.
The MI355 is a bit faster, providing 78.6 teraflops
of FP64 performance, 5 petaflops in FP16, 10.1 petaflops in FP8,
and 20.1 petaflops in either FP6 or FP4.
This is about 2X better than Nvidia Blackwell
in 6-bit and 64-bit performance,
but otherwise pretty similar.
AMD also rattled off a who's who list of prominent AI players
who are joining its party that includes OpenAI, XAI,
Oracle, Microsoft, Meta, and several others.
The second thing they launched was a new rev of their developer software, ROCM ROCM 7.0,
which is responsible for 4x inference and 3x training performance improvement over ROCM 6.0.
ROCM has been lagging behind NVIDIA's CUDA
and has been closing the gap.
And the third item is another step in emulating NVIDIA
as they try to catch up,
and that is a preview, not an actual launch,
of their Helios AI rack infrastructure.
This will be based on a future GPU called MI400,
together with their latest CPUs and DPUs, to arrive in 2026 and to provide a 72-way scale-up system
like NVIDIA's NVL-72 DGX system.
Now, as we've covered here, the idea of a chip company building
a rack-scale system goes back about 15 years,
when Intel started a project to do just that.
But that work was never finished until NVIDIA introduced their DGX system in 2016.
AMD also talked about MI350 systems that have eight-way scale-up capability and can be networked
to produce much larger clusters.
So AMD becoming a rack-scale systems company is the qualitatively new announcement, though it was sure to arrive
after AMD's acquisition of ZT Systems for its RackScale engineering capabilities and
was expected even before that.
The market is growing fast for all accelerators, but NVIDIA continues its commanding lead and
ability to ship in large volumes.
It's been NVIDIA versus everybody else,
what I call NVIDIA versus OnVIDIA.
So it makes sense that AMD is using this announcement
as an opportunity to tout openness,
even as it emulated NVIDIA
by using all of its own technologies to build a system.
Staying on chips, another notable development
is Fujitsu's next generation arm-based chip,
a follow-on to the A64FX that is used in supercomputer Fugaku. Fujitsu has talked about
this for a couple of years now, targeting 2027 for introduction. It is called Monaka,
M-O-N-A-K-A, presumably named after the Japanese suite that is made of often elaborately designed
wafers with suite filling in between.
This one is 144 core ARM V9 CPUs with extensions and uses TSMC's 2 nanometer technology on
top of a 5 nanometer SRAM and I.O. base layer packaged with Broadcom's 3.5D system in package platform.
It includes confidential computing
and hardware root of trust.
It uses PCIe6, CXL3, and DDR5.
So no high bandwidth memory and also no GPUs
while projecting GPU-less LLM performance
and performance per watt that could be three times better than contemporary
CPUs.
But Fujitsu and AMD formed an agreement in 2024 to cooperate on AI and HPC infrastructure,
which includes the Fujitsu Monocot chip and AMD Instinct GPUs and RockM software.
But arguably the most important part of the Fujitsu design is their focus on energy efficiency
and a design center that would allow fully air-cooled systems, though it's been mentioned
that liquid cooling might be desirable and available to allow higher density systems.
The ISC conference was by all counts another excellent and well-attended show, so now it's
time to look forward to SC25 in St. Louis.
SIG HPC is offering travel grants to undergraduates,
graduate students, and early career professionals
to attend SC25.
Deadline to apply is September 5th,
and you will hear by September 26th.
Accepted applicants would receive reimbursement
of travel expenses up to $800 for travel from
North America or $1,600 for travel from other continents.
The grant also includes conference registration and the assignment to a mentor throughout
the conference if desired.
Please look for it on sighpc.org.
That's S-I-G-H-P-C dot org. We end with a hat tip to our community's mysterious,
capable, always fair, and beloved personality. Yes, I'm talking about HPC Guru. Last week,
after more than 16 years of keeping the HPC community vibrant and informed, a couple of
sabbaticals, and an occasional will he, won't he,, HPC guru or community's anchor, steward and leader announced that he was signing off.
His contributions were huge and an incredible amount of work. That he kept it all interesting by successfully keeping his identity secret
was no small feat, especially in this community where people would go to great lengths to uncover his identity,
especially in this community, where people would go to great lengths to uncover his identity, including analyze the photos he'd post to figure out what angle they were taken at and
who was there when they were taken.
I also remember fondly those I am HPC guru pinback buttons.
If you have any, let's wear that SC25.
Here's his last message, quote, all things with one exception have to end.
After years of sharing HPC news and insights,
it's time to log off.
It was fun while it lasted,
and I remain grateful to all of you
who followed and interacted.
Stay curious, stay kind.
At HPC Guru signing off.
Hashtag HPC, hashtag farewell.
Here, here, farewell from all of us.
Thank you, HPC guru, Godspeed, oh, and CRM.
All right, that's it for this episode.
Thank you all for being with us.
HPC News Bytes is a production of OrionX
in association with Inside HPC.
Shaheen Khan and Doug Black host the show.
Every episode is featured on InsideHPC.com
and posted on OrionX.net. Thank you for listening.