@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20251020
Episode Date: October 20, 2025- How about an AI system that needs 2x the energy NYC uses? - A GPU for every person - HPC-Quantum hybrid systems - Exascale Day 10/18, ExaFlops or ExaWatts? - Seymour Cray 100th birthday - Cray-1 50...th anniversary - US Mint's new dollar coin featuring the Cray-1 - Cray-1 masterclass in... branding! [audio mp3="https://orionx.net/wp-content/uploads/2025/10/HPCNB_20251020.mp3"][/audio] The post HPC News Bytes – 20251020 appeared first on OrionX.net.
Transcript
Discussion (0)
Welcome to HPC Newsbytes, a weekly show about important news in the world of supercomputing,
AI, and other advanced technologies.
Hi, everyone. Welcome to HPC Newsbytes. I'm Doug Black of Inside HPC, and with me, of course,
is Shaheen Khan of OrionX.net. A few weeks ago, we mentioned data centers, the size of Central Park
in New York City. Well, how about a
an AI data center with power needs 2X, that of New York City.
A story in the Wall Street Journal on Friday looks at OpenAI and, quote, city-sized AI supercomputers.
It's a look at the strategy of OpenAI CEO, Sam Altman, and if things unfold the way he expects,
it will be truly astounding.
Now, Shane, we've heard vendors use the phrase AI everywhere, but OpenAI is running wild with that idea,
About their new deal with Broadcom, Altman has said that OpenAI data centers will need one AI
inference chip per user, and that means they'll need billions of chips, and it means OpenAI is a
chip market unto itself. The company will use Nvidia GPs for model training, but Broadcom will
tailor inference chips for OpenAI's high bandwidth memory intensive workloads, saving OpenAI money
with more efficient chips. This brings us to the scale of AI.
data centers, Altman, and Visions. The deal with Broadcom encompasses up to 10 gigawatts of
AI systems by the end of the decade. Combined with the 16 gigawatts of deals OpenAI has announced
since mid-September with AMD and NVIDIA, we're talking about close to a trillion dollars in investment
and two New York City's worth of electrical demand. Around the time of their main GTC conference
in March, NVIDIA said it had shipped 3.6 million Blackwell GPUs.
just to the top AI cloud vendors, not to mention other chips that they also sell, or other
customers, of which there are many. That was presumably three times the demand that they had
experienced with the previous generation Hopper GPU. Estimates are that it would ship over
5 million Blackwell GPUs this year, and probably another 2 million hoppers. All of these
GPUs are obviously going somewhere and are getting power and cooling. Invidia has the
lion's share of the global production of GPUs, so you can do some MBA work and guess what
the total number is and how much electricity they need to run. And that guess comes out to over
10 gigawatts at the data center level, and that puts it very much in the New York City ballpark,
which is estimated to need 5 to 10 gigawatts with a peak of over 20 gigawatts in the summer of this
year. One of the incredible aspects of the AI frenzy is how the numbers hold up and don't seem to
be letting up. So if OpenAI got a good fraction of the GPUs out there, it would be a contender
with a nice-sized city. But the numbers they are talking about will likely take a few years to
materialize for them, and that assumes production continues to increase and demand continues to
multiply. Small hiccups in production or demand will probably be okay, although they might spook
investors. But if a big correction occurs, and that continues to be a big if, that could lead to
big shortages again, or a glut of GPU capacity and a long recovery period. Or maybe we could
see a big breakthrough for high-speed, low-energy chips. Well, let's hope so. Now, integration of
quantum computers with HPC systems is something we have seen since the early days of quantum
computing, and the idea is gaining momentum with quite a bit of work underway in this area.
There was news this week of another hybrid HBC AI quantum system, this time driven by Oxford quantum circuits, connected to an AI system that uses Nvidia Grace Hopper 200 chips.
It provides 16 superconducting cubits with OQC's latest error-correcting technology that increases coherence times for physical cubits to more than 100 microseconds and for logical cubits to more than 1 milliseconds.
It's installed in New York City, targeting financial services applications.
The industry is making progress towards usable logical qubits, but to achieve scale,
you ultimately need better physical qubits with very low error rates to begin with,
and then error correction on top of that.
As you know, I see AI as an HPC application and quantum computing as an HPC subroutine call.
Right now, hybrid systems tend to have small quantum computers and are,
are used mostly for training, proof of concept, or specific kernels.
Really, subroutine calls.
The poster child continues to be the Eulik Center in Germany
that has a D-Wave quantum annealer with over 5,000 qubits.
Now, those are a different class of qubits.
They also have a Pascal neutral atom system with 100 cubits.
Pascal is a French company, and it's spelled with a Q, of course.
and an entry-level IQM superconducting system with five qubits,
and that's a Finnish European company.
We should note that Saturday, October 18th,
a date that can be expressed as 10.18 would suggest 10 to the 18th power,
or a billion billion, was exoscale date.
Shaheen, the frontier exascale system was installed and certified two and a half years ago,
but achieving exoscale performance in 64 bits remains a challenging milestone.
There are now so many large AI-focused systems, but they focus on lower precision arithmetic,
down to four bits and even lower, which means they can deliver multiple ex-flops,
but at that lower precision.
Of the known systems that participate in the top 500 benchmark, we note that the three U.S.-based DOE
systems, along with the new Jupiter system in Germany, continue to be formidable, state-of-the-art supercomputers.
Well, given the city-sized ambitions of the AI world, I suggest we change the metric from 64-bit
floating-point operations to watts. That way, X-a-scale would refer to X-a-wats, and we are just getting
into gigawatts so we've got ways to go, and it's also the kind of milestone that you try hard
not to reach if you can get it done with lower energy, and it would also bring the green
500 list to the front and center. We should also note that this year would have been
Seymour Cray's 100th birthday, and it's also the 50th anniversary of the Cray 1 supercomputer.
The Cray 1 ran at 80 megahertz and had a peak performance of 160 megaflops.
That's megaflops.
And it changed the world.
Cray 1 was a master class in technology, but also in branding.
The founder was Cray, the company was C, the system was C, and people noted that the shape
of the system spelled the letter C.
Now, they did that to minimize the length of wires, but the alignment in branding is exceptional,
and to this day, the Kray brand is synonymous with high performance.
Seymour was born in Chippewa Falls, Wisconsin, and built his company there,
though it expanded later into Minnesota and got way closer to a big airport at Mendota Heights
and Egan, with one of the finest corporate headquarters anywhere.
Well, the U.S. Mint is issuing a dollar coin featuring the C-shaped craywe.
saluting Wisconsin as the home of Cray Research and of Seymour Cray.
That is really cool.
All right, that's it for this episode.
Thank you all for being with us.
HPC Newsbytes is a production of OrionX in association with InsideHPC.
Shaheen Khan and Doug Black host the show.
Every episode is featured on InsidehPC.com and posted on Orionx.net.
Thank you for listening.
Thank you.