@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20251110
Episode Date: November 10, 2025- Quantum-accelerated supercomputing - NVQLink, QCS, QEC - Tesla Intel collaboration? - Is the chip era ending? - SC25 offers new options for those impacted by US Government shutdown [audio mp3="http...s://orionx.net/wp-content/uploads/2025/11/HPCNB_20251110.mp3"][/audio] The post HPC News Bytes – 20251110 appeared first on OrionX.net.
Transcript
Discussion (0)
Welcome to HPC Newsbytes, a weekly show about important news in the world of supercomputing,
AI, and other advanced technologies.
Hi, everyone.
Welcome to HBC Newsbytes.
I'm Doug Black of InsideHPC, and with me is Shaheen Khan of OrionX.net.
NVIDIA recently announced the NVQ link architecture for coupling GPUs with quantum processors.
And Shaheen, I wrote a piece earlier this year on HPE's daunting work in this area.
Their strategy is to have HPE classical supercomputers ready to integrate with any of the prominent quantum modalities now under development when the time is appropriate.
This is a very, very difficult and challenging area and very important.
As for NVQ Link, NVIDIA described it as an open approach to quantum integration supporting 17 quantum processing units.
builders, five controller builders, and nine US national labs.
Yeah, just like GPUs, QPUs for quantum processing units, need tight connections to accelerate
a supercomputer. Low latency and high bandwidth are needed for the applications that parcel out
heavy tasks to QPUs, but they're also needed for quantum error correction, which spuse out
a lot of data. By some estimates, and at current rates, a large
large-scale quantum computer will have to process on the order of 100 terabytes of data per second
for quantum error correction to achieve fault tolerance, just to illustrate the point. So the need
is expected to be there one way or another. That leads to a topology where NVQLink sits
between the physical supercomputer and the physical QPU. Inside NVQLink, we would have a real-time
host with its own real-time connection to the quantum system controller, not the quantum processing
unit itself, but the control system. NVQ link plus the physical QPU would then form the logical QPU
that is attached to the supercomputer. There's a very good paper on the archive website titled
Platform Architecture for tight coupling of high-performance computing with quantum processors.
that explains the issues very well. It is authored by a large and a very strong team at A-Star in
Singapore, University of Washington, University of Pittsburgh, DOE labs including Pacific Northwest,
Sandia, Lawrence Berkeley and Oak Ridge, and DOD-sponsored Lincoln Labs. Separately, in August,
IBM and AMD announced something similar, but focused on the QEC quantum error correction part.
and in October, they showed very good results of accelerating QEC using AMD's FPGAs.
QEC lends itself nicely to the way FPGAs work.
Getting a lot of application data into and out of QPUs, however, will remain a challenge.
Quantum computing is still not for big data, so to say.
It needs a problem that is one, very high in compute intensity, meaning a lot of compute on a small amount of data,
and two, requires what a QPU can actually do,
and three, can be formulated so that it can use the QPU.
The good news is a few of those hat tricks exist and more should exist.
These efforts also show how HPC enables and then uses advanced technologies like AI and quantum.
Last month, I took a tour of one of the most technically formidable facilities I've ever come across,
maybe even more impressive than taking a tour of the Frontier Exascale facility at Oak Ridge Lab
with liquid cooling plumbing that looked like a sci-fi version of a giant's castle.
I'm talking about Fab 52 at Intel's foundry plant outside of Phoenix.
It was enormous and it was endlessly complex.
And my thought was, good luck to anyone who decides to get into the chip fab business.
You need decades of corporate experience and institutional knowledge.
knowledge to operate at the super high level that the foundry industry requires.
Now, yet, we have Elon Musk, saying last week that Tesla will, quote,
probably have to build a gigantic chip fab to make custom AI chips.
Eddie added that Tesla might need to work with Intel.
All of this is according to a Reuters article, and the Musk comments came at Tesla's annual meeting.
Well, first off, wow, that's way cool that you saw Fab 52.
itself. I don't know if they allowed picture taking, probably not. No, but it must have been
quite an experience. Yes. You write on about the complexity of chip fabs and how long it takes
to produce chips efficiently and then keeping up over time. It's an expensive nonstop marathon.
So if you can show the money, you also have to show the patience. So maybe working with existing
players is best. And as you said, it was mentioned that maybe some deal can work out with Intel.
and Intel has been creative in its deal-making with the U.S. government and then NVIDIA.
We should also point out that Tesla needs chips for inside the cars, and its vertical integration
strategy led it to design those in-house, although manufactured by TSM.
Those chips, however, those chips need to be high-end, more to reduce energy requirements than
for speed per se, but they are not for data centers.
Tesla abandoned its in-house serra-class chip named D-1.
So the record is spotty.
Speaking of that, author and investor, George Gilder,
had an interesting thought piece in the Wall Street Journal last week with a provocative title,
The Microchip Era is about to end.
Specifically, Gilder is talking about the wafer scale processor technologies exemplified by Tesla
now abandoned Dojo supercomputer with, as you mentioned, Shaheen, the D1 chip.
and by cerebrus, which uses the whole wafer as the chip and has delivered eye-popping benchmark results.
The emergence of the wafer scale engine, Gilder said, runs on the face of the Chips Act and chip export bans to China,
which have been put in place to protect traditional chip design and manufacturing in the U.S.
and to deny China the latest chip technology.
This has taken place, according to Gilder, quote, amid undeniable portents of the end of microchips.
Gilder concludes by saying the post-microchip era with data centers in a box of wafers-scale processors is coming.
America, not China, should lead the way.
There was a time when one machine, whether a mainframe or a supercomputer, would do.
But the moment workloads exceeded one system, we were in cluster territory.
So the debate has pretty much always been not about whether or not you have a cluster,
but about how big each node is and how many nodes you have.
and bigger notes have always been more efficient and typically also more expensive.
There's a separate question about what it means for a note to be, quote, bigger.
And that usually boils down to the biggest system that can run a single copy of the operating system,
regardless of how that system is actually packaged.
There's an old story in supercomputing attributed to Seymour Cray himself,
saying, quote, if you were plowing a field, which would you rather use?
two strong oxen or 1,024 chickens, end quote.
This also has led to the capacity versus capability characterization of supercomputers,
and it was in support of the parallel vector architecture that Seymour was building at
the time versus the emerging massively parallel processing MPP approach,
like the thinking machine's CM1 system, which would rely on much, much smaller CPUs.
As technology enabled it, MPPs became better and clusters of PCs,
eventually led to the clusters we see today.
Now, chips are about integrating more and more functions,
so inevitably they have got physically larger
and started to look more and more like full computers.
Apple's M1 chip and its follow-ons have pointed the way.
Server motherboards these days don't have a lot more on them
than a big chip in the middle.
The prevailing manufacturing process for wafers
naturally produces cylindrical crystals,
and the current industry standard is 30 centimeters round wafers.
In our June 24th, 2024 episode, we mentioned TSMC is presumably playing with wafers of 51
centimeters by 51.5 centimeters.
60 centimeter wafers were proposed some years ago, but did not catch on for lack of demand
and the need to ripple that size through all the equipment across the supply chain.
So a lot of this has been happening, and all vendors,
are tracking it. We have been a fan of Cerebrus's technology and the hard problems it has solved
in a number of dimensions to make their wafer-sized chips work and the path it has led. But you have to
decouple its architecture from its manufacturing and remember that it's a fabulous semiconductor company
whose chips are produced by TSMC. And from a FAP perspective, a wafer scale chip is still a chip,
subject to the same supply chain issues as other high-end chips
and only applicable when you need to shrink a very large number of systems
like you do with the backhand AI data centers.
A bigger chip does not change the geopolitical dynamics
and the U.S. continues to need industrial policies
like the Chips Act a few years ago
and the very active all-around push that we see now.
Now, with the continuation, as of Monday morning,
of the federal government shutdown,
we're seeing a growing number of flight cancellations. And I don't know about you, Shaheen,
but I'm starting to get concerned about my flight to St. Louis this coming weekend for SC 25.
I've actually toyed with the idea of taking the train from New York to Chicago and continuing
west from there. Oh, that sounds quite nice, actually. Well, I'm sure everyone with flights to
SC25 will be watching airline announcements, and as you mentioned, I know some people have considered
contingency plans, including to drive from other airports as far away as Dallas and Chicago.
For those who are impacted by the shutdown, the SC25 organizers are offering online access,
ability to cancel an existing registration, and the flexibility to decide until the last
minute more or less by extending the standard rate deadline for in-person registration
through Friday, November 14th. That's very nice. If you need to change to their digital
experience or cancel your SC-25 registration, you do need to fill a form as a way of submitting
a written request. Look for it on the SC-25 website. So safe travels, and I hope we all meet
there in person and get back home safe and sound. All right, that's it for this episode. Thank you.
all for being with us.
HPC Newsbytes is a production of OrionX in association with InsideHPC.
Shaheen Khan and Doug Black host the show.
Every episode is featured on insidehpc.com and posted on OrionX.net.
Thank you for listening.
