@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20251103

Episode Date: November 3, 2025

- New Exa-Class Supercomputers at DOE labs, HPE, AMD, NVIDIA, Oracle, - AI-RAN, Telecoms+AI, Nvidia+Nokia+partners - Intel+SambaNova ? [audio mp3="https://orionx.net/wp-content/uploads/2025/11/HPCNB_...20251103.mp3"][/audio] The post HPC News Bytes – 20251103 appeared first on OrionX.net.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to HPC Newsbytes, a weekly show about important news in the world of supercomputing, AI, and other advanced technologies. Hi, everyone. Welcome to HPC Newsbytes. I'm Doug Black of Inside HPC, and with me is Shaheen Khan of OrionX.net. There's been an embarrassment of news riches in our world of late. The past week began with the announcements of leadership class supercomputers to be built. for the Department of Energy and housed at Oak Ridge and Argonne National Labs.
Starting point is 00:00:36 HPEE and AMD will build two systems for Oak Ridge. One is a successor to the frontier XISCale system along with an AI supercomputer cluster. Press reports estimated the value of the two systems at about a billion dollars. Discovery is scheduled for Discovery being the XISCale system. It's scheduled for 2008. will be based on the new HBE Kray Supercomputing GX-5,000 platform that utilizes a unified AI and HPC architecture and is supported by a new DAOS-D-D-E-Cray storage system. Discovery will utilize AMD Epic CPUs, codename Venice, and AMD Instinct MI430XGPUs.
Starting point is 00:01:20 Meanwhile, the other system, the AMD-led Lux AI system, will leverage. Oracle Cloud Infrastructure and utilize MI355XGPUs. It's scheduled for deployment early next year. We've always needed to make XSKal systems common for science. So it's nice to see these big systems get normalized and all these big brands participating. For example, another announcement had NVIDIA and Oracle building a supercomputer for scientific discovery with a mere 100,000 Blackwell GPUs for agentic AI science at Argonne National Lab. They plan to jumpstart that system with yet another system named Equinox that will use 10,000 Blackwells and go live in 2026. The systems will be interconnected and altogether would rate at 2.2 Zetaflops of AI performance. Now 110,000 B200
Starting point is 00:02:17 GPUs, just to pick that as the metric, would add up to a peak 64-bit performance of, 110,000 times 40 taraflops per chip, or 4.4 exoflops, if my math is right. And if you achieve even 75% of that for the HPL benchmark, it would be over three exophlops. The announcement said something about collaboration across labs and their mission, covering energy and science and national security and next generation infrastructure, but did not mention IRI, integrated research infrastructure, a concept that we've discussed here in the past with DOE researchers. So we'll see where all that lands as we get more details.
Starting point is 00:02:59 All in all, it's excellent news for science, and it certainly makes the upcoming SC25 conference more exciting. Among NVIDIA's long list of announcements coming out of their GTC conference in Washington last week is that they're taking a $1 billion equity stake in Nokia. Their partnership is all about adding NVIDIA-powered AI-RAN products to Nokia's RAN portfolio, which will enable service providers to launch 5G and 6G networks on NVIDIA platforms. The companies say the partnership marks the beginning of, quote, the AI native wireless era. Shaheen, I'm hoping you can illuminate the significance of this deal.
Starting point is 00:03:39 I thought the U.S. was still trying to absorb 5G, and maybe you can explain to us more about what AI RAN is all about. On the 5G, 6G adoption, there's a lot of benefit to 5G. to the operators and in the back end of the whole systems in terms of provisioning services and the kind of services you can add. So even though the consumer may not see the speed of 5G all the time and there are conditions that need to be in place in terms of frequency and line of sight and density, like how many other devices are in the vicinity and all, it has a positive impact overall. Now, a lot has been brewing in the telecommunications world and this week's news really helps bring them all home.
Starting point is 00:04:24 There are many standards in Telco that enable the system to work for voice and data in a global and multi-vender multi-technology way. That's why we make a phone call and the other side answers regardless of where they are in the world. Just look up 3GPP,
Starting point is 00:04:40 the third generation partnership project, and you will see another half a dozen entire standards groups that are part of that project that come together to unite and align various standards and protocols. It's all a very complex supply chain that enables consumers, equipment providers,
Starting point is 00:04:59 operators, cloud providers, and governments to play their part. The handsets, the Radio Access Network RAN that you mentioned, and backend cores all need to work together. And the RAN part is the key part here. Radio access network uses radio waves to connect endpoint devices to the core network. It consists of telephone poles, base stations, antennas, and if you ask, what if every telephone pole had a bunch of GPUs in it, then it's easy to get excited about the possibilities and why
Starting point is 00:05:32 Wall Street would be excited. About a decade ago, when software defined everything and anything was emerging, the notion of a software defined ran, or SD-d-Ran, gained momentum and helped modularize and serverize some telco tasks that could now be run on traditional computers, for example, in the cloud, instead of specialized equipment. Then we had OpenRAN that modularized the necessary functions for mobile connectivity and accelerated development by building open standards. That way, many vendors could work on various parts of the puzzle. Adding AI to all this is, of course, expected.
Starting point is 00:06:11 And AI-RAN emerged as another consortium and technology to inject AI into wireless networks. If you do that, it also changes RANs from a passive wireless network into an active platform. It's now essentially a real server versus a fixed appliance and all possibilities open up. So now we get close to ignition. Radio access networks themselves can become AI enabled to run their own tasks, but in the process become eligible to run other new services and customer apps too. This is the significance of the Nvidia-Nocchio deal and the work that was announced with a coalition of other vendors.
Starting point is 00:06:54 A lot of nuance in that announcement. Now, I've talked in the past about the evolution of application architecture, from dumb terminals attached to mainframes to client server, multi-tier applications, to mobile cloud, and now moving towards edge to core. These developments are very consistent with that vision as they bring compute to the fabric in a big way. They can also help telecommunications companies, especially in the global market, to drive cloud services and sovereign AI while doing their usual upgrade of their backend operations. Intel continues to make interesting news,
Starting point is 00:07:33 what with investments recently from the U.S. government and from newly minted $5 trillion for a day or two company, NVIDIA, followed by new two nanometer chips produced by Intel's advanced FAB 52. Now there's a Bloomberg report that Intel may be in acquisition talks with Sambanova, which, were that to happen, has some irony because Sampanova chips were built to compete in the AI-compute arena dominated by NVIDIA. But given Intel's long history of problems getting a successful GPU to market, the move makes sense. Sanpanova is grouped with other AI-compute processor companies with novel architectures, companies like Cerebris and GROC, and there have been reports of late that Sanbanova's valuation has been
Starting point is 00:08:20 slipping, that they've had trouble finding venture investors, and that the company was open to acquisition inquiries. We talked at some length about data flow architectures last time, and Sanbenova and GROC incorporate those concepts into their AI chips and systems. inference seems to be the next frontier and entry point for new players. A lot of inference will happen on the endpoint devices, IOT sort of a market, but also a lot in the backend server rooms. Rackskill design is another thing that Intel could get from a deal like that.
Starting point is 00:08:55 Rackskill being a concept that Intel started years ago and then abandoned, but has become critical now with NVIDIA and AMD both spending significant dollars in it. It is clear that Intel needs to get into the server GPU market, and having missed the last window, it can only catch the next one. Imagine if it could participate with Nvidia and AMD in the AI frenzy that occupies so much of the news. It's also a reminder that even as it's had to deal with big challenges, Intel continues to be a huge and important vendor.
Starting point is 00:09:28 All right, that's it for this episode. Thank you all for being with us. HPC Newsbytes is a production of OrionX in association with InsideHPC. Shaheen Khan and Doug Black host the show. Every episode is featured on insidehpc.com and posted on OrionX.net. Thank you for listening.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.