@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20260518

Starting point is 00:00:04 Welcome to HPC Newsbytes, a weekly show about important news in the world of supercomputing, AI, quantum computing, and other advanced technologies. Hi, everyone. Welcome to HPC Newsbytes. I'm Doug Black, and with me is Shaheen Khan. HPE, Hewipackard Enterprise, has collaborated with Broadcom on what the companies say is the first commercial deployment of an AMD helios-based AI Rack Scale system. The new architecture is built on HPE Juniper, networking hardware, which uses Broadcom's Tomahawk 6 networking chip, Juniper's software, and AMD's upcoming MI450 GPU processor series. HPE said the system integrates 72 AMD MI455X GPUs in a single

Starting point is 00:00:52 rack with 260 terabytes per second of bandwidth, which can deliver up to 2.9 AI Xaflops of FP4 processing. Yes, it's a pretty nice system, which, finally provides an alternative to Nvidia's NVL-72 system. Because it is more of an integration of existing components from multiple vendors, they also mentioned that it is open, or any way more open, than the vertically integrated system from Nvidia. Some other notable features. It's got 31 terabytes of fourth generation high bandwidth memory

Starting point is 00:01:27 and a corresponding 1.4 petabytes per second of HBM bandwidth. So it's a seriously big system that can handle trillion parameter training and large-scale inference. It also offers scale-up capabilities on top of Ethernet as compared to NV-Link for NVIDIA by using the Ultra Accelerator Link over Ethernet, UAL-O-E standard. So this goes beyond RDMA over Ethernet, MRC, which recovered last week. This one provides cache coherency among GPUs. and because it uses the Tomahawk chips, it can already use MRC2, for example, across racks. It uses the so-called double-wide rack, which is an open-compute project standard called OCP OpenRack wide.

Starting point is 00:02:15 This is a 47.25-inch-wide rack as compared to NVIDIA's NVL-72 system, which is 21 inches wide, and the standard rack, which is 19 inches wide. The double-wide rack gives you what I call Steric Freedom. borrowing from biochemistry. It provides more topological possibilities for thermal, mechanical, and layout, but with operational and logistical downsides. But it is just more space, and that is welcome and makes a bunch of things easier and enables several optimizations too. For example, it lets the switches be in the middle and ends up reducing signal travel lengths and in some ways enables scale up over Ethernet. It also enables better cooling strategies.

Starting point is 00:03:00 bigger pipes that run water at lower, easier, safer pressure, parallel flow, ability to have dedicated cooling zones for high heat areas, etc. AMD Helios racks will be a big deal in the market, as they also build on OpenAI's endorsement of AMD MI450 chips and MRC. It may be telling that news reports of China's latest quantum computer came out last week during President Trump's state visit to Beijing. If there were ever a technology with strategic geopolitical implications, it's quantum. And China says it's Zhuizang 4.0 photonic quantum computing system can outperform the world's most powerful supercomputers by a wide margin. It's yet to be seen if this claim stands up to independent testing, but the system's performance was published last Wednesday in

Starting point is 00:03:56 the journal Nature by scientists from the University of Science and Technology of China, USTC. They said that by manipulating and detecting up to 3,050 photons, the system solved a so-called quantum bosun sampling kernel in 25.6 microseconds versus 10 to the power of 42 years on the L-Capiton system at Livermore National Lab. Quantum computing benchmarks, especially those comparing with classical systems are notoriously tricky, so take this with a grain of salt. Shaheen, I'm reminded of a statement I heard several years ago from a senior U.S. National Lab official who said that in the quantum race between the U.S. and China, quote, it's a matter of who shuts down who first. Are we getting close? Well, the industry continues to be years away from that

Starting point is 00:04:47 scenario, but it sure attracts attention and investment, and also you never know. There have been a flurry of announcements from China about advanced technologies in quantum computing and supercomputing. We covered the CPU-only supercomputer a couple of weeks ago. And this reflects what must be a new strategy for China to signal strength versus hiding it, which requires that they disclose more. But there's a lot of curiosity about what exactly they are doing. So even small disclosures get a lot of attention. The Zhuzhang system has been disclosed perhaps more than others. The 1.0 version in 2020, reportedly used 76 photons. The 2.0 version raised that 213 photons a year later.

Starting point is 00:05:32 The 3.0 version in 2023 opted to 255 photons and came with similar performance claims. And now this Ju-jong 4.0 version goes up to 3,050 photons, as you mentioned. The photo of the system in the news article is a floor full of cylinders, sending and receiving red lights from lenses and sources placed at their tops. What we can perhaps infer is more of a focus on specific applications versus doing that and also building general purpose fault-tolerant quantum computers. Gaussian boson sampling, for example, can be useful in molecular design, vibrational analysis and materials, and even graph analytics,

Starting point is 00:06:15 but the work is still a directional advance and a very specific kernel. What we also seem to know is that quantum computing initiatives in China pursue several modalities, and a lot of it is at University of Science and Technology of China, USTC, as you mentioned, and Sinkhwa University. So, superconducting quantum computers are pursued at USTC, Chinese Academy of Sciences, CAS, CAS, and a company called Origin Quantum, one of the better known quantum competing companies from China. Optical, photonic quantum computers are pursued at USTC and SJTU, the Shanghai Jautong University. Neutral atoms at USTC and Sinkhua, trapped ion again at USTC at Sinkhwa, nuclear magnetic resonance through a company called Spin Q, which announced its funding recently, and Silicon Spin again at USTC, and then topological approaches at Sinkhwa University and Kass. A radiation-hardened processor is at the core of a NASA effort to build faster chips for AI processing for American spacecraft.

Starting point is 00:07:27 Chips built to provide hundreds of times more compute power than current spaceflight computers, quote, while surviving the severe conditions found in space, according to an article in SyTech Daily. In a partnership with microchip technology of Chandler, Arizona, engineers at NASA's Jet Propulsion Lab in California have been testing the processor under simulated harsh conditions of outer space. Chips now used for space missions are older processors that have proven to be reliable and durable enough for space missions, but according to the SyTech Daily story,

Starting point is 00:08:02 they quote, lack the performance needed for the next generation of missions. It seems to come down to how much you can shield processors via the enclosures of the computer and the space vehicle itself, and how much you need to do within a chip. And it depends on weights and the specific chip and its speed and fabrication node density. Remember also that Apollo 11 had three computers on board, so replication and traditional high availability engineering

Starting point is 00:08:31 are also important paths. Roughly speaking, current low-earth orbit satellites, which are partially protected by Earth's magnetic fields, use clustering and high-end error-correcting hardware. And the systems that need to go to outer space and would be subject to more serious radiation, use hardened processors that are decades old and very slow by today's standards.

Starting point is 00:08:56 So you do need newer, faster chips, but those chips pack things a lot more closely. Imagine two or three nanometer technology versus 100 or 250 nanometers of the hardened chips that they use right now. and they are more vulnerable to disturbances of any kind. Radiation for sure, but also thermal, vibration, impact, et cetera, and they're all important in space.

Starting point is 00:09:20 We've touched on the idea of data centers in space and whether it is a smart and practical idea or misguided and distracting, and we have to delve into it more deeply in future episodes. But NASA does need faster, more hardened chips. All right, that's it for this episode. Thank you all for being with us. HPC Newsbytes is a production of OrionX.

Starting point is 00:09:43 Shaheen Khan and Doug Black host the show. Every episode is posted on OrionX.net. If you like the show, please rate and review it. Thank you for listening.

@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20260518

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.