@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20251020

Starting point is 00:00:00 Welcome to HPC Newsbytes, a weekly show about important news in the world of supercomputing, AI, and other advanced technologies. Hi, everyone. Welcome to HPC Newsbytes. I'm Doug Black of Inside HPC, and with me, of course, is Shaheen Khan of OrionX.net. A few weeks ago, we mentioned data centers, the size of Central Park in New York City. Well, how about a an AI data center with power needs 2X, that of New York City. A story in the Wall Street Journal on Friday looks at OpenAI and, quote, city-sized AI supercomputers. It's a look at the strategy of OpenAI CEO, Sam Altman, and if things unfold the way he expects,

Starting point is 00:00:49 it will be truly astounding. Now, Shane, we've heard vendors use the phrase AI everywhere, but OpenAI is running wild with that idea, About their new deal with Broadcom, Altman has said that OpenAI data centers will need one AI inference chip per user, and that means they'll need billions of chips, and it means OpenAI is a chip market unto itself. The company will use Nvidia GPs for model training, but Broadcom will tailor inference chips for OpenAI's high bandwidth memory intensive workloads, saving OpenAI money with more efficient chips. This brings us to the scale of AI. data centers, Altman, and Visions. The deal with Broadcom encompasses up to 10 gigawatts of

Starting point is 00:01:34 AI systems by the end of the decade. Combined with the 16 gigawatts of deals OpenAI has announced since mid-September with AMD and NVIDIA, we're talking about close to a trillion dollars in investment and two New York City's worth of electrical demand. Around the time of their main GTC conference in March, NVIDIA said it had shipped 3.6 million Blackwell GPUs. just to the top AI cloud vendors, not to mention other chips that they also sell, or other customers, of which there are many. That was presumably three times the demand that they had experienced with the previous generation Hopper GPU. Estimates are that it would ship over 5 million Blackwell GPUs this year, and probably another 2 million hoppers. All of these

Starting point is 00:02:20 GPUs are obviously going somewhere and are getting power and cooling. Invidia has the lion's share of the global production of GPUs, so you can do some MBA work and guess what the total number is and how much electricity they need to run. And that guess comes out to over 10 gigawatts at the data center level, and that puts it very much in the New York City ballpark, which is estimated to need 5 to 10 gigawatts with a peak of over 20 gigawatts in the summer of this year. One of the incredible aspects of the AI frenzy is how the numbers hold up and don't seem to be letting up. So if OpenAI got a good fraction of the GPUs out there, it would be a contender with a nice-sized city. But the numbers they are talking about will likely take a few years to

Starting point is 00:03:07 materialize for them, and that assumes production continues to increase and demand continues to multiply. Small hiccups in production or demand will probably be okay, although they might spook investors. But if a big correction occurs, and that continues to be a big if, that could lead to big shortages again, or a glut of GPU capacity and a long recovery period. Or maybe we could see a big breakthrough for high-speed, low-energy chips. Well, let's hope so. Now, integration of quantum computers with HPC systems is something we have seen since the early days of quantum computing, and the idea is gaining momentum with quite a bit of work underway in this area. There was news this week of another hybrid HBC AI quantum system, this time driven by Oxford quantum circuits, connected to an AI system that uses Nvidia Grace Hopper 200 chips.

Starting point is 00:04:03 It provides 16 superconducting cubits with OQC's latest error-correcting technology that increases coherence times for physical cubits to more than 100 microseconds and for logical cubits to more than 1 milliseconds. It's installed in New York City, targeting financial services applications. The industry is making progress towards usable logical qubits, but to achieve scale, you ultimately need better physical qubits with very low error rates to begin with, and then error correction on top of that. As you know, I see AI as an HPC application and quantum computing as an HPC subroutine call. Right now, hybrid systems tend to have small quantum computers and are, are used mostly for training, proof of concept, or specific kernels.

Starting point is 00:04:54 Really, subroutine calls. The poster child continues to be the Eulik Center in Germany that has a D-Wave quantum annealer with over 5,000 qubits. Now, those are a different class of qubits. They also have a Pascal neutral atom system with 100 cubits. Pascal is a French company, and it's spelled with a Q, of course. and an entry-level IQM superconducting system with five qubits, and that's a Finnish European company.

Starting point is 00:05:23 We should note that Saturday, October 18th, a date that can be expressed as 10.18 would suggest 10 to the 18th power, or a billion billion, was exoscale date. Shaheen, the frontier exascale system was installed and certified two and a half years ago, but achieving exoscale performance in 64 bits remains a challenging milestone. There are now so many large AI-focused systems, but they focus on lower precision arithmetic, down to four bits and even lower, which means they can deliver multiple ex-flops, but at that lower precision.

Starting point is 00:05:57 Of the known systems that participate in the top 500 benchmark, we note that the three U.S.-based DOE systems, along with the new Jupiter system in Germany, continue to be formidable, state-of-the-art supercomputers. Well, given the city-sized ambitions of the AI world, I suggest we change the metric from 64-bit floating-point operations to watts. That way, X-a-scale would refer to X-a-wats, and we are just getting into gigawatts so we've got ways to go, and it's also the kind of milestone that you try hard not to reach if you can get it done with lower energy, and it would also bring the green 500 list to the front and center. We should also note that this year would have been Seymour Cray's 100th birthday, and it's also the 50th anniversary of the Cray 1 supercomputer.

Starting point is 00:06:49 The Cray 1 ran at 80 megahertz and had a peak performance of 160 megaflops. That's megaflops. And it changed the world. Cray 1 was a master class in technology, but also in branding. The founder was Cray, the company was C, the system was C, and people noted that the shape of the system spelled the letter C. Now, they did that to minimize the length of wires, but the alignment in branding is exceptional, and to this day, the Kray brand is synonymous with high performance.

Starting point is 00:07:20 Seymour was born in Chippewa Falls, Wisconsin, and built his company there, though it expanded later into Minnesota and got way closer to a big airport at Mendota Heights and Egan, with one of the finest corporate headquarters anywhere. Well, the U.S. Mint is issuing a dollar coin featuring the C-shaped craywe. saluting Wisconsin as the home of Cray Research and of Seymour Cray. That is really cool. All right, that's it for this episode. Thank you all for being with us.

Starting point is 00:07:51 HPC Newsbytes is a production of OrionX in association with InsideHPC. Shaheen Khan and Doug Black host the show. Every episode is featured on InsidehPC.com and posted on Orionx.net. Thank you for listening. Thank you.

@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20251020

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.