@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20250915

Starting point is 00:00:00 Welcome to HPC Newsbytes, a weekly show about important news in the world of supercomputing, AI, and other advanced technologies. Hi, everyone, welcome to HBC Newsbytes. I'm Doug Black of InsideHPC, and with me is Shaheen Khan of OrionX.net. Invidia last week announced the Ruben CPX AI Accelerator, which the company calls a new class of GPU built for, quote, massive context processing, massive context being hyphenated. They say the chip enables efficient processing of massive numbers of tokens

Starting point is 00:00:38 and speed up emerging uses like software coding and video generation. Ruben CPX works with Nvidia Vera CPUs and Ruben GPUs inside a new rack architecture. The Vera Rubin NVL-144 CPX as a system will provide eight ex-flops of AI compute, which would be 7.5 times more AI performance than the current, Grace Blackwell 300 NVL-72, as well as 100 terabytes of memory, and 1.7 petabytes per second of memory bandwidth. Shaheen at all sounds pretty impressive, and the new chip is being favorably received by at least one publication. CPX is quite a bombshell of news. It goes to the heart of the emerging AI inference market and is a giant competitive move. The big strategic competition in the data center is between on one side,

Starting point is 00:01:32 NVIDIA and AMD, and on the other side, AWS, Azure, and Google Cloud, which are building their own chips. Caught in the middle are the very large number of merchant AI chip vendors that must continue to navigate choppy waters and build on their momentum. The prominent ones that have built strength include GROC, Sambanova, and Cerebris. And they might use this news as validation of their strength, The rest might feel most of the heat with this move. There's also Apple, Qualcomm, and Tens Torrent,

Starting point is 00:02:04 which are more associated with devices and embedded users. NVIDIA's existing arm-based CPU is called Grace and was released at the same time as the Hopper GPU, completing the Grace Hopper family. Their next generation CPU will be part of the Vera Rubin family, as you mentioned, and will be coming in 2026, with 88 cores, 176 threads, threads, doubling chip-to-chip bandwidth to 1.8 terabytes per second and produced in a 3-nometer fab.

Starting point is 00:02:34 Ruben will be the GPU, and this new Ruben CPX will be a brand new personality for Rubin, which forks the architecture into a more general purpose learning optimized Rubin, and then inference optimized Rubin CPX. So at a high level, here's what you need to know. One, the post-Mores law period requires new architectures to get performance, which leads to specialized, customized silicon, but only for workloads that represent a big enough market. Two, AI is such a big market. It is in fact so big that its segments are big enough themselves to warrant specialized chips. Three, Nvidia validated that AI inference is such a segment and can have its own accelerators.

Starting point is 00:03:20 There will be other segments, especially with agentic AI. taking shape. The big stages of LLM's large language models for learning are typically described as one, pre-training, which is learning the foundation, two, fine-tuning, adapting for specific tasks, and three, alignment training, which is reinforcement learning with human in the loops and etc., which is kind of aligning with human values. And the main stages of LLM inference usually are described as pre-fill phase, which is processing the prompt and setting the context, and the decode phase, generating the response and refining it. The CPX announcement focuses a lot on pre-fill and decode and context, all of which

Starting point is 00:04:05 are part of inference. So, CPX is another so-called XPU, like CPU, GPU, GPU, DPU, NPU, QPU, et cetera. And the reason inference warrants warrants its own chip is because it runs the black box that was created by learning. So it is more compute-intensive and less memory-intensive compared to learning. And that's exactly what the CPX does. Smaller, less expensive memory, less focus on high-end bandwidth, and more focus on compute. That's the trade-off, and it leads to a higher performance and lower cost. Inference chip vendors have been saying as much, but are now validated by the 4.5 trillion-pound gorilla that is in video. Now, there was no hint of this at GTCC. back in March. The prevailing view then was the large variety of apps need to run on AI

Starting point is 00:04:56 platforms and Blackville was ideal for that sort of environment to run all AI-infused apps including inference. This announcement says there is a place for accelerators that are optimized for inference, which has been a major reason why so many AI chip startups have been able to raise capital. There was news last week of several massive deals for supercomputers, data centers, and cloud services, all of this AI-related, of course, on the scale of hundreds of billions of dollars. Shaheen, I know you're kind of, quote, Missouri, unquote, about this, as in show me, and you believe that in some cases, the deals don't amount to the money initially bandied about when announced. Nevertheless, Oracle and OpenAI inked a five-year supercomputing deal to put in place $300 billion worth of compute infrastructure,

Starting point is 00:05:47 starting two years from now, according to a story in the Wall Street Journal. It's seen as among the largest cloud deals ever, and it's well above OpenAI's annual revenue of about $10 billion. The news delivered a more than 40% boost to Oracle share prices and a big increase in the wealth on paper of Larry Ellison, whose net worth is now around $100 billion. Well, the need is quite evident, and it's a sign of our times to say it like this,

Starting point is 00:06:17 but fulfilling the need goes beyond, quote, just money, even in hundreds of billions. We've talked about the required list. You have to have the money, sure, but you also need electricity, real estate, cooling, and allocation of chips and networking and racks, and all that supply chain. In fact, the journal article also said the infrastructure will need 4.5 gigawatts of energy. That is more than two entire Hoover dams. But yes, right now that's what's needed to keep the pace with AI. and Open AI has the leadership position to bring together the coalition and work on the gaps.

Starting point is 00:06:53 Open AI is involved in another major deal reported last week in which they and Nvidia have pledged billions of investment for UK data centers. The two companies plan to partner with data center provider N-scale, according to a Bloomberg story. N-Skale reportedly plans for a 50-Magawatt UK data center that will come online next year. In total, the company is planning to spend three. billion in the UK. Speaking of the UK, its Ministry of Defense, MOD, has awarded Google Cloud a 400 million pound contract, about $540 million in dollars, to help MOD build a sovereign cloud and secure information processing capability.

Starting point is 00:07:35 Google will implement its air-gapped distributed cloud, which is on-prem and secure and not sitting in the middle of the internet. The German Army and Singapore's science and tech agency are also mentioned. as customers. It is best to see these moves in the context of digital sovereignty, which is the rapidly emerging and very complex requirement for all countries that have the wherewithal to pay attention to it. All right, that's it for this episode. Thank you all for being with us. HPC Newsbytes is a production of OrionX in association with InsideHPC. Shaheen Khan and Doug Black host the show. Every episode is featured on InsidehPC.com and posted on

Starting point is 00:08:16 Orionx.net. Thank you for listening.

@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20250915

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.