@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20240902

Episode Date: September 2, 2024

- Nvidia Blackwell delays - Inference champions - AI regulations - Supercomputing in Russia [audio mp3="https://orionx.net/wp-content/uploads/2024/09/HPCNB_20240902.mp3"][/audio] The post HPC News By...tes – 20240902 appeared first on OrionX.net.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to HPC News Bites, a weekly show about important news in the world of supercomputing, AI, and other advanced technologies. Hi, everyone. Welcome to HPC News Bites. I'm Doug Black of Inside HPC, and with me is Shaheen Khan of OrionX.net. So much of the discussion these days around NVIDIA in the business press focuses on their share price, which since NVIDIA's outstanding quarterly results were announced on Wednesday of last week, have actually declined. But on this podcast, we are, of course, more interested in
Starting point is 00:00:38 their chip technology, which is running into ceilings and walls not of NVIDIA's choosing. There's increasing talk that NVIDIA GPUs, particularly the new Blackwell chip, have run into production delays because it is pushing technology's limits. The problems reflect Blackwell's incredible complexity, as indicated by its 208 billion transistors, which is 2.5 times more than previous NVIDIA chips. Consistent with what you have heard here and various industry reports, including an article in the Wall Street Journal last week, NVIDIA chips keep getting physically larger. They're getting faster, and then more of them need to talk to each other coherently, and they need a
Starting point is 00:01:20 lot of electricity on top of that. It's hard to design one big piece of silicon, and it gets harder when you have multiple chiplets on a substrate, and then you want tens of them to work in concert. That's a lot of challenges in thermal design, memory coherence, manufacturing, and packaging. That said, delays in Blackwell were generally shrugged until NVIDIA projected future financial performance that was on the lower end of expectations. So the party continues, but maybe not with as much exuberance. That doesn't sound unhealthy, actually. Now, one company that has built super large single chips
Starting point is 00:01:59 is Cerebras, going all the way to the full 30 centimeter wafer. Their wafer scale engine three, their latest product, provides 4 trillion transistors. This week, Cerebrus, Grok, Antheter, and AMD joined the chorus of chip vendors who see NVIDIA dominating AI learning, but see opportunities in AI inference. Of course, GPUs are quite good at inference too, but optimizing for inference is a lower barrier to entry for others. Cerebrus announced what it said is the fastest AI inference numbers, 1,800 tokens per second for LAMA 3.18b and 840 tokens per second for LAMA 3.170B, making it 20 times faster than GPU-based solutions and hyperscale clouds, according to the company. By the way, speaking of AI benchmarks,
Starting point is 00:02:55 we are working to schedule a podcast discussion with ML Commons, which has issued new MLPerf benchmarks. Our intent is for them to help us better understand the numbers they put out several times a year. For others like me who have trouble with their benchmark tables and data, we hope you'll find the conversation helpful. Meanwhile, the storm clouds or fair weather clouds, depending on your outlook, are gathering around AI regulation. And an aspect here is that even as organizations across the private and public sectors invest heavily in AI strategies, they also are very concerned about the potential limitations that state and federal agencies may impose on the technology. And as we all know,
Starting point is 00:03:36 Shaheen, business hates uncertainty. Yes, a lot going on in this area. Another Wall Street Journal article last week discussed businesses bracing for the impact of AI regulation, noting that nearly 30% of Fortune 500 companies cite AI regulations as a source of risk in their annual reports. But as we've said about new AI laws in Europe, regulations could in fact provide clarity and establish a stable environment for companies. But AI is a new and emerging field, and it's challenging for even the experts to grasp the implications of technology and what a good regulatory regime should look like for individuals, companies, and national competitiveness. And then there's the complexity of what should be
Starting point is 00:04:23 self-regulated and what should be imposed. We'll close out this week with a story about HPC in Russia where a government agency there is reportedly linking six Russian supercomputers together into what they've called the Distributed Scientific Supercomputer Infrastructure Consortium. The concept seems like a low-end version of what you see in the West, and it doesn't look good for Russia. At the very high end, the leading edge is the U.S. Department of Energy plans to build the Integrated Research Infrastructure, IRI, that further integrates its supercomputing and data resources across locations in the U.S.
Starting point is 00:05:04 This is a story that originated from Tom's hardware, and there is a big difference between the IRI and what's happening in Russia. The Russian facilities will combine a relatively meager 1.5 petaflops of compute performance with scientific data storage systems 15-plus petabytes across 900 servers. That aggregate power would come in well below the bottom of the top 500 list. The whole thing shows how far Russia is falling behind in such a critical technology. All right, that's it for this episode. Thanks so much for being with us. HPC Newsbytes is a production of OrionX in association with InsideHPC. Shaheen Khan and Doug Black host the show.
Starting point is 00:05:46 Every episode is featured on InsideHPC.com and posted on OrionX.net. Thank you for listening.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.