@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20240902
Episode Date: September 2, 2024- Nvidia Blackwell delays - Inference champions - AI regulations - Supercomputing in Russia [audio mp3="https://orionx.net/wp-content/uploads/2024/09/HPCNB_20240902.mp3"][/audio] The post HPC News By...tes – 20240902 appeared first on OrionX.net.
Transcript
Discussion (0)
Welcome to HPC News Bites, a weekly show about important news in the world of supercomputing,
AI, and other advanced technologies.
Hi, everyone.
Welcome to HPC News Bites.
I'm Doug Black of Inside HPC, and with me is Shaheen Khan of OrionX.net.
So much of the discussion these days around NVIDIA in the business press focuses on
their share price, which since NVIDIA's outstanding quarterly results were announced on Wednesday of
last week, have actually declined. But on this podcast, we are, of course, more interested in
their chip technology, which is running into ceilings and walls not of NVIDIA's choosing.
There's increasing talk that NVIDIA GPUs,
particularly the new Blackwell chip, have run into production delays because it is pushing
technology's limits. The problems reflect Blackwell's incredible complexity, as indicated
by its 208 billion transistors, which is 2.5 times more than previous NVIDIA chips.
Consistent with what you have heard here and various industry reports, including an article
in the Wall Street Journal last week, NVIDIA chips keep getting physically larger. They're
getting faster, and then more of them need to talk to each other coherently, and they need a
lot of electricity on top of that. It's hard to design one big piece of silicon,
and it gets harder when you have multiple chiplets on a substrate,
and then you want tens of them to work in concert.
That's a lot of challenges in thermal design, memory coherence, manufacturing, and packaging.
That said, delays in Blackwell were generally shrugged
until NVIDIA projected future financial performance that was
on the lower end of expectations. So the party continues, but maybe not with as much exuberance.
That doesn't sound unhealthy, actually. Now, one company that has built super large single chips
is Cerebras, going all the way to the full 30 centimeter wafer. Their wafer scale engine three, their latest product, provides 4 trillion transistors.
This week, Cerebrus, Grok, Antheter, and AMD joined the chorus of chip vendors
who see NVIDIA dominating AI learning, but see opportunities in AI inference.
Of course, GPUs are quite good at inference too,
but optimizing for inference is a lower barrier to entry for others.
Cerebrus announced what it said is the fastest AI inference numbers, 1,800 tokens per second for
LAMA 3.18b and 840 tokens per second for LAMA 3.170B, making it 20 times faster than GPU-based
solutions and hyperscale clouds, according to the company. By the way, speaking of AI benchmarks,
we are working to schedule a podcast discussion with ML Commons, which has issued new MLPerf
benchmarks. Our intent is for them to help us better understand the numbers
they put out several times a year. For others like me who have trouble with their benchmark tables
and data, we hope you'll find the conversation helpful. Meanwhile, the storm clouds or fair
weather clouds, depending on your outlook, are gathering around AI regulation. And an aspect
here is that even as organizations across the private and public
sectors invest heavily in AI strategies, they also are very concerned about the potential
limitations that state and federal agencies may impose on the technology. And as we all know,
Shaheen, business hates uncertainty. Yes, a lot going on in this area. Another Wall Street Journal
article last week discussed businesses
bracing for the impact of AI regulation, noting that nearly 30% of Fortune 500 companies
cite AI regulations as a source of risk in their annual reports. But as we've said about new AI
laws in Europe, regulations could in fact provide clarity and establish a stable environment for
companies. But AI is a new and emerging field, and it's challenging for even the experts to grasp
the implications of technology and what a good regulatory regime should look like for individuals,
companies, and national competitiveness. And then there's the complexity of what should be
self-regulated and what should
be imposed. We'll close out this week with a story about HPC in Russia where a government agency
there is reportedly linking six Russian supercomputers together into what they've called
the Distributed Scientific Supercomputer Infrastructure Consortium. The concept seems
like a low-end version of what you see in the West,
and it doesn't look good for Russia. At the very high end, the leading edge is the U.S.
Department of Energy plans to build the Integrated Research Infrastructure, IRI,
that further integrates its supercomputing and data resources across locations in the U.S.
This is a story that originated from Tom's hardware,
and there is a big difference between the IRI and what's happening in Russia.
The Russian facilities will combine a relatively meager 1.5 petaflops of compute performance
with scientific data storage systems 15-plus petabytes across 900 servers.
That aggregate power would come in well below
the bottom of the top 500 list. The whole thing shows how far Russia is falling behind in such
a critical technology. All right, that's it for this episode. Thanks so much for being with us.
HPC Newsbytes is a production of OrionX in association with InsideHPC. Shaheen Khan and Doug Black host the show.
Every episode is featured on InsideHPC.com
and posted on OrionX.net.
Thank you for listening.