@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20240429
Episode Date: April 29, 2024- Infra-Tech, Datacenter Shortage of Parts, Property, and Power - AI winter, AI bubble, dot-AI bust - Trade bans, Chain of Custody - Sandia National Lab's Hala Point Neuromorphic Supercomputer [audio... mp3="https://orionx.net/wp-content/uploads/2024/04/HPCNB_20240429.mp3"][/audio] The post HPC News Bytes – 20240429 appeared first on OrionX.net.
Transcript
Discussion (0)
Welcome to HPC News Bites, a weekly show about important news in the world of supercomputing,
AI, and other advanced technologies.
Hi, everyone.
Welcome to HPC News Bites.
I'm Doug Black.
Hi, Shaheen.
We begin today with a report from the Wall Street Journal about the challenges of deploying
AI technology. It has now gone past GPU
availability to the shortage of data center capacity and the power requirements to run
and cool them. The article has a nice way of describing shortages for what it calls
exploding demand for AI infrastructure, parts, property, and power. I recall an episode last
year when we talked about modular nuclear reactors as a way to power future data centers.
It was based on a comment from Lisa Su of AMD.
It looks like the market might welcome such a solution.
By the way, you pointed out a website called gplist.ai that shows GPU availability according to type of GPU, duration, interconnect, and location.
It's almost like an Airbnb of GPUs.
Also, it needs to be said that the money being bandied about for new AI infrastructure,
$40 billion for Meta, $11 billion for a Microsoft data center in Indiana, a mere $1 billion for
Tesla's expanding AI capabilities, keeping in mind that the world's number one supercomputer costs $600
million. We're seeing the biggest tech companies spending at a scale that dwarfs even the U.S.
government. Data centers used to be described in square feet. Nowadays, it's about megawatts.
So when a rack can use over 100 kilowatts and you need 20,000 or 40,000 GPUs for AI workloads,
it's not hard to get there. And when
AI and GPUs are described as critical to national security, then you need a few data centers per
country at least. So infratech, as some call it, is emerging as the next obstacle and market
opportunity. Locations that have the advantages are the ones with a colder climate and low-cost
power, like geothermal or hydropower. Then you
need a building, ability to design and test full cluster configurations, and possibly to help
finance it too. In addition, many data centers were designed for a lot less power density than
AI requires, say 10, 20, or 40 kilowatts per rack versus the 100 or 120 kilowatts per rack that we
can see in high-end systems for AI.
All that said, there have been a few articles that take the opposite side, talking about an AI winter
or an AI bubble that will burst in the near term, causing a glut of GPUs and a painful setback up
and down the value chain. This would be caused by insufficient market traction by too many AI
software companies who've been around a couple of years and would have impatient investors. But it'd be a market correction, not a slowdown of
the bigger AI trend. It's quite a contrast, and it does feel as though a glut of AI chips,
or at least a glut of vendors trying to cash in on AI, could be in the offing. As someone put it,
would a.AI bust be like the.com bust of the late 1990s? On the
one hand, AI companies command multi-billion dollar valuations with little to no revenue,
and sometimes even without much of a product. And on the other hand, all that exuberance needs to
cope with the realities of the market. However, no one seems to doubt the long-term growth of AI,
again, similar to the.com era of the internet.
Trade wars and geopolitics is a recurring theme for us, and there was news that institutions in China have acquired recently banned NVIDIA AI chips in Supermicro and Dell systems.
There are, after all, many roads to so great a market, wouldn't you say?
Yes. While it is difficult to secure a long
supply chain when systems may change hands several times, securing the chain of custody
is a big deal. Vendors have strict and laborious processes to validate their customers who in turn
commit to restrictions about governing those systems. But chain of custody is also chain of
trust and there could be cracks.
The article I read was actually pretty inconclusive in its reporting.
It wasn't clear that it actually was a breach, for example, because the systems in question were acquired before the bans went into effect.
Back in December, we talked about neuromorphic chips for AI, modeled to a degree on the human brain and its low energy use,
and emulating large networks of biological neurons and their synaptic operations. Neuromorphic chips are on the
order of 20 to 30 watts, let's say, as compared to 500 watts or one kilowatt or more for GPUs.
We also covered Deep South, a neuromorphic supercomputer in Australia that was to go
live right around now and seemed to use FPGAs. Several
other projects have also been announced, like IBM's TrueNorth, a startup called Rain, and BrainChip,
also in Australia, that announced in February that they are accepting pre-orders for their system
called Akida and present it as an Edge AI box. Finally, Intel Labs has a project called Loihi. Well, there is news from Intel Labs
and Sandia National Lab that they are using the next rev of that chip, Loihi 2, in a neuromorphic
supercomputer called HalaPoint. Neuromorphic chips typically perform synaptic operations,
so the metrics are not the same as GPUs. The HoloPoint system is said to be able to perform 20 peta ops,
as well as the equivalent to 15 trillion 8-bit operations per second per watt.
This is a very respectable showing in terms of both performance and power efficiency
compared to GPUs and CPUs.
All right, that's it for this episode.
Thanks so much for being with us.
HPC Newsbytes is a production of OrionX in association with InsideHPC. Shaheen Khan and Doug Black host the show. Every episode is featured on InsideHPC.com and posted on OrionX.net.
Thank you for listening.