@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20240318
Episode Date: March 18, 2024- European AI Act - 5nm Cerebras Wafer Scale Engine 3 - A faster matrix-multiply algorithm? - Meta's GenAI Infrastructure [audio mp3="https://orionx.net/wp-content/uploads/2024/03/HPCNB_20240318.mp3"...][/audio] The post HPC News Bytes – 20240318 appeared first on OrionX.net.
Transcript
Discussion (0)
Welcome to HPC News Bites, a weekly show about important news in the world of supercomputing,
AI, and other advanced technologies.
Hey, everyone.
Welcome to HPC News Bites.
I'm Doug Black.
Hi, Shaheen.
We start with the European Union's passage of the AI Act, which they call the first ever legal framework for AI, aiming to position
Europe as a global leader for trustworthy AI. The AI Act is part of a set of provisions that
include the AI innovation package and the coordinated plan on AI. It's a significant
development in the global adoption of AI and its social and economic impact.
Yeah, and it promises to be a model for other countries to adopt, just like they did GDP. in the global adoption of AI and its social and economic impact.
Yeah, and it promises to be a model for other countries to adopt, just like they did GDPR,
the general data protection regulation that the EU passed in 2018. So this time for people,
the AI Act aims to protect the safety and rights of people and businesses. For AI companies,
it goes beyond data usage rights to include a risk factor.
For AI practices and AI applications, they define four levels of risk, minimal, limited, high,
and unacceptable. And unsurprisingly, the unacceptable variety is banned. And for AI investors, the fact that they now have clarity on usage and adoption, they think will help build confidence and pull more investments. The biggest chip in the world is designed by Cerebra Systems,
which uses the whole dinner plate size silicon wafer instead of cutting it up into smaller chips.
They naturally call it the wafer scale engine, WSE, and they just unveiled the third generation of that chip. It's built by TSMC
and their five nanometer fab. It comes in at 46,000 square millimeters. That's an eight and
a half inch square, packing a whopping 4 trillion transistors, which form 900,000 cores, which
together can do 125 petaflops of peak AI performance. That's twice the performance of the previous WSE-2 chip
at the same power draw and at the same price, although the prices are not disclosed.
The system that uses it is the Cerebra CS-3, and you can cluster multiples of those, of course.
It's quite a beast. Well, you know, Shaheen, of late, I've seen the word whopping show up more frequently in press announcements.
But in this case, whopping seems appropriate.
The CS3 supercomputer is a huge memory system of up to 1.2 petabytes.
And Cereber says it can train models 10x larger than GPT-4 and Google Gemini. 24 trillion parameter models can be stored in a single
logical memory space without partitioning or refactoring, which means training a 1 trillion
parameter model on the CS3 is as straightforward as training a 1 billion parameter model on GPUs,
they said. It seems obvious Cerebras timed this just ahead of NVIDIA's GTC conference this week,
and you have to wonder if, as large language models get bigger, how rapidly their wafer
scale engine technology will be adopted. We go a bit more technical next, talking about matrix
multiplication. A lot of numerical science and AI relies on matrix algebra, the arrangement of data into a table that allows abstraction
and efficient computational models. Well, we got news this week that matrix multiply has seen its
quote, biggest boost in more than a decade, unquote. This is via papers by scientists at
Tsinghua University in China and at UC Berkeley and MIT in the US. Now, as is often the case with news like this,
it will take a while for it to get vetted and used and translated into real improvement,
but this could be an important advancement. Yes, it's like you said, it's upstream,
it manifests itself for sufficiently large matrices, and as you said, will take a while
to work itself into apps, and then only when it could actually help. Now, the traditional way to multiply two matrices of size n on each side
requires n to the power of three multiplications.
Also a lot of additions, but by comparison, they're not too time-consuming.
The research that has gone into improving this over the years,
well before there were GPUs or AI, has been looking for ways to reduce that power.
What if you could have n to
the power of a number less than three? The most famous of these methods is Strassen's algorithm,
which uses a divide and conquer strategy to bring it down to n to the power of about 2.8,
already a very big improvement. This new research has it at n to the power of about 2.37.
Remember also that a few years ago,
Google's deep mind AI found a better matrix multiply algorithm
for specific size matrices.
And then there's the common case when the matrix is sparse,
where a lot of the numbers in it are zero,
and you find ways to avoid multiplying by zero and saving time.
Let's end with a couple of notes.
One is Facebook Meta's publishing a blog
about how they built their generative AI infrastructure. A lot of good info and a lot
of focus on open hardware infrastructure, PyTorch, of course, and open software for AI.
By the end of 2024, they expect to have 350,000 NVIDIA H100 GPUs in an infrastructure that includes
600,000 H100s across their whole capacity. Yes, and another note is GTC, which starts today.
We'll both be there, and we'll probably record a special edition of the At HPC podcast for it.
So we'll see what NVIDIA will announce
and how that changes Meta's infrastructure. All right, that's it for this episode. Thanks so much
for being with us. HPC News Bites is a production of OrionX in association with Inside HPC. Shaheen
Khan and Doug Black host the show. Every episode is featured on InsideHPC.comcom and posted on orionx.net. Thank you for listening.