@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20240318

Episode Date: March 18, 2024

- European AI Act - 5nm Cerebras Wafer Scale Engine 3 - A faster matrix-multiply algorithm? - Meta's GenAI Infrastructure [audio mp3="https://orionx.net/wp-content/uploads/2024/03/HPCNB_20240318.mp3"...][/audio] The post HPC News Bytes – 20240318 appeared first on OrionX.net.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to HPC News Bites, a weekly show about important news in the world of supercomputing, AI, and other advanced technologies. Hey, everyone. Welcome to HPC News Bites. I'm Doug Black. Hi, Shaheen. We start with the European Union's passage of the AI Act, which they call the first ever legal framework for AI, aiming to position Europe as a global leader for trustworthy AI. The AI Act is part of a set of provisions that
Starting point is 00:00:33 include the AI innovation package and the coordinated plan on AI. It's a significant development in the global adoption of AI and its social and economic impact. Yeah, and it promises to be a model for other countries to adopt, just like they did GDP. in the global adoption of AI and its social and economic impact. Yeah, and it promises to be a model for other countries to adopt, just like they did GDPR, the general data protection regulation that the EU passed in 2018. So this time for people, the AI Act aims to protect the safety and rights of people and businesses. For AI companies, it goes beyond data usage rights to include a risk factor. For AI practices and AI applications, they define four levels of risk, minimal, limited, high,
Starting point is 00:01:18 and unacceptable. And unsurprisingly, the unacceptable variety is banned. And for AI investors, the fact that they now have clarity on usage and adoption, they think will help build confidence and pull more investments. The biggest chip in the world is designed by Cerebra Systems, which uses the whole dinner plate size silicon wafer instead of cutting it up into smaller chips. They naturally call it the wafer scale engine, WSE, and they just unveiled the third generation of that chip. It's built by TSMC and their five nanometer fab. It comes in at 46,000 square millimeters. That's an eight and a half inch square, packing a whopping 4 trillion transistors, which form 900,000 cores, which together can do 125 petaflops of peak AI performance. That's twice the performance of the previous WSE-2 chip at the same power draw and at the same price, although the prices are not disclosed. The system that uses it is the Cerebra CS-3, and you can cluster multiples of those, of course.
Starting point is 00:02:20 It's quite a beast. Well, you know, Shaheen, of late, I've seen the word whopping show up more frequently in press announcements. But in this case, whopping seems appropriate. The CS3 supercomputer is a huge memory system of up to 1.2 petabytes. And Cereber says it can train models 10x larger than GPT-4 and Google Gemini. 24 trillion parameter models can be stored in a single logical memory space without partitioning or refactoring, which means training a 1 trillion parameter model on the CS3 is as straightforward as training a 1 billion parameter model on GPUs, they said. It seems obvious Cerebras timed this just ahead of NVIDIA's GTC conference this week, and you have to wonder if, as large language models get bigger, how rapidly their wafer
Starting point is 00:03:10 scale engine technology will be adopted. We go a bit more technical next, talking about matrix multiplication. A lot of numerical science and AI relies on matrix algebra, the arrangement of data into a table that allows abstraction and efficient computational models. Well, we got news this week that matrix multiply has seen its quote, biggest boost in more than a decade, unquote. This is via papers by scientists at Tsinghua University in China and at UC Berkeley and MIT in the US. Now, as is often the case with news like this, it will take a while for it to get vetted and used and translated into real improvement, but this could be an important advancement. Yes, it's like you said, it's upstream, it manifests itself for sufficiently large matrices, and as you said, will take a while
Starting point is 00:04:00 to work itself into apps, and then only when it could actually help. Now, the traditional way to multiply two matrices of size n on each side requires n to the power of three multiplications. Also a lot of additions, but by comparison, they're not too time-consuming. The research that has gone into improving this over the years, well before there were GPUs or AI, has been looking for ways to reduce that power. What if you could have n to the power of a number less than three? The most famous of these methods is Strassen's algorithm, which uses a divide and conquer strategy to bring it down to n to the power of about 2.8,
Starting point is 00:04:37 already a very big improvement. This new research has it at n to the power of about 2.37. Remember also that a few years ago, Google's deep mind AI found a better matrix multiply algorithm for specific size matrices. And then there's the common case when the matrix is sparse, where a lot of the numbers in it are zero, and you find ways to avoid multiplying by zero and saving time. Let's end with a couple of notes.
Starting point is 00:05:04 One is Facebook Meta's publishing a blog about how they built their generative AI infrastructure. A lot of good info and a lot of focus on open hardware infrastructure, PyTorch, of course, and open software for AI. By the end of 2024, they expect to have 350,000 NVIDIA H100 GPUs in an infrastructure that includes 600,000 H100s across their whole capacity. Yes, and another note is GTC, which starts today. We'll both be there, and we'll probably record a special edition of the At HPC podcast for it. So we'll see what NVIDIA will announce and how that changes Meta's infrastructure. All right, that's it for this episode. Thanks so much
Starting point is 00:05:50 for being with us. HPC News Bites is a production of OrionX in association with Inside HPC. Shaheen Khan and Doug Black host the show. Every episode is featured on InsideHPC.comcom and posted on orionx.net. Thank you for listening.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.