@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20240610

Episode Date: June 10, 2024

- Clinical trials for cancer fighting drug discovered by LLNL and BridgeBio - Sandia and Submer say immersing the whole rack can get big power savings - In defense of the CHIPS Act - New paper "Scala...ble MatMul-free Language Modeling" promises low memory low power AI [audio mp3="https://orionx.net/wp-content/uploads/2024/06/HPCNB_20240610.mp3"][/audio] The post HPC News Bytes – 20240610 appeared first on OrionX.net.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to HPC News Bytes, a weekly show about important news in the world of supercomputing, AI, and other advanced technologies. Welcome to HPC News Bytes. This is Shaheen Khan. Doug is away. We start with a positive story out of Lawrence Livermore National Lab, which announced that human clinical trials have begun for a cancer-fighting drug called BBO-8520, which was discovered using the lab's supercomputers. It's a collaboration with
Starting point is 00:00:33 Bridge Bio Oncology Therapeutics. This kind of project shows the power of supercomputing when it's available to the industry. It also points to the benefits of public-private partnership, P3, also referred to as multi-sector partnerships, a model to advance society that has been around for many years and is hard to do but can be very effective. Power and cooling have rapidly emerged as the new barrier for AI deployments. Data centers used to be measured by square feet or square meters, but are now described by megawatts and gigawatts. It's a massive challenge that must be solved one way or another. Sandia National Lab, which has been announcing a series of projects focused on promising next-generation technologies in various areas, from reconfigurable chips to neuromorphic
Starting point is 00:01:23 AI clusters, said it has taken immersion cooling to the next level by immersing the entire system, not just the compute boards, but the whole rack, cables and all, in a new cooling fluid developed by Submir, a company based in Spain. They have seen power consumption savings of up to 70%, a case study is expected to be released this fall.
Starting point is 00:01:46 The Wall Street Journal has begun a series of articles on a topic we've discussed here many times, the Chips and Science Act. The Act considers the ability to manufacture high-end semiconductors in the U.S. a matter of national security and an area where local manufacturing had fallen behind globally and was on a trajectory to fall behind even further. So the act, passed two years ago, aims to remedy the situation by accelerating the domestic development of high-end semiconductor manufacturing in the U.S. through very large subsidies. In the first article published recently, the journal reported that the program's impact is becoming clearer in the sense that big
Starting point is 00:02:26 U.S. companies that are pursuing advanced chips are getting a boost, but there are limits to what the money can do. It also said the $53 billion legislation is being challenged by other countries, leading to political complexity regarding the allotments at home and the sheer expense of manufacturing chips. They wrote that the program is forecast to triple the number of chips made in the U.S. by the year 2032, according to a Boston Consulting Group study that was commissioned by the Semiconductor Industry Association. But that would boost the U.S. share of global chip production to only about 14% in 2032, compared with 12% in 2020. The modest overall increase partly reflects the
Starting point is 00:03:08 new reality where governments in European countries and in Asia, such as in South Korea, Japan, Taiwan, and China, are either continuing to or starting to allocate tens of billions to help fund local chip capabilities. It's also partly about the segments of the chip building market. The action is really at the very high end where the CHIPS Act is targeted. And without the program, Boston Consulting estimated the U.S. share would have fallen to 8% in 2032. The CHIPS Act is an industrial policy initiative in a very complex technology and supply chain area. It is addressing the significant challenge that is created when organic market forces do not protect national security.
Starting point is 00:03:51 It's a complicated risk management problem because when global leadership in advanced technologies is lost, it is very expensive and time-consuming to regain it. It requires a lot of money and a lot of patience and a lot of deep understanding of what it takes to regain it. It requires a lot of money and a lot of patience and a lot of deep understanding of what it takes to regain it and to sustain it. The transformer technology and large language models created a step function for AI. It turned out predicting the next token is surprisingly good at generating content of various formats, hence the word generative in generative AI. What is the next major advance, you might ask? Well, nothing jumps out besides brute force.
Starting point is 00:04:29 Bigger models with many billions and trillions of parameters push towards better accuracy, and longer context windows push towards total recall. No pun intended about Total Recall the movie, but yes, that's another topic. What may just point to a significant advance is work by researchers at the University of California in Santa Cruz, Suchao University, which is 60 miles west of Shanghai, University of California in Davis, and Luxitech, an AI firm in China, which, by the way, was co-founded by two women who lead it as CEO and CTO. They published a paper this week titled, quote, Scalable MatMul-Free Language Modeling, end quote. MatMul refers to matrix multiply computations,
Starting point is 00:05:13 which are everywhere in AI and which GPUs are especially good at. Eliminating them is a big deal because matrix multiplication is computationally intensive, needs a lot of memory, and burns a lot of electricity. They've done it by quantizing neural net weights to just minus one, zero, and one. The idea has been around for a few years, but this paper shows its applicability to pretty large generative AI models. You only need two bits for those three numbers, and you can use ternary, one more than binary, routines that know in advance no other numbers will show up. Their optimized GPU implementation shows 25.6% faster training using 61% less memory and 4.75 times faster inference using 10 times less memory,
Starting point is 00:06:00 with models scaled up to 13 billion parameters. Not so large, but large enough to be notable. They've also built a custom FPGA that points to power savings. Related to this, I invite you to tune in to the latest episode of the At HPC podcast, in which we speak with Nestor Maslay of Stanford University Institute for Human-Centered AI, who is also the editor-in-chief of the annual Stanford AI Index Report. We discussed the latest edition issued in April. All right, that's it for this episode. Thank you all for being with us. HPC Newsbytes is a production of OrionX in association with InsideHPC. Shaheen Khan and Doug Black host the show. Every episode is featured on
Starting point is 00:06:44 InsideHPC.com and posted on OrionX.net. Thank you for listening.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.