@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20240610
Episode Date: June 10, 2024- Clinical trials for cancer fighting drug discovered by LLNL and BridgeBio - Sandia and Submer say immersing the whole rack can get big power savings - In defense of the CHIPS Act - New paper "Scala...ble MatMul-free Language Modeling" promises low memory low power AI [audio mp3="https://orionx.net/wp-content/uploads/2024/06/HPCNB_20240610.mp3"][/audio] The post HPC News Bytes – 20240610 appeared first on OrionX.net.
Transcript
Discussion (0)
Welcome to HPC News Bytes, a weekly show about important news in the world of supercomputing,
AI, and other advanced technologies.
Welcome to HPC News Bytes.
This is Shaheen Khan.
Doug is away.
We start with a positive story out of Lawrence Livermore National Lab, which announced that
human clinical trials have begun for a cancer-fighting drug
called BBO-8520, which was discovered using the lab's supercomputers. It's a collaboration with
Bridge Bio Oncology Therapeutics. This kind of project shows the power of supercomputing when
it's available to the industry. It also points to the benefits of public-private partnership, P3, also referred to
as multi-sector partnerships, a model to advance society that has been around for many years
and is hard to do but can be very effective. Power and cooling have rapidly emerged as the
new barrier for AI deployments. Data centers used to be measured by square feet or square meters, but are now
described by megawatts and gigawatts. It's a massive challenge that must be solved one way
or another. Sandia National Lab, which has been announcing a series of projects focused on
promising next-generation technologies in various areas, from reconfigurable chips to neuromorphic
AI clusters, said it has taken immersion cooling to the next level
by immersing the entire system,
not just the compute boards,
but the whole rack, cables and all,
in a new cooling fluid developed by Submir,
a company based in Spain.
They have seen power consumption savings of up to 70%,
a case study is expected to be released this fall.
The Wall Street Journal has begun a series of articles on a topic we've discussed here many times, the Chips and Science
Act. The Act considers the ability to manufacture high-end semiconductors in the U.S. a matter of
national security and an area where local manufacturing had fallen behind globally and
was on a trajectory to fall behind
even further. So the act, passed two years ago, aims to remedy the situation by accelerating
the domestic development of high-end semiconductor manufacturing in the U.S. through very large
subsidies. In the first article published recently, the journal reported that the program's impact
is becoming clearer in the sense that big
U.S. companies that are pursuing advanced chips are getting a boost, but there are limits to what
the money can do. It also said the $53 billion legislation is being challenged by other countries,
leading to political complexity regarding the allotments at home and the sheer expense of
manufacturing chips. They wrote that the program
is forecast to triple the number of chips made in the U.S. by the year 2032, according to a Boston
Consulting Group study that was commissioned by the Semiconductor Industry Association.
But that would boost the U.S. share of global chip production to only about 14% in 2032,
compared with 12% in 2020. The modest overall increase partly reflects the
new reality where governments in European countries and in Asia, such as in South Korea,
Japan, Taiwan, and China, are either continuing to or starting to allocate tens of billions to
help fund local chip capabilities. It's also partly about the segments of the chip
building market. The action is really at the very high end where the CHIPS Act is targeted.
And without the program, Boston Consulting estimated the U.S. share would have fallen to
8% in 2032. The CHIPS Act is an industrial policy initiative in a very complex technology
and supply chain area. It is addressing the
significant challenge that is created when organic market forces do not protect national security.
It's a complicated risk management problem because when global leadership in advanced
technologies is lost, it is very expensive and time-consuming to regain it. It requires a lot
of money and a lot of patience and a lot of deep understanding of what it takes to regain it. It requires a lot of money and a lot of patience and a lot of deep understanding of what
it takes to regain it and to sustain it. The transformer technology and large language models
created a step function for AI. It turned out predicting the next token is surprisingly good
at generating content of various formats, hence the word generative in generative AI.
What is the next major advance, you might ask?
Well, nothing jumps out besides brute force.
Bigger models with many billions and trillions of parameters push towards better accuracy,
and longer context windows push towards total recall.
No pun intended about Total Recall the movie, but yes, that's another topic.
What may just point to a significant advance
is work by researchers at the University of California in Santa Cruz, Suchao University,
which is 60 miles west of Shanghai, University of California in Davis, and Luxitech, an AI firm in
China, which, by the way, was co-founded by two women who lead it as CEO and CTO. They published a paper this week titled, quote,
Scalable MatMul-Free Language Modeling, end quote. MatMul refers to matrix multiply computations,
which are everywhere in AI and which GPUs are especially good at. Eliminating them is a big
deal because matrix multiplication is computationally intensive, needs a lot of memory, and burns a lot of
electricity. They've done it by quantizing neural net weights to just minus one, zero, and one. The
idea has been around for a few years, but this paper shows its applicability to pretty large
generative AI models. You only need two bits for those three numbers, and you can use ternary,
one more than binary, routines that know in
advance no other numbers will show up. Their optimized GPU implementation shows 25.6% faster
training using 61% less memory and 4.75 times faster inference using 10 times less memory,
with models scaled up to 13 billion parameters. Not so large, but large enough to be notable.
They've also built a custom FPGA that points to power savings.
Related to this, I invite you to tune in to the latest episode of the At HPC podcast,
in which we speak with Nestor Maslay of Stanford University Institute for Human-Centered AI,
who is also the editor-in-chief of the annual Stanford AI
Index Report. We discussed the latest edition issued in April. All right, that's it for this
episode. Thank you all for being with us. HPC Newsbytes is a production of OrionX in
association with InsideHPC. Shaheen Khan and Doug Black host the show. Every episode is featured on
InsideHPC.com and posted on OrionX.net. Thank you for listening.