@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20250929

Starting point is 00:00:00 Welcome to HPC Newsbytes, a weekly show about important news in the world of supercomputing, AI, and other advanced technologies. Hi, everyone. Welcome to HBC Newsbytes. I'm Doug Black of Inside HPC. And with me, of course, is Shaheen Khan of OrionX.net. Shahin, the Wall Street Journal, reporter Christopher Mims, has an interesting article on what could be the key to the next big AI. step, what are called world models. Unlike large language models, which MIMS calls book smart, but constrained to what they've been told from existing content, world models are sort of a virtual, realistic playground where AIs can behave, act, make mistakes, and learn. To quote from

Starting point is 00:00:49 the article, the key is enabling AI to learn from their environments and faithfully represent an abstract version of them in their heads the way humans and animals do. To do it, developers need to train AIs by using simulations of the world. Think of it like learning to drive by playing Grand Turismo or learning to fly from Microsoft Flight simulator. These world models include all the things required to plan, take actions, and make predictions about the future, including physics and time. We've all heard of using synthetic data from which AIs can be trained, but world models sound like a more advanced, far-reaching learning A.A. environment. Well, the big message and I hope this is a justified confirmation bias is the role that traditional HPC has played and will continue to play in enabling AI and then using AI. The keyword there

Starting point is 00:01:44 is physics. Actual data from the real world obviously reflects the physics of nature. But if that data is not available or is not sufficient, then you need HPC to simulate reality. And then wrangling that data, as they call it, and organizing and formatting it also requires HPC. So far, we've had deep learning, which was a massive breakthrough and ended the multi-decade AI winter. And it took a few years until generative AI was discovered, which was another big step function. Both of them exceeded expectations, and surprisingly so. Agentic AI was a natural next step and has led to agent swarms collaborating on a task, and they're still figuring themselves out. world models seems like another natural next step.

Starting point is 00:02:32 We probably need three or four other major advances like generative AI to get to the kind of AI that is envisioned here, and HPC will continue to be the enabler. But the market impact grows with every small and large advance since AI doesn't need to be AGI or superintelligence SI to be useful for a growing number of tasks with social and economic impact. The tens and hundreds of billions of bucks to be invested in AI data centers that we've been hearing about since the start of this year are taking shape, and what we're seeing is impressive. Microsoft recently released information about a gigantic data center near completion in Wisconsin, which the company claims is the most powerful in the world.

Starting point is 00:03:16 This is a $3.3 billion facility, and the company is planning to spend an additional $4 billion over the next three years for a second adjoining data center of similar. scale. Shaheen, we often hear about the vast water demand of large data centers, but Microsoft said more than 90 percent of it will rely on a closed loop liquid cooling system that will recirculate continuously, requiring roughly the amount of water that a typical restaurant uses annually or what an 18-hole golf course consumes weekly in peak summer. They said the data center will house hundreds of thousands of Nvidia GPUs operating in clusters connected by enough fiber a wrap around the Earth four times over in that the processors will handle training for frontier AI models delivering 10 times the performance of today's fastest supercomputers.

Starting point is 00:04:05 We should expect to see this as long as the only way they can make progress is by having more and more GPUs. In fact, last week, OpenAI threw around the T-word, as in trillions of dollars for data centers, and released information about what it described as a central park-sized site with eight data centers on it. New York's Central Park, which is what I assume they're referring to, is about 843 acres. So that sort of size by itself leads to new considerations, including critical infrastructure security. And they said they need 13 times that. So let's see how much of it actually materializes since having the funds is just one requirement. But the plans seem real and are impacting investment decisions. Everyone is also pursuing algorithmic advance.

Starting point is 00:04:53 and new GPU architectures. That happened most visibly with DeepSeek, which showed that algorithmic improvements can reduce the number and power of needed GPUs. Back to Microsoft, recirculating water or using hydrogen power, so water is a byproduct that can be fed into the cooling unit, as was done by another vendor,

Starting point is 00:05:14 are examples of optimizing the whole thing, not just at the rack level, but at the level of the whole building. Now, recirculating water means you have to cool it versus discharge it, if you are allowed to, or move the warmed up water to the city's public pool, as has been done in some locations. Meanwhile, China reportedly banned its major tech companies from purchasing Nvidia GPUs. That means they need more of their homegrown chips to deliver the same power, and that means physically larger systems that are harder to make work.

Starting point is 00:05:46 Is that strategy because the chips they can buy are the same as what they can build? is it because they think they can do it without faster GPUs, or less likely, is it because they don't see AGI as a necessary objective right now? All remains to be seen. Speaking of Microsoft, they posted an interesting blog on an in-chip liquid cooling technique that they say they're making significant advances on. It's called microfluidics, and they report it improves silicon cooling by 3x. Instead of conventional cold plate cooling, microfluidics uses tiny,

Starting point is 00:06:21 tiny channels etched on the back of the silicon chip, creating grooves that allow cooling liquid to flow directly onto the chip and more efficiently remove heat, according to the company. Microfluidics is a clever concept, but as with many other things, it is not a new concept, and getting it to work has been a challenge. It requires the channels to be deep enough to get heat out, to get fluid in and heat out, but not too deep to weaken the silicon structure. They also need to allow the chip density to deliver speed, and they can't be allowed to get clogged. Capillary flow is common in biological organisms, of course, so maybe the whole thing should look like the veins and a leaf or a butterfly wing. Well, that's exactly what they have done.

Starting point is 00:07:06 Microsoft collaborated with Swiss startup Corinthus to optimize a bio-inspired design to cool more efficiently than straight up-and-down channels in a grid. Taking inspiration from nature is a time-honored engineering method, but if you look at an airplane wing that is not flapping like a bird's does, you can see how it may start mimicking nature, but it can change before it is practical. There's also news from the quantum world of a new cubit record. Physicists at Caltech say they have assembled a 6100 neutral atom array of cubits trapped in a grid by lasers. Previous arrays of this kind have contained hundreds of cubits. The team used what they call, quote, optical tweezers, unquote, highly focused laser beams to trap thousands of

Starting point is 00:07:51 individual cesium atoms in a grid. To build the array of atoms, the researchers split a laser beam into 12,000 tweez, which together held the 6,100 atoms in a vacuum chamber. Shaheen, I know you and other analysts are wary of hype from the quantum world. What do you think of this? Well, this is a bit different in that the announcement is from a university research lab versus from a vendor. In fact, I wish more advances were announced by universities and not come across as something that is ready for deployment when it is not. The Caltech scientists said their work showed that the larger cubit scale did not come at the expense of quality. With more than 6,000 cubits in an array, the cubits were in superposition for about 13 seconds, nearly 10 times

Starting point is 00:08:37 longer than previous similar arrays, while providing the ability to manipulate individual qubits with 99.9.9% accuracy, almost four nines. As we've said here, the quantum industry is making many big advances and many more are needed. The main issues remain. Fidelity, connectivity, speed, and scale. And then programmability, integration, deployability, et cetera. All right, that's it for this episode. Thank you all for being with us. NBC Newsbytes is a production of OrionX in association with InsideHPC. Shaheen Khan and Doug Black host the show. Every episode is featured on insidehpc.com and posted on OrionX.net.

Starting point is 00:09:20 Thank you for listening.

@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20250929

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.