In The Arena by TechArena - Iceotope’s Sustainable Cooling Vision for the AI Era

Starting point is 00:00:00 Welcome to Tech Arena, featuring authentic discussions between tech's leading innovators and our host, Alison Klein. Now, let's step into the arena. Welcome in the arena. My name is Allison Klein, and we are coming to you from the AI Infra Conference in Santa Clara. And we have been having such a fantastic series of Data Insights episodes. It's Data Insights, So that means Janice Naroski is with me. Janice, welcome. Thank you, Allison. It's great to be back.

Starting point is 00:00:35 So, Janice, we have been talking to practitioners. We've been talking to folks in the value chain. Tell me who we're talking about today and who we've got with us. Yeah, we're hearing a lot about this topic all over AI infrastructure. And actually, every conference we go to, all thing is cooling. So today we actually have Isotope and Mayhem joining us, Jonathan Ballin, the CEO of Isotope. Welcome. Thank you.

Starting point is 00:00:59 Thank you for having me. me. So, Jonathan, I was so excited when you came up on the list of folks that we were going to interview at AI Infra. You and I have known each other for a number of years, and I have not talked to you about your most recent move, which is joining a CEO of Isotope. Can you just introduce Isotope for our listeners? And I want to know about why you chose to join a CEO. Sure, absolutely. Isotope found me, actually. I left Intel in 2020, and my objective was really to get involved with as many kind of groundbreaking startups as possible. One of them being Accelera, which is an edge inference company. It's a chip company out of Europe. And when Isotope

Starting point is 00:01:44 approached me, I didn't know a lot about what was happening in the data center other than the huge demand for power. And when I dug into it and I understood that the other major limitation is the ability to cool all of the heat that that power is generating. I took a closer look and I realized that there's actually a major barrier that the industry will face in around the 2027, 2028 timeframe that there really isn't a solution for. And so I looked at the technology, the products that Isotope has built over the last decade that hadn't really come to market yet. And they're entirely designed and built to solve that problem.

Starting point is 00:02:27 And really, there wasn't any other solution. So it just seemed like an incredible opportunity. That's fantastic. AI is creating new infrastructure challenges all over the industry, right? But from your perspective, what are the biggest power thermal challenges that data centers are facing as AI workload grow? Well, it's really been accelerating so much over the last few years. I mean, people talk, of course, at the lead horse in this. race, which is driving all of this, which of course is NVIDIA, they're setting the standard.

Starting point is 00:02:58 Right now, we've come from a place where racks in the data center were measured in 20, 30 kilowatt racks, and now people are deploying racks that are 130, 140 kilowatts, and that's kind of the max of what is being deployed in the industry. But Nvidia is signaling that in 2027 time frame, with the launch of Vera Rubin at Vera Rubin Ultra, we're going to see 600 kilowatt racks. That's crazy. We're now looking at one megawatt racks with Google and with NVIDIA, and the challenge with that is that liquid cooling has entered the cooling infrastructure just incredibly rapidly over the last year, whereas even when I joined isotope back in January, it was a question,

Starting point is 00:03:46 are we going to adopt liquid cooling? Now it's not even a question. It is like the de facto, yes, we will do that. But what people haven't answered is, how do we cool north of 140 north of 200? And that's where I think some of the things that we're doing beyond the chip come into play. And that's what's super interesting to me. AI infrared is so much about not just the hyperscalers, but where enterprises on their path to AI, if you want to think about it that way.

Starting point is 00:04:16 And one of the things that I've been hearing a lot is we're going to deploy AIMFRAs, across a continuum. We're going to deploy it with hyperscalers. We're going to keep some of it on-prem. We're going to run some of it at the edge. How do you see that proliferation shaping your strategy in terms of knowing that liquid is going to be needed across all these environments? Yeah, absolutely seeing that trend. We see it every day in our customer base. People are bringing workloads on site for various reasons. It could be data sovereignty. It could be that they want to reduce latency. Sure. They want to bring data closer to the scientists, for example, in genomics research. We see it at the edge for inference. Same reasons. They want to get the compute as close to the data as possible to reduce latency. So we're seeing basically this distributed computing architecture happening in real time. The cloud continues to be important. But this pendulum swing that took place over the last 15 to 20 years where everything got sucked into the cloud is now becoming this truly

Starting point is 00:05:18 distributed computing architecture and it's fascinating to watch. And the GPU and the need for whether it's training or inference is driving these thermal loads. And what's interesting is let's take the telco edge. You know, if you're doing inference at the edge for whether it's 5G moving to 6G or you want to do high bandwidth real-time inference for autonomous driving or you want to do it on a factory floor, you need GPUs in that environment, and you may not be able to have air and fans in that environment. So the ability to do that with 100% liquid becomes really important. I mean, air cooling today, even traditional cooling methods are having a hard time managing the high-density infrastructure systems. What limitations do you see happening here? And why is this

Starting point is 00:06:12 the case? A couple simple limitations. One is the power. So one of the key fundamental challenges is direct-to-chip is the primary solution that people are addressing today. So you basically put a cold plate on the compute. You can even put multiple chips under that cold plate. And that's great. And the cold plates are very efficient at doing that. And the cold plates are getting better. And the material science and all of the techniques for doing that, they can get infinitely better.

Starting point is 00:06:39 But as the compute thermal loads increase, so does the thermal load of energy. everything around that chip, which is the power supply, the network, the storage, the memory. And so if you ignore all of those other things, you're going to reach a limit to the ability to cool that with purely just air. So you're going to reach a physical limit with the thermodynamics associated with air cooling those things. And you're going to reach an economic limit where it just doesn't make economic sense to do that. So the direct-to-chip approach with air is going to reach that limit probably around 200 kilowatt racks. That's the limit of the rear door heat exchanger. And so if you just look at direct-to-chip, you're going to hit basically that ceiling. So our approach

Starting point is 00:07:27 is really you take a direct-to-everything approach. We cool everything equally, and that allows essentially unlimited headroom at the chassis and the rack level. So that's basically the limitation that people have with air. I would say the other limitation for a lot of our customers, particularly in enterprise and at the edge is noise. And this is something that people don't talk about, but we have customers that are using unused commercial real estate. There's a lot of vacant real estate in office spaces, in retail, and industrial, and they want to use it to build data centers, like micro data centers.

Starting point is 00:08:04 And so the ability to do that in a mixed use facility and operate it silently, or even to put a micro data center in, a genomics lab. And to do that very quickly and not have to build raised floors, not have to build HVAC systems, you can roll in these systems of racks very fast, turn them online and operate them silently. It's just a paradigm shift in terms of how you operate these systems. Yeah, anybody who's ever stood next to a server rack understands exactly what you just said. I love that direct to everything. That makes a lot of sense. We talk to a lot of cooling companies at Tech Arena. Everybody's got that.

Starting point is 00:08:42 their angle, right? What is it that differentiates? When you look at the challenge, you think about where we're going with power density. How do you deliver a solution, whether it's the way that you're delivering cooling to the platform, how you're evolving to address how platforms are changing, you're dielectric. I don't know what it is. How are you actually delivering something that's going to be differentiated in market and folks who will want to choose isotope? Essentially, the intent is for isotope just to become a standard. USB is a standard. We'd like everyone to be able to adopt it and incorporate it into their product. It's a technology. It's a method. So we're not trying to compete necessarily with direct-to-chip or compete with other methods. We cohabitate. In fact, we have

Starting point is 00:09:27 products that we've built that include directed-chip methods. So it's really not a substitute or a choice. You know, there's basically a couple of things that are happening right now. So you've got two-phase dielectric. Two-phase means that it reaches a certain temperature and it boils and it moves into a vapor. That vapor then acts as a cooling agent and then returns back into a liquid state. There's two types of those two-phase fluids. There's ones that have PFAS forever chemicals, which are very bad. And there's now some emerging two-phase fluids. They're very early in their life cycle that are more safe for the environment and for humans that are emerging. Two-phase, phase is a very efficient way of cooling.

Starting point is 00:10:14 There's also single phase. This is what we use in our products, which have an extremely high boiling point, very high flash point. They're non-flammable, very safe. They're also environmentally safe. They're human safe. So you can touch them with your hands. And then the third category is water.

Starting point is 00:10:30 Water is the most efficient way of cooling thermodynamically. So all of these methods are now different people are using them in different ways, and that's really some of the key considerations. With isotope, you don't have to choose. We use all of these methods interchangeably depending upon what thing we're trying to solve. We don't just solve compute. We also solve power supplies. We solve network devices. We solve storage devices. And so depending on what we're cooling, we'll adjust the method. So it's basically a toolkit that you're going to apply based on the customer's requirement. Exactly. That's awesome. It's really great. Let's take a look at what this cooling does beyond just efficiency, right?

Starting point is 00:11:12 What impact does cooling have on enabling scalability and reliability for AI workloads, you know, all the way from edge and into the data center? Yeah, so as far as scalability and reliability, first of all, when you think about the idea that you're removing air and the need for air from a density point of view, when you remove air, you also remove the need for air to circulate. So we can compress compute very densely. So in terms of compute per square inch per square meter, everything becomes incredibly dense. That also means we can move compute much closer together.

Starting point is 00:11:51 So when you're building an AI factory or AI pod, we can reduce latency and bring things much closer together. When you talk about reliability, we look at utilization. Because we're cooling everything consistently and at the same temperature, We have the ability for our customers to operate at 100% utilization. In fact, we have customers that are overclocking at all times with no risk of frying the compute. And so when you talk about return on invested capital or return on invested compute, our customers are getting like incredible return because there's no risk of burnout. When you talk about reliability, we've found that when you remove the air, you're removing

Starting point is 00:12:36 particulate matter, you're removing vibration, and you're also incorporating this mineral oil that acts as a preservation agent. So those three things actually decrease component failure significantly. We did and published a white paper with Meta, where we built with them a J-Bod, a storage device that can be used for inference workloads, where we saw a 30% reduction in drive failure just because we pack these drives super close together. There's no need for air and spinning drive failure reduced by 30%. There have been other studies by OCP that showed that equipment that was touching these types of dielectric fluids saw a 90 plus percent reduction in component failure. So it's a remarkable approach. It makes me wonder why we weren't doing it

Starting point is 00:13:26 all along, but we'll take that for another topic. I want to go somewhere else. Sure. You've really charted a pretty nice view of innovation, but I'm going to push you a little bit harder on it. We know that the hyperstallers are pushing at a rate that we've never seen before. You know, we've seen performance doubling at 1.5x Wars Law. We've seen industry standards and technologies across the motherboard trying to keep up. We know that the focus of compute over the next few years will be growing exponentially. How do you see, in three years' time, the liquid-cooling market looking? And can we keep pace with the technologies we have today

Starting point is 00:14:08 with the long-term demands for liquid cooling? I think there's certainly not a one-size-fits-all, and I don't think everyone needs the same amount or type of compute. So what the hyperscalers are doing and what they need is very different than what an enterprise or a telco customer needs. Do you think that differentiation is getting more stark over time? 100%. So, for example, and you saw this with the NV-72 offering from NVIDIA, the number of customers

Starting point is 00:14:40 for that is limited. Kept them on ahead. Exactly. Now, when you look at what's going to be launched in 2027 with Vera Rubin and then Vera Rubin Ultra, it's going to be even smaller. And so the distance between those customers and everyone, else is just getting more vast. And that's fine. Yeah. I think it's so interesting. You know, that's something that we've been tracking is that bifurcation and that growing divide of

Starting point is 00:15:09 AI factories on steroids with I want to go deploy some AI services on prem and I need to start thinking about liquid for this segment of my data center. How do you think that is actually happening within the enterprise space? Is that a different journey to be working with a liquid cooling provider than I need to move to all liquid. I think enterprises need to think carefully about how they're going to invest in new infrastructure. What we're seeing is enterprises that are looking for retrofit capability that don't require forklift upgrades so that they can adopt liquid cooling gradually rather than having to do forklift changes either in their existing infrastructure or in their new builds.

Starting point is 00:15:54 And so looking for technology that allows you to do that is super important. otherwise they could be making major infrastructure changes that could be outdated in three to five years, which could be a major risk. So I think looking for those types of opportunities is super important. Everyone's always trying to chase what the major hyperscalers are doing or the neoclouds. I think it's fine to look at them as reference points, but not try to emulate because they're going to always be way too scaled and way too advanced to try and, I think, You think you can find signals, but it's not worth trying to emulate. Thanks for that insight.

Starting point is 00:16:32 What you said earlier about the deployment of data centers and commercial spaces, that's so interesting for an enterprise or an edge use case. I can't see a hyperscaler taking over an office suite to build out something that isn't at the scale of what they're used to. Maybe at Roe. No, they won't. I think it's fascinating. I think that's something that we're going to have to uncover some work. You know what?

Starting point is 00:16:54 A retailer would. Yeah, for sure. For sure, yeah, absolutely. Why not? And why not find some cheap space to deploy something at the edge with space that, frankly, people are looking desperately for people to take on? So no more workers in those spaces. Why does well put data centers in then? So this was a fantastic conversation. I think that Janice has one more question for you. Yeah, just to kind of wrap it up, and we have the CEO with us, right? What would you say your vision is for isotope in terms of redefining sustainable cooling for the overall AI era? You hit the right word. It's sustainable. I think, you know, when you look at the amount of resources that are being consumed right now, whether it's power or water, it's unsustainable. The first question was what drew me to isotope. Isotope uses 96% less water than what is being used in the industry. today. To me, that is hard to ignore. And when you look at the power consumption of the types of

Starting point is 00:17:52 things that we do, it's 80% less power required for cooling. And I'm at the stage of my career where I think about my kids. I think about the environment and what's important legacy that I want to leave behind. And those are two important characteristics that are important to be. Jonathan, it's been awesome having you on the show. It was the first time that isotope is here. So where can folks find out more information about isotope and engage your team? Of course, it's isotope.com. It's spelled in an unusual way. Ice, as in cold ice, O-Tope. All right. Tech Arena listeners, you heard it here. Thank you so much. And this wraps another Data Insights episode. Thanks so much, Janice. Thanks, Jonathan for being here.

Starting point is 00:18:36 Thanks, Allison. Thanks, Janice. Thank you, Jonathan. Thanks for joining Tech Arena. Subscribe and engage at our website, Tech Arena. dot AI. All content is copyright by techering.

In The Arena by TechArena - Iceotope’s Sustainable Cooling Vision for the AI Era

With sustainability at the core, Iceotope is pioneering liquid cooling solutions that reduce environmental impact while meeting the demands of AI workloads at scale....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.