Catalyst with Shayle Kann - The mechanics of data center flexibility

Episode Date: August 28, 2025

Adding flexibility to data center loads could ease strain on the grid and reduce the need for costly new generation. And, according to one study, shaving off just a few megawatts during peak hours cou...ld also unlock unused capacity —as many as 98 gigawatts in the U.S —  if those facilities reduced load by just 0.5% each year.   The problem: data centers promise near-perfect reliability, often “five nines” (99.999% uptime) in service-level agreements with customers. That leaves little room to adjust something as critical to reliability as power.  But times are changing. The data center market is reckoning with the constraints of the power grid and growing concern about pushing up electricity prices to pay for new generation. In July, the Electric Power Resource Institute’s DCFlex demonstration at an Oracle data center in Phoenix, Arizona, reduced load 25% during peak demand. And this month Google expanded its demand response through two new agreements with Michigan Power and the Tennessee Valley Authority. So what are the actual mechanics of data center flexibility? In this episode, Shayle talks to Varun Sivaram, founder and CEO of Emerald AI. The startup’s data center flexibility platform powered EPRI’s DCFlex demonstration. Shayle and Varun cover topics like: What people often misunderstand about how much of their nameplate capacity data centers actually use  The distinct load profiles of training, inference, and other workloads How data centers can pause, slow, or shift workloads in time or space to reduce demand What it will take for flexibility solutions like Emerald AI to earn operator trust  How much flexibility data centers can realistically achieve  Varun’s long-term vision for evolving from occasional demand response to weekly or even daily load shifting Resources: Latitude Media: Nvidia and Oracle tapped this startup to flex a Phoenix data center   Latitude Media: Google expands demand response to target machine learning workloads    Catalyst: The potential for flexible data centers   Credits: Hosted by Shayle Kann. Produced and edited by Daniel Woldorff. Original music and engineering by Sean Marquand. Stephen Lacey is our executive editor. Catalyst is brought to you by Anza, a solar and energy storage development and procurement platform helping clients make optimal decisions, saving significant time, money, and reducing risk. Subscribers instantly access pricing, product, and supplier data. Learn more at go.anzarenewables.com/latitude. Catalyst is supported by EnergyHub. EnergyHub helps utilities build next-generation virtual power plants that unlock reliable flexibility at every level of the grid. See how EnergyHub helps unlock the power of flexibility at scale, and deliver more value through cross-DER dispatch with their leading Edge DERMS platform by visiting energyhub.com. Catalyst is brought to you by Antenna Group, the public relations and strategic marketing agency of choice for climate and energy leaders. If you're a startup, investor, or global corporation that's looking to tell your climate story, demonstrate your impact, or accelerate your growth, Antenna Group's team of industry insiders is ready to help. Learn more at antennagroup.com.

Transcript
Discussion (0)
Starting point is 00:00:02 Latitude Media, covering the new frontiers of the energy transition. I'm Shayle Khan, and this is Catalyst. You might slow down a job. You might change the resource allocation of how many chips, for example, are instantaneously being used for a job. You might also go all the way down to the underlying silicon, and you might change what we call the clock frequency of the chip to change the rate at which computations happen.
Starting point is 00:00:28 Coming up, what does it actually look like to make a data center flexible? When utilities need flexible capacity they can count on, they turn to Energy Hub. Energy Hub works with more than 170 utilities, coordinating over 2.5 million devices to manage 3.4 gigawatts of flexibility, built for the moments when utilities can't afford uncertainty. Energy Hub builds and operates virtual power plants that utilities actually stake their grid planning on, coordinating EVs, batteries, thermostats, and more through a single platform built for utility scale. predictive, verifiable, and designed to perform when it counts. Learn more at Energy Hub.com.
Starting point is 00:01:15 Trillions of dollars are flowing into clean and critical infrastructure, but those investments aren't driven by technology alone. They're shaped by markets, by policy, by capital, and by the institutions that connect them. I'm Alfred Johnson, CEO of Crux, and host of a brand new podcast, Critical Capital. Each episode, I talk with people deploying capital, shaping policy and building the clean economy.
Starting point is 00:01:37 Tune in as we unpack how progress is actually made. Listen to Critical Capital on Spotify, Apple, or wherever you get your podcasts. Catalyst is supported by Fish Tank PR. An award-winning PR firm focused on climate and energy tech, renewables, and sustainability. Fish Tank is known for generating prominent and effective media coverage for the brands they work with. If you want a PR partner that's thoughtful, shoots straight, and gets results, you'll like Fish Tank PR. To learn more about Fish Tank's approach, visit Fish, That's f-i-s-h-fish-tankpr.com.
Starting point is 00:02:15 I'm Shail Khan, invest in early-stage companies at energy impact partners. Welcome. So the conventional wisdom about data centers is that from an electricity perspective, they look like totally flat load, i.e. operating 24-7, 365, and without much willingness to change that. But as power increasingly becomes the choke point for more data infrastructure development, the world is waking up to a bunch of ways in which that's not entirely or necessarily true. First, you can put generation or batteries on site to shave peak load.
Starting point is 00:02:50 That's the physical solution. But there are also digital solutions, it appears. First, because data centers aren't actually operating at nameplate peak most of the time anyway, but also, second, because you might actually be able to make the workloads themselves a little bit flexible. Google actually made a big announcement about doing this at their data centers just a few weeks ago. They've announced that they've partnered with two utilities, Michigan Power and TVA, to introduce demand response via workload flexibility in their data centers. But our guest today is my old friend, Verun Sivaram, who's also working on this problem. His company, Emerald AI, is building a software platform that is intended to make data centers flexible.
Starting point is 00:03:31 As with many things in electricity, the devil is in the details. And in this case, the details involve what do we mean by flexibility? How do we actually get it? What are the SLAs between the data center operators and their customers? How are the grid operators going to think about it? There are a lot of nuances to this. So let's get into it. Here's Veroon.
Starting point is 00:03:50 Verroon, welcome back. Shale, thanks for having me back. All right, new topic for us to talk about here, which is what you are spending your time on these days, data center flexibility. I want to start by having you kind of walk me through what you understand to be the way that compute translates to electricity load in AI data centers today. I think this is something that is actually commonly misunderstood.
Starting point is 00:04:13 So what does the electricity load profile look like of an actual AI data center today? Yeah, great question. First of all, from a planning perspective, the grid has absolutely no idea what your load profile is going to look like, and that's the way that they study you as a new AI data center load. But let's just back up here. AI data centers, nowadays as NVIDIA CEO, Jensen Huang, calls them AI factories, fundamentally are in the business of transforming electricity into what we call tokens, which are the fundamental input or output unit from AI.
Starting point is 00:04:50 And they're doing it increasingly well. So a data center will try very efficiently to take electricity and turn it into compute outputs. and you'll have losses along the way. You'll have losses because of the load of cooling, for example, all the other non-computational loads in a data center. Historically, a data center might lose 33% of the power or use it 33% of that power for non-IT or information technology uses, and the remaining 66% or 67% goes into actual computations.
Starting point is 00:05:23 Nowadays, with the increasingly customized design of these AI factories and some of the amazing efforts of the hypers, such as Google, these numbers are falling and therefore you can get 80 or 90% of the power being turned directly into AI computations. What does that look like to the grid? Well, if you're running a large language model training run, you might see the power use of that AI data center spike as the training run commences, have brief dips as the AI training run undergoes what's called synchronized checkpoint.
Starting point is 00:05:58 So there's this kind of very difficult to predict transient behavior that's wildly swinging. And then after the training run concludes hours or days later, you might have a large reduction in demand. If you have an AI data center that's fully committed to doing what's called inference or using these AI models, you might see more smooth, but still relatively unpredictable usage patterns from the grids perspective. So that's one of the reasons that AI data centers appear so scary to grids today. can't really plan for what you expect to see. And these loads look fundamentally different from anything they've ever seen. They're extraordinarily energy dense. Yeah. And, you know, it's not dissimilar from kind of everything else in electricity, which is the result is you
Starting point is 00:06:44 have to plan for the peak, right? So the data center says, I need, let's invent a number, 400 megawatts of capacity. The, I think from a grid operator perspective, you basically have to plan for 8760 hours of 400 megawatts. That is essentially what you were planning for, right? You're actually planning for even worse than that shale. You're right. Over 8760 hours, which is one single year, you want to predict or plan for a worst-case scenario
Starting point is 00:07:13 where the data center, let's say, as you suggested, it's a 400-macawatt data center, that 400 megawatts shows up at the absolute worst time of the year. But you're actually planning for even more years than that when you're running this interconnection study to determine can this data center connect to my system, you're saying in the next seven or ten years, in an absolute worst case scenario, so not just 8760, but 8760 times 10, 87,600 hours, when a transmission line
Starting point is 00:07:40 goes down somewhere and it's a record hot day and air conditioning demand is super high, on that particular day, will my 400 megawatt data center request its full 400 megawatts and overload a circuit? And if so, can't connect it today, have to upgrade the system before we do that. So that's how data centers are studied today. Okay, but that is a different question. That's sort of, you said it right. That is how data centers are studied today. There is a separate question of how are they operated, generally speaking, which does not
Starting point is 00:08:10 align perfectly with how they are studied. In other words, it is not always true that they are operating at full 400 megawatt capacity if it's a 400 megawatt rated data center. So what do we know about the actual operational profile from an electricity perspective, assuming you're doing nothing clever like the things we're about to start talking about? And let me say, Shail, before you do anything clever, I actually don't think it's irresponsible or analytically incorrect for the grid to study these data centers in that extremely risk-averse way that I just described. Because you're right, Shail, data centers do take, sometimes years to ramp up their capacity.
Starting point is 00:08:44 They'll proceed in phases as you build out the buildings, fill the data halls with the equipment, and begin to actually run the workloads that you'd like to run. And there may also be quite a bit of buffer that you'll leave. leave on top. It may only, even if you're running an intensive training run, you may only be utilizing this data hall 75%, let's say. And so it may very well be the case that that 400 megawatt data center in the foreseeable future does not hit 400 megawatts. And yet, I don't think it's incorrect for system operators to plan for a hyperscaler who comes to town and says, I want a 400 megawatt data center to actually use that full entitlement once it's granted. And there are certainly
Starting point is 00:09:23 examples of data centers running absolutely full tilt, large data centers running full tilt, to the point where unless, Shail, as you mentioned, you do one of these clever things to intelligently control the consumption when the grid needs you to, the grid has absolutely, you know, it's correct and justified to assume that you may use your consumption at the absolute worst time in full. Yeah, I mean, my understanding of kind of the basic state of affairs is right. So the grid says, okay, I'm going to plan for worst case scenario as I need to do to deliver reliable service. And so I'm going to assume you need 400 megawatts all the time for 10 years. Meanwhile, the data center actually operates differently from that.
Starting point is 00:10:02 And data center load profiles, AI load profiles, as I understand it, I mean, particularly for training, but in inference as well, at least in the current iteration of inference, they're surprisingly spiky. So loads can go up and down quite a lot. So maybe you're pulling 400 megawatts some of the time. Maybe you're pulling 200 megawatts some of the time. It's kind of a weird load profile. But to the grid operator, it's unpredictable, which is, I guess, the key point here, which is if you don't know when that load is going to spike or not spike, then again, all you can do is operate as if it is 8760, 400 megawatts of load.
Starting point is 00:10:37 And so that's what people are starting to wake up to, is like, wait a second. Like, there is this mismatch here. Clearly there is headroom because the data center does not need to operate all the time at full capacity. but taking advantage to that requires doing some things differently because otherwise the grid operators can't do anything different. Their hands are tied, basically. Yeah, precisely. I think that's really well said.
Starting point is 00:11:00 And if I can just take one more moment to set the table here, Shail earlier you said, hey, look, this isn't dissimilar to what we see from other loads. And I think, you know, I don't probably disagree with you fundamentally, but I do think there are some very peculiar things about AI that are truly dissimilar. One is the extraordinary rate of growth. The power demand from data centers has more than doubled every year the last several years,
Starting point is 00:11:27 and that trend shows no sign of abating. A lot of people talk about data center efficiency and the increasing efficiency of the new generations of GPUs, these graphics processing units, Nvidia's Blackwell is much more efficient than Hopper, which is much more efficient than the previous generation, A100s, etc. But that efficiency gain is currently being eaten up by the tremendous growth in computing demand. So even as power demand is more than doubling every year, the reason it's more than doubling is because compute demand is more than quadrupling every year, a 4x increase every year. And the second thing that's truly dissimilar is what I mentioned earlier, the power density.
Starting point is 00:12:02 AI's power density is increasing by orders of magnitude, which I don't think any other electricity application has seen in this short span of time, where we went from 5 kilowatt racks. The rack is a set of servers and stacked in a single cabinet. That rack might have used five kilowatts just a few years ago. Today, I just was in a data center in Silicon Valley, seeing a brand new deployment of NVIDIAGB 200s, the Blackwell generation. The rack is 132 kilowatts. It's liquid cooled, and we're headed toward one megawatt rack. So think of that. That's two orders of magnitude increase in density. These massive data centers occupy a tiny footprint and look like small cities. So both of these trends, the exponential increase in power demand and the shrinking
Starting point is 00:12:46 footprint of massive power demand are stressing grids out in ways we haven't seen before. Okay, so last question on the current state of affairs before we talk about the clever stuff. I mentioned this, but I'm curious whether you have visibility into actually what it looks like, which is, is there a meaningful distinction in terms of the current operating profile of AI data centers for a training data center versus an inference data center. Do they look different from a load profile perspective? Oh, absolutely. These loads do look different, right? Training loads have a very characteristic profile, and inference workloads have a different characteristic profile. And we talked a little bit about this earlier. A training run looks like, you know, you ramp it up. It can ramp up by tens or hundreds
Starting point is 00:13:28 of megawatts. It will kind of randomly, you'll have dips in the power as checkpoints happen. It'll ramp all the way back down when the synchronized GPUs stop with the end of the training run. Inference, depending on the set of use cases and the diversity of the applications, can look much more smoothed out. It might, in some cases, look more like what you've seen traditional cloud computing. Like, you've seen, for example, a meta data center might have a load profile that looks like people open their phones in the morning and go to Instagram. And so you see a spike. Similarly, today, people open their phones and go to chat GPT. And so that's a more familiar load profile. But, but, Nevertheless, you can certainly impute a different kind of workload type from the power signature today.
Starting point is 00:14:09 It's one of the things, by the way, that we at Emerald AI have been training an AI model to do. However, an important distinction here is that a data center will not do a single thing for its lifetime, right? A massive data center, for example, may initially be configured and specified to train a large language model, and then you'll finish training the large language model, and then you'll do other things with those GPUs. those same Nvidia GPUs can then be used for smaller research training workloads. They can be used for inference and fine-tuning large models for specific applications. A single data center may be used for one model, and then it's separated out into multiple different types of workloads.
Starting point is 00:14:45 So I wouldn't count on any given data center having the same load profile for its lifetime or even more than a year. Which presumably complicates things even a little bit further from the electricity perspective. All right, so let's talk about the clever stuff. then, or at least start to talk about the clever stuff. So the key concept here is, can we make data centers look to the grid like flexible assets, which means introducing some measure of predictability and planning into when the load from the data center is below peak, basically.
Starting point is 00:15:18 And there are various ways you could do that from, like, you know, basic demand response that says we will tone down demand, you know, a few hours a year just at peak to, like, daily flexibility where you're shifting intraday all the time. So there's lots of different versions of it, but from like a simple mechanical perspective, just to start, say you want to introduce some measure of load flexibility into a AI data center, what are you actually doing? So you can achieve flexibility through multiple routes. You can, of course, achieve flexibility through what I'll call the physical infrastructure route. If you have a lot of backup generation, you might fire up the backup generation. Often you're not allowed to because your diesel generator will
Starting point is 00:15:57 violate its air permit if you use it regularly. And so what we at Emerald AI, the company I founded to solve this problem of data center flexibility, what we do at Emerald AI is computational workload orchestration. We want to attack the beating heart of AI's energy demand, which as I mentioned increasingly is just the computers as AI factories become much more efficient and honed at converting electricity into tokens. And to do that, to achieve that on-demand flexibility, you take advantage of some of the inherent or latent flexibility that the different AI workloads have. You might, for example, orchestrate a workload that is flexible in time, one that can be slowed down or paused for a certain amount of time, something, for example, that looks like a fine-tuning operation that doesn't
Starting point is 00:16:49 need to terminate immediately on time if what you're doing is taking a large language model and tuning it to a particular enterprise application that enterprise might not mind if that model is paused for a minute or an hour. And in other cases, you may be taking a model or an AI use case that has flexibility spatially. You might move it from one location to another to save power in one particular data center location while keeping that application running as you move it to a different location. So there are a lot of different ways within this broad frame. of achieving spatiotemporal flexibility. And what Emerald AI takes advantage of is there is inherent workload flexibility in the use cases of AI today.
Starting point is 00:17:32 Virtual power plants are becoming a reliable way for utilities to manage capacity. But enrolling devices is just the start. What really matters is confidence, knowing those resources will perform when dispatched and being able to prove it from the control room to the living room. Energy Hub's platform handles the full picture. from near real-time forecasting, locational dispatch, and the kind of rigorous verification
Starting point is 00:17:53 that holds up when regulators, grid operators, or leadership ask, did it deliver? Easy enrollment creates momentum, proven performance builds trust. That's why more than 170 utilities rely on Energy Hub to manage over 2.5 million devices delivering 3.4 gigawatts of flexible capacity.
Starting point is 00:18:12 See what that looks like at Energyhub.com. We're living through a profound economic shift, and energy sits at the center of all of it. Trillions of dollars are flowing into power plants, transmission lines, battery factories, data centers, but the future of energy isn't shaped by technology alone. It's shaped by markets, by policy, by capital, and by the institutions that connect them.
Starting point is 00:18:36 I'm Alfred Johnson, CEO of Crux, the capital platform for the clean economy. Join me for my brand new show, Critical Capital, as I talk with people deploying capital, shaping policy and building projects, Together, we unpack how risk is priced, how incentives are structured, and how progress is actually made. Listen to Critical Capital on Spotify, Apple, or wherever you get your podcasts. Are you tired of overpaying for big-name PR firms, but not really knowing what they're delivering?
Starting point is 00:19:04 Is your comms team wasting time reviewing lengthy messaging briefs and decks, instead of engaging journalists or producing content? Are you wondering why your competitors are getting press and you aren't? Fish Tank PR is an award-winning climate and energy tech, renewables, and sustainability. FACTS-BUR Firm dedicated to elevating the work of both early stage and established companies. Whether you need to position yourself as a thought leader in between project announcements or translate complex ideas and technologies into tangible, compelling stories that resonate with the media, F-TankP-R.com. Check out fishtankpr.com. That's F-I-S-C-H-Fish-Tankpr.com. Maybe let's walk through that in a little bit more detail. So let's focus on the temporal
Starting point is 00:19:47 component, right? Spatial component, if you have multiple data centers, you shift load from one place to another. Google's actually been talking about doing that for years for the purpose of lower carbon, right? Like they've been saying one of our ways we're going to reduce the carbon intensity of our computation is by shifting location to location. That feels to me like it is more readily available to the hypers who have lots and lots of data centers probably within one region than it is to others. The temporal one in theory available to anybody. So what does it look like? So you have some workload that the data center is supposed to undertake. Is it as simple as saying we delay this workload by a few hours?
Starting point is 00:20:26 Or presumably there's more to it than that? It absolutely can be as simple. And let me first give credit where credits do. You mentioned Google. Google also, by the way, has exploited its temporal flexibility. There was a paper or a post they put out a couple years ago. A friend of mine, Varun Mera, wrote it about moving video, indexing operations to nighttime in order to reduce load during periods, as you mentioned, Shale,
Starting point is 00:20:51 when that computation would be not renewables intensive. It would be carbon intensive. So exactly as you said, one simple thing to do would be to simply pause a workload. However, that's not going to work for all workloads. And the reason this is tricky and sophisticated is because there are many things you could do, many different requirements that users are going to have for you and you want to precisely meet a grid target, and you want to make sure that your performance is not sort of approximate, but that you can guarantee to the grid that if they need you to achieve a particular demand reduction, you can certainly do that while respecting the constraints that the users of the AI compute put on you. That dual optimization problem is what makes this
Starting point is 00:21:34 complicated. So in addition to pausing and then resuming later on a job that can tolerate a delay, you might slow down a job. You might change the resource allocation of how many chips, for example, are instantaneously being used for a job. Some instances of this are known as auto-scaling, where you scale up and down the resource allocation for particular kinds of queries. You might also go all the way down to the underlying silicon, the, for example, Nvidia chips, and you might change what we call the clock frequency of the chip to change the rate at which computations happen. And so depending on the workload type, a customer may be comfortable with that workload being slowed a little bit, slowed a lot, and there are some other technical limitations as well. And I'll stop talking in a moment about the
Starting point is 00:22:20 complexities because they're fractally complex. But I'll mention, for example, that different workload types can tolerate different amounts of clock frequency changes or power caps. And so you need to know something about these workloads in order to determine, hey, what's the best set of operations that I can do to preserve what the user wants, which is great performance for their AI workload, whether it's training a model, fine-tuning a model, et cetera, and precisely what the grid needs, which is not a megawatt more than this limit that we promise to achieve for them. And that is a non-trivial problem that's far harder than just, I'll just pause a bunch of jobs. Yeah, that differentiation amongst types of workloads I think is sort of important here,
Starting point is 00:23:02 because if you think just historically pre-AI wave, right, there was already the same problem of like lots of data centers, way, way fewer, but lots of data centers that needed what looked to the grid, like 24-7 load, et cetera, et cetera. And the explanation you would always get as to why those loads couldn't or wouldn't be flexible was, well, these are mostly hyperscaler data centers, and the hyperscalers are making a commitment to their customers,
Starting point is 00:23:26 the ones on whose behalf they're doing this work, that they will deliver with low latency or whatever it is. And so it's just not worth it to them to try to shift this. stuff around, they just want to deliver as quickly as they possibly can. So I can imagine there being cases here where that's going to be true to certain inference workloads in particular, I can imagine like there isn't really flexibility, but then others may be like training a model, certainly not as time sensitive. So how do you think about like the workloads and types of
Starting point is 00:23:56 compute for which this is especially well suited? Well, first of all, necessity is the mother of invention or changing your business model. And this is one of those cases, Shale, where, look, we've got 50 to 100 gigawatts of latent AI demand in the pipeline, it's just not going to get built unless you have this capability of flexibility. Tyler Norris's viral paper, he's an advisor to Emerald AI, I should note. Tyler Norris's viral paper said, hey, there's 100 gigawatts of spare capacity lying around on grids if we can just make data centers modestly flexible, up to 200 hours a year. They're able to reduce consumption by around 25% for around two hours on average per event. And so if it weren't the case that there was this extraordinary demand for energy, severe limitations, and kind of this golden ticket to get it, I don't think we would be changing business as usual, which is the last two decades of SLAs or service level agreements is, Shale, as you said, you simply get 24-7 uptime agreements on your power.
Starting point is 00:24:58 Given the necessity now, I think there's a range of AI customers, and we've talked to hundreds who are willing to tolerate. small levels of changed power availability. You know, today there are different kinds of ways that you can reserve compute capacity. You can have a guaranteed instance where you get that 99.999% uptime guarantee. You can also have a spot instance where you can basically just get kicked out any time
Starting point is 00:25:26 or preempted. What Emerald AI's spatiotemporal flexibility technology offers is an almost firm guarantee. It's a guarantee that, look, 99% of the time, you're going to be left alone. But every so often, up to that 100 hours or 200 hours, there might be a mild power cap in which the, in which Emerald Conductor is going to gracefully orchestrate your workloads, and you might have to face a power cap. And based on what kind of workloads you're running, we're going to make sure to protect
Starting point is 00:25:56 the performance and tolerate delays only where you're willing to tolerate them. So that implies then, that sort of answers one of my implicit questions from earlier. So you're focused on the 100, 200 hours. a year. So this is a demand response type application. It's not like a daily load shifting thing. This is like in periods of extreme grid stress, we will dial down
Starting point is 00:26:16 your power consumption a little bit. You know, to be clear, I think that's where we enter. It's the most pressing need of the hour. No pun intended today. But I think that the same toolkit that harnesses spatiotemporal flexibility that allows you to, for those
Starting point is 00:26:32 100 or 200 hours, provide this demand response, is also the same capability set that would allow data centers to flex on a weekly or even daily basis one day, again, if the prices are right, if the incentives are well calibrated. And I think, Shail, you and I both believe in a grid that is fundamentally abundant, cheap, affordable, and that's going to require a lot of both dispatchable but also intermittent and not dispatchable energy. And I personally view data centers as a potential holy grail if not the silver bullet to enable a generation mix like that,
Starting point is 00:27:11 one that's far more clean and one that's far more intermittent. So down the road, you can imagine that data centers, which today are about 4% of American energy consumption, AI data centers are about 5 gigawatts of load, grow to 12% by the end of the decade AI data centers could be all, anywhere up to 50 or even more gigawatts, to 25% of American load by 2035 and beyond. They suddenly become by far the big,
Starting point is 00:27:36 biggest user of electricity in this country. And if they have this flexibility toolkit, they can be doing all of these operations, the up to 100 hours demand response, potentially, daily shifting. That's what a truly co-optimized AI infrastructure and electricity grid infrastructure massive system would look like. And I think step one is solving this 100 to 200 hour problem and just getting data centers onto the grid and getting grids comfortable that they can perform when called upon. So I think the big question then here is like how much flexibility can you actually offer? It's going to vary, I understand. But I don't think anybody's proposing the 400 megawatt nominal data center turns to zero megawatts, 200 hours out of the year,
Starting point is 00:28:21 because you still have HVAC load and all that kind of stuff. And my presumption is you also don't want to, I mean, you mentioned this, right? Some of the techniques that you want to employ are things like slowing down the clock speed of a GPU, that doesn't dial the load down to zero. It just dials it down some. So what do we know about how much flexibility, how much demand response capacity is realistically latent within, say, a 400 megawatt data center? You know, we set out to demonstrate one example of this in Phoenix, Arizona, earlier this summer, and we published the results along with NVIDIA and the Electric Power Research Institute, our partner's Salt River Project, at an Oracle data center. And we said, look, let's take a large cluster of GPUs and let's see what we can get.
Starting point is 00:29:08 Can we achieve a 25% demand reduction, which the Tyler Norris Duke paper suggested would be a kind of minimum threshold to achieve this massive amount of headroom? So 25% reduction. Sustain it for what the Arizona grid needed, which was a three-hour demand reduction. And do so with representative AI workloads. And so we worked with. our partner, Jonathan Frankel, the chief AI scientist of Databricks, who specified for us, look, this is what a representative set of workloads could look like. It was surprising to me, by the way, to hear that he anticipated that just 10% of the workloads on a representative Databricks cluster were non-preemptible. In other words, they absolutely could not be paused
Starting point is 00:29:51 or delayed in any way. That gives us a lot of flexibility to work with. And so we worked with to develop four kind of representative ensembles of workloads, so varying levels of flexibility, some which could be just delayed by a little bit or slowed down a little bit, and some which could be delayed a little more. Using those representative workloads, we've published a pre-print of our academic paper on the archive showing that a 25% reduction is definitely feasible. We even have one of our runs which showed a 40% reduction, still met all of the performance requirements for this representative set of users and AI workloads. So there is, I think, a lot of inherent flexibility in the system.
Starting point is 00:30:30 And then, Shail, you can think about layering on other interventions. You can get computational load flexibility alongside, let's say, some limited deployment of batteries. And together, you can get much of the data center's consumption to go offline for a small amount of time. When you say still met the performance requirements, is that like there's something in the SLA, they're giving you a representative SLA,
Starting point is 00:30:53 and you're saying, okay, I still need to meet this? Or is it, who defines what the, because isn't that the key thing? Obviously, you can get kind of as much as you want, presuming that the performance requirements allow for it. And so a lot of this, to me, seems to come down to like, what is the SLA between the data center operator and the customer? You're nailing it.
Starting point is 00:31:13 This is the key central question going forward, is can we define a new kind of SLA that looks almost like the previous kind of SLA, but has, again, less than 1% of the time, the chance that your workloads might get power capped in the most graceful way possible. And again, in talking with hundreds of AI companies, our conclusion is this is definitely doable. It is definitely possible for us to find
Starting point is 00:31:40 a large set of customers who are willing to tolerate this kind of disruption, especially because, first, AI customers today struggle to get access to compute. You hear OpenAI's Sam Altman, talking often about how GPU capacity is a limiting constraint on the expansion of OpenAI's GPT5 model, for example. And others say, hey, the costs of compute because of the scarcity of compute are really the limiting factor for popularizing and democratizing AI. And even for applications that are extremely time or latency sensitive, you know, I recently talked to
Starting point is 00:32:18 the CEO of a company that makes a very real-time interactive work. world model. You know, you can step into this world and the data center needs to be quite close to you in order for you to have a good experience at 30 frames per second. Even they can tolerate geoshifting a workload less than 1% of the year, geoshifting some of the workloads within a 500-mile radius because it's only going to incur less than a 50 millisecond latency penalty. That's acceptable if what that leads to is a much larger set of GPU deployments and therefore better access to compute and maybe even cheaper access to compute. So I think, yes, Shale, the central question is, can there be a new PowerFlex SLA that's slightly different from today's SLAs? And I think the answer is
Starting point is 00:33:02 probably yes. All right. So final question for you then, the Holy Grail here is if you and others can convince the grid operators. You mentioned this before, right, that they can rely upon this type of flexibility, as you said, perhaps in combination with physical flexibility assets as well, such that they know there is a data center that has nominal 400 megawatt capacity, but actually we're going to interconnect it at 300 megawatts or whatever it is. What do you think it's going to take to get that level of comfort from the grid operators? It's been a long road to get traditional demand response there, and this is like a whole level of complexity. Now, as you said, necessities
Starting point is 00:33:44 and motherhood of invention, but what's your sense of like what are you going to have to prove to get grid operators to trust it? That's a really great question. To answer it, I recently was invited to speak at the Electric Power Research Institute's summer seminar.
Starting point is 00:33:59 There are 100 utility and grid operator CEOs in the audience. And I asked all of them for the same thing. I said, please participate alongside the AI companies in an escalating series of demonstrations approaching commercial scale. And we at Emerald AI plan to hit commercial scale early next year.
Starting point is 00:34:17 We're very excited to have whole data centers be power flexible in partnership with our collaborators, such as Nvidia, which is our biggest investor. Because that data, that ground truth, reliability information is what's needed for grid operators and utilities to believe that this is actually a thing, that AI, far from being the scariest liability that's getting added to grids
Starting point is 00:34:42 could actually be the most promising asset that we can add to grids. They've got to see it to believe it. So we're working with a range of partners. I mentioned the collaboration with Epre and Oracle and NVIDIA and SRP in Phoenix, but now we have upcoming demonstrations all over the added states
Starting point is 00:34:58 and increasingly around the world, which I'm very excited about, to showcase that data centers can be flexible and get grid operators very comfortable. One last thing I'll mention is in order for a grid operator, utility to bank on the fact that, hey, when I call this resource, it's actually going to perform the way I need it to. Emerald has developed something called the Emerald Simulator, which is a
Starting point is 00:35:22 digital twin that imagines what would happen if we did certain orchestration operations. We move some workloads around. We paused or slowed workloads. And as we've submitted in our academic paper, it's extremely accurate. And that accuracy built out over many more demonstrations is going to be critical to prove to utilities and grid operators that, in fact, the system is going to work the exact way you expect it to. And if it doesn't, in that absolute worst case, there will be some fail-safe mechanism to make sure that it does work. So there's a lot of convincing work to do, but I sometimes feel we're pushing on an open door. You know, when I talk to the chairman of a regulatory commission, you know, you pick your large East Coast state. That chairman said,
Starting point is 00:36:00 I've got the governor knocking my door every month and saying, what have you done for me to bring data centers to my state? Because I want to economically compete with all the other states. Regulators, utilities, system operators are all balancing this trade-off between providing reliable and affordable electricity, but also bringing economic development and this extraordinary new source of demand, the greatest economic opportunity humanities ever seen, to their state. Data Center flexibility is a way to end the trade-off between those two halves. You can have it all at the same time. It's the reason I left everything I've been doing in my career and founded this company to do just this for the next decade of my life. So really excited about it. Verun, this was fun. Thank you again for coming back. I really appreciate the time, Shail. Thank you so much for having me.
Starting point is 00:36:46 Verun Sivaram is the founder and CEO of Emerald AI. This show is a production of Latitude Media. You can head over to Latitude Media.com for links to today's topics. Latitude is supported by Prelude Ventures. This episode was produced by Daniel Waldorf. Mixing and theme song by Sean Marquan. Stephen Lacey is our executive editor. I'm Shail Khan, and this is Catalyst.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.