In The Arena by TechArena - Sneak peek: What to expect at the OCP conference with Schneider Electric’s Alex Rakow and Intel’s Eric Dahlen

Episode Date: July 26, 2024

Join host Allyson Klein in this insightful episode of Tech Arena, featuring Eric Dahlen from Intel and Alex Rakow from Schneider Electric. As co-chairs of the Compute Sustainability group within the O...pen Compute Project, Eric and Alex discuss their roles, the initiative's goals, and the impact of AI on data center sustainability. They delve into the challenges and innovations in power and cooling technologies, embodied carbon, and circularity practices. Get a sneak peek into what to expect at the upcoming OCP Summit and how industry leaders are pushing the boundaries of sustainable technology.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Tech Arena, featuring authentic discussions between tech's leading innovators and our host, Alison Klein. Now, let's step into the arena. Welcome to the Tech Arena. My name is Alison Klein, and I am very excited for today's episode. I've got Eric Dalen of Intel and Alex Rayko of Schneider Electric in the studio with me. Welcome to the program, guys. Thanks so much for having us. So you both have been on the show before, and you are co-chairs of a very important group within the Open Computing Initiative,
Starting point is 00:00:46 Compute Sustainability. Why don't we just start with introductions about your day jobs, what you do at OCP, and a little bit about the background on the goals of the initiative that you drive together. I lead sustainability for the data center segment at Schneider. So I'm working on our strategy within Schneider for how we partner with our data center customers, our data center partners to advance what at the end of the day are often common sustainability goals and how we overcome sustainability challenges that we share. And we can certainly talk a lot more about what those are through the course of the conversation. With OCP, I have for several years led a sustainability project, which in 2022, sort of the end of 2022, became one of the major projects at OCP and the fifth tenant that drives all the work that OCP does. I can describe more how that works and how we've set it up once Eric
Starting point is 00:01:40 has a chance to introduce himself. All right. Yeah. Thanks, Alex. And I'm Eric Galen. I'm a senior PE in the data center and AI group at Intel, and I'm our lead cloud technologist and I'm the primary technical support for the corporate sustainability product officer. So Intel stood up a key level position for sustainability of Intel products. And I support that person too. I actually came into the sustainability project from the steering committee. I was the original sustainability sustainability project from the steering committee. I was the original sustainability steering committee rep. And then when I handed that off to Shruti from Microsoft, I was able to come down and actually co-lead the project, the sustainability project.
Starting point is 00:02:16 That was, I think, the original intent, except that it's probably not cool to oversee yourself from a steering committee. When you think about what you've co-led for a significant amount of time now, can you just give us an introduction on what are the objectives of the program and what were they at the start and what are they today? So when I started working with Open Computing Project on my end, which I think is more recent than Eric, but back in 2022, sustainability was a strategic initiative for OCP. So not a named project. There were lots of ways in which sustainability intersected with all of the other
Starting point is 00:02:51 projects within OCP, cooling environments, data center facilities, and so on. And then in 2022, we, along with the foundation, decided that sustainability was important enough to the members, the member organizations, the foundation, and all the projects that we're working on, that it was worth us making it a named project and sort of wrapping it with the organizational trimmings that would allow us to do more work under that umbrella. And in doing so, as I referenced briefly earlier, we made sustainability the fifth tenant at OCP, meaning it's part of how we evaluate new innovations, new contributions to the Open Compute project. So at that time, we devised some general umbrella criteria for evaluating sustainability contributions. We wanted to make sure that the sustainability attribute that was being described
Starting point is 00:03:36 for whatever new solution was being brought to OCP was meaningful, that it was moving the ball forward in terms of advancing sustainability, that it was measurable, that there was some sort of metric behind whatever it was that you were claiming in terms of sustainability benefit, and that it was relevant to the solution. Software solutions have very low embodied carbon. We don't need to talk about embodied carbon for software, that sort of thing. It needs to be relevant to whatever category it is that the solution provider is bringing that to OCP. So that's part of what we've done. And then as a major project at OCP, we have several sub-projects that we've devoted a lot of resources to and have made a lot of progress on in the past couple of years. And we can dig into
Starting point is 00:04:14 those as the conversation progresses, but let Eric get in as well. Yeah. And I think for me, I'm kind of a dyed-in-the-wool hardware geek. And part of what we were after was trying to enable the ecosystem that generates compute infrastructure to tell where they were on efficiency. There's more to sustainability than just efficiency, but I definitely came at this from a compute-per-wad perspective. What we had in the industry from the Green Grid and others was P-RE, which is greater than a data center scale, right? How much of the power you drop, the utility grid actually goes into the RIT equipment. But it assumes that all IT power is good power. And I'm here to tell you that's not the case.
Starting point is 00:04:52 There's a lot of power on the IT side that is overhead and certainly could and should be reduced over time. But the real vision that we're after, I think, is to enable someone to make energy efficiency and carbon footprint-based decisions on what to run and where and how to run it. That's the vision. And we talked about that at the Global Summit a year ago. We talked about it in the Regional Summit in Lisbon. The OCP is kind of in a unique position where we have all the right players to go establish and set required profiles for compliant hardware that will make it possible
Starting point is 00:05:26 for a software operator, even someone who doesn't run and operate the equipment themselves, to have enough information to figure out what their dynamic footprint is. And I agree with Alex that the software itself doesn't have a huge footprint or anything, or as much software as you want, and not necessarily bring any more carbon than what it takes to store it. But we have seen that getting the same work done in software can have two orders of magnitude difference in runtime and thus operational footprint. So you can optimize your software, reduce the energy, reduce the carbon footprint. And then if you had insight into it, you could also run it somewhere with lower intensity and better sustainability. But that's going to take
Starting point is 00:06:02 several steps of progress. We're making progress on all fronts, but I can't give you an ETA yet where every operator everywhere will be able to see the footprint of what they're doing. I think that you guys were a bit prophetic in terms of focusing the organization in this direction and forming a project because AI has come into the fold of everybody's thinking. And all of a sudden, even those people who would poo-poo compute sustainability initiatives are now paying attention just based on the power draw and the constraints that these new platforms represent to data centers and their broad proliferation as our large cloud players seek to advance AI at a frenetic pace. Can you talk a little bit about
Starting point is 00:06:41 what this has meant for the compute sustainability initiative? And Alex, I think you had something else that you wanted to share about Eric's last comment, if you wanted to work that into your answer. Yeah, sure. I mean, I think that this pertains to the question that you just asked, which is a big one. What's happening driven by AI generally is that we're just building more. And Eric can get into the details of how it's changing the architecture within the data center and how that affects environmental impacts. But even just from that high level,
Starting point is 00:07:08 thinking about how much more infrastructure we anticipate building, how much more we're building right now because of AI demands, it completely changes the calculus in terms of environmental impact. You mentioned in your question, Alison, the power that the data center consumes and that's sort of top of everyone's mind. But there are other impacts as well, you know, certainly to land and local ecosystems. But I think top of mind for many of the biggest data center developers is this concept of embodied carbon, which is all of the carbon that's emitted in the process of extracting raw materials, manufacturing the core and shell of the data center, the equipment that goes into the data center, you know, MEP and IT, all of that carbon that's emitted before those
Starting point is 00:07:49 materials and equipment even reach the gate of the data center. And for many data center operators, that's the vast majority of their carbon footprint. A lot of data center operators have been at the forefront of renewable energy procurement, blunting the impact of their power consumption when it comes to carbon emissions. There are all kinds of other challenges with power consumption. Top of mind is power availability, but just looking at the carbon footprint, this topic of embodied carbon emissions is at the top of the list. And what we've done at Open Compute Project is a couple of things, but one of the projects that we've undertaken is a joint initiative with the iMasons Climate Accord. iMasons is an important industry organization for
Starting point is 00:08:26 digital infrastructure operators and consumers. And we're working with that organization on a joint project around carbon disclosure. So trying to standardize, however we can, the information that vendors to the data center industry provide on the embodied carbon of the product that they're bringing to the market, whether that's raw materials or finished equipment. And so the more that we can recruit suppliers to in some way start to measure and report on those numbers, the bigger the database that we can develop of embodied carbon numbers for different product categories, the more we can learn about how to address that embodied carbon, how to mitigate it over time at a pace that's compliant with our shared carbon emissions goals, which broadly stated are to reach net
Starting point is 00:09:10 zero by mid-century. Yeah, and I think what I would add is the AI frenzy has been largely catalyzed by the growth of generative AI and large language models. And those have in common with HPC the physical size of the cluster. If you have an Amdahl problem, you're trying to steal the thousands of endpoints in a node. The speed of light is actually one of your limiters. And so trying to put these closer together, I actually have performance value. So AI is both a challenge and an opportunity here to dig up a little concept tray that gives us the opportunity to do much denser racks with much less overhead and loss and less physical material.
Starting point is 00:09:46 So dedicated AI infrastructure actually could have a much more concentrated build out with much more modernized approaches and way better efficiency and lower overhead. But to the point Alex was on, the AI treadmill is actually much steeper than Moore's law. I don't know if you've seen any articles about that. But a two-year-old AI system is extremely obsolete. The idea that you can reduce your body footprint by keeping things in a service locker, that's going to be a real challenge. You know, this is where I was going to go next, which is the performance requirements are just, I've never seen anything like it. And I've been in the compute space for quite a long time. Eric, when you look at the performance requirements from the cloud service providers and what
Starting point is 00:10:30 they're demanding for this AI training workload, and you see the power draw that's coming, and I understand that the full carbon footprint needs to be kept in mind, and I get that. But just the power draw alone is leading people to consider just esoteric power generation and new rack cabling and all sorts of different investment in the data center. How does sustainability have a chance when it comes to this? I think it runs the risk of being an afterthought again, where you do whatever you have to do and then clean it up later. We are certainly facing that challenge, particularly on liquid cooling. One of the other overlaps with the sustainability project is the data
Starting point is 00:11:09 center infrastructure and the cooling project in particular liquid cooling. Because you can imagine if you start to try and generate all this computing in a smaller, smaller volumetric form factor, at some point pretty soon you outstrip the ability of either efficient air cooling, where the overhead for the energy to do air cooling starts to climb non-linearly, affecting cubic function or case temperature, or you could just outspit air cooling altogether. The problem being, of course, the liquids that behave well, that are available at a rational cost per gallon, have a low boiling point, low viscosity, so you can use them for two-phase liquid. Those are PFAS chemicals.
Starting point is 00:11:46 With any luck, those will be banned globally very soon. They're not, you know, got to be a little careful about painting everything with a broad brush, but obviously forever chemicals in particular need to go away. And we all, I think, acknowledge that, but we're also very impatient, very high performance. So on the one hand, AI in the big training clusters and this huge investment that we're on with billions of dollars per year from the big guys will push the envelope of liquid cooling, which is way more efficient than any kind of air cooling. But on the other hand, it gives us the offer, the risk that we adopt something we're going to have to replace very quickly in terms of what the liquids are. So there's a lot of tension in the system here. The bigger players in the ecosystem, you know, household names that have gigawatts of data centers based globally, big cloud companies, they're already, like Alex said, pushing the
Starting point is 00:12:34 envelope on efficiency and on sustainable energy. In many cases, they've become energy vendors themselves. There wasn't enough sustainability where they operate for them to hit their sustainability goals. So they went and built a farm that's bigger than they need. And they take all of their energy from that to be renewable and then sell their ethnic grid. The sustainability pushes has actually caused some very good behavior. Most of these companies are trying to be good, good global citizens, but they are driven by a board of directors in the bottom line.
Starting point is 00:13:03 Now, Alex, you know, I know that you and your day job are all about power delivery in the data center. And one of the things that I see a tremendous amount of opportunity for is the power and cooling technologies that are going into these very specialized buildings. What do you see are the trends in that space? And what do you think the industry can do to help with some of those new challenges? Despite my day job employer, I do think that Eric is better suited to answer this question. So I'm curious as to what Eric would say first. So I think we're going to see, you know, just like we saw the big guys, they figured out that a traditional 12 or 15 kilowatt rack wasn't going to cut it.
Starting point is 00:13:39 In fact, the leading press gathering system right now, the MBL72 rack from NVIDIA, the building block node is in the 11 kilowatt range. And you want eight of those in a rack. That's a whole bunch of specialty switching equipment. So these are 100 kilowatt racks already and heading up from there. Now, like I said, I think that has the potential to greatly improve how much power delivery and cooling overhead there is. The fraction of energy that is overhead in that rack is going to get smaller. And there are going to be fewer physical racks. You can imagine 100 kilowatt rack is going to displace 7, 15 kilowatt racks. So the physical
Starting point is 00:14:16 space, the brick and mortar and steel and concrete, can get much smaller. You can have a lower footprint data center in terms of total material. You can have a lower footprint data center in terms of total material. You can have a lower footprint data center in terms of total losses and energy delivery and conditioning equipment. And then liquid cooling will drive the PUE well below 1.1. So the overhead for non-IT power should come way down. And what we've seen from the big guys is once they invest in this highly customized rack, they just use it for everything. So it's happened twice now. Interesting. They had to go build a full custom rack to do the AI thing they wanted to do. And once they had it custom made, well, we went to
Starting point is 00:14:49 all the trouble to build it. Let's just use it everywhere. And so I think it actually can catalyze very rapid progress in terms of modernization of equipment. But I do still hold that reservation that the AI equipment itself needs to find its way to life somewhere. Now, Alex, you started this with a conversation around full embedded carbon and circularity. We've discussed in the past modular configurations and other new approaches to sustainability. Has OCP advanced consideration of design of form factors and other things that will help facilitate this adoption? We have a data center facilities project at OTP, which is focused on this,
Starting point is 00:15:27 which honestly would be probably better suited to answer than I would. I don't know, Eric, if you have ideas about overall form factor beyond modularity in terms of what we can do for sustainability. I think quite to two things, I think. The open rack is certainly gaining broad traction and it's got a couple things going for it right instead of
Starting point is 00:15:46 silver box power supply is redundant in every rack mounted device it has dc power delivery and on orv3 that's a 48 volt it requires less copper you know lower curry so lower conduction losses and delivering the power less materials to deliver the power and the power supply is actually historically pretty fragile feed So probably better reliability, but the jury's out. The open rack is one thing that's modular and changing the industry pretty broadly, gaining a lot of traction. The other thing I would point to, which is probably lesser known, is Alex mentioned the server project, the DCSCM, the server compute module.
Starting point is 00:16:20 That form factor is actually being adopted by some of this AI hardware, because it turns out what we had in the specs, the OAM, the accelerator module, that form factor is actually being adopted by some of this AI hardware. Because it turns out what we had in the specs, the OAM, the accelerator module, isn't big enough for some of these new AI devices. They needed something the next size up, the level of integration they wanted. And the DCSM form factor has been taking off as the big bad accelerator for AI. And that's good, right? Because that means that as we push for a second life and other usage for those things, when they can't be the flagship AI training
Starting point is 00:16:49 course anymore, there should be a place to plug them in somewhere else, which is something that wouldn't happen with the full custom design. Traditionally in HPC, if you had a full custom HPC element that only works in this cluster, once this cluster is done with it, it's got nowhere else to go. Yeah, that makes a lot of sense. Now, as we look forward, obviously the entire industry is thinking about how to make data centers more sustainable as greenfield development continues to grow,
Starting point is 00:17:16 as compute density continues to grow. There's so many things that we need to do. What other areas do you see as getting into the forefront of thinking within the OCP project? And is there anything else that you would like to mention that our listeners should be aware of? We have a number of subprojects under the OCP sustainability project. We have projects on sustainability metrics, which is important to Eric's earlier point in terms of PUE having perhaps reached its full usefulness and needing to move on to new sustainability metrics, which is important to Eric's earlier point in terms of PUE having perhaps reached its full usefulness and needing to move on to new sustainability metrics.
Starting point is 00:17:49 We have projects on power telemetry inside the data center, how we're gathering the data we need to understand where power is being used and where there may be opportunities for efficiency. But one that pops out to me in terms of how you framed your question is circularity. So we have a carbon accounting for circularity workstream. It's dedicated to figuring out once you adopt a circularity practice within the data center, how to account for the decrease in embodied carbon in particular, operational emissions, depending on what the circularity intervention is, so that we can take credit for those in the right way without double counting. But I think circularity is an important broad topic for the industry as we contemplate this incredible grow up, this incredible build out associated with AI demands. Because the more we can reuse equipment
Starting point is 00:18:35 rather than build from scratch, that has the biggest overall effect on blunting that embodied carbon impact and blunting all of the other associated environmental impacts that come along with building new equipment, extracting new materials. So we're looking as an industry for ways to, of course, take back and recycle and reuse equipment at end of life, but also prolong the life of equipment, doing things like manage spares inside the data center, modernizing equipment to become more digital and connected so that it can be serviced based on need rather than based on schedule, reduced truck rolls, all of that. So the more we can think about extending the useful life of a piece of equipment and extending the life of the materials inside that equipment, we will be both saving costs and saving carbon throughout the life cycle.
Starting point is 00:19:21 That's awesome. Now we know that OCP Summit is coming up and this is a big moment for this project. What should we expect at the Conference for Sustainability? And is there anything that you would suggest our listeners to do to prepare for the conference? We'll get updates on all of those sub-projects that I ran through quite quickly just now. So you'll get to hear what the output of those projects was, what were developed in terms of our thinking in terms of sustainability metrics, power telemetry, and the rest. On the OCP iMasons collaboration that I referenced earlier, we're going to use the summit in the fall as an opportunity to reveal the standardized carbon disclosure questionnaire that we've been
Starting point is 00:20:01 developing and circulating for feedback. So that'll be an unveiling for the big output from that project and commitments for those involved to start using that disclosure, both as a data center operator from the perspective of an RFP and from the supplier committing to starting that embodied carbon measurements that we can build that database of embodied carbon information. And then the third thing is that we'll be able to report on how sustainability itself is evolving in OCP. So from the perspective of the governance of the foundation, what we're doing to make sure that we are using those criteria I was describing earlier, building on those criteria, making sure that every project that we undertake within OCP and the innovations that we put the OCP stamp on are advancing sustainability at pace, given the great need that we have in our growing industry. Eric, anything to add?
Starting point is 00:20:50 I think the other thing I would add is that there will be always one thing that Global Summit is great at is DevOps. There'll be a lot of partners with technology and booths on your AI topic, on a liquid cooling topic. In particular, I think a lot of the maybe boring or geekish plumbing sort of chose behind these initiatives. We've made very good progress on that. So the manageability project, we'll be talking about profiles to track and
Starting point is 00:21:13 expose all this information, assuming we can get the ecosystem to make that information exist in the first place. One thing I'd come back to you, right, on that kind of vision we talked about at the outset, in order to be able to see the footprint of everything you can on digital infrastructure, you need a way wherever you run. If you own your own equipment or you log into the public cloud, you need to be able to get an effective inventory of what resources are allocated to you and the footprint associated with those resources.
Starting point is 00:21:42 And then you need to know how much energy for how long you consumed in doing what it is they're doing, and then see the carbon intensity of that. And we'll be talking more about that at the summit. The idea that we can get these databases that Alex is talking about and access to them as a standard profile thing. You can imagine in reality and hardware, in order to have such information, an API or an interface that you call to get this information and the data schema so you can understand the information that's returned.
Starting point is 00:22:09 And those profiles, I think you've made very good progress on. I think there's still a long way to go. The Stability Project got lots of players working in lots of different directions. And like Alex said, we'll give an update on all that stuff. A lot going on. And as you've observed, a lot of pressure to do better as soon as we can. Awesome. Well, thanks, guys, for being on the show today. I just have one final thing for you.
Starting point is 00:22:31 Where can folks engage with the project as well as engage with you? Project itself, I think if you go to ocp.org, all the projects are listed there. If you click through to sustainability, embarrassingly enough, you'll see Alex and I right there.
Starting point is 00:22:44 That's the leads of the project, it's got contact information. Obviously anyone can join. You want to contribute or start downloading things and you'll have to actually join somehow to attend meetings and consume collateral. All the open project is open, right? All the stuff we work on is open. Thank you so much for being on the show today. It was a real pleasure. Thanks, Alison. Thanks for joining The Tech Arena. Subscribe and engage at our website, thetecharena.net.
Starting point is 00:23:18 All content is copyright by The Tech Arena.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.