In The Arena by TechArena - Sneak peek: What to expect at the OCP conference with Schneider Electric’s Alex Rakow and Intel’s Eric Dahlen
Episode Date: July 26, 2024Join host Allyson Klein in this insightful episode of Tech Arena, featuring Eric Dahlen from Intel and Alex Rakow from Schneider Electric. As co-chairs of the Compute Sustainability group within the O...pen Compute Project, Eric and Alex discuss their roles, the initiative's goals, and the impact of AI on data center sustainability. They delve into the challenges and innovations in power and cooling technologies, embodied carbon, and circularity practices. Get a sneak peek into what to expect at the upcoming OCP Summit and how industry leaders are pushing the boundaries of sustainable technology.
Transcript
Discussion (0)
Welcome to the Tech Arena,
featuring authentic discussions between
tech's leading innovators and our host, Alison Klein.
Now, let's step into the arena.
Welcome to the Tech Arena. My name is Alison Klein, and I am very excited for today's
episode. I've got Eric Dalen of Intel and Alex Rayko of Schneider Electric in the studio with me.
Welcome to the program, guys. Thanks so much for having us. So you both have been on the show
before, and you are co-chairs of a very important group within the Open Computing Initiative,
Compute Sustainability.
Why don't we just start with introductions about your day jobs, what you do at OCP, and a little bit about the background on the goals of the initiative that you drive together. I lead sustainability for the data center segment at Schneider. So I'm working on our strategy within Schneider for how we partner with
our data center customers, our data center partners to advance what at the end of the day are often
common sustainability goals and how we overcome sustainability challenges that we share.
And we can certainly talk a lot more about what those are through the course of the conversation.
With OCP, I have for several years led a sustainability project, which in 2022,
sort of the end of 2022, became one of the major projects at OCP and the fifth tenant that drives
all the work that OCP does. I can describe more how that works and how we've set it up once Eric
has a chance to introduce himself. All right. Yeah. Thanks, Alex. And I'm Eric Galen. I'm a senior PE in the data center and AI group at Intel, and I'm our lead cloud technologist
and I'm the primary technical support for the corporate sustainability product officer. So
Intel stood up a key level position for sustainability of Intel products. And I support
that person too. I actually came into the sustainability project from the steering
committee. I was the original sustainability sustainability project from the steering committee.
I was the original sustainability steering committee rep.
And then when I handed that off to Shruti from Microsoft, I was able to come down and
actually co-lead the project, the sustainability project.
That was, I think, the original intent, except that it's probably not cool to oversee yourself
from a steering committee.
When you think about what you've co-led for a significant amount
of time now, can you just give us an introduction on what are the objectives of the program and
what were they at the start and what are they today? So when I started working with Open Computing
Project on my end, which I think is more recent than Eric, but back in 2022, sustainability was
a strategic initiative for OCP. So not a
named project. There were lots of ways in which sustainability intersected with all of the other
projects within OCP, cooling environments, data center facilities, and so on. And then in 2022,
we, along with the foundation, decided that sustainability was important enough to the
members, the member organizations, the foundation, and all the projects that we're working on, that it was worth us making it a named project and sort
of wrapping it with the organizational trimmings that would allow us to do more work under that
umbrella. And in doing so, as I referenced briefly earlier, we made sustainability the fifth tenant
at OCP, meaning it's part of how we evaluate new innovations, new contributions to the Open Compute
project. So at that time, we devised some general umbrella criteria for evaluating sustainability
contributions. We wanted to make sure that the sustainability attribute that was being described
for whatever new solution was being brought to OCP was meaningful, that it was moving the ball
forward in terms of advancing sustainability, that it was measurable,
that there was some sort of metric behind whatever it was that you were claiming in
terms of sustainability benefit, and that it was relevant to the solution. Software solutions have
very low embodied carbon. We don't need to talk about embodied carbon for software, that sort of
thing. It needs to be relevant to whatever category it is that the solution provider is
bringing that to OCP. So that's part of what we've done. And then as a major project at OCP, we have several sub-projects that we've devoted a lot of
resources to and have made a lot of progress on in the past couple of years. And we can dig into
those as the conversation progresses, but let Eric get in as well. Yeah. And I think for me,
I'm kind of a dyed-in-the-wool hardware geek. And part of what we were after was trying to enable
the ecosystem that generates compute infrastructure to tell where they were on efficiency. There's
more to sustainability than just efficiency, but I definitely came at this from a compute-per-wad
perspective. What we had in the industry from the Green Grid and others was P-RE, which is
greater than a data center scale, right? How much of the power you drop, the utility grid actually goes into the RIT equipment.
But it assumes that all IT power is good power.
And I'm here to tell you that's not the case.
There's a lot of power on the IT side that is overhead and certainly could and should
be reduced over time.
But the real vision that we're after, I think, is to enable someone to make energy efficiency
and carbon footprint-based
decisions on what to run and where and how to run it. That's the vision. And we talked about that at
the Global Summit a year ago. We talked about it in the Regional Summit in Lisbon. The OCP is kind
of in a unique position where we have all the right players to go establish and set required
profiles for compliant hardware that will make it possible
for a software operator, even someone who doesn't run and operate the equipment themselves,
to have enough information to figure out what their dynamic footprint is.
And I agree with Alex that the software itself doesn't have a huge footprint or anything,
or as much software as you want, and not necessarily bring any more carbon than what
it takes to store it. But we have seen that getting the same work done in software can have two orders of magnitude
difference in runtime and thus operational footprint. So you can optimize your software,
reduce the energy, reduce the carbon footprint. And then if you had insight into it, you could
also run it somewhere with lower intensity and better sustainability. But that's going to take
several steps of progress. We're making progress on all fronts, but I can't give you an ETA yet where every operator everywhere will be
able to see the footprint of what they're doing. I think that you guys were a bit prophetic in
terms of focusing the organization in this direction and forming a project because AI
has come into the fold of everybody's thinking. And all of a sudden, even those people who would
poo-poo compute
sustainability initiatives are now paying attention just based on the power draw and the
constraints that these new platforms represent to data centers and their broad proliferation as
our large cloud players seek to advance AI at a frenetic pace. Can you talk a little bit about
what this has meant for the compute sustainability initiative? And Alex, I think you had something else that you wanted to share about Eric's last comment,
if you wanted to work that into your answer.
Yeah, sure.
I mean, I think that this pertains to the question that you just asked, which is a big
one.
What's happening driven by AI generally is that we're just building more.
And Eric can get into the details of how it's changing the architecture within the data
center and how that affects environmental impacts. But even just from that high level,
thinking about how much more infrastructure we anticipate building, how much more we're building
right now because of AI demands, it completely changes the calculus in terms of environmental
impact. You mentioned in your question, Alison, the power that the data center consumes and that's
sort of top of everyone's mind. But there are other impacts as well, you know, certainly to land
and local ecosystems. But I think top of mind for many of the biggest data center developers
is this concept of embodied carbon, which is all of the carbon that's emitted in the process of
extracting raw materials, manufacturing the core and shell of the data center, the equipment that
goes into the data center, you know, MEP and IT, all of that carbon that's emitted before those
materials and equipment even reach the gate of the data center. And for many data center operators,
that's the vast majority of their carbon footprint. A lot of data center operators have been at the
forefront of renewable energy procurement, blunting the impact of their power consumption when it
comes to carbon emissions.
There are all kinds of other challenges with power consumption. Top of mind is power availability, but just looking at the carbon footprint, this topic of embodied carbon emissions is at the top
of the list. And what we've done at Open Compute Project is a couple of things, but one of the
projects that we've undertaken is a joint initiative with the iMasons Climate Accord.
iMasons is an important industry organization for
digital infrastructure operators and consumers. And we're working with that organization on a
joint project around carbon disclosure. So trying to standardize, however we can, the information
that vendors to the data center industry provide on the embodied carbon of the product that they're
bringing to the market, whether that's raw materials or finished equipment. And so the more that we can recruit suppliers to
in some way start to measure and report on those numbers, the bigger the database that we can
develop of embodied carbon numbers for different product categories, the more we can learn about
how to address that embodied carbon, how to mitigate it over time at a pace that's
compliant with our shared carbon emissions goals, which broadly stated are to reach net
zero by mid-century.
Yeah, and I think what I would add is the AI frenzy has been largely catalyzed by the
growth of generative AI and large language models.
And those have in common with HPC the physical size of the cluster.
If you have an Amdahl problem, you're trying to steal the thousands of endpoints in a node.
The speed of light is actually one of your limiters.
And so trying to put these closer together, I actually have performance value.
So AI is both a challenge and an opportunity here to dig up a little concept tray that gives us the opportunity to do much denser racks with much less overhead and loss and less physical material.
So dedicated AI infrastructure actually could have a much more concentrated build out with
much more modernized approaches and way better efficiency and lower overhead. But to the point
Alex was on, the AI treadmill is actually much steeper than Moore's law. I don't know if you've seen any articles about that.
But a two-year-old AI system is extremely obsolete.
The idea that you can reduce your body footprint by keeping things in a service locker, that's going to be a real challenge.
You know, this is where I was going to go next, which is the performance requirements are just, I've never seen anything like it.
And I've been in the compute space for quite a long time.
Eric, when you look at the performance requirements from the cloud service providers and what
they're demanding for this AI training workload, and you see the power draw that's coming,
and I understand that the full carbon footprint needs to be kept in mind, and I get that.
But just the power draw alone is leading people to consider just
esoteric power generation and new rack cabling and all sorts of different investment in the
data center. How does sustainability have a chance when it comes to this?
I think it runs the risk of being an afterthought again, where you do whatever you have to do and
then clean it up later. We are certainly facing that challenge, particularly on liquid cooling.
One of the other overlaps with the sustainability project is the data
center infrastructure and the cooling project in particular liquid cooling.
Because you can imagine if you start to try and generate all this computing
in a smaller, smaller volumetric form factor, at some point pretty soon you
outstrip the ability of either efficient air cooling, where the overhead for
the energy to do air cooling starts to climb non-linearly, affecting cubic function or case
temperature, or you could just outspit air cooling altogether. The problem being, of course,
the liquids that behave well, that are available at a rational cost per gallon, have a low boiling
point, low viscosity, so you can use them for two-phase liquid. Those are PFAS chemicals.
With any luck, those will be banned globally very soon.
They're not, you know, got to be a little careful about painting everything with a broad brush,
but obviously forever chemicals in particular need to go away.
And we all, I think, acknowledge that, but we're also very impatient, very high performance.
So on the one hand, AI in the big training clusters and this huge investment that we're on with billions of dollars per year from the big guys will push the envelope of liquid cooling, which is way more efficient than any kind of air cooling.
But on the other hand, it gives us the offer, the risk that we adopt something we're going to have to replace very quickly in terms of what the liquids are.
So there's a lot of tension in the system here.
The bigger players in the ecosystem, you know, household names that have gigawatts of data centers based globally, big cloud companies, they're already, like Alex said, pushing the
envelope on efficiency and on sustainable energy.
In many cases, they've become energy vendors themselves.
There wasn't enough sustainability where they operate for them to hit their sustainability
goals.
So they went and built a farm that's bigger than they need. And they take
all of their energy from that to be renewable and then sell their ethnic grid. The sustainability
pushes has actually caused some very good behavior. Most of these companies are trying to be good,
good global citizens, but they are driven by a board of directors in the bottom line.
Now, Alex, you know, I know that you and your day job are all about power delivery in the data center.
And one of the things that I see a tremendous amount of opportunity for is the power and cooling technologies that are going into these very specialized buildings.
What do you see are the trends in that space?
And what do you think the industry can do to help with some of those new challenges?
Despite my day job employer, I do think that Eric is better suited to answer this question.
So I'm curious as to what Eric would say first.
So I think we're going to see, you know, just like we saw the big guys, they figured out
that a traditional 12 or 15 kilowatt rack wasn't going to cut it.
In fact, the leading press gathering system right now, the MBL72 rack from NVIDIA, the building block node is in the 11 kilowatt range.
And you want eight of those in a rack.
That's a whole bunch of specialty switching equipment.
So these are 100 kilowatt racks already and heading up from there.
Now, like I said, I think that has the potential to greatly improve how much power delivery and cooling overhead there is.
The fraction of energy that
is overhead in that rack is going to get smaller. And there are going to be fewer physical racks.
You can imagine 100 kilowatt rack is going to displace 7, 15 kilowatt racks. So the physical
space, the brick and mortar and steel and concrete, can get much smaller. You can have a lower
footprint data center in terms of total material. You can have a lower footprint data center in terms of total material. You can have a lower footprint data center in terms of total losses and energy delivery and conditioning
equipment. And then liquid cooling will drive the PUE well below 1.1. So the overhead for non-IT
power should come way down. And what we've seen from the big guys is once they invest in this
highly customized rack, they just use it for everything. So it's happened twice now.
Interesting.
They had to go build a full custom
rack to do the AI thing they wanted to do. And once they had it custom made, well, we went to
all the trouble to build it. Let's just use it everywhere. And so I think it actually can
catalyze very rapid progress in terms of modernization of equipment. But I do still
hold that reservation that the AI equipment itself needs to find its way to life somewhere.
Now, Alex, you started this with a conversation around full embedded carbon and circularity.
We've discussed in the past modular configurations and other new approaches to sustainability.
Has OCP advanced consideration of design of form factors and other things that will help facilitate this adoption?
We have a data center facilities project at OTP,
which is focused on this,
which honestly would be probably better suited to answer
than I would.
I don't know, Eric, if you have ideas
about overall form factor beyond modularity
in terms of what we can do for sustainability.
I think quite to two things, I think.
The open rack is certainly gaining broad traction
and it's got a couple things going for it right instead of
silver box power supply is redundant in every rack mounted device it has dc power delivery
and on orv3 that's a 48 volt it requires less copper you know lower curry so lower
conduction losses and delivering the power less materials to deliver the power and the power
supply is actually historically pretty fragile feed So probably better reliability, but the jury's out.
The open rack is one thing that's modular and changing the industry pretty
broadly, gaining a lot of traction.
The other thing I would point to, which is probably lesser known, is Alex
mentioned the server project, the DCSCM, the server compute module.
That form factor is actually being adopted by some of this AI hardware,
because it turns out what we had in the specs, the OAM, the accelerator module, that form factor is actually being adopted by some of this AI hardware.
Because it turns out what we had in the specs, the OAM, the accelerator module, isn't big enough for some of these new AI devices.
They needed something the next size up, the level of integration they wanted.
And the DCSM form factor has been taking off as the big bad accelerator for AI.
And that's good, right?
Because that means that as we push for
a second life and other usage for those things, when they can't be the flagship AI training
course anymore, there should be a place to plug them in somewhere else, which is something that
wouldn't happen with the full custom design. Traditionally in HPC, if you had a full custom
HPC element that only works in this cluster, once this cluster is done with it, it's got nowhere
else to go. Yeah, that makes a lot of sense.
Now, as we look forward,
obviously the entire industry is thinking
about how to make data centers more sustainable
as greenfield development continues to grow,
as compute density continues to grow.
There's so many things that we need to do.
What other areas do you see
as getting into the forefront of thinking within the OCP project?
And is there anything else that you would like to mention that our listeners should be aware of?
We have a number of subprojects under the OCP sustainability project.
We have projects on sustainability metrics, which is important to Eric's earlier point in terms of PUE having perhaps reached its full usefulness and needing to move on to new sustainability metrics, which is important to Eric's earlier point in terms of PUE having
perhaps reached its full usefulness and needing to move on to new sustainability metrics.
We have projects on power telemetry inside the data center, how we're gathering the data we
need to understand where power is being used and where there may be opportunities for efficiency.
But one that pops out to me in terms of how you framed your question is circularity. So we have
a carbon accounting for circularity workstream. It's dedicated to figuring out once you adopt a circularity practice within
the data center, how to account for the decrease in embodied carbon in particular, operational
emissions, depending on what the circularity intervention is, so that we can take credit for
those in the right way without double counting. But I think circularity is an important broad topic for the industry as we contemplate this incredible grow
up, this incredible build out associated with AI demands. Because the more we can reuse equipment
rather than build from scratch, that has the biggest overall effect on blunting that embodied
carbon impact and blunting all of the other associated environmental impacts that come along with building new equipment, extracting new materials. So we're looking as
an industry for ways to, of course, take back and recycle and reuse equipment at end of life,
but also prolong the life of equipment, doing things like manage spares inside the data center,
modernizing equipment to become more digital and connected so that it can be serviced based on need
rather than based on schedule, reduced truck rolls, all of that. So the more we can think about
extending the useful life of a piece of equipment and extending the life of the materials inside
that equipment, we will be both saving costs and saving carbon throughout the life cycle.
That's awesome. Now we know that OCP Summit is coming up and this is a big moment for this
project. What should we expect at the Conference for Sustainability? And is there anything that
you would suggest our listeners to do to prepare for the conference? We'll get updates on all of
those sub-projects that I ran through quite quickly just now. So you'll get to hear what
the output of those projects was, what were developed in terms
of our thinking in terms of sustainability metrics, power telemetry, and the rest. On the
OCP iMasons collaboration that I referenced earlier, we're going to use the summit in the
fall as an opportunity to reveal the standardized carbon disclosure questionnaire that we've been
developing and circulating for feedback. So that'll be an unveiling for the big output from that project and commitments for those
involved to start using that disclosure, both as a data center operator from the perspective
of an RFP and from the supplier committing to starting that embodied carbon measurements
that we can build that database of embodied carbon information.
And then the third thing is that we'll be able to report on how sustainability itself
is evolving in OCP. So from the perspective of the governance of the foundation, what we're doing to make sure that we are using those criteria I was describing earlier, building on those criteria, making sure that every project that we undertake within OCP and the innovations that we put the OCP stamp on are advancing sustainability at pace, given
the great need that we have in our growing industry.
Eric, anything to add?
I think the other thing I would add is that there will be always one thing that Global
Summit is great at is DevOps.
There'll be a lot of partners with technology and booths on your AI topic, on a liquid cooling
topic.
In particular, I think a lot of the maybe boring or geekish plumbing
sort of chose behind these initiatives.
We've made very good progress on that.
So the manageability project, we'll be talking about profiles to track and
expose all this information, assuming we can get the ecosystem to make that
information exist in the first place.
One thing I'd come back to you, right, on that kind of vision we talked about at
the outset, in order to be able to see the footprint of everything you can on digital infrastructure,
you need a way wherever you run.
If you own your own equipment or you log into the public cloud,
you need to be able to get an effective inventory of what resources are allocated to you
and the footprint associated with those resources.
And then you need to know how much energy for how long you consumed
in doing what it is they're doing, and then see the carbon intensity of that.
And we'll be talking more about that at the summit.
The idea that we can get these databases that Alex is talking about
and access to them as a standard profile thing.
You can imagine in reality and hardware, in order to have such information,
an API or an interface that you call to get
this information and the data schema so you can understand the information that's returned.
And those profiles, I think you've made very good progress on. I think there's still a long way to
go. The Stability Project got lots of players working in lots of different directions. And
like Alex said, we'll give an update on all that stuff. A lot going on. And as you've observed,
a lot of pressure to do better as soon as we can.
Awesome.
Well, thanks, guys,
for being on the show today.
I just have one final thing for you.
Where can folks engage with the project
as well as engage with you?
Project itself,
I think if you go to ocp.org,
all the projects are listed there.
If you click through to sustainability,
embarrassingly enough,
you'll see Alex and I right there.
That's the leads of the project, it's got contact information. Obviously anyone can join. You
want to contribute or start downloading things and you'll have to actually join somehow to attend
meetings and consume collateral. All the open project is open, right? All the stuff we work
on is open. Thank you so much for being on the show today. It was a real pleasure.
Thanks, Alison.
Thanks for joining The Tech Arena.
Subscribe and engage at our website,
thetecharena.net.
All content is copyright by The Tech Arena.