In The Arena by TechArena - A Sustainable Data Center Future with Meta and OCP Lead Dharmesh Jani
Episode Date: October 17, 2023TechArena host Allyson Klein chats with Meta and OCP lead Dharmesh Jani about his vision for sustainability innovation for the data center and how the OCP sustainability initiative came into being....
Transcript
Discussion (0)
Welcome to the Tech Arena, featuring authentic discussions between tech's leading innovators
and our host, Alison Klein.
Now let's step into the arena.
Welcome to the Tech Arena. My name is Alison Klein. I'm coming to you from the OCP
Summit in San Jose, and I'm joined by Dharmesh Jani, also known as DJ. He's a lead of ecosystem
and partnership for infrastructure at Meta and chairs the sustainability initiative and co-chairs
the OCP incububation Committee.
I'm so delighted to have you with me, DJ. Welcome to the program.
Thank you, Alison, for having me, and I'm delighted to join you in this conversation today.
Last year was a huge year for the OCP Summit, and you were a big part of that.
You came on stage and stated,
there is no army mighty enough to stop the march of an
idea whose time has come. And you proceeded to roll out OCP's Tenet on Sustainability that really
shook the industry. Today we're kicking off a series of discussions on OCP's Sustainability
Initiative starting with this interview. And I wanted to take us back to the beginning because you were there from the ground up.
How did the focus on sustainability start in OCP?
It is an interesting journey for sure.
I joined Meta and one of my first responsibilities was to join OCP Incubation Committee.
And at that time, OCP Incubation Committee was playing an active role
in determining contributions, deciding what, you know, specifications will be approved
or rejected, etc. And when I joined, there were around 13 technologists part of that
Incubation Committee,, wealth of experience in industry
represented by those set of people. I calculated actually that it was over 100 years of experience
in that combined group. And all we did was to decide on and dispose of specifications,
whether they should be approved or not. I felt, being a very strong technologist myself,
that I wanted to do more than just sit and decide on specs.
So I started, one of the first things was to change the role of incubation committee.
And I wanted them to play a role in determining the future technology direction within OCP
rather than what companies are bringing in and you just are a gatekeeper.
I wanted them to play a strategic role.
So with that, I transformed incubation committee's role.
I presented to the board and I brought in strategic roadmap and technology direction
as part of a role which incubation committee should play.
That got approved.
And then the next step was, so exactly how do we go about doing this?
The first year, we solicited a set of ideas from the group, from the industry, and it led to four initiatives, which included
modular systems and a few other ideas. And it was still something which we were working through to
figure out how to do this. 2019-2020 became a very interesting year. In 2020, we actually went
through this process in a lot more systematic manner.
We hired agency from outside to generally talk about where the industry was going.
And at the same time, I was settling in within Meta with my tenure being built. I started having more
cross-functional conversations within Meta and I bumped into a group which was trying to drive sustainability
work and they needed some advice from me and I was working with them.
And that's when I had an epiphany that we are doing all this work on sustainability.
I'm part of Open Compute project and there is not a sound or word of sustainability within
Open Compute. a sound or world of sustainability within open compute.
I found that disconnect, that silence,
actually very jarring in my ear.
And one of the first things I did in 2020
was to propose sustainability as part of strategic initiative.
In 2020, on 3rd December,
I presented sustainability should be accepted as one of the strategic initiative i championed for it within incubation committee and with the ocp board as
well that was actually 100 and precisely i calculated 147 days to today which is roughly two years and 10 months and two weeks from today and people
listened to me and there were other ideas which they started discussing so there were 14 ideas
that year which were presented and we can't pursue all of them the plan was to pursue top three so i actively championed with every member
saying why sustainability is important including meta representatives in ic who themselves were
pushing for other areas and i told them that sustainability is very important because in
coming years we are going to actually build more emphasis on it and data center gear consumes some
of the largest carbon footprint to my pleasant surprise after all the championing i had done
with various members i got massive support from most of them and sustainability became number one voted initiative for 2021 strategic group, which was fantastic.
It was a first step.
And that's how our journey started.
When you talk about this, I can just sense from you what an imperative you felt this was.
And why do you think it's such an imperative that the entire industry focus on sustainable compute and the entire value chain works together and when you think about that
please put some context on where energy consumption trends are today and why this
is something that was of so much focus for you that's a great point. Sustainability is a very complicated subject.
And hence, it requires everyone to communicate in a manner which is easily understandable.
And that's a toy order.
So I always grapple with that as well.
And I use energy as a very simple proxy for looking at sustainability.
There are two parameters actually you can use for looking at proxy for sustainability.
One is the capital expenditure dollars spent on the gear which goes into hardware data center
or the amount of energy which is consumed by the data center.
In fact, these two, I guarantee you, are correlated.
If your energy consumption is going up,
the dollar amount which is being spent is going up and vice versa.
I decided to use energy as a proxy for a very simple reason.
Actually, two reasons.
One, the data is publicly available on energy within data center and the predictions have been made.
Number two, this data is published by a very neutral research party, not a commercial party,
which means that you can get people to rally around that data.
So when you look at that data, data center energy usage is growing at a massive rate.
Just to give you an idea,
OpenComplete project was launched in 20,
since that time to 20,
the energy consumption in 2011 for data center
was roughly 1% to 1.5% of the total energy
consumed in the world.
By 2030, the data centers are expected to consume
anywhere between 3% to 13% of the total energy consumed in the world,
which is a massive growth.
In fact, the numbers are expected to be anywhere between
7,000 terawatt hours to 3,000 terawatt hours.
So this growth in energy comes with a direct implication on the carbon
which is going to be associated with this energy.
And for me, being in Open Compute, I felt Open Compute as a community
had a moral obligation to address this
because we were contributing to that energy consumption.
We were building devices which go into data center
which contributes to all of that.
If not us, who would be the community who would address this working on
the data center gear? I don't see any other community doing that. It was just that sometimes
immediate problems are always the biggest. It's something which I call fish lens effect.
Whatever is in front of us is always going to
be the biggest sustainability is never in front of anybody because it's something five years out
10 years out 15 years out I have something immediately right now which is what is important
to me you need to get people to raise their head up beyond fish lens and look further and see that there's
a freight train coming down the track and you got to do something about it.
And that's why I felt it was important for us to work on it.
CP is known as being the organization where the largest data center operators on the planet
congregate and put their collective voices
out there. And those companies also have incredible climate commitments, both from a scope two and
scope three perspective of looking to be carbon neutral and looking in some cases to be carbon
negative. When you started the sustainability effort, you rolled out some BHAGs or Big Hairy Audacious Goals that I wanted to discuss with you associated with those scope two and scope three emissions.
The first is embodied carbon and silicon. And you started there. Why is that so important? So if you look at the gear which goes into our data center,
silicon is actually the most important component of everything which goes into the data center.
I gave you an earlier example of looking at capital expense spent
as a proxy for identifying carbon.
Actually, you can use the same lens and you'll figure out that when you are putting gear
inside the data center,
the most amount of dollars spent is for silicon,
which is used for building CPUs, GPUs,
memory, and network devices.
These four would, I reckon, consume up to 85% of the capital spent for any
system which goes into the data center. You want to go and address the biggest bottleneck, and this
is where the biggest bottleneck is. The amount of effort required is actually pretty much the same
where you put in, but you want to look at ways
to increase the return on those efforts.
In my opinion, addressing silicon as a place for embodied carbon
has a maximum return for the same effort,
which is why I put this as a number one goal.
And it is interesting that this actually is now globally recognized. IMEC, which is Inter
Microelectronics Center in Belgium, is actively working on trying to address how to calculate embodied carbon and silicon in various fabrication
nodes so this is now becoming universal and i believe having a common method to calculate this
is going to help all the data center operators because we all use silicon, whether it is Meta, Microsoft, Google, Amazon,
or anybody else. Every one of them has a CPU, GPU, memory, network devices, which are all based on
silicon, and they all exactly have the same problem. So let's solve it together.
When you think of the challenge of embodied carbon and silicon, you think about the complex manufacturing process and energy intense manufacturing process, which is silicon fabrication.
What do you think are the biggest opportunities for the silicon industry to drop embodied carbon?
Is it the use of renewable energy?
Is it changes in chemistry? What do you see as
the things that are the biggest opportunities for advancement? And then how does OCP contribute to
that? I would not claim expertise in all of those areas, but I do have ideas of what the industry could do to enhance this effort.
I believe that the first step we have to do is just to find a way to measure without making
any change.
You don't need to make any change because if you don't know a proper way to measure, how are you going to
determine what to change? I don't know if there are clear, defined methods which can measure
embodied carbon and silicon for a given node, for example. It's actually not easy to come up with a model.
First thing industry needs to do is come up with a model.
It could be a black box model, which has APIs or interfaces.
And when I put in certain parameters, my clean room is this big, I'm consuming the models, fix out few parameters in terms of my efficiency of what my carbon effectiveness is, what my water effectiveness is and so on and so forth.
We don't have a universal method right now.
In fact, a lot of these things are not quite shared across industry.
And I can understand why,
because there are business reasons.
But if everybody starts to think of this
from perspective,
what could be done
rather than what cannot be done?
Yeah, we understand
you can go through litany of cases of,
oh, I cannot do this.
I cannot do this.
I cannot do this. That's this i cannot do this that's not
what we should focus on the focus should be if we had to create a model like this what could be done
can you come up with some answers for that please and this is where we need to encourage industry to
lean in with let me tell you what can be done. You don't understand our business,
but I understand the problem we are trying to solve.
I think this is how we should try to look at it.
Believe that discussion is yet very nascent
because the discussion is more in the first column of,
no, I can't do this, I can't do that.
I usually operate always with the focus of,
yeah, I don't want to talk about what
cannot be done i'm interested in only what can be done out of hundred things if there are three
things which can be done okay let's just focus on those three things first and that's a progress
and then in later on we can figure out more so for me measuring in a somewhat reasonable and consistent manner across various fabs in the world would be a great start.
And then even if everything is inaccurate, but it's inaccurate at the same level, it's a great start.
And then you can ratchet it down and bring the accuracy and get the numbers right.
That would be the next step.
So once you understand and measure, the next step will be, OK, now I need to set a goal
and figure out how best to achieve those goals.
Maybe it is for you to change your chemistry.
Maybe for somebody else, it is to change the renewable energy mix going into the fab as a step one.
All of those would be valid.
And then maybe for somebody else it is just buying carbon credits, saying that I'm just going to do stopgap measure for now and compensate now because I don't have time and then I'm going to work on something else. So those would be some of the ways in which we
should progress even when there are challenges ahead of us. These are not very difficult things
to and I do know for instance IMEC has a model. I know that some researchers from MIT Lincoln Lab have published models. So I think there are ways to converge on first step,
which is let's just figure out a mechanism to measure.
We are not commenting on what the number is.
We are not making any judgment call on the number.
We are just figuring out that your method of measurement
and my method of measurement is the same and consistent.
And then I come up with my number, you come up with your number, and then we go to the
next step on how to make that number better in accuracy.
Then the next step on, okay, how do I start working on reduction of that number?
So those are the three steps I would work on.
I am fascinated to see how this progresses in the silicon landscape.
After spending 20 years there, I can't wait to see how the industry responds.
The next BHAG was circularity.
How do we approach this in the data center?
And how is that weighed with keeping data centers competitive?
Circularity is a very different problem compared to embodied carbon in silicon.
Embodied carbon in silicon is a technical problem more than anything else. Circularity,
in my opinion, is less of a technical problem, more of a business problem let me elaborate on that imagine you have
a product which a company is selling to you and the revenue depends on selling more of that product
to you and everybody else like you now imagine you decide not to buy that product at the same frequency as before, but you delay your purchases.
What does that do to the revenue of the company which is selling that product to you?
Their revenue would go down.
They do not want you to slow down on your purchase.
They want you to accelerate on your purchase this is one reason why every thanksgiving black friday
you have so many gadgets on the market pushing for shelf space and attention space because
everybody wants to buy that new phone which is coming in the market everybody wants to buy that new TV, everybody wants to buy that new gadget, Xbox, whatever it be.
It's the same in case of data center. People want to have the latest technology which goes
into the data center. However, over the last many years, Moore's law has slowed down in terms of
how the technology advances, which has extended the lifecycle of
servers. What it means is that you can get pretty much similar performances for longer period of
time from the same devices than it was 10, 15, 20 years ago. So this allows for possibility of reuse.
And that reuse requires a secondary supply chain to be established.
How do you do all of that so that everybody's business interests are aligned?
That is important.
And when it comes to carbon footprint, actually, when the equipment is used for a longer and longer period of time,
its amortized carbon footprint only goes down. Which is why if you have a car today, which is,
let's say, built in 1970s or 80s, it is greener than an electric vehicle, brand new one, which
you would buy to replace that car so that's pretty much
what is going on here how do you solve it using business is non-trivial and it requires everybody
to come together to recognize each other's business interest and give and take at the
boundary where the businesses are occurring so that nobody gets disproportionately profited
from circularity or disproportionately affected in an adverse way from circularity.
And that is an unsolved problem today.
When you look at circularity and you think about those secondary markets, what is the
role of OCP in establishing best practices and establishing standards
so that this business problem becomes a shift in the way the market operates?
I am very optimistic and excited that OCP actually has a very big role to play in establishing circularity
market. Why is that? It's because OCP brand is huge in data center. And if you have a gear,
which is OCP certified, it comes with certain amount of guarantee in terms of what the equipment would look like how it is designed
what kind of specifications it has who has used it in past so for instance ocp has a ocp ready
program if we can establish a mechanism where a decommissioned equipment from a hyperscaler,
which is OCP compliant or OCP certified gear,
gets through the channel where OCP actually has a role to play in ensuring that the repair, test, etc. of that gear
has been done to certain standard.
And it is all being done by companies which are part of OCP.
So this is all within OCP value chain.
You don't even have to go outside that.
And then OCP also has a marketplace where you can go and buy OCP gear.
Imagine if you are able to buy refurbished OCP gear at much lower cost. I'm
sure the demand for that would be massive and it will play a role in ensuring extended life of many
of these devices. So I do believe that OCP has a role to play ocp has been not super active yet in defining what that role
would be but that is something i'm sure they are working on or thinking about it like i said it's
non-trivial and they need to find way to get all the right companies in a room, discuss it and start working on defining how such a program could look like,
which is accurate to everybody in the value chain.
And the final question that I got for you
about your BHAGs is actually about metrics as well.
And it's your final BHAG was going beyond PUE,
which is a metric near and dear to my heart
after spending years of time in the green grid.
What does that mean to you going beyond PUE?
And where is OCP focused in terms of additional metrics?
So many years ago, I would say roughly right when I joined Meta, I had opportunity as a
part of boot camp to go to one of the Meta data centers in Oregon, Prineville.
I spent a week there going through the data center,
changing parts and operating as a technician
within the data center.
And every time we would stand in front of the data hall,
there's a lock and key and you have some sophisticated way to enter the data hall.
I started imagining that, think of a case where you have a display, a big display, a front in
the entrance of every data hall in the data center, which is in real time showing how much carbon is being generated in
that data. How cool that would be. And it tells you cumulatively how much has been generated in
last one month. And in last one year, I don't know how to do that. But that measure is the measure which is the real measure. Now I can compare one data hall with
another data hall. And in fact, I even started imagining in this dream world that you could
have a fun competition, even within a company like Meta, where we have hundreds of data halls
across 10s or 20 data centers all over the world.
And you could start ranking and awarding a data hall for being the most green
because your carbon number is the best.
And then you can make these data halls compete with each other in a fun way.
And then you can have an annual conference where all of these data hall operators come together
and they exchange their best practices. And I help you to of these data hall operators come together and they exchange
their best practices and I help you to make your data hall better. And that is how I started
thinking of going beyond PoE. And I asked people around and everybody was fascinated with Meta.
They said that would be pretty cool if we have something like that, but we don't know how to go about doing that and i said okay
so that's something which we should get industry to solve for and maybe start with one aisle forget
even a big data hall one row which has let's say 20 racks and can you and maybe it's a colo, or maybe it's a small, tiny room.
Can we determine based on the airflow and based on the energy consumed
and the mix of energy and the gear which is being used and amortization
and everything, is there an amount of workload being run
and how efficient these servers are and all of that.
Can you come up with a hypothetical number like this?
And if you can't do it for a row, can you do it for a rack?
Just one rack and do the calculation on that.
I think that is where I'm imagining
we need to go beyond PoE.
And we are working within Open Compute project
on this exact problem, where we are working within Open Compute Project on this exact problem where we are starting to look at things which are not just in terms of efficiency, but which are also in terms of taking into account the type of energy being used, taking into account the utilization of servers being used.
All of that is never included in pve pve is awesome but as i mentioned in my
talk efficiency in my opinion is a stable stake because we are always efficient or trying to be
efficient because we want to save money that goes without saying my dad every time i would keep my
likes on in my house in India,
he would magically appear from nowhere and tap me on my head and say,
dude, turn the lights off.
He was not trying to increase or lower PV of the household.
He just wanted to save money.
And rightfully so.
And guess what?
I do the same with my kids now.
Yeah, we need to start thinking more.
For instance, in the same context, yeah, you could save electricity,
but are you extending the lifecycle of the servers?
Are you repairing your parts and reusing them?
If I have a rack which is six years old with everything there
running for a longer period of time or being repaired and reused.
That's probably a greener rack
than a fresh, shiny rack
with all the gear which is new
in the same data hub.
That will never be addressed by PUE.
Age of rack is never addressed by PUE.
So how do you do that?
So those are the ways I'm thinking we need to go beyond PVE.
Nice.
Now I'm moving forward with more interviews going further into all these topics with you and your initiative team members.
And I wanted to talk to you for a second because you've assembled a dream team working from across the value chain on OCP.
I know building open community is a passion for you.
Why is it particularly important in this moment?
I would think of a couple of reasons.
I've reached a point in my career,
I've worked in startups,
I've worked in big companies,
I've driven businesses,
which is all very exciting, very gratifying.
But at this point, I find that having a way to inspire people to scale and solve and address problems which are beyond you, which are relevant to bigger than you, is very deeply moving and
something rewarding internally. It's a pure selfishness on my part in some sense to say this that it gives me a lot of joy to do this so I do
this. No other reason. Getting people within your company to align to a goal in a business is an
easy part. Why? Because you're part of one company and it's easy to say that hey you're all rowing
the boat in the same direction let's do that doing this across company is a whole new challenge
and it is something which does not you cannot use the same skills which you have to operate
within company to bring people across the company together it's a completely new set of skills
and that's why you will find that the set of people who get gravitated toward
these kind of community are in some sense wired differently they think differently than set of
individuals who are pushing the agenda of the companies actually if you try to push your own
agenda into these open company a community it's a non-starter you got to
really look for everybody to win it gives me a lot of joy allison i cannot tell you when people
are succeeding left and right around me based on something i said and i feel they are doing all the
work i just said few things and i sent them on the course and they are all flying and they are doing all the work i just said few things and i sent them on the course and they
are all flying and they are doing these amazing things how cool is that your ability to take your
ideas and scale them to a level which is beyond your capabilities, it is nothing but gratifying. I never started a company. I did try
to start a company many years ago and I failed miserably. It's somewhat similar. You start
something. Are you able to inspire people to work with you with nothing to give back in return? I'm
not providing any monetary reward or anything. It's just opportunity to do more work.
In grad school, there was a poster which has always inspired me. And it was a poster with
a comment from my childhood hero, Jonas Salk, who invented polio vaccine. And he gave away the polio vaccine for free to the whole world, which is why
every country could effort to inoculate their kids, young ones with polio, almost free of cost
because of generosity of John Salk. Today, if I don't have polio, it's in great part because of
him and his vaccine. And this quote from him said, reward for doing good work is more work.
And that statement has stayed with me forever. I feel that today, if I get more work from any
corner of the place, I consider that as a reward. That somebody recognized that I'm doing a good
job, so they're giving me more work. And that gives me a lot of joy because I always frame it with this statement from a hero of mine.
And I say, OK, you know what?
This is how we should frame it.
There should be no other reward.
I don't need any other reward.
Just give me more work.
Tell me what else should I do.
If I'm able to bring coffee for somebody who are tired and they are thinking of great things because I cannot think, I'm able to bring coffee for somebody who can who are tired and they are
they are thinking of great things because I cannot think I'm happy to do that because
you are helping somebody to scale and at the end of the day we are always pursuing business all
our life at some point we have to step back and say what else can I do and i think doing these kind of activities are satisfying in a manner
which i do not experience in any other activity so i keep working in this
community i keep working in this manner i I come across super sharp people.
I learn from every one of them.
And all of that, I feel I'm having a dream job.
So I always feel I'm blessed every moment
when people rally around what I have to say
because they have choice.
They don't need to be where I'm asking them to be.
They have thousands of choices and they choose to do this.
That's, in some sense, honor for me.
I feel very thankful for every one of these individual volunteers who push, work, think. So yeah, it's deeply gratifying in a way which nothing else
is, at least for me. DJ, data centers are strategically important to the planet,
and their sustainable future is critical to all of us. So keep doing the good work. I think that
your more work can be rewarding to all of us.
Thanks so much for spending time with me today.
I know it's OCP week and it's a busy week for you.
I'm looking forward to the conference and looking forward to the sustainability track.
Where can folks connect with you and learn more?
A couple of places.
My LinkedIn profile, people can connect with me
and my contact information in
linkedin profile has my email and my phone number public it's my personal phone number people can
call me at any time and i always have time for anybody who calls me to discuss anything ocp
i'm also going to be at ocp summit i'll be in a panel for the OCP sustainability piece. I'm also going to spend
some time in Meta Booth. And if you see me walk across, just stop me and introduce yourself and
tell me how I can be of help to them. So that's the main thing. Come to me with something where
I'm able to help them
or come to me with an ask.
Like, DJ, I want you to do this
or this is what we should be doing.
Come with ideas and I will engage with you right away.
Sounds great.
Thanks so much for the time today.
It was a real pleasure.
Likewise, Alison.
Thank you for having me here
and I enjoyed talking to you this morning.
Thanks for joining the Tech Arena. Subscribe and engage at our website, thetecharena.net. All content is copyright by the Tech Arena.