Computer Architecture Podcast - Ep 16: Sustainability in a Post-AI World with Dr. Carole-Jean Wu, Meta
Episode Date: June 19, 2024Dr. Carole-Jean Wu is a Director of AI Research at Meta. She is a founding member and a Vice President of MLCommons – a non-profit organization that aims to accelerate machine learning innovations f...or the benefits of all. Dr. Wu also serves on the MLCommons Board as a Director, chaired the MLPerf Recommendation Benchmark Advisory Board, and co-chaired for MLPerf Inference. Prior to Meta/Facebook, Dr. Wu was a professor at ASU. She earned her M.A. and Ph.D. degrees in Electrical Engineering from Princeton University and a B.Sc. degree in Electrical and Computer Engineering from Cornell University. Dr. Wu’s expertise sits at the intersection of computer architecture and machine learning. Her work spans across datacenter infrastructures and edge systems, such as developing energy- and memory-efficient systems and microarchitectures, optimizing systems for machine learning execution at-scale, and designing learning-based approaches for system design and optimization. She is passionate about pathfinding and tackling system challenges to enable efficient and responsible AI technologies.
Transcript
Discussion (0)
Hi and welcome to the Computer Architecture Podcast, a show that brings you closer to
cutting-edge work in computer architecture and the remarkable people behind it. We are your hosts,
I'm Suvina Subramanian. And I'm Lisa Hsu. For this episode, we were so thrilled to have Dr.
Carol Jean Wu as our guest on the show. Dr. Wu is a director of AI research at Meta. She is also
a founding member and a vice president of ML Commons, which
is a non-profit organization that aims to accelerate machine learning innovations for the benefits of
all. Dr. Wu also serves on the ML Commons board as a director. She has chaired the MLPerf Recommendation
Benchmark Advisory Board and she is co-chaired for MLPerf Inference. Prior to Meta slash Facebook,
Dr. Wu was a professor at Arizona State University.
Dr. Wu's expertise sits at the intersection of computer architecture and machine learning.
Her work spans across data center infrastructures and edge systems, such as developing energy and
memory efficient systems and microarchitectures, optimizing systems for machine learning execution
at scale, and designing learning-based approaches for system design and optimization.
She is passionate about pathfinding and tackling system challenges to enable efficient and responsible AI technologies.
She joined us to discuss the explosion in the utilization of compute for machine learning and AI applications,
and the ramifications for our world. Part of her latest work involved a deep consideration of how to improve the sustainability of all
the electronics in our lives, not just the devices in our homes and our pockets, but
the compute fueling the AI revolution as well.
She shared with us ways to think about the end-to-end carbon footprint of AI and the
work that lies ahead to understand and reduce it.
A quick disclaimer that all views shared on the show are the opinions of individuals and
do not reflect the views of the organizations they work for.
Carol, welcome to the podcast.
We are so excited to have you here.
Long time listeners to the podcast know our first question is always, what is getting you up in the morning these days? It's really a few things. Spring just got here in Boston and that's where I am. And it's usually just a couple of months
that we get to really hang out comfortably outside.
So it's just an amazing time to enjoy, you know,
the all the nice things, flowers and trees,
and you know, before the leaves start changing color.
So that's like what gets me up early these days,
but really on a more serious note,, what really excites me these days and also worries me a little bit at the same time is that we are seeing such a significant growth in our usage of AI and such an exciting application use case, but at the same time, it's also demanding a lot of energy usage. And so this is the topic
that I've been thinking a lot about. How do we bend the growth demand of AI technologies?
How do we reduce the demand in its electricity, natural resources, and all that? And so this is
the research topic that I've been thinking a lot about. And how do we scale this important technology in a more sustainable way?
And they are also all these little human beings in my life that I like to spend some time with before heading out to work.
And then at work, I get to hang out with my colleagues, learn and really enable and up-level my contributions and impact and so that's what gets
me up in the morning these days awesome great answer like very very rich answer like all
dimensions of life right so like spring it's funny that you say like spring like we only have a little
bit of time before the leaves change color and it's april that's kind of crazy to think about
but i guess if you live in Boston,
like you said, also on a more serious note, you know, you're at Meta. Recently, there was that
big announcement where Meta was going to like committed to buying, I forget the exact number,
but like an absurd number of GPUs from NVIDIA, right? Like something to the tune of 600,000 H100 GPUs through the course of this year.
Man, that's good for NVIDIA. And what are the energy implications? Is that the sort of thing
that you're thinking about? We're using this incredible volume of stuff and how are we
managing it? Yeah, so I guess it's a shocking number in the sense that, well, why do we need so much compute?
There's so much technology and advancements that goes into the design of these GPUs.
And really, these GPUs are there to improve the efficiencies of AI technologies.
Power is a limited quantity.
And so if we want to continue to scale AI's capabilities,
we want to make sure that these AI systems that we're using
can bring us much higher energy efficiency,
power efficiency for the compute
that is enabling various different products and meta.
And so if you think about the product
that's being supported by just deep learning in general,
News Feed Ranking has been used,
deep learning technologies, video recommendations,
and these days everybody's talking about
large language model and other generative AI technologies.
All of these are being trained on high-performance GPU.
You may even remember,
I think this is the very first podcast
that you have with Kim Hazelwood.
There, she was talking about
how the various different deep learning technologies
that's happening at Meta
and how that supports the different products
that's in the company.
And so these GPUs are here
to sort of help advance the state of products.
And at the same time, if you look at how AI is being used across the board, AI is
in a lot of scientific domain.
AI is used in discovering electro catalyst materials to find how to more efficiently
store renewable energies, for example.
FAIR has a project called Open Catalyst, and that's exactly what the North Star is, is
to use AI in the scientific domain to improve the efficiency of energy storage.
And that has a huge potential to tackle climate change challenges.
You probably also hear Microsoft has this FarmBeats project
that's using AI to improve farming efficiencies.
And I think that's really cool.
This is exactly what computing can help
tackling some of the most important
sustainability challenges that's facing the world.
Yeah, so that's really interesting.
So first, like a side note is AI is everywhere,
right? So I feel like I know all these people who are not technologists, who are not technical
people, not in our field, and they're using AI for whatever. I have people who are doctors,
and they're using AI in their research for helping elderly people know when they need to go see a
doctor or whatever. It's all over the place. And so I remember when I was in grad school,
I had this random idea to do like a wacky paper, which I didn't do. And it was like,
how do we reduce power consumption in computer architecture? And it was like, oh, we can reduce
power consumption in computer architecture by stop running a million simulations every day to,
to do research on how to reduce power consumption. Because like, just that,
we're like, we wasted so many cycles. Oh, send out 10,000 jobs. Oops.
I forgot to put a print here, kill it all. And then do it again. And so,
so there's that. And so what you were just saying just now a little bit was
like,
we're using all this compute to potentially do research on how to improve climate situations, right?
So like, one possible way to do it is to like not do it at all. And I know that that's like
probably not what we're going to do. But you asked the question, why do we need all this
compute? And part of the compute is to figure out how to maybe use less compute or use the
compute more efficiently.
How I would like to think about this problem space is that if climate change is what we recognize as the most important challenge facing the world and facing this century,
then we must look at where carbon emissions is coming from or where greenhouse gas global warming potential is being generated.
And if you look at where carbon emission is coming from, there are two significant contributors when you look at the overall industry sectors.
One is from the agricultural industry. The other is coming from the energy usage of heating and cooling
of buildings in general. And this is a place where I feel like computing can really help
to drive down the carbon dioxide that's being generated or the other chemical gases that's being generated.
And so a lot of our colleagues are working on these problems.
But in order to advance the state of these, the solution space,
it will then require a lot of compute.
And when it requires a lot of compute, then it's going to demand a lot of electricity or other type of energy.
And I felt like that's where there is a very similar problem that we are facing in the AI industry.
I think there are a lot of problems that can be accelerated by the use of AI, and therefore
it is propelling a lot of research that goes into pushing the frontier of AI capabilities.
But at the same time, the compute demand and the energy demand of this AI research phase
is becoming so significant that it's not something
that we can easily ignore anymore.
And so wouldn't it be nice if we can have,
we can make computing to be a lot more sustainable,
make it a lot more environmentally friendly,
while at the same time letting computing
to solve some of these most important challenges that we are facing in the world.
Right, I think you painted a great picture of the capabilities and the promise of AI,
and at the same time it's counterbalanced by the energy demands of AI itself in the near term.
And maybe we can double click a little bit on that. So
perhaps we can paint a picture of what does the energy footprint look like for AI currently?
How has that been trending over the last few years? And what do the trend lines look like
moving forward? And perhaps we can expand on what are the components that go into the energy use
itself? In particular, you've written papers and blog posts on the operational energy
use and there's been a lot of focus on that and then there's the embodied carbon use as well.
So perhaps you can educate our listeners on these various components and how these things are
evolving as time progresses. Yeah, happy to share a little bit on sort of our journey in this green AI studies,
in the sense that about three, four years ago,
couple of us become really interested in
sort of like look at the big picture
about where the energy consumptions of AI
and computing in general are being consumed
and where the carbon footprints of AI and
computing in general are being produced.
And in this particular characterization analysis, we must start with a methodology and we can
take the lifecycle analysis as an example.
So when we start looking at, for example,
the lifecycle emissions of a computing device,
we can use the lifecycle analysis methodology.
We can look at, they are generally four important phases
of computer systems.
Usually the product is being produced
in the manufacturing phase,
and then this product then will be transported
to a location where it
will be used and then there is the product use phase eventually at the end of the life cycle of
this product in terms of computer then it will be recycled or up cycle for second life so when we
look at computing carbon footprint i guess the primary phase that we must focus on is the manufacturing
phase as well as the product use phase. The manufacturing phase is where the embodied
carbon or manufacturing emissions of the computer devices come from. This is what we call the
embodied carbon footprint. Now, when the device is being designed, manufactured, and is installed into
say data center or shipped to a user's hand, then during the life cycle of that computer devices,
there is also the corresponding operational energy consumption or the operational carbon footprints.
Right, and when we started looking at the breakdown of the embodied carbon and operational
carbon, we were quite shocked to find that the ratio between embodied to operational carbon
footprints for consumer electronics like a smartphone is about 80 to 20 percent breakdown.
Most of the carbon footprints of your smartphone is coming from manufacturing emissions.
There's so much energy it requires in order to manufacture that smartphone that we have in our pocket.
And what does that include? Does that include like the mining of the precious metals or that kind of stuff?
Or is that really just the manufacturer?
Or is it even just the sourcing of the materials?
Well, really good point, Lisa.
So if we start looking at these supply chain,
it's actually really deep.
And so mining the minerals that's using these smartphone SOCs
will be part of the embodied carbon emissions.
So it really depends on what is being captured
in the analysis itself.
But if you look at Apple's sustainability reports
for iPhones, you will find that about 80% of the carbon
is coming from manufacturing or embodied carbon,
and about 33% is coming from semiconductor IC manufacturing.
These are the manufacturing of the application processors,
of the DRAM, of the NAND flash storage devices that goes into the smartphone itself.
They are all the additional components that need to manufactured,
for example, the display,
the battery, and all of that, that also is accounted for as the embodied carbon of the smartphone.
And the operational usage is less than 20%.
Is that because people only use their phones for two years before they upgrade?
That's part of the question.
Exactly how often we upgrade our phone will have implications on how the upfront and body carbon costs can be amortized.
And so, in fact, on average, I think each of us will use our phone for two to three years only.
And if we were able to extend the lifetime of the smartphones, it can amortize that upfront embodied carbon cost that goes into the manufacturing.
Gotcha. Well, I'm on the fourth year of my iPhone SE because I do not need the latest
and greatest. So maybe I'm doing my part.
Right. So you talked about operational and embodied carbon use. And as you rightly pointed
out, you can't optimize or improve what you can't measure. So the first step is of course measuring it and that is shed some light on what are the
different components of our carbon emissions to our supply chain plus our operational use.
Maybe switching to the other side of the coin, now that we've done this measurement
and we've done this characterization, how do we think about opportunities to improve this? How do
we improve the state of the world moving forward?
What are opportunities that you see maybe across various layers of the stack or various
parts of the entire pipeline in the computing industry?
Yeah, I have to share a little bit more on this.
So I would say kudos to Apple, who does a really good job on quantifying the embodied
operational carbon footprints of their own device,
but really being able to quantify carbon emission is a really complex process. And I don't know if
our computing industry or just the computing community is ready for that. We don't currently
have the tool, we don't have all the metrics there for us to understand first, what is the carbon emissions
of computing devices, not to even mention hyperscale data centers.
But I think this is something that computer architecture community has a lot of experience
with.
We are very good at building performance models that helps guide the design by optimizing
for performance of our system hardware. We have
a lot of experience to build power model, to quantify power, and then energy in order to guide
energy efficient computing. But carbon emission is like a step more complex in the sense that depending on how energy is being generated, the overall carbon
emissions of the system usage is going to vary.
And so I felt like this is where our community can invest in a little bit more, understanding
how to quantify carbon emissions of computing in a more systematic way and then build the tools such
that we can replicate and enable others to do the measurements systematically and as what you say
subne the first step to optimize is actually to be able to measure once we have the tools once we
have the methodology once we have the metrics then we can then go figure out how to optimize our systems for minimizing carbon footprint in this particular case.
What are some challenges and maybe pitfalls that you see in the process of collecting
these metrics, right?
Like both in terms of our methodology or in terms of the metrics that we choose to annotate
in this particular exercise?
Any particular drawbacks that you've seen?
And how can we as an industry improve that?
And what tools do we need or what alignment
do we need across the industry to enable that?
Yeah, happy to share my thoughts on this aspect a little bit
more in the sense that we talk about there
is the operational carbon emission that
get produced during the product use phase
or during the operational phase of a computer. And there is the manufacturing or embodied carbon
that get produced during the making of the hardware devices itself. I feel like our community
hasn't agreed on whether the goal should be minimizing operational carbon
footprints or minimizing embodied carbon footprints or minimizing the total of
the two and even that I felt like there's a lot of debate going on in our
community and so this is where I felt like a lot of conversation a lot of more
research is going to help us decide under what circumstances it makes sense to reduce
total carbon, under what circumstances it makes sense to reduce embodied carbon, and under what
use case it actually makes sense to just focus on reducing operational carbon. Got it. Can you
shed some light on what data is actually available right now or what data is reported by companies?
For example, you talked about how Apple does publish like their data on the operational versus embodied carbon use
uh how about across the industry across different companies what is the current state of the world
what can we improve in terms of reporting and just data that's available for people to look at
yeah so um in on the consumer electronic sites currently, I think the consensus is to use the lifecycle analysis or LCM methodologies to understand the carbon available sustainability reports by companies like Google, Meta, Microsoft, you will be able to find these companies use a different methodology that's defined by the greenhouse gas protocols, GHG protocols. In the GHG protocols, there are three different scopes of
gases that is part of the reporting structure. There's the scope one emissions, scope two
emission, and scope three emissions. Scope one emissions is the direct emission that could
produce by energy use on site. So for example, the diesel generators that's used as a backup power in data
centers will be producing scope one emissions during power outage. The scope two emissions
are indirect emissions from electricity usage. So for example, the power grid will be generating
electricity that would then eventually be used by the data centers run by Meta, Google, or Microsoft.
And that will be Meta's scope 2 emission, scope 2 indirect emission.
Finally, there is the scope 3 emission that is accounting for the upstream and downstream emission usage.
Therefore, the semiconductor manufacturing emissions
that goes into making any IT equipment that eventually
goes into hyperscale data centers
are that scope 3 emissions.
Now, taking Meta's sustainability reports
statistics, we'll find that the scope two and scope three emissions for computing
equipment is about equal split.
What that's saying is that the embodied carbon for the data center infrastructure is about
the same amount as the operational scope two emission.
So this aggregated carbon emission data is available.
The challenge is coming from how do we do a better job on breaking it down for different
processors that we bring into the data centers?
How do we break it down for all the DRAM memory devices?
Can we break it down even further for all the DRAM memory devices? Can we break it down even further
for all the storage equipments?
If we are able to do so,
if we have high quality data to do so,
then we will be able to think about
how to provision data center computing devices
to minimize carbon footprints
or to figure out how to trade off operational carbon with embodied carbon,
and how do we build a data center that can come with minimum total carbon footprint, for example.
So that's really interesting, Carol. And it's making me think a little bit about the notion
of transparency, except in this case, the scope is like super large, right? So,
you know, back when we were coming up, right? So, you know, back when
we were coming up in the early days, you know, you would have the notion of being really transparent
or not about your microarchitecture, right? There's your special sauce, and that's what makes
this particular microarchitecture extra fast, because you did this to the fetch unit, you did
that, and the manufacturer would have to kind of maybe walk this fine line
of being able to say hey here's the programmer's manual here's the thing here's the way that you
need to write the code in order to make your code faster and you they'd want to do it in a way that
like let people actually run their code faster but not give away so much that it explains exactly how
the microarchitecture is you know better, better than somebody else's, for example.
And in this case, it seems to me like, you know, of course, in order to have all these tools and
all these metrics, you want radical transparency all the way up and down the supply chain so you
can make these trade-offs appropriately. At the same time, I can imagine many of these companies
up and down the supply chain may or may not want
to open the covers and explain,
oh, this is how we do it, this is how we don't do it.
And that makes it very hard to sort of do
the top level optimization, right?
So like, if you imagine you don't understand anything
about a microarchitecture, you can't tailor your code
to make it better, right?
So in this case, it's like, you don't understand anything
about how the embodied carbon is being produced throughout your supply chain.
It's very hard to make trade-offs.
So how are you thinking about this sort of challenge?
I can imagine.
I mean, even within meta, like everything you were just saying there, there's scope
two and then there's scope three, which is components that are being sourced from
elsewhere, right?
So, so can you talk a little bit more about that?
Yeah, this is a really great question in a sense that how much transparency helps and really how
much transparency do we need to actually minimizing, to minimize the carbon footprints of computing in
general. I think there is actually a lot more that we can do. now is just non-existence right when we make a decisions
about what particular skew of cpus or systems that we want to bring into the data center we will have
a way to figure out what is the performance improvement what is the energy efficiency again
i'm going to get from this next generation of hardware. And then we get to make that decision based on some data, based on some trade-off between
performance, cost, energy efficiency, and others.
But right now, we are missing carbon in this conversation.
And the reason why we are missing the information on carbon footprints on either manufacturing side of things or on operational side of things is that
it is not yet a metric. There isn't a metric that people know and use to report carbon emission
results. It is not something that hardware vendors collect data for. They may know, but they may not measure
that. And therefore, it is not the kind of information that can be shared easily with
users or customers of the hardware devices. So I guess once we start getting into a cadence of
measuring the emissions across various different levels of
the supply chain, then when customers ask about, hey, what is the environmental footprint of this
particular hardware equipment, that answer can be provided. And that answer then can be factored in
in the purchasing decision. It may also be factored in in our design space exploration decisions as well.
But I guess one step at a time is perhaps we should start with understanding how to measure the environmental footprint of computing.
And second, when we are going to procure hardware equipment into data centers, for example,
we will have that piece of information and we get to decide how to use that piece of information.
And then finally, once the information is becoming stable, it can be supplied in a consistent
way, then perhaps we can then start thinking about how to optimize the computing infrastructure
with that information in mind.
Right, makes sense.
I think motivating the entire industry and setting up incentives so that this reporting becomes valuable across the entire chain sounds like a good direction to pursue towards enabling this.
And clearly, there are still gaps in this overall ecosystem that we can work towards as an industry.
While we are articulating the gaps, and it's certainly something to look forward to in terms of improving the state of the world, maybe we can also reflect on some of the wins that we have had, because as an industry, a few years back, for example,
we were focused on operational energy use,
and the conversations in the community overall
has translated into meaningful impact.
So just as a way of inspiring and motivating the community
to continue working towards this and bridging these gaps,
maybe we can reflect and think about what are some of the wins
that we have had over the last few years by shining a spotlight on maybe some aspects of this overall ecosystem.
In particular, I think the operational energy use is something that has received a lot more
attention. And so there's been deliberate trust towards improving this from various companies.
Perhaps you can share some success stories or good case studies in this particular realm on how we did that,
what has been the material impact of that maybe over the last few years in improving the energy
efficiency at least in one end of the spectrum across the different components that we have in
this space. Sure, I guess maybe I can start with developing metrics and developing tools to enable the communities to be able to measure
and quantify, embody and operational compounds of AI or systems, computing systems in general.
So as what I mentioned, when we started this green AI journey, sustainability journey,
we immediately run into the bottleneck of not having the tools to help us quantify and estimate
where the carbon footprints are, what's the biggest bottleneck of computing carbon footprints.
And so we started with the initial work on something we call chasing carbon. Over there,
we start looking at just understanding where carbon footprint is coming from for data center computing.
And there, we also look at where carbon footprint is looking like for consumer electronics like smartphones.
And we also dig into semiconductor manufacturing's carbon footprint, carbon emissions in general. And over there, it gives us some rough idea about how significant manufacturing,
semiconductor manufacturing's carbon footprint is like, and in order to
enhance our community with the tool, we went ahead and developed a first of its
kind carbon modeling tool, which we call ACT,
which stands for Architectural Carbon Modeling Tool. The idea here is that if we can take a look
at what is the semiconductor manufacturing emissions for Logix IC manufacturing for various DRAM across different technology node and for various different
storage technologies, then perhaps we can build this carbon modeling tool that can be
used to guide the design space exploration for hardware design. And we use AI as a application use case to look at what happens if I provide this suite
of workloads with some performance target in mind. And then now, if I were going to design
a new hardware systems to minimize lifecycle emissions, along with other optimization objectives like performance,
power, and cost as the design dimensions, what kind of design would I get if I were putting
KaBang as the first-class design citizen? And it was a very interesting journey as we go through this analysis.
First, we figured that, well, we don't actually have a carbon metric that our community
have consensus on in order to like sort of optimize for.
Should we minimize and body carbon or should we figure out how to minimize the total carbon, which in this case will be either the ratio between embodied and operational carbon footprint, or should it be like the sum of embodied versus operational carbon footprint? Like, what are we reducing? What are we going to optimize for? Regardless, we build the carbon modeling tool. And we describe the tool itself in our ESCAP 2022 paper.
We open sourced the tool, which we hope that can help the research communities to continue pushing these directions forward.
But at the same time, being industry now, I guess we do have a team of colleagues in the sustainability team and also others
who become very interested in understanding and be able to advance the tool itself.
So in parallel, there is this conversation through the iMac sustainability program, where
many industry companies are part of, including Google, Microsoft, Amazon,
Apple, and others.
The idea here is that in addition to this research tool, it would be really good if
we have a high-quality production-grade carbon modeling tool that everybody can use to quantify
the embodied carbon of IT equipment that's installed in any data centers.
Right. And so IMAC then had this net zero program.
The idea for them is to be able to instrument the fabrication process of various different semiconductor technology nodes. And then they will then be
able to provide a tool that can be used by all the manufacturers, by all the hyperscale data center
operators to quantify the embodied carbon that goes into the data center equipment.
And so there is that that's happening. And then also through internal collaboration within Meta,
we have colleagues that's building and operationalize
the concept of carbon evaluation methodologies
that can be used for understanding the accounting
and understanding the design space trade off internally.
I think tools and methodology are an excellent example of things that, you
know, improve the entire community.
It takes us from zero to one and enables an entire community.
So I'd like to essentially highlight that that's a great contribution.
In particular, I think just like we've had power modeling tools, like Orion or
Descent or Cacti, I think there
are both components of the equations
that go into various factors, as well as, I think, constants,
which is sometimes underappreciated.
For example, in the case of power modeling,
it could be device factors like capacitances,
frequency of operation of devices,
and so on that help us understand power modeling.
I'm sure in a similar manner, the ACT tool that you've put out has a variety of different
constant factors on what is the carbon footprint during the manufacturing process for certain
components in the ecosystem.
Just providing that range and, hey, these are the different factors that you need to
consider and templatizing that is a great value add to the community because now people
can start reporting those particular metrics.
They have a broad sense of, okay, what's the rough rule of thumb on where does this
particular metric land in terms of range and so on. So I think that's an excellent educational
resource as well, but also helps essentially move the community into exploring this problem really
well. Like tools are an excellent avenue to enable this kind of research and further insights? Yeah, I hope that this first step is
helpful, but I also want to double down on our prior conversations about, I think a lot of these
modeling tools is going to require high quality data. And so I do think that our industry has a very important task in hand,
in the sense that how do we collect higher quality data? And how do we measure the carbon
footprints of computing across this entire supply chain? And I think the more that we know how to
measure and we can provide the data is going to help improve the prediction
quality for this carbon modeling tool or to any modeling tool that we will build to guide the
design space exploration. Yeah, that's really interesting because I think, you know, one of
the things that was in vogue when the power tools were a sort of new and like a ripe area of research is people would enjoy finding wrong
assumptions inside of them and saying like oh you know gosh there's this thing that's wrong in
whatever cacti watch it didn't matter which tool people would love to sort of find those wrong
assumptions but the fact is when you take that step, you do need to bake in certain amounts of assumptions.
And like over time, you know, they get found and then the tool improves.
I am curious about this, like some of the assumptions that are currently being baked in into this ACT tool, for example.
Because I know we haven't talked as much about operational carbon right now because of your stat that said, gosh, the carbon is very much
dominated by this embodied carbon. But even in the sense of operational carbon, what are some
of the assumptions about what that usage is? Like, what is the assumption on how much load
is being used over the lifetime, for example? So, you know, if you assume that a server is being
run at spec 100, 100% of the time, or if you assume a server is being run at spec 10, you know, if you assume that a server is being run at spec 100, 100% of the time, or if you assume a server is being run at spec 10, you know, we've had a lot of data at various conferences that show that a lot of servers are running at low load a lot of the time.
So what kind of stuff is baked into the tool operational usage?
Yeah, great question.
Maybe we can use the smartphone example that we talked about previously.
So one of the reasons why most of the carbon is in the hardware for smartphones is because of users' usage pattern.
So imagine like when, you know, we do not run spec workloads on our smartphones,
really the usage pattern for smartphones is
it's sporadic user activities. Most of the time, the smartphone is idle. And the smartphone SoC is
designed for maximizing the energy efficiency during those sporadic user activities. And in order to do that, I guess Apple probably devote tons of accelerators in its SOC in order to maximize that operational efficiency.
And it is because of that design decision that contributes to 80% of the carbon footprints that would go into embodied carbon and only 20% of the carbon footprints
of your smartphone is coming from operational carbon.
Just to make sure I understand you there, you're saying that essentially if we used
our smartphones differently where we had a constant load of then there would not be the
need to have specific accelerators to handle spikes. That is what leads to more embodied carbon.
So for example, you need an NPU to be able to do this
because when somebody wants something, they want it fast.
Because when the user actually picks up their phone,
they expect an immediate response,
even though most of the time it's not doing anything.
So I think what I heard you say is essentially,
there's all sorts of accelerators to handle
the various little spikes in activity that people use.
And as a result, just having all that kind of componentry, if you just had like one little
CPU on the SoC, it would not be so much embodied carbon, but it's because there has to be
accelerators to handle all the different things that people do with it and want it done fast
because they've picked up their phone. That's what leads to the embodied carbon. Is that fair?
That's one contributing factor that contributes to this. The other is our smartphones are constrained by battery capacity. And so when not only that we want to minimize latency,
we also want to minimize energy use. And therefore, again, similar situation as what you brought up Lisa, is that we then will incorporate all these
accelerators to maximize energy efficiency, to reduce the latency, to reduce the overall energy usage. And it's that design space
trade-off that contribute to the majority of the carbon footprint to come from embodied carbon, and only a small percentage is coming
from operational carbon footprints.
So that's a super interesting trade-off
because for so long, we've been drilling into like,
we wanna reduce our energy,
you just reduce energy usage,
reduce energy usage, reduce energy usage,
because that is the green thing to do.
And what you're saying here is that if you zoom out,
the very fact that now we have to manufacture
four, five, six, seven different accelerators in the SOC,
that is actually potentially less green.
And maybe if we left things well enough alone
and just put in a beefy CPU in there and said,
hey, we'll burn operational power,
but we only have one component.
That's the kind of design space exploration this tool is intended to seek out.
So what have you found?
Is it better to have...
I mean, with the rough assumptions that you've got baked in now
since you don't have good data, what's it finding?
Yeah, so I think now we have the tool,
then exactly how to put the tool into discovering
hardware design space for various use cases is exactly what is becoming available right
now. I think the optimal decisions obviously is going to be use case dependent. We cannot
in on one extreme, we can come up with the greenest hardware design that has the least
amount of embodied carbon but that comes with the cost of perhaps degraded user experience and
nobody wants to sort of have a phone that doesn't finish any tasks that you know i have in mind and so there is this trade-off and we'll have to
figure out how to basically come up with a design space come up with a design decision that not only
minimizes carbon but at the same time meet the performance requirements meet the power
requirements meet the cost requirements and now with this particular tool, ACT,
you will be able to do the design space exploration
having carbon as a metric that you can optimize for.
Cool.
Yeah, I think that makes a lot of sense, yeah.
Having carbon as a first-class citizens
in the same vein as performance or energy consumption
can potentially lead to different kinds of design
choices and optimizations for different use cases. That's our hope with this particular tool.
Right. So we've spent a lot of time talking about carbon footprint and so on,
and we recently touched upon the trade-off with respect to user experience and performance
in the particular real-world scenario.
You've also done a lot of work on just improving performance of various accelerators, various computing systems.
So maybe this is a good time to switch gears into that component.
You've looked at various optimizations across different categories
of ML models, including recommendation models and more recently, LLMs.
So how do you think about this computational efficiency?
How do you improve the raw performance itself?
What are some themes that you've observed over the last few years?
And what are some avenues that you see for improving computational efficiency and performance
moving forward?
Yeah, happy to share a little bit sort of my observations of where AI, deep learning, and in general, some of the most important workloads are going.
So I would say as a computer architect by training, one of the things that we do is to do workload characterization. Let's understand what is the characteristic of workloads
that we are designing systems for
and use that observation to guide design decisions,
either it be large scale, cloud scale computing device,
computing infrastructures, or handheld devices,
smartphones, AR, VR equipment.
So looking back, I guess when the first podcast
that was carried out with Kim Hazelwood,
she provided a very interesting stats
that drive all the investments
into deep learning recommendation model space
is that when you look at where compute cycles being spent at
companies like Meta, back then, more than 80% of compute cycle for machine learning
inference is coming from deep learning recommendation models.
And as a computer architect, that sounds like, you know, that's the most important workloads
that we should focus on if we can make this deep learning recommendation model inference
to run much more efficiently, then it will translate to lower energy usage.
It will translate to lower capex cost.
It will translate to overall more cost effective computing infrastructures.
And that's exactly the kind of exercise
I think we should all aspire to do,
is to look at the big picture,
look at where your compute cycle is coming from,
and use that to guide efficiency optimization efforts.
These days, we hear a lot about large language model
or generative AI technologies in general.
And this LLM is
interesting in the sense that it's a very unique type of workflow as compared to deep learning
recommendation models. First, the scale of compute requirements for LLM is much larger as compared to
the amount of training requirements or training accelerators required for
DORM model training. And that comes with a lot of implications on the kind of computer systems that
we would design the training workloads for. So that's number one. Secondly is that on the inference side of things,
the compute memory and networking requirements
for large language model also look quite different
as compared to DORM.
If you remember, DORM is very embedding memory capacity
heavy, whereas large language model, the trade-off
is different.
And so if you were in the business of designing AI accelerators,
I think that gives you a little bit hints on how to balance the compute memory and networking ratios
for the systems that you provision the various different type of deep learning workloads for.
And I think as we were talking about the
energy demand of AI and how that's a really significant amount, I think this is where
efficiency optimization can really help. If you think about
how to sort of bend the resource usage curve of AI in general, if we can be more efficient across the
across the entire system stack, not only that it will translate into lower operational energy consumption,
it also has the impact on much lower operational carbon footprints.
So I think there is a lot that we can do to bend the AI's energy demand by just focusing
on making training and inference to run more efficiently.
But there's also a lot more that we must do beyond efficiency itself for the reasons of not only
efficiency is only addressing the operational phase
of the computing infrastructure.
There's all the additional embodied carbon
that goes into just building and developing the hardware itself
to develop the data center infrastructure itself.
Right.
I think that's a point well taken.
And as you said, it requires the entire industry
to come together.
Now, one of the other efforts that you've
been involved with that involves getting multiple players
from industry to sort of align and move forward
is the MLPerf effort under MLCommons.
I think the original use case was just
to get a standardized set of benchmarks
so that everyone can, you know, everyone
can agree on what they're actually comparing performance for.
And of course, MLPerf has evolved with the years, you know, with introducing like inference
based workloads, training based workloads.
And there are, you know, experimental and research tracks as well these days.
So perhaps you can tell us a little bit about what's happening in the MLPerf community,
what's on the horizon in this
constantly changing landscape of AI and accelerators and compute.
Yeah, happy to touch a little bit on MLPerf also. I felt like there is a lot of synergy between what
we've been talking about here and also with the spirit of MLPerf. So maybe a little bit background on MLPerf itself. This is an effort that started
in around 2017, 2018. At the time, there were just so many different AI accelerators from startups,
from various different companies are developing their own AI accelerators as well. It was really hard to make Apple to Apple comparisons across these different AI accelerators.
I think at the time, I remember Microsoft was looking into a PGA approach.
Google was developing their TPUs and NVIDIA GPUs and Meta we heard about the MTIA accelerators more recently. And there wasn't
consensus about how to compare these various different AI accelerators or system offerings
in a systematic way. And the north star of MLPerf is to basically develop a suite of benchmarks that we can use to make this Apple to Apple comparison across all the systems that's out there.
And as what Dave Patterson shared, if we can put a competition out there, that's one of the best way to drive
forward progress for the computing industry.
And therefore, tons of work went into the creation of the MLPerf benchmark.
And today, I was just looking at the number of submissions five, six years later, already there were six, seven thousands of
submission results that went through MLPerf benchmark suite. And that is a really good way,
open competition is a really good way to advance the innovation in the AI system design space.
And so I'm really proud that we've come a really long way to first build the first iteration
of the benchmark. And given the fast evolution of our machine learning industry, there's so much
work that goes into upgrading each iteration of the MLPerf benchmark in order to sort of drive the
innovations in the AI system space.
It's really cool.
So question for you,
you just touched a little bit about through the evolution
just during the lifetime of this podcast,
which I guess is now unbelievably four years old.
But our very first one, you know, we talked to Kim.
She talked about how there was so much deep learning recommendation models going on at Facebook. And now, obviously, LLMs are the new kid in town. And like you just said, their characteristics are rather different from how you would describe the characteristics of deep learning models. So with MLPerf, how do you adopt these? Because the landscape changes so fast, right?
Spec doesn't change that fast.
MLPerf, I assume, changes very fast.
That's one question.
And the second question is, even in the sort of classical computing, computer architecture
world, we had things like spec, but then we also had like HPC workloads, where the intent
is these are much larger workloads where there's a lot of intercommunication between different
nodes.
And for things like LLMs and for some of these really large models, I assume they have to have
that sort of scope as well, where it's like the intention of the kind of system that they're
running on is different. Do you accommodate both of those things in ML Perfer? How do you kind of
deal with the vast range and scope and size of these different ML models? Yep, great question.
One of the most important features for any benchmark that we will build
is it has to be representative of the current use case,
the use case that's in production.
And this is something that the MLPerf team takes to heart. And some of the examples about how this is embedded into the various different iterations of the MLPerf benchmark is taking DORM as an example.
The first version of DORM was brought to MLPerf in 2019, 2020-ish timeframe. There was the second version that was built
as a collaboration between Meta, NVIDIA, Intel, Google,
and others is to incorporate this additional component
called the cross network.
And the whole point is to make sure that the benchmark
is representative of what is being observed
in production environment.
So then two years down the road, there was the second version of DORM benchmark
that got incorporated into MLPerf. And then just as we speak, we are working on version number three,
and that is incorporating a generative component into this next version of the ORM benchmark. And Lisa, you mentioned everybody
is talking about large language model. Is MLPerf also incorporating the large language model into
its suite? And I guess the TODR is yes. MLPerf training and inference benchmark had a GPT-3 application as part of the training benchmark.
And also more recently, Lama2 is incorporated into the inference benchmark suite as well.
To my knowledge, there's a lot of work going on to look at even the mixture of expert model as part of the training and inference benchmark.
Cool. That's awesome. That's awesome. I mean, because like, you know, I think, when I think
about the early days of my computer architecture career, the iterations of the spec by track were
like years and years apart. And so the fact that you're able to put out so many iterations and just
adopt the latest and greatest with each one, that's really, really great for, that's great for the community.
So kudos to you for doing that
and explaining it so well for our listeners.
And so maybe this would be a good time
to get back to your roots.
I mean, I think, you know,
I used to write profiles of women in computer architecture
and I would always ask everybody,
like, what's your origin story?
Here, let's do that for the podcast audience too what's your origin story with computer
architecture how did you get started you've had quite a varied career both as a professor and
now at Meta yeah wow this is going to be from a while back now I was a undergrad at Cornell
and I remember I actually was going to just have my undergrad thesis to be
on microelectronics and at Intel there is this co-op program which gave me the opportunity to
do two internships with Intel. I remember vividly that after my first Intel internship, I went to Jose Martinez at Cornell, turned out to be my
undergrad advisor. I was like, oh, I want to learn more about microprocessor design. And then from
that on, I became super interested in just what goes into processor design. And then I went to Princeton for my PhD dissertation, got to work with
Professor Margaret Martinozzi. Over there, I get to learn and advance the memory subsystems
for CPUs. And ever since then, I felt like this is computer architecture is the bread
and butter for the computing industry and this is the topic that
I spent I guess my is it almost 20 years of careers of mine looking at yeah that's just a
little bit about how I got started with computer architecture and you've taken it all the way to
thinking about like embodied carbon for hyperscale
data centers, which I think is super cool.
Like that's what I think is so great about computer architecture.
We all start out, you know, learning Hennessy and Patterson and, you know, about how to
have a out of order execution.
And then somehow a lot of us end up all over the place, right?
Like Souvenez working on TPU and I did a lot of work on data center type stuff.
And now here you are thinking about both MLPerf
and body carbon type work.
It can take you all sorts of places.
It does seem like this carbon stuff is in its early stages.
So presumably you're going to continue working on that.
Like what's exciting you going forward
with respect to that work?
Yeah. So we didn't have a lot of chance to talk about the role of AI. But really, if you think
about what we have talked about is AI is kind of the hero use case. If we were going to reduce,
find ways to reduce computing's carbon footprint, we must start with AI's carbon
footprint. That's where a lot of energy usage is coming from. And as what I mentioned, I think
efficiency is ever more important than ever. And the rule of efficiency is to reduce the
resource demand curve, which is increasing very quickly.
And so we must focus on efficiency optimization.
But focusing on efficiency optimization is only one component.
We have a lot of opportunities to go beyond that.
Something we didn't talk about very much is how do we make computing to be more flexible in order to facilitate better coordination with the power grid?
And I believe this is a new direction that deserves a lot more research efforts.
How do we instrument the main response between data center computing
and the power grid operator such that low compute workloads can happen
either during lower carbon intensity of electricity time of the day or computing can happen more
flexibly in order to meet the demand of the power grid operators.
Building that flexibility into the design of computing infrastructure can be very helpful
for various different reasons.
Sustainability is one of those.
I think that's an excellent thought.
I think seemingly simple ideas can have like fairly substantial impact.
For example, even with respect to TPUs, we think about scheduling the training jobs or large scale training jobs on data centers that say have access to nighttime wind energy. And that reduces the
carbon footprint or at least the operational footprint while you're running some of these
large training jobs. And so fairly seemingly simple idea, but it does have like manifest implications
on the bottom line.
Yeah, and I think it's really important
because there's a lot of renewable energy
that is being stranded in our power grid right now.
And finding ways to improve that utilization
will go a long way.
I think that's a great point.
You briefly talked about, you know,
the flexibility of our
computing ecosystems against the power grid. Maybe I can switch gears back from the technical into the
more professional trajectory landscape. So you actually donned multiple hats,
the technical themes ranging from microarchitecture, cash replacement policies,
all the way up to the data center. But you've also like straddled multiple worlds, being in academia, being in industry.
So perhaps you can talk about how we have managed to navigate these transitions,
how we have managed to be flexible as the entire ecosystem sort of changes around you.
Not easy question to respond to,
but then I guess if I were sort of summarizing it into a few sentences, I guess I think it's
important to sort of embrace where you are in your life right now.
And I felt like looking back, I have enjoyed being in academia, have the freedom to explore
is one of the best aspects of my academic career.
And you get to explore and advance technology
with some of the best young minds
that's developing their own career.
And I think that's a really great way
to sort of go through my career.
Now being in industry, especially at VEYer, I feel like I'm in such an exciting
environment that I get to tackle some of the most important real-world challenges at the
intersection between systems, AI, machine learning, energy, and sustainability. And that is such a
tremendously valuable opportunity that I treasure and I focus on these days.
And I get to do that with world-class researchers as well.
And so recognizing where we are, I guess there is a lot that we can focus on to make impact.
Maybe I can push you to sharpen your message a little bit,
like any words of wisdom that you will give to our listeners.
We have students, industry professionals spanning early career
to further on in their careers.
So any words of wisdom that you would have or advice
to our listeners of the podcast?
I would say once in a while, find something that's new
and that you are passionate about and spend extended amount of
time focusing on solving that problem. For me, I guess it was roughly like five to six years,
find a new direction that excites me and I get to focus on solving that problem for, you know,
five to six years and then new problems will come up,
you'll have the opportunity to figure out
what is that particular topic that you want to focus on.
And I guess as you are doing that,
I think it's really important to keep in mind
that solving these problems with a group of people
that I enjoy working with.
And so definitely take the opportunity to also develop others during the journey
and bring people with you.
And that's something I've been enjoying doing,
I guess, when I was in academia
and also now when I'm in research labs in industry.
And then finally, I felt like time is so precious.
There is a lot of happening in every one of, you know, every, you know, single ones of
our lives.
And so sort of spend the time intentionally on what is the most important problem that
deserve your attention and do that with,
you know, the group of people that you are working with, and move together, and make the impact that
you can to the world, is something that's very meaningful to me. Yeah, that's wonderful, because
I think a lot of times, as there's all sorts of discussions about how to get more people
interested in computing, of course, at this stage of discussions about how to get more people interested in
computing of course at this stage of our careers lots of people are interested in computing but
part of the question has been you know how do we make sure that the general image of what we do
is not like a bunch of people with coke bottle glasses sitting in basins typing on computers
and like just hacking or right and it
turns out that there are many sorts of real world problems that can be solved but through our
efforts and the fact that you can find such meaning in your work your technical work is
fabulous alongside the fact that of course you're you're a leader in the field and now as a as a
director at madhav you lead wonderful people and grow their careers as well as your grad students back when you were in academia.
So it's been really cool to catch up with you, Carol, and hear how things are going.
And this latest project sounds really exciting and really, really impactful.
And thanks so much for having me here and giving me the opportunity to share my two
cents and thanks for the great conversations with you, Lisa and Subhne.
Yeah, echoing Lisa, thank you so much for being on the podcast.
It's been wonderful speaking with you about this very important topic.
And to our listeners, thank you for being with us on the Computer Architecture Podcast. Till next time, it's goodbye from us.