Computer Architecture Podcast - Ep 16: Sustainability in a Post-AI World with Dr. Carole-Jean Wu, Meta

Starting point is 00:00:00 Hi and welcome to the Computer Architecture Podcast, a show that brings you closer to cutting-edge work in computer architecture and the remarkable people behind it. We are your hosts, I'm Suvina Subramanian. And I'm Lisa Hsu. For this episode, we were so thrilled to have Dr. Carol Jean Wu as our guest on the show. Dr. Wu is a director of AI research at Meta. She is also a founding member and a vice president of ML Commons, which is a non-profit organization that aims to accelerate machine learning innovations for the benefits of all. Dr. Wu also serves on the ML Commons board as a director. She has chaired the MLPerf Recommendation Benchmark Advisory Board and she is co-chaired for MLPerf Inference. Prior to Meta slash Facebook,

Starting point is 00:00:43 Dr. Wu was a professor at Arizona State University. Dr. Wu's expertise sits at the intersection of computer architecture and machine learning. Her work spans across data center infrastructures and edge systems, such as developing energy and memory efficient systems and microarchitectures, optimizing systems for machine learning execution at scale, and designing learning-based approaches for system design and optimization. She is passionate about pathfinding and tackling system challenges to enable efficient and responsible AI technologies. She joined us to discuss the explosion in the utilization of compute for machine learning and AI applications, and the ramifications for our world. Part of her latest work involved a deep consideration of how to improve the sustainability of all

Starting point is 00:01:29 the electronics in our lives, not just the devices in our homes and our pockets, but the compute fueling the AI revolution as well. She shared with us ways to think about the end-to-end carbon footprint of AI and the work that lies ahead to understand and reduce it. A quick disclaimer that all views shared on the show are the opinions of individuals and do not reflect the views of the organizations they work for. Carol, welcome to the podcast. We are so excited to have you here.

Starting point is 00:02:25 Long time listeners to the podcast know our first question is always, what is getting you up in the morning these days? It's really a few things. Spring just got here in Boston and that's where I am. And it's usually just a couple of months that we get to really hang out comfortably outside. So it's just an amazing time to enjoy, you know, the all the nice things, flowers and trees, and you know, before the leaves start changing color. So that's like what gets me up early these days, but really on a more serious note,, what really excites me these days and also worries me a little bit at the same time is that we are seeing such a significant growth in our usage of AI and such an exciting application use case, but at the same time, it's also demanding a lot of energy usage. And so this is the topic that I've been thinking a lot about. How do we bend the growth demand of AI technologies?

Starting point is 00:03:14 How do we reduce the demand in its electricity, natural resources, and all that? And so this is the research topic that I've been thinking a lot about. And how do we scale this important technology in a more sustainable way? And they are also all these little human beings in my life that I like to spend some time with before heading out to work. And then at work, I get to hang out with my colleagues, learn and really enable and up-level my contributions and impact and so that's what gets me up in the morning these days awesome great answer like very very rich answer like all dimensions of life right so like spring it's funny that you say like spring like we only have a little bit of time before the leaves change color and it's april that's kind of crazy to think about but i guess if you live in Boston,

Starting point is 00:04:06 like you said, also on a more serious note, you know, you're at Meta. Recently, there was that big announcement where Meta was going to like committed to buying, I forget the exact number, but like an absurd number of GPUs from NVIDIA, right? Like something to the tune of 600,000 H100 GPUs through the course of this year. Man, that's good for NVIDIA. And what are the energy implications? Is that the sort of thing that you're thinking about? We're using this incredible volume of stuff and how are we managing it? Yeah, so I guess it's a shocking number in the sense that, well, why do we need so much compute? There's so much technology and advancements that goes into the design of these GPUs. And really, these GPUs are there to improve the efficiencies of AI technologies.

Starting point is 00:05:03 Power is a limited quantity. And so if we want to continue to scale AI's capabilities, we want to make sure that these AI systems that we're using can bring us much higher energy efficiency, power efficiency for the compute that is enabling various different products and meta. And so if you think about the product that's being supported by just deep learning in general,

Starting point is 00:05:29 News Feed Ranking has been used, deep learning technologies, video recommendations, and these days everybody's talking about large language model and other generative AI technologies. All of these are being trained on high-performance GPU. You may even remember, I think this is the very first podcast that you have with Kim Hazelwood.

Starting point is 00:05:51 There, she was talking about how the various different deep learning technologies that's happening at Meta and how that supports the different products that's in the company. And so these GPUs are here to sort of help advance the state of products. And at the same time, if you look at how AI is being used across the board, AI is

Starting point is 00:06:13 in a lot of scientific domain. AI is used in discovering electro catalyst materials to find how to more efficiently store renewable energies, for example. FAIR has a project called Open Catalyst, and that's exactly what the North Star is, is to use AI in the scientific domain to improve the efficiency of energy storage. And that has a huge potential to tackle climate change challenges. You probably also hear Microsoft has this FarmBeats project that's using AI to improve farming efficiencies.

Starting point is 00:06:51 And I think that's really cool. This is exactly what computing can help tackling some of the most important sustainability challenges that's facing the world. Yeah, so that's really interesting. So first, like a side note is AI is everywhere, right? So I feel like I know all these people who are not technologists, who are not technical people, not in our field, and they're using AI for whatever. I have people who are doctors,

Starting point is 00:07:19 and they're using AI in their research for helping elderly people know when they need to go see a doctor or whatever. It's all over the place. And so I remember when I was in grad school, I had this random idea to do like a wacky paper, which I didn't do. And it was like, how do we reduce power consumption in computer architecture? And it was like, oh, we can reduce power consumption in computer architecture by stop running a million simulations every day to, to do research on how to reduce power consumption. Because like, just that, we're like, we wasted so many cycles. Oh, send out 10,000 jobs. Oops. I forgot to put a print here, kill it all. And then do it again. And so,

Starting point is 00:07:59 so there's that. And so what you were just saying just now a little bit was like, we're using all this compute to potentially do research on how to improve climate situations, right? So like, one possible way to do it is to like not do it at all. And I know that that's like probably not what we're going to do. But you asked the question, why do we need all this compute? And part of the compute is to figure out how to maybe use less compute or use the compute more efficiently. How I would like to think about this problem space is that if climate change is what we recognize as the most important challenge facing the world and facing this century,

Starting point is 00:08:40 then we must look at where carbon emissions is coming from or where greenhouse gas global warming potential is being generated. And if you look at where carbon emission is coming from, there are two significant contributors when you look at the overall industry sectors. One is from the agricultural industry. The other is coming from the energy usage of heating and cooling of buildings in general. And this is a place where I feel like computing can really help to drive down the carbon dioxide that's being generated or the other chemical gases that's being generated. And so a lot of our colleagues are working on these problems. But in order to advance the state of these, the solution space, it will then require a lot of compute.

Starting point is 00:09:37 And when it requires a lot of compute, then it's going to demand a lot of electricity or other type of energy. And I felt like that's where there is a very similar problem that we are facing in the AI industry. I think there are a lot of problems that can be accelerated by the use of AI, and therefore it is propelling a lot of research that goes into pushing the frontier of AI capabilities. But at the same time, the compute demand and the energy demand of this AI research phase is becoming so significant that it's not something that we can easily ignore anymore. And so wouldn't it be nice if we can have,

Starting point is 00:10:18 we can make computing to be a lot more sustainable, make it a lot more environmentally friendly, while at the same time letting computing to solve some of these most important challenges that we are facing in the world. Right, I think you painted a great picture of the capabilities and the promise of AI, and at the same time it's counterbalanced by the energy demands of AI itself in the near term. And maybe we can double click a little bit on that. So perhaps we can paint a picture of what does the energy footprint look like for AI currently?

Starting point is 00:10:51 How has that been trending over the last few years? And what do the trend lines look like moving forward? And perhaps we can expand on what are the components that go into the energy use itself? In particular, you've written papers and blog posts on the operational energy use and there's been a lot of focus on that and then there's the embodied carbon use as well. So perhaps you can educate our listeners on these various components and how these things are evolving as time progresses. Yeah, happy to share a little bit on sort of our journey in this green AI studies, in the sense that about three, four years ago, couple of us become really interested in

Starting point is 00:11:33 sort of like look at the big picture about where the energy consumptions of AI and computing in general are being consumed and where the carbon footprints of AI and computing in general are being produced. And in this particular characterization analysis, we must start with a methodology and we can take the lifecycle analysis as an example. So when we start looking at, for example,

Starting point is 00:12:05 the lifecycle emissions of a computing device, we can use the lifecycle analysis methodology. We can look at, they are generally four important phases of computer systems. Usually the product is being produced in the manufacturing phase, and then this product then will be transported to a location where it

Starting point is 00:12:25 will be used and then there is the product use phase eventually at the end of the life cycle of this product in terms of computer then it will be recycled or up cycle for second life so when we look at computing carbon footprint i guess the primary phase that we must focus on is the manufacturing phase as well as the product use phase. The manufacturing phase is where the embodied carbon or manufacturing emissions of the computer devices come from. This is what we call the embodied carbon footprint. Now, when the device is being designed, manufactured, and is installed into say data center or shipped to a user's hand, then during the life cycle of that computer devices, there is also the corresponding operational energy consumption or the operational carbon footprints.

Starting point is 00:13:21 Right, and when we started looking at the breakdown of the embodied carbon and operational carbon, we were quite shocked to find that the ratio between embodied to operational carbon footprints for consumer electronics like a smartphone is about 80 to 20 percent breakdown. Most of the carbon footprints of your smartphone is coming from manufacturing emissions. There's so much energy it requires in order to manufacture that smartphone that we have in our pocket. And what does that include? Does that include like the mining of the precious metals or that kind of stuff? Or is that really just the manufacturer? Or is it even just the sourcing of the materials?

Starting point is 00:14:08 Well, really good point, Lisa. So if we start looking at these supply chain, it's actually really deep. And so mining the minerals that's using these smartphone SOCs will be part of the embodied carbon emissions. So it really depends on what is being captured in the analysis itself. But if you look at Apple's sustainability reports

Starting point is 00:14:34 for iPhones, you will find that about 80% of the carbon is coming from manufacturing or embodied carbon, and about 33% is coming from semiconductor IC manufacturing. These are the manufacturing of the application processors, of the DRAM, of the NAND flash storage devices that goes into the smartphone itself. They are all the additional components that need to manufactured, for example, the display, the battery, and all of that, that also is accounted for as the embodied carbon of the smartphone.

Starting point is 00:15:11 And the operational usage is less than 20%. Is that because people only use their phones for two years before they upgrade? That's part of the question. Exactly how often we upgrade our phone will have implications on how the upfront and body carbon costs can be amortized. And so, in fact, on average, I think each of us will use our phone for two to three years only. And if we were able to extend the lifetime of the smartphones, it can amortize that upfront embodied carbon cost that goes into the manufacturing. Gotcha. Well, I'm on the fourth year of my iPhone SE because I do not need the latest and greatest. So maybe I'm doing my part.

Starting point is 00:15:56 Right. So you talked about operational and embodied carbon use. And as you rightly pointed out, you can't optimize or improve what you can't measure. So the first step is of course measuring it and that is shed some light on what are the different components of our carbon emissions to our supply chain plus our operational use. Maybe switching to the other side of the coin, now that we've done this measurement and we've done this characterization, how do we think about opportunities to improve this? How do we improve the state of the world moving forward? What are opportunities that you see maybe across various layers of the stack or various parts of the entire pipeline in the computing industry?

Starting point is 00:16:32 Yeah, I have to share a little bit more on this. So I would say kudos to Apple, who does a really good job on quantifying the embodied operational carbon footprints of their own device, but really being able to quantify carbon emission is a really complex process. And I don't know if our computing industry or just the computing community is ready for that. We don't currently have the tool, we don't have all the metrics there for us to understand first, what is the carbon emissions of computing devices, not to even mention hyperscale data centers. But I think this is something that computer architecture community has a lot of experience

Starting point is 00:17:17 with. We are very good at building performance models that helps guide the design by optimizing for performance of our system hardware. We have a lot of experience to build power model, to quantify power, and then energy in order to guide energy efficient computing. But carbon emission is like a step more complex in the sense that depending on how energy is being generated, the overall carbon emissions of the system usage is going to vary. And so I felt like this is where our community can invest in a little bit more, understanding how to quantify carbon emissions of computing in a more systematic way and then build the tools such

Starting point is 00:18:06 that we can replicate and enable others to do the measurements systematically and as what you say subne the first step to optimize is actually to be able to measure once we have the tools once we have the methodology once we have the metrics then we can then go figure out how to optimize our systems for minimizing carbon footprint in this particular case. What are some challenges and maybe pitfalls that you see in the process of collecting these metrics, right? Like both in terms of our methodology or in terms of the metrics that we choose to annotate in this particular exercise? Any particular drawbacks that you've seen?

Starting point is 00:18:45 And how can we as an industry improve that? And what tools do we need or what alignment do we need across the industry to enable that? Yeah, happy to share my thoughts on this aspect a little bit more in the sense that we talk about there is the operational carbon emission that get produced during the product use phase or during the operational phase of a computer. And there is the manufacturing or embodied carbon

Starting point is 00:19:12 that get produced during the making of the hardware devices itself. I feel like our community hasn't agreed on whether the goal should be minimizing operational carbon footprints or minimizing embodied carbon footprints or minimizing the total of the two and even that I felt like there's a lot of debate going on in our community and so this is where I felt like a lot of conversation a lot of more research is going to help us decide under what circumstances it makes sense to reduce total carbon, under what circumstances it makes sense to reduce embodied carbon, and under what use case it actually makes sense to just focus on reducing operational carbon. Got it. Can you

Starting point is 00:19:59 shed some light on what data is actually available right now or what data is reported by companies? For example, you talked about how Apple does publish like their data on the operational versus embodied carbon use uh how about across the industry across different companies what is the current state of the world what can we improve in terms of reporting and just data that's available for people to look at yeah so um in on the consumer electronic sites currently, I think the consensus is to use the lifecycle analysis or LCM methodologies to understand the carbon available sustainability reports by companies like Google, Meta, Microsoft, you will be able to find these companies use a different methodology that's defined by the greenhouse gas protocols, GHG protocols. In the GHG protocols, there are three different scopes of gases that is part of the reporting structure. There's the scope one emissions, scope two emission, and scope three emissions. Scope one emissions is the direct emission that could produce by energy use on site. So for example, the diesel generators that's used as a backup power in data

Starting point is 00:21:26 centers will be producing scope one emissions during power outage. The scope two emissions are indirect emissions from electricity usage. So for example, the power grid will be generating electricity that would then eventually be used by the data centers run by Meta, Google, or Microsoft. And that will be Meta's scope 2 emission, scope 2 indirect emission. Finally, there is the scope 3 emission that is accounting for the upstream and downstream emission usage. Therefore, the semiconductor manufacturing emissions that goes into making any IT equipment that eventually goes into hyperscale data centers

Starting point is 00:22:14 are that scope 3 emissions. Now, taking Meta's sustainability reports statistics, we'll find that the scope two and scope three emissions for computing equipment is about equal split. What that's saying is that the embodied carbon for the data center infrastructure is about the same amount as the operational scope two emission. So this aggregated carbon emission data is available. The challenge is coming from how do we do a better job on breaking it down for different

Starting point is 00:22:54 processors that we bring into the data centers? How do we break it down for all the DRAM memory devices? Can we break it down even further for all the DRAM memory devices? Can we break it down even further for all the storage equipments? If we are able to do so, if we have high quality data to do so, then we will be able to think about how to provision data center computing devices

Starting point is 00:23:17 to minimize carbon footprints or to figure out how to trade off operational carbon with embodied carbon, and how do we build a data center that can come with minimum total carbon footprint, for example. So that's really interesting, Carol. And it's making me think a little bit about the notion of transparency, except in this case, the scope is like super large, right? So, you know, back when we were coming up, right? So, you know, back when we were coming up in the early days, you know, you would have the notion of being really transparent or not about your microarchitecture, right? There's your special sauce, and that's what makes

Starting point is 00:23:55 this particular microarchitecture extra fast, because you did this to the fetch unit, you did that, and the manufacturer would have to kind of maybe walk this fine line of being able to say hey here's the programmer's manual here's the thing here's the way that you need to write the code in order to make your code faster and you they'd want to do it in a way that like let people actually run their code faster but not give away so much that it explains exactly how the microarchitecture is you know better, better than somebody else's, for example. And in this case, it seems to me like, you know, of course, in order to have all these tools and all these metrics, you want radical transparency all the way up and down the supply chain so you

Starting point is 00:24:37 can make these trade-offs appropriately. At the same time, I can imagine many of these companies up and down the supply chain may or may not want to open the covers and explain, oh, this is how we do it, this is how we don't do it. And that makes it very hard to sort of do the top level optimization, right? So like, if you imagine you don't understand anything about a microarchitecture, you can't tailor your code

Starting point is 00:25:00 to make it better, right? So in this case, it's like, you don't understand anything about how the embodied carbon is being produced throughout your supply chain. It's very hard to make trade-offs. So how are you thinking about this sort of challenge? I can imagine. I mean, even within meta, like everything you were just saying there, there's scope two and then there's scope three, which is components that are being sourced from

Starting point is 00:25:20 elsewhere, right? So, so can you talk a little bit more about that? Yeah, this is a really great question in a sense that how much transparency helps and really how much transparency do we need to actually minimizing, to minimize the carbon footprints of computing in general. I think there is actually a lot more that we can do. now is just non-existence right when we make a decisions about what particular skew of cpus or systems that we want to bring into the data center we will have a way to figure out what is the performance improvement what is the energy efficiency again i'm going to get from this next generation of hardware. And then we get to make that decision based on some data, based on some trade-off between

Starting point is 00:26:09 performance, cost, energy efficiency, and others. But right now, we are missing carbon in this conversation. And the reason why we are missing the information on carbon footprints on either manufacturing side of things or on operational side of things is that it is not yet a metric. There isn't a metric that people know and use to report carbon emission results. It is not something that hardware vendors collect data for. They may know, but they may not measure that. And therefore, it is not the kind of information that can be shared easily with users or customers of the hardware devices. So I guess once we start getting into a cadence of measuring the emissions across various different levels of

Starting point is 00:27:06 the supply chain, then when customers ask about, hey, what is the environmental footprint of this particular hardware equipment, that answer can be provided. And that answer then can be factored in in the purchasing decision. It may also be factored in in our design space exploration decisions as well. But I guess one step at a time is perhaps we should start with understanding how to measure the environmental footprint of computing. And second, when we are going to procure hardware equipment into data centers, for example, we will have that piece of information and we get to decide how to use that piece of information. And then finally, once the information is becoming stable, it can be supplied in a consistent way, then perhaps we can then start thinking about how to optimize the computing infrastructure

Starting point is 00:27:57 with that information in mind. Right, makes sense. I think motivating the entire industry and setting up incentives so that this reporting becomes valuable across the entire chain sounds like a good direction to pursue towards enabling this. And clearly, there are still gaps in this overall ecosystem that we can work towards as an industry. While we are articulating the gaps, and it's certainly something to look forward to in terms of improving the state of the world, maybe we can also reflect on some of the wins that we have had, because as an industry, a few years back, for example, we were focused on operational energy use, and the conversations in the community overall has translated into meaningful impact.

Starting point is 00:28:34 So just as a way of inspiring and motivating the community to continue working towards this and bridging these gaps, maybe we can reflect and think about what are some of the wins that we have had over the last few years by shining a spotlight on maybe some aspects of this overall ecosystem. In particular, I think the operational energy use is something that has received a lot more attention. And so there's been deliberate trust towards improving this from various companies. Perhaps you can share some success stories or good case studies in this particular realm on how we did that, what has been the material impact of that maybe over the last few years in improving the energy

Starting point is 00:29:10 efficiency at least in one end of the spectrum across the different components that we have in this space. Sure, I guess maybe I can start with developing metrics and developing tools to enable the communities to be able to measure and quantify, embody and operational compounds of AI or systems, computing systems in general. So as what I mentioned, when we started this green AI journey, sustainability journey, we immediately run into the bottleneck of not having the tools to help us quantify and estimate where the carbon footprints are, what's the biggest bottleneck of computing carbon footprints. And so we started with the initial work on something we call chasing carbon. Over there, we start looking at just understanding where carbon footprint is coming from for data center computing.

Starting point is 00:30:09 And there, we also look at where carbon footprint is looking like for consumer electronics like smartphones. And we also dig into semiconductor manufacturing's carbon footprint, carbon emissions in general. And over there, it gives us some rough idea about how significant manufacturing, semiconductor manufacturing's carbon footprint is like, and in order to enhance our community with the tool, we went ahead and developed a first of its kind carbon modeling tool, which we call ACT, which stands for Architectural Carbon Modeling Tool. The idea here is that if we can take a look at what is the semiconductor manufacturing emissions for Logix IC manufacturing for various DRAM across different technology node and for various different storage technologies, then perhaps we can build this carbon modeling tool that can be

Starting point is 00:31:16 used to guide the design space exploration for hardware design. And we use AI as a application use case to look at what happens if I provide this suite of workloads with some performance target in mind. And then now, if I were going to design a new hardware systems to minimize lifecycle emissions, along with other optimization objectives like performance, power, and cost as the design dimensions, what kind of design would I get if I were putting KaBang as the first-class design citizen? And it was a very interesting journey as we go through this analysis. First, we figured that, well, we don't actually have a carbon metric that our community have consensus on in order to like sort of optimize for. Should we minimize and body carbon or should we figure out how to minimize the total carbon, which in this case will be either the ratio between embodied and operational carbon footprint, or should it be like the sum of embodied versus operational carbon footprint? Like, what are we reducing? What are we going to optimize for? Regardless, we build the carbon modeling tool. And we describe the tool itself in our ESCAP 2022 paper.

Starting point is 00:32:49 We open sourced the tool, which we hope that can help the research communities to continue pushing these directions forward. But at the same time, being industry now, I guess we do have a team of colleagues in the sustainability team and also others who become very interested in understanding and be able to advance the tool itself. So in parallel, there is this conversation through the iMac sustainability program, where many industry companies are part of, including Google, Microsoft, Amazon, Apple, and others. The idea here is that in addition to this research tool, it would be really good if we have a high-quality production-grade carbon modeling tool that everybody can use to quantify

Starting point is 00:33:42 the embodied carbon of IT equipment that's installed in any data centers. Right. And so IMAC then had this net zero program. The idea for them is to be able to instrument the fabrication process of various different semiconductor technology nodes. And then they will then be able to provide a tool that can be used by all the manufacturers, by all the hyperscale data center operators to quantify the embodied carbon that goes into the data center equipment. And so there is that that's happening. And then also through internal collaboration within Meta, we have colleagues that's building and operationalize the concept of carbon evaluation methodologies

Starting point is 00:34:36 that can be used for understanding the accounting and understanding the design space trade off internally. I think tools and methodology are an excellent example of things that, you know, improve the entire community. It takes us from zero to one and enables an entire community. So I'd like to essentially highlight that that's a great contribution. In particular, I think just like we've had power modeling tools, like Orion or Descent or Cacti, I think there

Starting point is 00:35:06 are both components of the equations that go into various factors, as well as, I think, constants, which is sometimes underappreciated. For example, in the case of power modeling, it could be device factors like capacitances, frequency of operation of devices, and so on that help us understand power modeling. I'm sure in a similar manner, the ACT tool that you've put out has a variety of different

Starting point is 00:35:27 constant factors on what is the carbon footprint during the manufacturing process for certain components in the ecosystem. Just providing that range and, hey, these are the different factors that you need to consider and templatizing that is a great value add to the community because now people can start reporting those particular metrics. They have a broad sense of, okay, what's the rough rule of thumb on where does this particular metric land in terms of range and so on. So I think that's an excellent educational resource as well, but also helps essentially move the community into exploring this problem really

Starting point is 00:35:58 well. Like tools are an excellent avenue to enable this kind of research and further insights? Yeah, I hope that this first step is helpful, but I also want to double down on our prior conversations about, I think a lot of these modeling tools is going to require high quality data. And so I do think that our industry has a very important task in hand, in the sense that how do we collect higher quality data? And how do we measure the carbon footprints of computing across this entire supply chain? And I think the more that we know how to measure and we can provide the data is going to help improve the prediction quality for this carbon modeling tool or to any modeling tool that we will build to guide the design space exploration. Yeah, that's really interesting because I think, you know, one of

Starting point is 00:36:56 the things that was in vogue when the power tools were a sort of new and like a ripe area of research is people would enjoy finding wrong assumptions inside of them and saying like oh you know gosh there's this thing that's wrong in whatever cacti watch it didn't matter which tool people would love to sort of find those wrong assumptions but the fact is when you take that step, you do need to bake in certain amounts of assumptions. And like over time, you know, they get found and then the tool improves. I am curious about this, like some of the assumptions that are currently being baked in into this ACT tool, for example. Because I know we haven't talked as much about operational carbon right now because of your stat that said, gosh, the carbon is very much dominated by this embodied carbon. But even in the sense of operational carbon, what are some

Starting point is 00:37:52 of the assumptions about what that usage is? Like, what is the assumption on how much load is being used over the lifetime, for example? So, you know, if you assume that a server is being run at spec 100, 100% of the time, or if you assume a server is being run at spec 10, you know, if you assume that a server is being run at spec 100, 100% of the time, or if you assume a server is being run at spec 10, you know, we've had a lot of data at various conferences that show that a lot of servers are running at low load a lot of the time. So what kind of stuff is baked into the tool operational usage? Yeah, great question. Maybe we can use the smartphone example that we talked about previously. So one of the reasons why most of the carbon is in the hardware for smartphones is because of users' usage pattern. So imagine like when, you know, we do not run spec workloads on our smartphones,

Starting point is 00:38:42 really the usage pattern for smartphones is it's sporadic user activities. Most of the time, the smartphone is idle. And the smartphone SoC is designed for maximizing the energy efficiency during those sporadic user activities. And in order to do that, I guess Apple probably devote tons of accelerators in its SOC in order to maximize that operational efficiency. And it is because of that design decision that contributes to 80% of the carbon footprints that would go into embodied carbon and only 20% of the carbon footprints of your smartphone is coming from operational carbon. Just to make sure I understand you there, you're saying that essentially if we used our smartphones differently where we had a constant load of then there would not be the need to have specific accelerators to handle spikes. That is what leads to more embodied carbon.

Starting point is 00:39:45 So for example, you need an NPU to be able to do this because when somebody wants something, they want it fast. Because when the user actually picks up their phone, they expect an immediate response, even though most of the time it's not doing anything. So I think what I heard you say is essentially, there's all sorts of accelerators to handle the various little spikes in activity that people use.

Starting point is 00:40:05 And as a result, just having all that kind of componentry, if you just had like one little CPU on the SoC, it would not be so much embodied carbon, but it's because there has to be accelerators to handle all the different things that people do with it and want it done fast because they've picked up their phone. That's what leads to the embodied carbon. Is that fair? That's one contributing factor that contributes to this. The other is our smartphones are constrained by battery capacity. And so when not only that we want to minimize latency, we also want to minimize energy use. And therefore, again, similar situation as what you brought up Lisa, is that we then will incorporate all these accelerators to maximize energy efficiency, to reduce the latency, to reduce the overall energy usage. And it's that design space trade-off that contribute to the majority of the carbon footprint to come from embodied carbon, and only a small percentage is coming

Starting point is 00:41:07 from operational carbon footprints. So that's a super interesting trade-off because for so long, we've been drilling into like, we wanna reduce our energy, you just reduce energy usage, reduce energy usage, reduce energy usage, because that is the green thing to do. And what you're saying here is that if you zoom out,

Starting point is 00:41:26 the very fact that now we have to manufacture four, five, six, seven different accelerators in the SOC, that is actually potentially less green. And maybe if we left things well enough alone and just put in a beefy CPU in there and said, hey, we'll burn operational power, but we only have one component. That's the kind of design space exploration this tool is intended to seek out.

Starting point is 00:41:51 So what have you found? Is it better to have... I mean, with the rough assumptions that you've got baked in now since you don't have good data, what's it finding? Yeah, so I think now we have the tool, then exactly how to put the tool into discovering hardware design space for various use cases is exactly what is becoming available right now. I think the optimal decisions obviously is going to be use case dependent. We cannot

Starting point is 00:42:21 in on one extreme, we can come up with the greenest hardware design that has the least amount of embodied carbon but that comes with the cost of perhaps degraded user experience and nobody wants to sort of have a phone that doesn't finish any tasks that you know i have in mind and so there is this trade-off and we'll have to figure out how to basically come up with a design space come up with a design decision that not only minimizes carbon but at the same time meet the performance requirements meet the power requirements meet the cost requirements and now with this particular tool, ACT, you will be able to do the design space exploration having carbon as a metric that you can optimize for.

Starting point is 00:43:16 Cool. Yeah, I think that makes a lot of sense, yeah. Having carbon as a first-class citizens in the same vein as performance or energy consumption can potentially lead to different kinds of design choices and optimizations for different use cases. That's our hope with this particular tool. Right. So we've spent a lot of time talking about carbon footprint and so on, and we recently touched upon the trade-off with respect to user experience and performance

Starting point is 00:43:42 in the particular real-world scenario. You've also done a lot of work on just improving performance of various accelerators, various computing systems. So maybe this is a good time to switch gears into that component. You've looked at various optimizations across different categories of ML models, including recommendation models and more recently, LLMs. So how do you think about this computational efficiency? How do you improve the raw performance itself? What are some themes that you've observed over the last few years?

Starting point is 00:44:13 And what are some avenues that you see for improving computational efficiency and performance moving forward? Yeah, happy to share a little bit sort of my observations of where AI, deep learning, and in general, some of the most important workloads are going. So I would say as a computer architect by training, one of the things that we do is to do workload characterization. Let's understand what is the characteristic of workloads that we are designing systems for and use that observation to guide design decisions, either it be large scale, cloud scale computing device, computing infrastructures, or handheld devices,

Starting point is 00:44:59 smartphones, AR, VR equipment. So looking back, I guess when the first podcast that was carried out with Kim Hazelwood, she provided a very interesting stats that drive all the investments into deep learning recommendation model space is that when you look at where compute cycles being spent at companies like Meta, back then, more than 80% of compute cycle for machine learning

Starting point is 00:45:33 inference is coming from deep learning recommendation models. And as a computer architect, that sounds like, you know, that's the most important workloads that we should focus on if we can make this deep learning recommendation model inference to run much more efficiently, then it will translate to lower energy usage. It will translate to lower capex cost. It will translate to overall more cost effective computing infrastructures. And that's exactly the kind of exercise I think we should all aspire to do,

Starting point is 00:46:10 is to look at the big picture, look at where your compute cycle is coming from, and use that to guide efficiency optimization efforts. These days, we hear a lot about large language model or generative AI technologies in general. And this LLM is interesting in the sense that it's a very unique type of workflow as compared to deep learning recommendation models. First, the scale of compute requirements for LLM is much larger as compared to

Starting point is 00:46:42 the amount of training requirements or training accelerators required for DORM model training. And that comes with a lot of implications on the kind of computer systems that we would design the training workloads for. So that's number one. Secondly is that on the inference side of things, the compute memory and networking requirements for large language model also look quite different as compared to DORM. If you remember, DORM is very embedding memory capacity heavy, whereas large language model, the trade-off

Starting point is 00:47:24 is different. And so if you were in the business of designing AI accelerators, I think that gives you a little bit hints on how to balance the compute memory and networking ratios for the systems that you provision the various different type of deep learning workloads for. And I think as we were talking about the energy demand of AI and how that's a really significant amount, I think this is where efficiency optimization can really help. If you think about how to sort of bend the resource usage curve of AI in general, if we can be more efficient across the

Starting point is 00:48:09 across the entire system stack, not only that it will translate into lower operational energy consumption, it also has the impact on much lower operational carbon footprints. So I think there is a lot that we can do to bend the AI's energy demand by just focusing on making training and inference to run more efficiently. But there's also a lot more that we must do beyond efficiency itself for the reasons of not only efficiency is only addressing the operational phase of the computing infrastructure. There's all the additional embodied carbon

Starting point is 00:48:55 that goes into just building and developing the hardware itself to develop the data center infrastructure itself. Right. I think that's a point well taken. And as you said, it requires the entire industry to come together. Now, one of the other efforts that you've been involved with that involves getting multiple players

Starting point is 00:49:14 from industry to sort of align and move forward is the MLPerf effort under MLCommons. I think the original use case was just to get a standardized set of benchmarks so that everyone can, you know, everyone can agree on what they're actually comparing performance for. And of course, MLPerf has evolved with the years, you know, with introducing like inference based workloads, training based workloads.

Starting point is 00:49:35 And there are, you know, experimental and research tracks as well these days. So perhaps you can tell us a little bit about what's happening in the MLPerf community, what's on the horizon in this constantly changing landscape of AI and accelerators and compute. Yeah, happy to touch a little bit on MLPerf also. I felt like there is a lot of synergy between what we've been talking about here and also with the spirit of MLPerf. So maybe a little bit background on MLPerf itself. This is an effort that started in around 2017, 2018. At the time, there were just so many different AI accelerators from startups, from various different companies are developing their own AI accelerators as well. It was really hard to make Apple to Apple comparisons across these different AI accelerators.

Starting point is 00:50:31 I think at the time, I remember Microsoft was looking into a PGA approach. Google was developing their TPUs and NVIDIA GPUs and Meta we heard about the MTIA accelerators more recently. And there wasn't consensus about how to compare these various different AI accelerators or system offerings in a systematic way. And the north star of MLPerf is to basically develop a suite of benchmarks that we can use to make this Apple to Apple comparison across all the systems that's out there. And as what Dave Patterson shared, if we can put a competition out there, that's one of the best way to drive forward progress for the computing industry. And therefore, tons of work went into the creation of the MLPerf benchmark. And today, I was just looking at the number of submissions five, six years later, already there were six, seven thousands of

Starting point is 00:51:47 submission results that went through MLPerf benchmark suite. And that is a really good way, open competition is a really good way to advance the innovation in the AI system design space. And so I'm really proud that we've come a really long way to first build the first iteration of the benchmark. And given the fast evolution of our machine learning industry, there's so much work that goes into upgrading each iteration of the MLPerf benchmark in order to sort of drive the innovations in the AI system space. It's really cool. So question for you,

Starting point is 00:52:31 you just touched a little bit about through the evolution just during the lifetime of this podcast, which I guess is now unbelievably four years old. But our very first one, you know, we talked to Kim. She talked about how there was so much deep learning recommendation models going on at Facebook. And now, obviously, LLMs are the new kid in town. And like you just said, their characteristics are rather different from how you would describe the characteristics of deep learning models. So with MLPerf, how do you adopt these? Because the landscape changes so fast, right? Spec doesn't change that fast. MLPerf, I assume, changes very fast. That's one question.

Starting point is 00:53:10 And the second question is, even in the sort of classical computing, computer architecture world, we had things like spec, but then we also had like HPC workloads, where the intent is these are much larger workloads where there's a lot of intercommunication between different nodes. And for things like LLMs and for some of these really large models, I assume they have to have that sort of scope as well, where it's like the intention of the kind of system that they're running on is different. Do you accommodate both of those things in ML Perfer? How do you kind of deal with the vast range and scope and size of these different ML models? Yep, great question.

Starting point is 00:53:46 One of the most important features for any benchmark that we will build is it has to be representative of the current use case, the use case that's in production. And this is something that the MLPerf team takes to heart. And some of the examples about how this is embedded into the various different iterations of the MLPerf benchmark is taking DORM as an example. The first version of DORM was brought to MLPerf in 2019, 2020-ish timeframe. There was the second version that was built as a collaboration between Meta, NVIDIA, Intel, Google, and others is to incorporate this additional component called the cross network.

Starting point is 00:54:34 And the whole point is to make sure that the benchmark is representative of what is being observed in production environment. So then two years down the road, there was the second version of DORM benchmark that got incorporated into MLPerf. And then just as we speak, we are working on version number three, and that is incorporating a generative component into this next version of the ORM benchmark. And Lisa, you mentioned everybody is talking about large language model. Is MLPerf also incorporating the large language model into its suite? And I guess the TODR is yes. MLPerf training and inference benchmark had a GPT-3 application as part of the training benchmark.

Starting point is 00:55:29 And also more recently, Lama2 is incorporated into the inference benchmark suite as well. To my knowledge, there's a lot of work going on to look at even the mixture of expert model as part of the training and inference benchmark. Cool. That's awesome. That's awesome. I mean, because like, you know, I think, when I think about the early days of my computer architecture career, the iterations of the spec by track were like years and years apart. And so the fact that you're able to put out so many iterations and just adopt the latest and greatest with each one, that's really, really great for, that's great for the community. So kudos to you for doing that and explaining it so well for our listeners.

Starting point is 00:56:12 And so maybe this would be a good time to get back to your roots. I mean, I think, you know, I used to write profiles of women in computer architecture and I would always ask everybody, like, what's your origin story? Here, let's do that for the podcast audience too what's your origin story with computer architecture how did you get started you've had quite a varied career both as a professor and

Starting point is 00:56:33 now at Meta yeah wow this is going to be from a while back now I was a undergrad at Cornell and I remember I actually was going to just have my undergrad thesis to be on microelectronics and at Intel there is this co-op program which gave me the opportunity to do two internships with Intel. I remember vividly that after my first Intel internship, I went to Jose Martinez at Cornell, turned out to be my undergrad advisor. I was like, oh, I want to learn more about microprocessor design. And then from that on, I became super interested in just what goes into processor design. And then I went to Princeton for my PhD dissertation, got to work with Professor Margaret Martinozzi. Over there, I get to learn and advance the memory subsystems for CPUs. And ever since then, I felt like this is computer architecture is the bread

Starting point is 00:57:41 and butter for the computing industry and this is the topic that I spent I guess my is it almost 20 years of careers of mine looking at yeah that's just a little bit about how I got started with computer architecture and you've taken it all the way to thinking about like embodied carbon for hyperscale data centers, which I think is super cool. Like that's what I think is so great about computer architecture. We all start out, you know, learning Hennessy and Patterson and, you know, about how to have a out of order execution.

Starting point is 00:58:18 And then somehow a lot of us end up all over the place, right? Like Souvenez working on TPU and I did a lot of work on data center type stuff. And now here you are thinking about both MLPerf and body carbon type work. It can take you all sorts of places. It does seem like this carbon stuff is in its early stages. So presumably you're going to continue working on that. Like what's exciting you going forward

Starting point is 00:58:43 with respect to that work? Yeah. So we didn't have a lot of chance to talk about the role of AI. But really, if you think about what we have talked about is AI is kind of the hero use case. If we were going to reduce, find ways to reduce computing's carbon footprint, we must start with AI's carbon footprint. That's where a lot of energy usage is coming from. And as what I mentioned, I think efficiency is ever more important than ever. And the rule of efficiency is to reduce the resource demand curve, which is increasing very quickly. And so we must focus on efficiency optimization.

Starting point is 00:59:30 But focusing on efficiency optimization is only one component. We have a lot of opportunities to go beyond that. Something we didn't talk about very much is how do we make computing to be more flexible in order to facilitate better coordination with the power grid? And I believe this is a new direction that deserves a lot more research efforts. How do we instrument the main response between data center computing and the power grid operator such that low compute workloads can happen either during lower carbon intensity of electricity time of the day or computing can happen more flexibly in order to meet the demand of the power grid operators.

Starting point is 01:00:27 Building that flexibility into the design of computing infrastructure can be very helpful for various different reasons. Sustainability is one of those. I think that's an excellent thought. I think seemingly simple ideas can have like fairly substantial impact. For example, even with respect to TPUs, we think about scheduling the training jobs or large scale training jobs on data centers that say have access to nighttime wind energy. And that reduces the carbon footprint or at least the operational footprint while you're running some of these large training jobs. And so fairly seemingly simple idea, but it does have like manifest implications

Starting point is 01:01:05 on the bottom line. Yeah, and I think it's really important because there's a lot of renewable energy that is being stranded in our power grid right now. And finding ways to improve that utilization will go a long way. I think that's a great point. You briefly talked about, you know,

Starting point is 01:01:24 the flexibility of our computing ecosystems against the power grid. Maybe I can switch gears back from the technical into the more professional trajectory landscape. So you actually donned multiple hats, the technical themes ranging from microarchitecture, cash replacement policies, all the way up to the data center. But you've also like straddled multiple worlds, being in academia, being in industry. So perhaps you can talk about how we have managed to navigate these transitions, how we have managed to be flexible as the entire ecosystem sort of changes around you. Not easy question to respond to,

Starting point is 01:01:59 but then I guess if I were sort of summarizing it into a few sentences, I guess I think it's important to sort of embrace where you are in your life right now. And I felt like looking back, I have enjoyed being in academia, have the freedom to explore is one of the best aspects of my academic career. And you get to explore and advance technology with some of the best young minds that's developing their own career. And I think that's a really great way

Starting point is 01:02:37 to sort of go through my career. Now being in industry, especially at VEYer, I feel like I'm in such an exciting environment that I get to tackle some of the most important real-world challenges at the intersection between systems, AI, machine learning, energy, and sustainability. And that is such a tremendously valuable opportunity that I treasure and I focus on these days. And I get to do that with world-class researchers as well. And so recognizing where we are, I guess there is a lot that we can focus on to make impact. Maybe I can push you to sharpen your message a little bit,

Starting point is 01:03:22 like any words of wisdom that you will give to our listeners. We have students, industry professionals spanning early career to further on in their careers. So any words of wisdom that you would have or advice to our listeners of the podcast? I would say once in a while, find something that's new and that you are passionate about and spend extended amount of time focusing on solving that problem. For me, I guess it was roughly like five to six years,

Starting point is 01:03:55 find a new direction that excites me and I get to focus on solving that problem for, you know, five to six years and then new problems will come up, you'll have the opportunity to figure out what is that particular topic that you want to focus on. And I guess as you are doing that, I think it's really important to keep in mind that solving these problems with a group of people that I enjoy working with.

Starting point is 01:04:27 And so definitely take the opportunity to also develop others during the journey and bring people with you. And that's something I've been enjoying doing, I guess, when I was in academia and also now when I'm in research labs in industry. And then finally, I felt like time is so precious. There is a lot of happening in every one of, you know, every, you know, single ones of our lives.

Starting point is 01:04:56 And so sort of spend the time intentionally on what is the most important problem that deserve your attention and do that with, you know, the group of people that you are working with, and move together, and make the impact that you can to the world, is something that's very meaningful to me. Yeah, that's wonderful, because I think a lot of times, as there's all sorts of discussions about how to get more people interested in computing, of course, at this stage of discussions about how to get more people interested in computing of course at this stage of our careers lots of people are interested in computing but part of the question has been you know how do we make sure that the general image of what we do

Starting point is 01:05:36 is not like a bunch of people with coke bottle glasses sitting in basins typing on computers and like just hacking or right and it turns out that there are many sorts of real world problems that can be solved but through our efforts and the fact that you can find such meaning in your work your technical work is fabulous alongside the fact that of course you're you're a leader in the field and now as a as a director at madhav you lead wonderful people and grow their careers as well as your grad students back when you were in academia. So it's been really cool to catch up with you, Carol, and hear how things are going. And this latest project sounds really exciting and really, really impactful.

Starting point is 01:06:20 And thanks so much for having me here and giving me the opportunity to share my two cents and thanks for the great conversations with you, Lisa and Subhne. Yeah, echoing Lisa, thank you so much for being on the podcast. It's been wonderful speaking with you about this very important topic. And to our listeners, thank you for being with us on the Computer Architecture Podcast. Till next time, it's goodbye from us.

Computer Architecture Podcast - Ep 16: Sustainability in a Post-AI World with Dr. Carole-Jean Wu, Meta

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.