Computer Architecture Podcast - Ep 13: Energy-efficient Algorithm-hardware Co-design with Dr. Vivienne Sze, MIT

Starting point is 00:00:00 Hi, and welcome to the Computer Architecture Podcast, a show that brings you closer to cutting edge work in computer architecture and the remarkable people behind it. We are your hosts. I'm Souvenay Subramanian. And I'm Lisa Hsu. Today we have with us Vivian Zee, who is an associate professor in the EECS department at MIT. Vivian is recognized for her leading work on energy efficient computing systems spanning a wide

Starting point is 00:00:26 range of domains from video compression to machine learning, robotics, and digital health. She received the Darba Young Faculty Award, Edgerton Faculty Award, faculty grants from Google, Facebook, and Qualcomm, and a Primetime Engineering Emmy as a member of the team that developed the High Efficiency Video codec standard. Today, she is here to talk to us about energy efficient algorithm hardware co-design for compute intensive applications, including video encoding, deep neural networks, and robotics. A quick disclaimer that all views shared reflect the views of the organizations they work for. Vivian, welcome to the podcast. We're so glad to have you here. Thank you for having me.

Starting point is 00:01:21 Yeah, we're excited to talk to you. And what's getting you up in the mornings? Okay, so I would say if you say literally what's getting me up in the morning, I recently got a milk proper. So I'm making these matcha lattes at home. Very excited about that. And you know, nice upbeat music from the work. And I'm super excited about, you know, meeting up with my students and collaborators and brainstorming and working together and then learning a lot from them. Also, since we're just kicking off the semester here at MIT, learning a lot from teaching. So right now, this semester, I'm teaching with Joel Emmer on this hardware for deep learning class. I know we'll get into it, but like this space moves very quickly. And so often we want to update our slides, but we often try to be very conscientious about trying to identify principles in this space that

Starting point is 00:02:03 we can distill down to lecture material so that the material doesn't have to change from year to year. So we just went through that exercise or still going through that exercise together. So that's a lot of fun. And also just learning from, you know, thinking through a different challenges that we have these days in society looking at, you know, where can we use whatever skill set we have to address some of the challenges. So most recently, looking at things like sustainable computing, and trying to think about whether or not, you know, the work that we do with energy efficient systems, at the small scale for these battery operated

Starting point is 00:02:36 devices can have impact on these large scale challenges. And yeah, just learning a lot from the people in the community, like Carol Jean Wu, Udi Gupta, and Bobby Mann who've been doing a lot of work in this space and trying to think about, you know, whether or not we can also make an impact from an academic side or academia side given our lens in energy efficiency. So you were talking about a lot of things right there with, and I'm interested in particularly about this class that you're talking about because as it stands things are developing really really really fast in this space and one of the things that we know is that like hardware develops

Starting point is 00:03:10 really rather slowly but the AI algorithms and things develop really quickly and distilling down to principles like that does seem like the best way to sort of have a structure and going forward how much of those principles do you feel like then carry over into things like energy efficient, uh, hardware design, just because, you know, if things are changing, like, are there things that you can carry over so that the, the principles you carry in the hardware design piece also can hold for at least longer than, you know, six months or something? Yeah. So I think in general, I mean, at least my

Starting point is 00:03:46 perception of our research is always trying to find these principles, because I think it's important to, let's say, build an energy efficient, you know, chip or whatnot. But like, I think it's more what we learn from that process of, you know, developing either the architecture or the chip itself? Are there key ideas that can then be used for other future designs for even other applications? Are there ways to kind of generalize those principles? So I mean, like, maybe I just rattle them off now, some of them are, you know, things like, obviously, we've seen a lot in particular in the

Starting point is 00:04:23 deep learning side of things, you know, things like looking at data movement through efficient data flows and exploiting reuse. Those are some principles that, you know, regardless of the type of DNN you might design, those are things that you might want to, you know, think of in the, you know, in the context of maybe we'll talk about video coding later. But like, you know, thinking through in kind of the co-design space one of the challenges might be things like parallelism and so trying to even though you can do that easily in hardware those are some things that might be challenging on the algorithm side particularly if you're trying to achieve a certain quality of results so trying to i think kind of understanding the principle there but then trying to fit it for a new application but still distilling some key kind of idea there i'm trying to fit it for a new application. But still distilling some key idea there. I'm trying to keep it general, but I

Starting point is 00:05:08 think as we go to the specific applications, it becomes more concrete. No, that's a good overview. Maybe we can expand on one of the themes that you talked about in video encoding, for example. We talked about the trade-offs between parallelism, which, of course, lends itself to better hardware efficiency because you can use many of the cores available on your chip. But in some of your early work on video compression, I think one of your observations was that advanced algorithms that you use in video encoding are typically more challenging to parallelize because they try to remove redundant or wasteful work in the effort to get more work efficient.

Starting point is 00:05:45 But in order to do that, they end up enforcing a sequential dependency. So how do you think about the trade-offs between work efficiency, which typically comes with an accompanying amount of sequential dependencies and so on, and hardware efficiency, where typically, if you have parallel computations, you can execute them in parallel and therefore get better speed

Starting point is 00:06:03 ups or better performance? KURTUL KOTAVANI- Sure, Yeah. And I don't know if I'll exactly answer that question, but I think, let me just take a step back in terms of understanding video coding. So give some context. So the, the way in which we compress videos, the like how a video is actually compressed as you're trying to remove redundant information in the video. So for example, if you took it, look at two frames, they could be very similar, there's a lot of temporal redundancy. And so what you would do is, you know, you could predict some of the pixels in one frame based on pixels in the previous frame. So for example, imagine the background, right, doesn't really change, you can just say, Oh, copy these pixels over. And then similarly,

Starting point is 00:06:37 within an image or frame itself, neighboring pixels within the same image are also very dependent. So if you can imagine like, you know, a white wall, right? So like all the pixels, you can predict it from the neighboring pixels, you don't have to send them. And so as a result, you can compress very well with this, you just have to tell them, oh, where do you predict the pixel from? And if you, if there's a bit of error, what is the error, and that tends to be compressed very well. That's great for compression. And really, you know, the whole goal of video compression is to find all this, you know, redundancy and then do the prediction. Of course, the main challenge is that,

Starting point is 00:07:11 as you just mentioned, Souvene, hardware design parallelism is very important for speed. It's also very important actually for energy efficiency. So how I came about this, my PhD was my PhD advisor, Anantha Chandrakasan was really focused on how do you build like very low voltage systems, low voltage means slow. So in order to, you know, make up for the speed, you want to paralyze things. So if you do parallelism, you can also go be much more energy efficient, nonetheless, anyway, so when we look at video coding, it seemed very exciting. But at the same time, the challenge there was that because the algorithms are getting more and more advanced, there was more and more dependencies that were being introduced in order to do or to achieve the compression.

Starting point is 00:07:54 So it became very, very difficult to parallelize. was trying to find ways to break this depend or you know decouple this dependency without sacrificing compression efficiency so you have to have like a deep understanding of the impact of first of all what part of the system could benefit from paralyzation and then in that particular part could you kind of break some of these dependencies or feedback loops in such a way that you still maintain your ability to compress well, but then also be able to paralyze and run quickly? Yeah, so do you see any parallels between that sort of mentality and a lot of the work that you're doing now with things like DNNs? Because, you know, there are, in many ways, there's a lot of parallelsism but then there's also dependencies that you have to take care of and then you know we're seeing work now in the field about you know how to maybe break some of these backwards loops or do some sort of predictive type stuff so that you don't have to necessarily wait all uh wait such a long time to have

Starting point is 00:08:57 everything come backwards oh and one other thing is that you know back in the days when i was like learning about things like media compression it was all all about memory too. Like in terms of a lot of the power consumption that is being used, the energy that's being used, it's like, okay, I got to move this, ship this data around. So if you can save having to read and write to memory, then you can, like you've basically won. And that we hear a lot of the same sorts of themes happening right now in these kind of DNN and ML and AI type workloads, which is we have to ship all this data around, like let's prefer not to read, write, restore and save all this stuff all the time.

Starting point is 00:09:33 And that's how you, that's like the big low hanging fruit in terms of the energy consumption. And yet it's really hard to do. So how has your work in the compression type things served your work now that you're doing now in these TNN type workloads? Yeah, so I think there's a couple of things.

Starting point is 00:09:55 So one is maybe I should mention like the similarities and the differences between the two domains. I think so the similarity is certainly you're still at least looking from an input perspective. When you look at video, it's a very high dimensional or very heavy input workload. So there's certainly a lot of kind of, you know, data movement in terms of data access there. I think the difference,

Starting point is 00:10:20 so in deep learning, you would also have the same thing if you're, let's say, processing images or video. I think there's a couple of main differences so first in video compression most of it is very standardized so you can very much like very very hard code your hardware so meaning that in these video uh compression and decompression algorithms you have you know sometimes you have filters and transforms and all the weights and the coefficients of these filters and transforms and all be hard coded so it's very um simple like very like it's just like very efficient and so the the main challenge there is you're really trying to because it tends to be that the hardware for these video codecs are very

Starting point is 00:11:01 hardwired because they have to be very efficient if you want to do HD real time, which we're all doing these days. You know, the hardware is very specialized, but then also there's not very much sharing of resources of things like, you know, a lot of the memory sometimes is very dedicated to this hardware. So you still have to think about if you want to even have, you know, a couple kilobytes of memory and so on, it's just gonna eat up hardware space. It's very dedicated. So you do still want to minimize the cost of that chip. And so even internally, you might do some compression

Starting point is 00:11:36 within the compression accelerator. So it's kind of a little bit meta. On the deep learning side of thing, how it's very different is that you require a lot more flexibility and you have things like the weights and stuff that are much more part of the workload as well. It's not necessarily just the incoming data. So as a result, there's a lot more data that's moving around in the system.

Starting point is 00:11:59 And so, as Lisa pointed out, data movement is then really key. You really need to think about how to minimize this data movement in order to achieve both energy efficiency and high speed. Of course, the other thing is that it has to be also very flexible because you want to be able to support a wide range of deep neural nets versus in video compression, you typically have one chip dedicated to one standard and you don't need that amount of flexibility or one choice one ip on one chip sometimes these uh chips have multiple ips for different standards so when you start thinking about both flexibility and data movement i think it becomes an

Starting point is 00:12:36 interesting challenge and that's why uh for the work i've been doing with uh joel emmer and then also our student yushin ch, we were primarily focused on looking at efficient data flows where you can be very, we can exploit a lot of data reuse, but we also wanted to have a solution that was very flexible in the sense that regardless of the shape of your neural network, whether it be number of layers, the shape of each layer,

Starting point is 00:13:03 we should be able to find a way to optimize that data flow on our hardware to minimize the amount of data movement. And we needed to account for all data types, not just the inputs, but also the weights. And so I think that's kind of a little bit of the difference between deep learning and the video coding side. Of course, efficiency and compression, I mean, you can apply compression also in deep learning as well, also play an important role. But I think it's more of a tension of efficiency versus flexibility and the variability in terms of the problem that you're trying to solve. Right. I think balancing the flexibility while getting the performance and

Starting point is 00:13:44 efficiency that you want in DNNs has been one of the challenges, especially since the space also moves very rapidly. Expanding on the compression theme itself. So in video compression, as you said, you know, you have different frames. There's a very well understood theory behind what kind of redundancy are you trying to exploit in the frame? And how do you sort of go about systematically designing a compression algorithm to take advantage of those properties. Now if you context switch to DNNs, there are a variety of compression techniques ranging from Vanilla Huffman encoding style compression to things that are more model specific like quantization and more recently things like sparsity which introduce zeros which you can

Starting point is 00:14:20 eventually compress or use to save in terms of compute as well. You can skip the zero compute and so on. So in the DNN realm, it looks like we don't have that strong a theory on how these techniques actually work, like quantization, compression, and so on. And they can have a material impact on the model quality. So how do you think about these techniques that trade off accuracy against performance in the context of DNNs. Is there any differences from the video compression space? And what are the principles behind how do you exploit these techniques in the DNN context? Any pitfalls, any things that people need to watch out for as they employ these techniques

Starting point is 00:14:58 in the DNN space? KASIMA YUSUFENGALALE- Yeah, I think that's a really great question. It's a very challenging question, actually, because, yes, you're right, in the video compression space, it's much more grounded in theory, a lot of signal processing theory. I think the only thing that's kind of a little bit unsettled and harder to get one's hand around is more, you know, when you look at the quality of the output, there's some, you know, human visual perception aspect of it that's a little bit harder to manage but in general in terms of why we're doing each step there's a principle behind it um we have what i think there's like a lot of challenges in deep learning space that i think people are still trying to figure out like you know why do i think there's a lot of work on like the science of deep learning why does it even work and so on and so if it's hard to understand

Starting point is 00:15:42 why it works or how to kind of debug it it's harder to get like a very grounded approach in terms of determining the implications of how you would change the neural network and how it would impact the accuracy like that relationship is much um weaker or what much less clear and so there's a lot of i guess unfortunately a lot of ad hoc and exploration that has to be done in order to kind of, or it's very empirical in terms of figuring it out. There's been two ways that we have approached this. One is more on just understanding from the efficiency perspective. By efficiency, I mean like energy efficiency and then also speed. of how do you, when you apply these techniques that you've outlined, which is, as you mentioned,

Starting point is 00:16:26 quantization, pruning with sparsity, there's just like a compact network architecture, that was the other one, I could come in with like smaller models, how do those impact energy and latency? I think that's obviously an important thing to look at. I think a very kind of first order that people have done in the past is primarily evaluate a neural network in terms of the number of operations and the number of weights. And I think that gives you some idea of the complexity,

Starting point is 00:16:55 but I think what's more important is to try and actually look at the metrics that we care about from a hardware perspective, so energy and latency, and use those specific metrics to drive the design, at least of the neural network itself. So for example, often one might associate the number of operations with latency, for example, but we know that, you know, as computer architecture designers, you know that, you know, it also matters the utilization of your hardware.

Starting point is 00:17:24 So some types or some shapes of the neural network might not map as well onto the hardware. So even though you have fewer operations, you might not get the speed up that you expect, right. And so really having kind of the hardware in the loop there to really kind of drive the design choices that you might make in terms of, you know, your layer shapes, for example, could really be helpful. We had done some work on this with respect to this work called NetAdapt that really put the hardware in the loop in the sense that you would take a neural network, measure its latency and energy, and then use that as an input to iteratively modify the neural network

Starting point is 00:17:57 so that you would hit those energy latency targets. Another prime example of how people simplify the neural networks is through the process of pruning. And so that's basically you set some weights to zero, you remove some of the weights. Traditionally, the approach there is you try and remove the weights that have small magnitude, but just because the weight has, you know, like the number of weights that you move, for example, or the magnitude weights is actually no indication in terms of the impact on energy. In fact, you also should think about things like, you know, the data movement cost, how often that weight is being reused, and also of course the feature map information. So, you know, some work that we did in that space, primarily energy aware pruning, is trying to

Starting point is 00:18:34 kind of use the energy cost to drive the decision in terms of, for example, which layers to prune. We want to prune the layers that consume the most energy first. And that would allow us to get a better trade-off between energy efficiency or energy consumption and accuracy. And so that at least gives us a better trade-off, but in terms of getting the insights, it's still very challenging. On the accuracy point of view, I think that's kind of what motivated me to look at more, and motivated the collaborations looking more the robotics and autonomy aspects in the healthcare space, because there's also this aspect of, you know, if you achieve a certain accuracy on, you know, let's say ImageNet, and you, let's say you made

Starting point is 00:19:14 this trade off, and you drop the accuracy by 1%, what does that actually mean? Like, is that meaningful a 1% drop? Is that big, not big? Is that, you know, so we wanted to look a little bit deeper in the pipeline. So if I was using this neural network to navigate from point A to point B, then I can actually tell if this tradeoff in terms of accuracy is meaningful or not. Or if I'm using this neural network to do some eye tracking for some neurodegenerative disease tests, then I actually have a very concrete um metric of accuracy because you want to you know know how this might impact you know a certain diagnosis and so on so i think from an accuracy evaluation point of view it's good to look at the very specific applications but i completely agree that it's challenging when it comes to neural networks is not as where we don't have a good enough understanding at this moment still of the relationship between complexity and accuracy

Starting point is 00:20:09 and then the accuracy for various applications i was actually curious about that accuracy friend that you're talking about because you're a little bit transitioning into the robotics and the health space there because you know in some sense um like on ImageNet you know you can achieve accuracy of say you know 99 or 100 percent these days like 100 percent right but then but then for something like a diagnosis and the kinds of work that you're dealing with you know and one sense the diagnosis is binary like yes you know we think you have neurodegenerative disease X or no we don't but then maybe behind it, there's some sort of threshold

Starting point is 00:20:47 that comes to like 96, 97% likely to have it or something like that. So in your mind, when you think about the accuracy on these kinds of domains is the accuracy in the binary side, where obviously a 1% error of yes or no is a really huge sort of actual impact to somebody's life, or perhaps like in the, in the, in the layer underneath, where you might be a little bit off,

Starting point is 00:21:12 but then you're, you're, you still meet the threshold, and you still, the diagnosis stays the same, because it's 96 instead of 95. Yeah, I think it's a really good question. I think, so then the question was like, how is the neural network being used for this particular application? And let me just use the robotics one as an example. First, it's a little bit more fleshed out because there's a lot of complexity in the health space, but then the robotic space, I think, you know, some of the things that you use a neural network for is for perception or for tonic navigation. So can you understand your environment? If you use a neural network to detect whether or not like how far an object is, or if there's an object there, the test there is, you know, how, you know, how likely are you going to crash into something basically, to get from point A to point B. And I think often a lot of these things are probabilistic, or if

Starting point is 00:22:03 you should model model in a probabilistic manner. I don't think it's always yes or no completely. And so in fact, actually, some of the work, this is in collaboration with Sertesh Karaman, who's a roboticist here at MIT. So our student, Samia Sudhakar, has been looking at the implications of uncertainty. So there's actually this whole field of looking at,

Starting point is 00:22:22 a neural network will give you a result. But the question is, like, how confident are you on this result? What is the uncertainty around this result? You can't just say, oh, this is like this object is this far away. Like, are you sure? Are you not sure? We should also know that. So there's a whole field of uncertainty. And so I think that part of trying to also measure the uncertainty of the neural network can then help to inform whether like how seriously you should use the output of the neural network in you know in the given tasks that you're doing. On the healthcare side of things I think that's still obviously these are right when you go into the healthcare space it's much a much more long

Starting point is 00:23:00 term thing and that's in collaboration with Thomas Heldt who's another faculty here at MIT. There we were trying to do some eye tracking work and like you know basically depending how quickly your eye reacts to certain stimuli you know there is a correlation between that and certain neurodegenerative diseases like you know Alzheimer's, Parkinson's and so on. But there I think the aspect is of course you know depending on the lighting and depending on various other factors, your measurement, like the accuracy of your measurement can vary a bit. The key idea there is that, you know, if you can do this, rather than doing this in a clinic with a very expensive machine, if you can do it at home with your iPad, you can collect a lot more measurements. So it's not like, you know, going in to see a specialist once a year to get this measurement, but you can do it more frequently. And though, you know, maybe the measurement is more noisy at home, but you're collecting it

Starting point is 00:23:52 over time, then you can use the longitudinal thing and the multiple collections to kind of address a bit of the noise that you might get in the measurement. But again, it's really like how you use it. So you can imagine, I mean, this of course is still very long term, but like, you know, how you use it, and that might be saying like, oh, if you if you can measure the uncertainty, or if you can collect more data, then you can do a more efficient trade off between or you can trade off a little bit more accuracy, or but if you, you know, if you use it as like, this is the final decision for everything, then yeah, you might have to really increase the complexity and max out the accuracy.

Starting point is 00:24:30 So that's kind of like these are stuff like just using it in the field for these particular tasks gives us a little bit more insight in terms of the tradeoffs. Yes. Yeah. Thank you so much i think that that helps because then now you have a concrete use case and that kind of informs because yeah how much accuracy is okay to lose how much uh parallelism or how much energy you do you really really have you know like what in some of our previous guests with physically constrained systems like this is all the energy you have boom period like that's it so that gives you a hard bound and it sounds like this is by going into a specific use case that helps give you a hard bound to with as a lens to look at your trade off space. Right. And then you could also. But because you're doing something, if you're looking at what you're doing with the results, it could also, I guess, loosen that down, too. Right. So depending on if you're averaging more results or if you have a good measure of uncertainty i think then you could it's not all or nothing it's not it's so it's not ends up being

Starting point is 00:25:30 not all like just based on a data set you like there's a certain task and there's things in a task that you can cut corners on and things you can't and also but in a task you also have many more other options in the space that you can use to address um like the neural network is not the main thing kind of thing. Then you have a better idea of how you can trade off other things to address any computational challenges in your neural network. So you have different design space of neural networks is quite large. You can prune them. You can quantize them. You can choose different architectures and so on.

Starting point is 00:26:07 But moving to the end-to-end stack, it's not just a deep neural network. There are other algorithms that come into the end-to-end application. So how do you think? And in many cases, there can be many different algorithms that you can potentially use to solve the same task. So how do you think about sampling this space?

Starting point is 00:26:23 For example, if you look at autonomy, I'm sure there are many, many algorithms for each individual step in that pipeline. So how do you think about the design process for just pruning those entire space? You have a design space in terms of the hardware parameters. You have a design space in terms of the algorithms that you can potentially use and a design space in terms

Starting point is 00:26:41 of how you want to implement this, what are the different technologies that you can use to implement it. So what does the process look like in any of these cases, right? Like deep neural networks or in particular for end-to-end systems designed for specific applications? Right. I guess there's end-to-end and then there's also full stack. So I'll try and answer both. But I think the way that generally speaking, I approach looking at building energy efficient systems, I guess the first question is always, you know, what what is the

Starting point is 00:27:09 bottleneck? Or what is what is the main driver of the energy consumption? So is it you could start always looking from a hardware perspective? And is it like that existing state of art, say the art solutions for this application, the hardware that they're using, you know, just is not efficient. Like, you know, you can, we have all these low power hardware techniques, so, you know, parallels and memory, like, you know, efficient memory hierarchies, data flows and stuff. Is that something, is that, can that problem be addressed there? But then

Starting point is 00:27:39 sometimes, as I mentioned with the video coding space, and even some of the work in the robotic space, the problem, you're limited by the algorithms that are running that are actually doing the you know the you know completing the tasks and so if you don't have that much parallelism or if your algorithms require you to use like you know a huge amount of data and there's going to be a huge you're going to need huge memory I mean typical thing in the robotic space is you know you're gonna need huge memory. I mean typical thing in the robotics space is you know you're building these maps and these maps can be very huge right and so um is the issue more the representation of that map so maybe then you should address it from an algorithmic standpoint right so can you we have a student Peter Lee who's been looking at oh can we

Starting point is 00:28:19 you know represent a 3D space with these like Gaussian mixture models which is going to be much more compact than, you know, doing like kind of a voxel 3D space type of thing. And then of course, the challenge there is you want a very compact representation, but you don't want the computing to generate that compact representation that also be very costly. So, so that let let's just let leads me to start thinking about the algorithm side of thing. And of course, then when it comes to

Starting point is 00:28:44 algorithms, you're going to impact the quality of results. So there, that's where it's really important to collaborate with domain experts, right? So that's actually what led to the collaboration with Sirtesh, because kind of like we want to see, I think, first of all, at that time, we want to see, you know, is efficient computing really critical in the robotics space? Like, what is the role it can play there? And then, as you mentioned, there's many solutions. I mean, robotics is a huge field, right? To get from point, even like the navigation test, there's many different ways of approaching it.

Starting point is 00:29:13 So kind of learning from and working with Sirtaj trying to understand, you know, what are the different approaches people take? What are the quality metrics that they care about? So like, you know, like what are the environments that are really, you know, stress these type of algorithms. So really understand how they define what is a good quality algorithm.

Starting point is 00:29:33 Cause we don't want, you don't want to, you know, design something that's super efficient, but then it doesn't do anything meaningful. Then it's not very useful. So, and then, you know, given, you know, this range of algorithms, then we start, this is the fun part of learning about the algorithms and thinking through, oh, well, you know, given, you know, this range of algorithms, then we start, this is the fun part of learning about the algorithms and thinking through, oh, well, you know, this algorithm might have a lot of memory or this algorithm might, you know, if we made this small change to the algorithm, it could be much more efficient this way. So it's kind of then, you know, collaborating on trying to learn about the space, but also to educate your collaborators on the space so you can together converge on a set of algorithms

Starting point is 00:30:05 that you think are more efficient. And then actually more recently, it's also important to think also at the systems point of view, because there's always this question on, especially in the robotics space, is it really the computer that's building a lot of energy or is it the sensors or is it the actuation itself?

Starting point is 00:30:20 And then, so understanding kind of the interplay between all those things. So, for instance, on the sensing side of things, often, you know, things like sensing depth is actually very expensive because you can like send a pulse out and wait for it to come back. And so if you can actually pay more compute and do something like a very common thing these days is to take regular RGB camera, run a neural network on it, and you can predict depth based on that. Like it's kind of like monocular depth estimation. Could you use that to replace these, you know, depth sensors? So obviously that's a lot more compute, but then there's interesting trade-off between compute and sensing.

Starting point is 00:30:56 I was just going to make a joke that we should definitely do the less accurate thing for autonomous driving. Get rid of LiDAR and just like using some inocular um uh depth sensing that you were saying that that would that will sell cars i'm sure well actually so okay it's a little bit divergent so speaking of the self-driving cars we actually so as i mentioned earlier we were i was interested in sustainable computing and we had talked to a lot of great people who were looking at that from the perspective of cloud computing.

Starting point is 00:31:29 But we were also just wondering often, so my big thing is I want to make sure our research is helping things, not making things worse, particularly for sustainability. So one question was, for autonomous vehicles, the amount of compute you need to have to do on it, is that also going to have a very large carbon footprint. And so it turns out like so, autonomous vehicles are great, but we have to be mindful of the

Starting point is 00:31:51 compute, because it turns out if you think about the amount of compute that you have to do for an autonomous vehicle, which of course is very far out and the algorithms are still you need to be developed. But if you just think of the number of sensors on a vehicle for to understand its environment, and if we hypothesize that you'd be running DNN3GD sensors and it turns out if you drive you know a vehicle one hour a day and you consider you know the number of vehicles that we have in the world and if a large portion of them are autonomous vehicles the amount of compute that would be necessary would actually be

Starting point is 00:32:23 comparable or more than what you have in data centers today right just the scaling factor and so the main takeaway from that is it's great to enable you know autonomous vehicles i think it's very valid there's a lot of societal benefits for that but it's really important to think about the compute too because even though it doesn't seem like a lot of people get one vehicle as you scale it up, it can be quite substantial. Interesting project we're looking at. So we're trying to look at this because we're really curious to see if our small scale stuff also flies at large scale.

Starting point is 00:32:53 So what I'm hearing you say is we should get rid of the pulses so that we can make it cheaper. I'm just kidding. I'm just kidding. No, no, that is really interesting. Yeah, because in a lot of ways, it's kind of like the reverse of the shift to data center, right?

Starting point is 00:33:10 Because in the old days, you know, each individual person had to decide whether or not the battery on a particular laptop was like good enough for them. And like each individual person doesn't have enough scale to really necessarily like move the industry. But suddenly we start packing thousands and thousands and thousands of them into these buildings and then we're like oh my gosh

Starting point is 00:33:27 we have to think about the individual like the small thing because we're timesing it by a million right and now it's kind of like now you know with the cars we're just like oh it's just a car there's a couple of little chips in there whatever but then what you're saying is like yeah there's there's billions of cars well actually how many cars are in the world yeah so there's 1.2 billion vehicles out there so um you know we were just trying to model some scenarios in one particular scenario if you assume that you know 1 billion of them let's say would be autonomous then yeah you would hit this issue where you're comparable to uh the power of data centers i think if it was something like if the compute per vehicle is around 800 watts.

Starting point is 00:34:08 It's not too much if you think about the power that we take to run deep learning, specifically if you had so many sensors around your vehicle. You just see like 360 right around your vehicle and stuff. So it can be quite significant. And I think the interesting thing about this was also, you know, on the data center side, it's kind of you can swap your, I mean, I'm not a data center expert, so you guys can call me, but like, you know, you can like upgrade your hardware, and so on.

Starting point is 00:34:38 But I think also the challenge there is that you also have like a wide range of workloads. On a self-driving car, I think you can be much more specialized in terms of like the tasks that you're trying to do because you know it's going to be in a vehicle. But at the same time, if I understand correctly, in the data center, people change out their compute every like three, four years or something. But on a vehicle, usually these vehicles are supposed to last like 10 years or more. So it's not as easily upgradable. And so there's a question of how should you design the hardware for a self-driving vehicle because it needs to last longer. And you do have, maybe it is a good match for specialized computing because you do have a narrower set of tasks. But at the same time, if you update the algorithm, you want to be able to push that to the vehicle as well. So

Starting point is 00:35:22 it's an interesting challenge to think about. Really fascinating set of considerations, definitely unique to the self-driving car space. And I think it's interesting to see how this will evolve. I did want to touch upon one of the themes that you brought up, which is you've worked on a wide range of domains and gone into substantial amounts of depth

Starting point is 00:35:43 to figure out how can you find that next domain of efficiency and how do you relate them back into the hardware context. I wanted to ask you, how do you think about problems that span multiple boundaries, maybe things that are outside your immediate field of expertise, and how do you go about learning and collaborating effectively with multiple domain experts in the many projects and the many ideas that you've developed over the years? Yeah, so I guess it's a question more about the process. So certainly, it's fun to do a lot of reading,

Starting point is 00:36:19 learn about the topics. But I actually get the most enjoyment interacting with people. So I like to go learn about a given topic from a particular person right so I think in all of these cases both with Joel so with Joel it was more like okay I want to focus on efficiency I know that he's done a lot of you know coming from the computer architecture community he's you know knows a lot about flexibility and programmability so kind of bringing our expertise together to solve this problem is really fun

Starting point is 00:36:46 because I get to learn more about the architecture space. I hope he gets to learn more about efficiency. In the robotics space, same thing with Sirtesh and the healthcare space thing with Thomas. So I think the high level thing is always, what is important problem to solve? And then, at least for me, it's like, what is my from my skill set? Will that make an impact there? My skill set is first from the lens of energy efficiency. And then afterwards, going a little bit deeper and trying

Starting point is 00:37:20 to think at what level of the stack we should be solving at. And then if I'm not familiar with that level, find people that I would like to work with and are excited to learn from about that level and work together. And we're also equally excited about working together to solve that kind of a problem. So I think that the high level bit is trying to identify the right problem

Starting point is 00:37:43 and then finding the right people to work with on it. And then of course the high level, the challenges are always trying to find a way to efficiently or not effectively communicate across the different domain boundaries. So as we know that in each domain people have their own little language about their topic and sometimes it's expected at the beginning it's going to be kind of a bumpy road to ramp up but I think if both parties are equally engaged in trying to communicate to the other person so that they can understand and trying to you know distill down the concepts then I think it's in you know an effective way to learn but

Starting point is 00:38:23 that is the challenge at the beginning to just try to figure out that you speak the same language, same terminology, define it all very clearly. Yeah, that's definitely a tough one. I've experienced that a lot in any sort of cross-layer collaboration. And I just wanted to point out that throughout this, this discussion that we've been having,

Starting point is 00:38:42 it feels to me like I just jumped out that how many other people's names you've mentioned and so you know it seems pretty clear to me that you would be like a generous and enthusiastic collaborator and that's probably why you have had such fruitful collaborative work in the past because you're just like oh we work with this person and this person that it I don't know at some point it pierced my thick skull just like wow she's mentioned a lot of people um and so so yeah kudos to you because that's not easy. Thank you yeah I think this is what I also tell my students it's good to be generous with credit and also to value your collaborators I

Starting point is 00:39:17 think a lot of these problems and like challenges it's like again like I said for me it's really much the the fun part is with working out with all these people I find actually much more satisfaction saying we solved this problem together than I solved it myself because I just feel like that's the same thing I encourage my good people to collaborate because I think it's a lot more fun like if you bring a lot of different perspectives different ideas and you get both get much more out of it I personally think so but yeah thank you for noticing. That's awesome. And that's a great meta thing to teach to your students too, I think, in my humble opinion.

Starting point is 00:39:49 So kudos again. Maybe this is a good time to wind the clocks back a little bit. Could you tell our listeners how you got interested in energy-efficient computing? How did you get to MIT? What was your journey like? And any highlights from those from the journey

Starting point is 00:40:05 right sure yeah so I guess so I did also do my PhD at MIT so maybe I can start like when you say MIT you mean both times unfortunately I spent a lot of time here or fortunately what happened was after in graduate school I mean undergrads I was not actually to be honest thinking about doing grad school at all, but at the University of Toronto, where I did my undergraduate studies, you could do a 16-month internship to kind of figure out what you want to do after you graduated. So I ended up working at a startup called Snowbush, which was actually started by two professors at the University of Toronto, so David Johns and Ken Martin, and they were looking at, you know,

Starting point is 00:40:45 mixed signal analog design. I had no idea what that was. But I just thought, you know, it's just something cool to do. And as you can expect, a startup by professors is going to be made the employees are all their former grad students. So then I kind of got a what a feel is like for working with fellow graduate students. And I mean, I think also what was very clear to me at the time was if you wanted a design role, at least at that company, you would need some kind of higher level degree, some grad school, versus like as an undergrad. Though I was involved with many,,

Starting point is 00:41:15 but primarily the focus was doing the physical design and layout. So I decided to apply for grad school when I guess was a senior in my fourth year. And I was also like, oh, I might as well apply for MIT in the States as well. So then that's how I kind of ended up at MIT. And then at MIT for my PhD, I worked with Anantha Chandrakasan, who's kind of like the world expert on low power circuit design.

Starting point is 00:41:37 And so he had pioneered a lot of these approaches of aggressively scaling down the supply voltage and then developing circuits that can work at very low voltages, like sub thresholds like down to like 0.2 volts and stuff at very low power but then one strategy to medic like if you as i mentioned before i think if you go very low voltage you're also super slow so that's great for applications like the medical space but not so much for you know applications where you need to have some reasonable amount of throughput. I was also personally okay so I love television so I was also very into video so I was like oh it would be like that would be one topic I would want to work on video and so he was like great so try and think about how you would apply these low power strategies to video processing and so I was you know lucky enough to be able to work on that topic. Of course, as a faculty,

Starting point is 00:42:25 found some funding for that, which is great. And so working with another graduate student, Daniel Finkelstein, we designed this H.264, which was kind of a state of the art at that time, video compression chip that operated, I think it was like 0.7 volts, was very low power. So people were very impressed by that. And I think often a question that comes up in the computer architecture community is, is there value in taping out a chip? And there definitely was because we taped out a chip, we reported it and people were like, oh, I don't believe that it's actually that low power. So we actually brought our chip to, I think the conference was in Japan, like a demo for them to see that it is this

Starting point is 00:43:02 low power. So it's like, it gives like, it's not, there's like, we didn't model it or something incorrectly like it actually like we can design it it's true it's like a proof of concept and it was like um very exciting uh but in that process i realized that you know there were a lot of things that were still kind of limiting what we could do especially if you start wanting starting wanted to like improve the albums for better compression so then even as a grad student start to collaborate with Texas Instruments on looking at new algorithms for video compression but with the lens of making it low power I was also very fortunate at that time that the video standards were just like starting to kick off so these video standards happen every decade or so so like you really

Starting point is 00:43:42 have to be lucky to because I mean if it just finished there's no new nothing new to do. But they were just kicking off. So by the time I graduated 2010 and Anantha was actually very I should say was very supportive of sending me to these like kind of preliminary standards meeting when I was a grad student which was great. So even though there's no publication out of it I learned a lot in terms of what happens in these standards committees. And so when I graduated I joined TI and then so with Madhukar Bhuttagavi and Minhwa Zhou, we worked on like, you know, algorithms for the new standard, everything. So it was great. We designed HEVC and that's very rewarding because it's used in a lot of these Apple products and a lot of devices around the world. You know, after the standard

Starting point is 00:44:19 ended, so it takes three years to do a standard, it was like, oh, you know, what to do next? You can either work on the next standard, you can switch products, or you can consider going back to be faculty at MIT. And I think I really enjoyed working with the under or for the grad students who interned at TI over the summer, and Anantha was really encouraging me to consider faculty. And I was also getting really interested in computer vision, in the sense of like, it's one thing to be very energy efficient when you compress the pixels, but it could also be very interesting to see how energy efficient it is to understand the pixels.

Starting point is 00:44:52 Cause we know there's a wide range of applications where you don't need to look at the video. So then I decided to come back to MIT and then that's where like the focus was then on, you know, efficient video processing and that became computer vision. And then it was of course, say the art there was deep learning.

Starting point is 00:45:11 I had, it was just starting off at that time. And I would say, I should also give credit to Vinay Sharma who was at Texas Instrument, who, you know, he was a computer vision guy. We'd always have these like nice discussions about, you know, what's going on in the computer vision space. Like, oh, this deep learning thing seems to be really ticking off this is like 2011. so then when i came back to mi just like oh we should work on this uh space and so that's kind of how the

Starting point is 00:45:32 deep learning stuff started that's a really cool story and it reminds me that one of the things that we had wanted to ask you about today which we haven't gotten to yet, was how your work from your time at MIT for the first time around NTI led to an Emmy. I don't know how many people in our field have an Emmy. Tell us about that, like specifically what was the Emmy for and did you have to get all dressed up and go to a dinner or whatever like on TV? All right. So first of all, let me, I mean, sounds good. I mean, I clarify a couple of things. So the Emmy is first of all for, it's for the entire standards committee that developed

Starting point is 00:46:14 HEVC, right? So it's not a personal Emmy. It's for like our whole committee together. So that was led by Gary Sullivan and Jens Ohm. Yeah, it was just in recognition of this new technology that we've built. And now that obviously they can do it to use it to distribute a lot of video. I mean, like I mentioned, I love television.

Starting point is 00:46:30 So it was like really great to be like, ooh, I never thought, you know, I like consuming television, not producing. So I was like, oh, this Emmy would never happen to us. But then, oh, actually with an engineering approach, you can actually get these Emmys. I will also say it was an engineering Emmy, not a creative arts Emmy. So they're

Starting point is 00:46:45 different. But nonetheless, it was super exciting to go. So yeah, it was like a whole dinner thing. And I forget who the celebrity was, they had a celebrity like who was the host. And it was just actually really rewarding to see all you know, former collaborators and colleagues because the Emmy thing was 2017. I had left TI at 2013, so it had been like a four-year gap since I saw a lot of these people. So it was just a really nice reunion as well. And in general, it's just like very nice recognition. Like actually developing standards in terms of time and effort is actually can be really tiring. We meet like quarterly,

Starting point is 00:47:23 but then we have these really intense 10-day meetings where most often when the meeting's over, most restaurants are also closed as well. It just runs really late. It's really intense. And then even between the meetings, it's very intense. So it's just nice to get that kind of recognition after all of that hard work. But yeah, it was a lot of fun. It was certainly unexpected, but then when I think back, like, oh, like, yeah, I've always liked television. So this is a good alignment with my other interests. That's amazing. And would you say, I mean, because like, so yeah, like you say, right now, like, we can, like, we as a society consume a lot of video. And, and is was this technology, like, seminal and sort of being able to pack a lot of video

Starting point is 00:48:08 into small pieces so that we can consume it at the rate that we do now? Like what specifically, I mean there's a lot of standards out there but not all of them win Emmys. So maybe you can elaborate a little bit more about what it was exactly. Right and I think you know credit to like the whole standards community and the leadership of the chairs to getting this done. Cause you're right. There's a lot of standards and not that many take off in terms of popularity. So I think the first key thing, obviously for video coding standard is to improve compression.

Starting point is 00:48:37 So every standard, which happens, as I mentioned, every decade or so, the goal is to improve the compression by 2X, meaning same quality of video, and then 2x smaller size. And this is, of course, important, because we're all using a lot more video these days. And even though, you know, bandwidth and memory increases, but like the increase in usage is much higher than that. So that's really critical. But then the second thing, and that was a specific focus in this new standard, or HEVC at the time, there's another one that's coming out. But HEVC was, if you think 264,

Starting point is 00:49:10 which is a prior standard that was developed in 2003, how we use video in 2003 is very different from 2013, right? So these days, a lot of the video is consumed on battery operated devices, like whether it be your phone, your laptop, both consumed and I would say created, right? Because you're taking videos. And so there, I think the key thing to make HEVC

Starting point is 00:49:34 also be an effective standard is we considered energy consumption. So basically often there were a lot of things where we had to evaluate the trade off, like, oh, your new idea can give X amount of compression, but we had to evaluate the trade-off like, oh, your new idea can give X amount of compression, but we had to evaluate the complexity of it. Does it also consume however much energy? And so is that trade-off really worth it? Or is there a better way to achieve the same, you know, coding efficiency in a different way that's much less energy heavy or complexity heavy? So by bringing those two

Starting point is 00:50:02 things in, I think the standard that we came up with was twice as efficient in terms of compression, but also the overhead of getting that compression was very low compared to 264. So I think those are the kind of things that really first making sure that your standard does compress well over a wide range of content. So they really have to test for a wide range of content, range of scenarios because how we do video compression right now for video conferencing

Starting point is 00:50:30 conferencing low latency is very different from how you would compress like cinematic content right um where you have a lot of compute time but quality is really key um so like you have to consider all scenarios and then also these days compute as well. So I think being able to balance all of that is what led to a key component leading to the success of the standard. That's a fascinating story and also sort of talks about the changing times and how different standards sort of evolve with the changing times and the huge amount of work effort that goes into enabling these technologies for the larger mass as well. Any words of wisdom to our listeners who will be listening to this podcast based on your experience working on a wide range of domains, collaborating with a wide range of

Starting point is 00:51:15 people, anything that you'd like to share with our listeners? Yeah, so I think there's, I was trying to think about this before. I think there's two key things. So one is if I look look at my own trajectory and this is what I actually tell my own students I think it's really important to work with people that you like to work with who you surround yourself is like so key actually sometimes doesn't even really matter so much the problem although the problem is should be both like you should both be moved or all be motivated to solve it I think I'm so lucky and happy with all the people that I collaborate with. That's, you know, I really enjoy the teamwork and learning from everyone. So I think that's really, you know, very important. And I think the other thing, and this actually comes from like, so

Starting point is 00:51:57 I recently ran this workshop with another faculty member who teaches like a leadership workshop here at MIT. But you know, there's a lot of interesting things. I think this happens in industry. When you become a manager leader, you get sent to these leadership workshops and they teach you things like, you know, how to give feedback, how to lead the team, how to manage conflicts. But I think actually a lot of those skill sets are really important, even if you're quote unquote not a leader. Like even as like a graduate student, like you don't know if you'll be a leader if you're written as a graduate student, like you don't know if you'll be a leader in future, but even as a graduate student, or any student actually, knowing how to give and receive feedback, in particular, I think receive feedback is really important. So we ran a workshop on that recently, but there's actually a lot of, okay, so a couple of things, like this is like, you often feel like these interpersonal,

Starting point is 00:52:44 non-technical challenges are like you're the only one encountering them but it turns out there's a lot of literature and research on this so even stuff like learning and like to give and receive feedback there's a ton of literature on that and so we ran a workshop on that but as it turns out it was actually very timely because for many folks they're getting their reviews back right now from and so on other conferences. And so being able to kind of look at that feedback, and of course, there's going to be good reviews and bad reviews, but being able to look at it and try and distill the useful feedback from that and not take it so personally, it's hard to not take it personally, because you work really hard on it. But nonetheless, being try

Starting point is 00:53:25 to look at it more like this is a growth opportunity. You know, actually getting feedback, in some ways is actually a privilege, not really, like, nobody has to give you feedback, you're kind of, you know, being given feedback isn't someone put the time in to give you that feedback and try and make the best use of it. I think having that lens for the kind of the grad school journey or any journey is actually quite useful. And more broadly, even if you don't intend to be a leader,

Starting point is 00:53:53 although you might be a leader, you might. But nonetheless, like building those type of skills of being able to manage feedback, managing conflict, just these non-technical skills are really important. At every point of your journey, these are not things that you would deal with after you graduate, like you can use them every day. So I think that that would be the other thing to, you know, we're all excited about technical skills and abilities, but non-technical is also really critical.

Starting point is 00:54:23 I 100% agree with that. And I think one of the things that I've come to realize toward, you know, at this time in my career, which is currently paused, as some people may know, is that you get to a point, it's like, you know what, the limiting factor, I mean, of course, people can reach a limit in terms of your technical capacity, right? It's just like, I just don't understand this stuff, right? Like I will never be a string theory physicist or whatever, because like that just not how my brain works. So that at a certain point, you might be able to, you might cap out on some sort of technical aspect. What I didn't really realize is that you may be able to cap out on your interpersonal aspect. Like you might not just be able to communicate your ideas.

Starting point is 00:55:00 You might just not be able to work with enough people. You might just not be able to cope with the stress of having to deal with like, you i don't know uh 500 million dollar budget or something like that and so to get these and i think that is something that a lot of young folks don't necessarily realize it's just like if i just get you know if i just you know break through this technology if i just do this or that or the other then that'll that'll be set for life and that's not the case so the fact that you're teaching this at such a young age, even like running workshops for your students, like that's going to set them up

Starting point is 00:55:28 for a longer term success. Because I mean, coming out of MIT, you're going to have the technical chops, but to be able to like handle all the other stuff is a really, really meaningful lesson to teach to your students. I think so. And I also think like these things

Starting point is 00:55:41 actually inhibit your ability to develop your technical chops because you spend a lot of your time like if i reflect back you know like 90 of the things that keep you up and i are typically these non-technical interpersonal issues as opposed to the technical issues so if you could clear these things out or learn how to manage these things then you could focus more of your energy on the technical aspects of things. And then yeah, the other like I really want to emphasize is just like in, you know, in computer architecture with tons of papers on different like you hit a technical challenge, you can look at papers and read papers, and you're not the first one to hit that challenge.

Starting point is 00:56:16 And these interpersonal things, huge body of research, you're also not the first one to have an issue with your advisor, not the first one to feel upset about a negative review. There's tons of people who have gone through this and there's different ways of dealing with it. So it's good to use that as a resource. Like we're, you know, here we're all researchers, just research this part as well. Like it's all there. I just didn't realize it until, you know, later in life and I wish I had known earlier in life. So that's why I kind of feel like for these younger folks, we should all do it. this actually evokes a little bit you know our previous guest jim keller who obviously has a lot of technical jobs as well as managerial jobs talk to us a bit about how he reads a lot of books about management that's how he like builds these teams that are successful because he's

Starting point is 00:57:00 read a lot of books specifically about that topic and And so it's interesting that you're doing something similar, but sort of pushing it down at the university level already. So that's pretty awesome. I mean, I wish they would do in the high school level. I think everyone has to deal with people and conflict and feedback. So you don't have to be just a manager or professor or whatever. You know, it's been wonderful, wonderful chatting with you. We've learned a lot.

Starting point is 00:57:26 You've had a lot of great things to share with us about all sorts of topics, ranging from the technical and the non-technical. So thanks for being here today. Thank you so much for doing this. I think this is a lot of fun and thanks for just your leadership and all this podcast area

Starting point is 00:57:39 and broadening the reach of this community to the rest of the world. Yeah, echoing Lisa's sentiment here, it's been an absolute delight talking to you today. And to our listeners, thank you for being with us on the Computer Architecture Podcast. Till next time, it's goodbye from us.

Computer Architecture Podcast - Ep 13: Energy-efficient Algorithm-hardware Co-design with Dr. Vivienne Sze, MIT

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.