Computer Architecture Podcast - Ep 13: Energy-efficient Algorithm-hardware Co-design with Dr. Vivienne Sze, MIT
Episode Date: September 27, 2023Dr. Vivienne Sze is an associate professor in the EECS department at MIT. Vivienne is recognized for her leading work on energy-efficient computing systems spanning a wide range of domains: from video... compression, to machine learning, robotics and digital health. She received the DARPA Young Faculty Award, Edgerton Faculty Award, faculty grants from Google, Facebook and Qualcomm, and a Primetime Engineering Emmy as a member of the team that developed the High-Efficiency Video Coding standard.
Transcript
Discussion (0)
Hi, and welcome to the Computer Architecture Podcast, a show that brings you closer to
cutting edge work in computer architecture and the remarkable people behind it.
We are your hosts.
I'm Souvenay Subramanian.
And I'm Lisa Hsu.
Today we have with us Vivian Zee, who is an associate professor in the EECS department
at MIT.
Vivian is recognized for her leading work on energy efficient computing systems spanning a wide
range of domains from video compression to machine learning, robotics, and digital health. She
received the Darba Young Faculty Award, Edgerton Faculty Award, faculty grants from Google, Facebook,
and Qualcomm, and a Primetime Engineering Emmy as a member of the team that developed the High
Efficiency Video codec standard.
Today, she is here to talk to us about energy efficient algorithm hardware co-design for compute intensive applications, including video encoding, deep neural networks,
and robotics. A quick disclaimer that all views shared reflect the views of the organizations they work for.
Vivian, welcome to the podcast. We're so glad to have you here.
Thank you for having me.
Yeah, we're excited to talk to you. And what's getting you up in the mornings?
Okay, so I would say if you say literally what's getting me up in the morning, I recently got a milk proper. So I'm making these matcha lattes at home.
Very excited about that. And you know, nice upbeat music from the work. And I'm super excited about,
you know, meeting up with my students and collaborators and brainstorming and working
together and then learning a lot from them. Also, since we're just kicking off the semester here at MIT, learning a lot from teaching. So right now,
this semester, I'm teaching with Joel Emmer on this hardware for deep learning class. I know
we'll get into it, but like this space moves very quickly. And so often we want to update our slides,
but we often try to be very conscientious about trying to identify principles in this space that
we can distill down to lecture
material so that the material doesn't have to change from year to year. So we just went through
that exercise or still going through that exercise together. So that's a lot of fun.
And also just learning from, you know, thinking through a different challenges that we have these
days in society looking at, you know, where can we use whatever skill set we have to address some of
the challenges. So most recently,
looking at things like sustainable computing, and trying to think about whether or not, you know,
the work that we do with energy efficient systems, at the small scale for these battery operated
devices can have impact on these large scale challenges. And yeah, just learning a lot from
the people in the community, like Carol Jean Wu, Udi Gupta,
and Bobby Mann who've been doing a lot of work in this space and trying to think about,
you know, whether or not we can also make an impact from an academic side or academia
side given our lens in energy efficiency.
So you were talking about a lot of things right there with, and I'm interested in particularly
about this class that you're talking about because as it stands things are developing really really really fast in
this space and one of the things that we know is that like hardware develops
really rather slowly but the AI algorithms and things develop really
quickly and distilling down to principles like that does seem like the
best way to sort of have a structure and going forward how much of those
principles do you feel like then
carry over into things like energy efficient, uh, hardware design, just because, you know,
if things are changing, like, are there things that you can carry over so that the,
the principles you carry in the hardware design piece also can hold for at least longer than,
you know, six months or something? Yeah. So I think in general, I mean, at least my
perception of our research is always trying to find these
principles, because I think it's important to, let's say, build
an energy efficient, you know, chip or whatnot. But like, I
think it's more what we learn from that process of, you know,
developing either the architecture or the chip itself? Are there
key ideas that can then be used for other future designs for even other applications? Are there
ways to kind of generalize those principles? So I mean, like, maybe I just rattle them off now,
some of them are, you know, things like, obviously, we've seen a lot in particular in the
deep learning side of things, you know, things like looking at data movement through efficient data flows and exploiting reuse.
Those are some principles that, you know, regardless of the type of DNN you might design, those are things that you might want to, you know, think of in the, you know, in the context of maybe we'll talk about video coding later.
But like, you know, thinking through in kind of the co-design space one of the challenges might be things like parallelism and so trying to even
though you can do that easily in hardware those are some things that might be challenging on the
algorithm side particularly if you're trying to achieve a certain quality of results so trying to
i think kind of understanding the principle there but then trying to fit it for a new application
but still distilling some key kind of idea there i'm trying to fit it for a new application. But still distilling some key idea there.
I'm trying to keep it general, but I
think as we go to the specific applications,
it becomes more concrete.
No, that's a good overview.
Maybe we can expand on one of the themes
that you talked about in video encoding, for example.
We talked about the trade-offs between parallelism, which,
of course, lends itself to better hardware efficiency because you can use many of the cores available on your chip.
But in some of your early work on video compression, I think one of your observations was that advanced algorithms that you use in video encoding are typically more challenging to parallelize because they try to remove redundant or wasteful work in the effort to get more work efficient.
But in order to do that, they end up enforcing
a sequential dependency.
So how do you think about the trade-offs between work
efficiency, which typically comes
with an accompanying amount of sequential dependencies
and so on, and hardware efficiency, where typically,
if you have parallel computations,
you can execute them in parallel and therefore get better speed
ups or better performance? KURTUL KOTAVANI- Sure, Yeah. And I don't know if I'll exactly answer that question, but I think,
let me just take a step back in terms of understanding video coding. So give some context. So the,
the way in which we compress videos, the like how a video is actually compressed as you're
trying to remove redundant information in the video. So for example, if you took it,
look at two frames, they could be very similar, there's a lot
of temporal redundancy. And so what you would do is, you know, you could predict some of the pixels
in one frame based on pixels in the previous frame. So for example, imagine the background,
right, doesn't really change, you can just say, Oh, copy these pixels over. And then similarly,
within an image or frame itself, neighboring pixels within the same image are also very
dependent. So if you can imagine like, you
know, a white wall, right? So like all the pixels, you can predict it from the neighboring pixels,
you don't have to send them. And so as a result, you can compress very well with this, you just
have to tell them, oh, where do you predict the pixel from? And if you, if there's a bit of error,
what is the error, and that tends to be compressed very well. That's great for compression. And
really, you know, the whole goal of video compression is to
find all this, you know, redundancy and then do the prediction. Of course, the main challenge is that,
as you just mentioned, Souvene, hardware design parallelism is very important for
speed. It's also very important actually for energy efficiency. So how I came about this, my PhD was
my PhD advisor, Anantha Chandrakasan was really focused on how do you build
like very low voltage systems, low voltage means slow. So in
order to, you know, make up for the speed, you want to paralyze
things. So if you do parallelism, you can also go be
much more energy efficient, nonetheless, anyway, so when we
look at video coding, it seemed very exciting. But at the same time, the challenge there was that because the algorithms are getting more and more advanced, there was more and more dependencies that were being introduced in order to do or to achieve the compression.
So it became very, very difficult to parallelize. was trying to find ways to break this depend or you know decouple this dependency without
sacrificing compression efficiency so you have to have like a deep understanding of the impact of
first of all what part of the system could benefit from paralyzation and then in that particular part
could you kind of break some of these dependencies or feedback loops in such a way that you still maintain your ability to compress well, but then also be able to paralyze and run quickly?
Yeah, so do you see any parallels between that sort of mentality and a lot of the work that you're doing now with things like DNNs?
Because, you know, there are, in many ways, there's a lot of parallelsism but then there's also dependencies that you have to take care of and then you know we're seeing work now in the field
about you know how to maybe break some of these backwards loops or do some sort of predictive
type stuff so that you don't have to necessarily wait all uh wait such a long time to have
everything come backwards oh and one other thing is that you know back in the days when i was like
learning about things like media compression it was all all about memory too. Like in terms of a lot of the power consumption that
is being used, the energy that's being used, it's like, okay, I got to move this, ship this data
around. So if you can save having to read and write to memory, then you can, like you've basically
won. And that we hear a lot of the same sorts of themes happening right now in these kind of DNN
and ML and AI type workloads, which is we have to ship all this data around,
like let's prefer not to read, write, restore
and save all this stuff all the time.
And that's how you,
that's like the big low hanging fruit
in terms of the energy consumption.
And yet it's really hard to do.
So how has your work in the compression type things
served your work now that you're doing now
in these TNN type workloads?
Yeah, so I think there's a couple of things.
So one is maybe I should mention like the similarities
and the differences between the two domains.
I think so the similarity is certainly you're still at least looking from an input perspective.
When you look at video,
it's a very high dimensional or very heavy input workload.
So there's certainly a lot of kind of, you know,
data movement in terms of data access there.
I think the difference,
so in deep learning, you would also have the same thing
if you're, let's say, processing images or video.
I think there's a couple of main differences so first in video compression most of
it is very standardized so you can very much like very very hard code your hardware so meaning that
in these video uh compression and decompression algorithms you have you know sometimes you have
filters and transforms and all the weights and the coefficients of these filters and transforms and all be hard coded so
it's very um simple like very like it's just like very efficient and so the the main challenge there
is you're really trying to because it tends to be that the hardware for these video codecs are very
hardwired because they have to be very efficient if you want to do HD real time, which we're all doing these days. You know, the hardware is very specialized, but then
also there's not very much sharing of resources of things like, you know, a lot of the memory
sometimes is very dedicated to this hardware. So you still have to think about if you want to
even have, you know, a couple kilobytes of memory
and so on, it's just gonna eat up hardware space.
It's very dedicated.
So you do still want to minimize the cost of that chip.
And so even internally, you might do some compression
within the compression accelerator.
So it's kind of a little bit meta.
On the deep learning side of thing,
how it's very different is that you require
a lot more flexibility and you have things like the weights and stuff that are much more part
of the workload as well.
It's not necessarily just the incoming data.
So as a result, there's a lot more data that's moving around in the system.
And so, as Lisa pointed out, data movement is then really key.
You really need to think about how to minimize this data movement in order to achieve both
energy efficiency and high speed.
Of course, the other thing is that it has to be also very flexible because you want
to be able to support a wide range of deep neural nets versus in video compression, you
typically have one chip dedicated to one standard and you don't need that amount of flexibility or one choice
one ip on one chip sometimes these uh chips have multiple ips for different standards
so when you start thinking about both flexibility and data movement i think it becomes an
interesting challenge and that's why uh for the work i've been doing with uh joel emmer and then
also our student yushin ch, we were primarily focused on looking
at efficient data flows where you can be very,
we can exploit a lot of data reuse,
but we also wanted to have a solution that
was very flexible in the sense that regardless
of the shape of your neural network,
whether it be number of layers, the shape of each layer,
we should be able to find a way to optimize that data flow on our
hardware to minimize the amount of data movement. And we needed to account for all data types,
not just the inputs, but also the weights. And so I think that's kind of a little bit of
the difference between deep learning and the video coding side. Of course, efficiency and
compression, I mean, you can apply compression
also in deep learning as well, also play an important role. But I think it's more of a
tension of efficiency versus flexibility and the variability in terms of the problem that you're
trying to solve. Right. I think balancing the flexibility while getting the performance and
efficiency that you want in DNNs has been one of the challenges, especially since the space also moves very rapidly.
Expanding on the compression theme itself.
So in video compression, as you said, you know, you have different frames.
There's a very well understood theory behind what kind of redundancy are you trying to exploit in the frame?
And how do you sort of go about systematically designing a compression algorithm to take advantage of those properties.
Now if you context switch to DNNs, there are a variety of compression techniques ranging
from Vanilla Huffman encoding style compression to things that are more model specific like
quantization and more recently things like sparsity which introduce zeros which you can
eventually compress or use to save in terms of compute as well. You can skip the zero compute and so on. So in the DNN realm, it looks like we don't have that strong a theory
on how these techniques actually work, like quantization, compression, and so on. And they
can have a material impact on the model quality. So how do you think about these techniques that
trade off accuracy against performance in the context of DNNs. Is there any differences from the video compression space?
And what are the principles behind how
do you exploit these techniques in the DNN context?
Any pitfalls, any things that people
need to watch out for as they employ these techniques
in the DNN space?
KASIMA YUSUFENGALALE- Yeah, I think
that's a really great question.
It's a very challenging question, actually, because, yes,
you're right, in the video compression space, it's much more grounded in theory, a lot of signal processing theory. I think the only thing that's kind of a little bit unsettled and harder to get one's hand around is more, you know, when you look at the quality of the output, there's some, you know, human visual perception aspect of it that's a little bit harder to manage but in general in terms of why we're doing each step there's a principle behind
it um we have what i think there's like a lot of challenges in deep learning space that i think
people are still trying to figure out like you know why do i think there's a lot of work on like
the science of deep learning why does it even work and so on and so if it's hard to understand
why it works or how to kind of debug it it's
harder to get like a very grounded approach in terms of determining the implications of how you
would change the neural network and how it would impact the accuracy like that relationship is much
um weaker or what much less clear and so there's a lot of i guess unfortunately a lot of ad hoc and
exploration that has to be done in order to kind of, or it's very empirical in terms of figuring it out.
There's been two ways that we have approached this.
One is more on just understanding from the efficiency perspective.
By efficiency, I mean like energy efficiency and then also speed. of how do you, when you apply these techniques that you've outlined, which is, as you mentioned,
quantization, pruning with sparsity, there's just like a compact network architecture,
that was the other one, I could come in with like smaller models, how do those impact energy and
latency? I think that's obviously an important thing to look at. I think a very kind of first order
that people have done in the past
is primarily evaluate a neural network
in terms of the number of operations
and the number of weights.
And I think that gives you some idea of the complexity,
but I think what's more important is to try
and actually look at the metrics that we care about
from a hardware perspective, so energy and latency,
and use
those specific metrics to drive the design, at least of the neural network itself.
So for example, often one might associate the number of operations with latency, for
example, but we know that, you know, as computer architecture designers, you know that, you
know, it also matters the utilization of your hardware.
So some types or some shapes of the neural network might not map as well onto the hardware. So even though you have fewer operations, you might not get the speed up that you expect, right. And so really having kind of the hardware in the loop there to really kind of drive the design choices that you might make in terms of, you know, your layer shapes, for example, could really be helpful.
We had done some work on this with respect
to this work called NetAdapt
that really put the hardware in the loop
in the sense that you would take a neural network,
measure its latency and energy,
and then use that as an input
to iteratively modify the neural network
so that you would hit those energy latency targets.
Another prime example of how people simplify
the neural networks is through the process of pruning. And so that's basically you set some weights to zero, you remove some of
the weights. Traditionally, the approach there is you try and remove the weights that have small
magnitude, but just because the weight has, you know, like the number of weights that you move,
for example, or the magnitude weights is actually no indication in terms of the impact on energy.
In fact, you also should think about things like, you know, the data movement cost, how often that weight is being reused, and also of course the feature map information. So,
you know, some work that we did in that space, primarily energy aware pruning, is trying to
kind of use the energy cost to drive the decision in terms of, for example, which layers to prune.
We want to prune the layers that consume the most energy first. And that would allow us to get a better trade-off between energy efficiency or energy consumption and
accuracy. And so that at least gives us a better trade-off, but in terms of getting
the insights, it's still very challenging. On the accuracy point of view, I think that's
kind of what motivated me to look at more, and motivated the collaborations looking more
the robotics
and autonomy aspects in the healthcare space, because there's also this aspect of, you know,
if you achieve a certain accuracy on, you know, let's say ImageNet, and you, let's say you made
this trade off, and you drop the accuracy by 1%, what does that actually mean? Like, is that
meaningful a 1% drop? Is that big, not big? Is that, you know, so we wanted to look a little bit deeper in the pipeline.
So if I was using this neural network to navigate from point A to point B, then I can actually tell if this tradeoff in terms of accuracy is meaningful or not.
Or if I'm using this neural network to do some eye tracking for some neurodegenerative disease tests, then I actually have a very concrete um metric of accuracy because you want to
you know know how this might impact you know a certain diagnosis and so on so i think from an
accuracy evaluation point of view it's good to look at the very specific applications but
i completely agree that it's challenging when it comes to neural networks is not as where we don't
have a good enough understanding at this moment still of the relationship between complexity and accuracy
and then the accuracy for various applications i was actually curious about that accuracy friend
that you're talking about because you're a little bit transitioning into the robotics and the health
space there because you know in some sense um like on ImageNet you know you can achieve accuracy of say you know 99 or 100 percent these days like 100 percent
right but then but then for something like a diagnosis and the kinds of work
that you're dealing with you know and one sense the diagnosis is binary like
yes you know we think you have neurodegenerative disease X or no we
don't but then maybe behind it,
there's some sort of threshold
that comes to like 96, 97% likely to have it
or something like that.
So in your mind,
when you think about the accuracy on these kinds of domains
is the accuracy in the binary side,
where obviously a 1% error of yes or no
is a really huge sort of actual impact to somebody's life, or
perhaps like in the, in the, in the layer underneath, where you might be a little bit off,
but then you're, you're, you still meet the threshold, and you still, the diagnosis stays
the same, because it's 96 instead of 95. Yeah, I think it's a really good question. I think,
so then the question was like, how is the neural network being used for this particular application? And let me just use the robotics one as an example. First, it's a little bit more fleshed out because there's a lot of complexity in the health space, but then the robotic space, I think, you know, some of the things that you use a neural network for is for perception or for tonic navigation. So can you understand your environment? If
you use a neural network to detect whether or not like how
far an object is, or if there's an object there, the test there
is, you know, how, you know, how likely are you going to crash
into something basically, to get from point A to point B. And I
think often a lot of these things are probabilistic, or if
you should model model in a probabilistic manner.
I don't think it's always yes or no completely.
And so in fact, actually, some of the work,
this is in collaboration with Sertesh Karaman, who's
a roboticist here at MIT.
So our student, Samia Sudhakar, has
been looking at the implications of uncertainty.
So there's actually this whole field of looking at,
a neural network will give you a result.
But the question is, like, how confident are you on this result? What is the uncertainty around this result?
You can't just say, oh, this is like this object is this far away.
Like, are you sure? Are you not sure? We should also know that.
So there's a whole field of uncertainty.
And so I think that part of trying to also measure the uncertainty of the neural network can then
help to inform whether like how seriously you should use the output of the neural network in
you know in the given tasks that you're doing. On the healthcare side of things I think that's still obviously these are right when you go into the healthcare space it's much a much more long
term thing and that's in collaboration with Thomas Heldt who's another faculty here at MIT.
There we were trying to do some eye tracking work and like you know basically depending how
quickly your eye reacts to certain stimuli you know there is a correlation between that and
certain neurodegenerative diseases like you know Alzheimer's, Parkinson's and so on. But there I
think the aspect is of course you know depending on the lighting and depending on various other factors, your measurement, like the accuracy of your measurement can vary a bit.
The key idea there is that, you know, if you can do this, rather than doing this in a clinic with a very expensive machine, if you can do it at home with your iPad, you can collect a lot more measurements.
So it's not like, you know, going in to see a specialist once a year to get this measurement, but you can do it more frequently.
And though, you know, maybe the measurement is more noisy at home, but you're collecting it
over time, then you can use the longitudinal thing and the multiple collections to kind of address
a bit of the noise that you might get in the measurement. But again, it's really like how
you use it. So you can imagine,
I mean, this of course is still very long term, but like, you know, how you use it, and that might
be saying like, oh, if you if you can measure the uncertainty, or if you can collect more data,
then you can do a more efficient trade off between or you can trade off a little bit more accuracy,
or but if you, you know, if you use it as like, this is the final decision for everything,
then yeah, you might have to really increase the complexity and max out the accuracy.
So that's kind of like these are stuff like just using it in the field for these particular tasks gives us a little bit more insight in terms of the tradeoffs.
Yes. Yeah. Thank you so much i think that that helps because then now you have a concrete use case and that kind of
informs because yeah how much accuracy is okay to lose how much uh parallelism or how much energy
you do you really really have you know like what in some of our previous guests with physically
constrained systems like this is all the energy you have boom period like that's it so that gives
you a hard bound and it sounds like this is by going into a specific use case that helps give you a hard bound to with as a lens to look at your trade off space.
Right. And then you could also. But because you're doing something, if you're looking at what you're doing with the results, it could also, I guess, loosen that down, too.
Right. So depending on if you're averaging more results or if you have a good measure of uncertainty i think then you could it's not all or nothing it's not it's so it's not ends up being
not all like just based on a data set you like there's a certain task and there's things in a
task that you can cut corners on and things you can't and also but in a task you also have many
more other options in the space that you can use to address um like the neural network is not the
main thing kind of thing.
Then you have a better idea of how
you can trade off other things to address
any computational challenges in your neural network. So you have different design space of neural networks is quite large. You can prune them. You can quantize them.
You can choose different architectures and so on.
But moving to the end-to-end stack,
it's not just a deep neural network.
There are other algorithms that come
into the end-to-end application.
So how do you think?
And in many cases, there can be many different algorithms
that you can potentially use to solve the same task.
So how do you think about sampling this space?
For example, if you look at autonomy,
I'm sure there are many, many algorithms
for each individual step in that pipeline.
So how do you think about the design process for just pruning
those entire space?
You have a design space in terms of the hardware parameters.
You have a design space in terms of the algorithms
that you can potentially use and a design space in terms
of how you want to implement this,
what are the different technologies
that you can use to implement it. So what does the process look like in any of these cases,
right? Like deep neural networks or in particular for end-to-end systems designed for specific
applications? Right. I guess there's end-to-end and then there's also full stack. So I'll try
and answer both. But I think the way that generally speaking, I approach looking at
building energy efficient systems, I guess the
first question is always, you know, what what is the
bottleneck? Or what is what is the main driver of the energy
consumption? So is it you could start always looking from a
hardware perspective? And is it like that existing state of
art, say the art solutions for this application, the hardware
that they're using, you know, just is not
efficient. Like, you know, you can, we have all these low power hardware techniques, so,
you know, parallels and memory, like, you know, efficient memory hierarchies, data flows
and stuff. Is that something, is that, can that problem be addressed there? But then
sometimes, as I mentioned with the video coding space, and even some of the work in the robotic
space, the problem, you're limited by the algorithms that are running that are actually doing the you know
the you know completing the tasks and so if you don't have that much parallelism or if your
algorithms require you to use like you know a huge amount of data and there's going to be a huge
you're going to need huge memory I mean typical thing in the robotic space is you know you're gonna need huge memory. I mean typical thing in the robotics space is you know you're building these maps and these maps can be very huge right and so
um is the issue more the representation of that map so maybe then
you should address it from an algorithmic standpoint right so can you
we have a student Peter Lee who's been looking at oh can we
you know represent a 3D space with these like Gaussian mixture models which is
going to be much more compact than, you know, doing
like kind of a voxel 3D space type of thing. And then of
course, the challenge there is you want a very compact
representation, but you don't want the computing to generate
that compact representation that also be very costly. So, so that
let let's just let leads me to start thinking about the
algorithm side of thing. And of course, then when it comes to
algorithms, you're going to impact the quality of results. So there, that's where it's really
important to collaborate with domain experts, right? So that's actually what led to the
collaboration with Sirtesh, because kind of like we want to see, I think, first of all, at that time,
we want to see, you know, is efficient computing really critical in the robotics space? Like,
what is the role it can play there? And then, as you mentioned, there's many solutions.
I mean, robotics is a huge field, right?
To get from point, even like the navigation test,
there's many different ways of approaching it.
So kind of learning from and working with Sirtaj
trying to understand, you know,
what are the different approaches people take?
What are the quality metrics that they care about?
So like, you know, like what are the environments
that are really, you know, stress these type of algorithms.
So really understand how they define
what is a good quality algorithm.
Cause we don't want, you don't want to, you know,
design something that's super efficient,
but then it doesn't do anything meaningful.
Then it's not very useful.
So, and then, you know, given, you know,
this range of algorithms, then we start,
this is the fun part of learning about the algorithms and thinking through, oh, well, you know, given, you know, this range of algorithms, then we start, this is the fun part of learning about the algorithms and thinking through, oh, well, you know, this algorithm might have a lot of memory or this algorithm might, you know, if we made this small change to the algorithm, it could be much more efficient this way.
So it's kind of then, you know, collaborating on trying to learn about the space, but also to educate your collaborators on the space so you can together converge on a set of algorithms
that you think are more efficient.
And then actually more recently,
it's also important to think also
at the systems point of view,
because there's always this question on,
especially in the robotics space,
is it really the computer that's building a lot of energy
or is it the sensors or is it the actuation itself?
And then, so understanding kind of the interplay
between all those things.
So, for instance, on the sensing side of things, often, you know, things like sensing depth is actually very expensive because you can like send a pulse out and wait for it to come back.
And so if you can actually pay more compute and do something like a very common thing these days is to take regular RGB camera, run a neural network on it, and you can predict depth based on that.
Like it's kind of like monocular depth estimation.
Could you use that to replace these, you know, depth sensors?
So obviously that's a lot more compute, but then there's interesting trade-off between
compute and sensing.
I was just going to make a joke that we should definitely do the less accurate thing for
autonomous driving.
Get rid of LiDAR and just like using some inocular
um uh depth sensing that you were saying that that would that will sell cars i'm sure
well actually so okay it's a little bit divergent so speaking of the self-driving cars we actually
so as i mentioned earlier we were i was interested in sustainable computing and we had talked to a
lot of great people who were looking at that from the perspective of cloud
computing.
But we were also just wondering often,
so my big thing is I want to make sure our research is
helping things, not making things worse, particularly
for sustainability.
So one question was, for autonomous vehicles,
the amount of compute you need to have to do on it,
is that also going to have a very large carbon footprint.
And so it turns out like so, autonomous vehicles are great, but we have to be mindful of the
compute, because it turns out if you think about the amount of compute that you have
to do for an autonomous vehicle, which of course is very far out and the algorithms
are still you need to be developed.
But if you just think of the number of sensors on a vehicle for to understand its environment,
and if we hypothesize
that you'd be running DNN3GD sensors and it turns out if you drive you know a vehicle one hour a day
and you consider you know the number of vehicles that we have in the world and if a large portion
of them are autonomous vehicles the amount of compute that would be necessary would actually be
comparable or more than what you have in data
centers today right just the scaling factor and so the main takeaway from that is it's great to
enable you know autonomous vehicles i think it's very valid there's a lot of societal benefits for
that but it's really important to think about the compute too because even though it doesn't seem
like a lot of people get one vehicle as you scale it up, it can be quite substantial.
Interesting project we're looking at.
So we're trying to look at this because we're really curious
to see if our small scale stuff also flies at large scale.
So what I'm hearing you say is we
should get rid of the pulses so that we can make it cheaper.
I'm just kidding.
I'm just kidding.
No, no, that is really interesting.
Yeah, because in a lot of ways,
it's kind of like the reverse
of the shift to data center, right?
Because in the old days, you know,
each individual person had to decide
whether or not the battery on a particular laptop
was like good enough for them.
And like each individual person doesn't have enough scale
to really necessarily like move the industry.
But suddenly we start packing thousands and thousands
and thousands of them into these buildings and then we're like oh my gosh
we have to think about the individual like the small thing because we're timesing it by a million
right and now it's kind of like now you know with the cars we're just like oh it's just a car
there's a couple of little chips in there whatever but then what you're saying is like yeah there's
there's billions of cars well actually how many cars are in the world yeah so there's 1.2 billion vehicles out there so um you know we were just
trying to model some scenarios in one particular scenario if you assume that you know 1 billion of
them let's say would be autonomous then yeah you would hit this issue where you're comparable to
uh the power of data centers i think if it was something like if the compute per vehicle
is around 800 watts.
It's not too much if you think about the power
that we take to run deep learning,
specifically if you had so many sensors around your vehicle.
You just see like 360 right around your vehicle and stuff.
So it can be quite significant.
And I think the interesting thing about this was also, you know, on the data center side,
it's kind of you can swap your, I mean, I'm not a data center expert, so you guys can
call me, but like, you know, you can like upgrade your hardware, and so on.
But I think also the challenge there is that you also have like a wide range of workloads.
On a self-driving car, I think you can be much more specialized in terms of like the tasks that you're trying to do because you know it's going to be in a vehicle.
But at the same time, if I understand correctly, in the data center, people change out their compute every like three, four years or something.
But on a vehicle, usually these vehicles are supposed to last like 10 years or more.
So it's not as easily upgradable. And so there's a question of how should you design the hardware for a
self-driving vehicle because it needs to last longer. And you do have, maybe it is a good match
for specialized computing because you do have a narrower set of tasks. But at the same time,
if you update the algorithm, you want to be able to push that to the vehicle as well. So
it's an interesting challenge to think about.
Really fascinating set of considerations,
definitely unique to the self-driving car space.
And I think it's interesting to see how this will evolve.
I did want to touch upon one of the themes
that you brought up, which is you've
worked on a wide range of domains
and gone into substantial amounts of depth
to figure out how can you find
that next domain of efficiency and how do you relate them back into the hardware context.
I wanted to ask you, how do you think about problems that span multiple boundaries,
maybe things that are outside your immediate field of expertise, and how do you go about learning and
collaborating effectively with multiple domain experts in the many projects and the many ideas
that you've developed over the years?
Yeah, so I guess it's a question more about the process.
So certainly, it's fun to do a lot of reading,
learn about the topics.
But I actually get the most enjoyment
interacting with people.
So I like to go learn about a given topic from a particular person right so I think in all of these cases both
with Joel so with Joel it was more like okay I want to focus on efficiency I know that he's done
a lot of you know coming from the computer architecture community he's you know knows a lot
about flexibility and programmability so kind of bringing our expertise together
to solve this problem is really fun
because I get to learn more about the architecture space.
I hope he gets to learn more about efficiency.
In the robotics space, same thing with Sirtesh
and the healthcare space thing with Thomas.
So I think the high level thing is always,
what is important problem to solve? And then, at least for me,
it's like, what is my from my skill set? Will that make an impact there? My skill set is first
from the lens of energy efficiency. And then afterwards, going a little bit deeper and trying
to think at what level of the stack we should be solving at. And then if I'm not familiar with that level,
find people that I would like to work with
and are excited to learn from about that level
and work together.
And we're also equally excited about working together
to solve that kind of a problem.
So I think that the high level bit
is trying to identify the right problem
and then finding the right people to work with on it.
And then of course the high level, the challenges are always trying to find a way to efficiently
or not effectively communicate across the different domain boundaries.
So as we know that in each domain people have their own little language about their topic
and sometimes it's expected at the
beginning it's going to be kind of a bumpy road to ramp up but I think if both parties are equally
engaged in trying to communicate to the other person so that they can understand and trying
to you know distill down the concepts then I think it's in you know an effective way to learn but
that is the challenge at the beginning
to just try to figure out that you speak the same language,
same terminology, define it all very clearly.
Yeah, that's definitely a tough one.
I've experienced that a lot
in any sort of cross-layer collaboration.
And I just wanted to point out that throughout this,
this discussion that we've been having,
it feels to me like I just jumped out
that how many other people's names you've mentioned and so you know it seems
pretty clear to me that you would be like a generous and enthusiastic
collaborator and that's probably why you have had such fruitful collaborative
work in the past because you're just like oh we work with this person and this
person that it I don't know at some point it pierced my thick skull just
like wow she's mentioned a lot of people um and so so yeah kudos to you because that's not easy. Thank you yeah I think this is what I also
tell my students it's good to be generous with credit and also to value your collaborators I
think a lot of these problems and like challenges it's like again like I said for me it's really
much the the fun part is with working out with all these people I find actually much more satisfaction saying we solved this problem
together than I solved it myself because I just feel like that's the same thing I encourage my
good people to collaborate because I think it's a lot more fun like if you bring a lot of different
perspectives different ideas and you get both get much more out of it I personally think so
but yeah thank you for noticing. That's awesome.
And that's a great meta thing to teach to your students too,
I think, in my humble opinion.
So kudos again.
Maybe this is a good time to wind the clocks back
a little bit.
Could you tell our listeners how you got interested
in energy-efficient computing?
How did you get to MIT?
What was your journey like?
And any highlights from those from the journey
right sure yeah so I guess so I did also do my PhD at MIT so maybe I can start like when you
say MIT you mean both times unfortunately I spent a lot of time here or fortunately what happened
was after in graduate school I mean undergrads I was not actually to be honest thinking about
doing grad school at all,
but at the University of Toronto, where I did my undergraduate studies, you could do a 16-month
internship to kind of figure out what you want to do after you graduated. So I ended up working at
a startup called Snowbush, which was actually started by two professors at the University of
Toronto, so David Johns and Ken Martin, and they were looking at, you know,
mixed signal analog design. I had no idea what that was. But I just thought, you know, it's just
something cool to do. And as you can expect, a startup by professors is going to be made the
employees are all their former grad students. So then I kind of got a what a feel is like for
working with fellow graduate students. And I mean, I think also what was very clear to me
at the time was if you wanted a design role, at least
at that company, you would need some kind of higher level
degree, some grad school, versus like as an undergrad.
Though I was involved with many,,
but primarily the focus was doing the physical design
and layout.
So I decided to apply for grad school
when I guess was a senior in my fourth year.
And I was also like, oh, I might as well apply for MIT in the States as well.
So then that's how I kind of ended up at MIT.
And then at MIT for my PhD, I worked with Anantha Chandrakasan, who's kind of like
the world expert on low power circuit design.
And so he had pioneered a lot of these approaches of aggressively scaling down the supply voltage
and then developing circuits that can work at very low voltages, like sub thresholds like down to like 0.2 volts and stuff at very low power but then one
strategy to medic like if you as i mentioned before i think if you go very low voltage you're
also super slow so that's great for applications like the medical space but not so much for
you know applications where you need to have some reasonable amount of throughput. I was also personally okay so I love television so I was also very into video so I was like oh
it would be like that would be one topic I would want to work on video and so he was like great so
try and think about how you would apply these low power strategies to video processing and so I was
you know lucky enough to be able to work on that topic. Of course, as a faculty,
found some funding for that, which is great. And so working with another graduate student,
Daniel Finkelstein, we designed this H.264, which was kind of a state of the art at that time,
video compression chip that operated, I think it was like 0.7 volts, was very low power. So people
were very impressed by that. And I think often
a question that comes up in the computer architecture community is, is there value in taping out
a chip? And there definitely was because we taped out a chip, we reported it and people
were like, oh, I don't believe that it's actually that low power. So we actually brought our
chip to, I think the conference was in Japan, like a demo for them to see that it is this
low power. So it's like, it gives like, it's not, there's like, we didn't model it or something incorrectly like it actually like we can design it it's true it's like
a proof of concept and it was like um very exciting uh but in that process i realized that
you know there were a lot of things that were still kind of limiting what we could do especially
if you start wanting starting wanted to like improve the albums for better compression
so then even as a grad student start to collaborate
with Texas Instruments on looking at new algorithms for video compression but with the lens of
making it low power I was also very fortunate at that time that the video standards were just like
starting to kick off so these video standards happen every decade or so so like you really
have to be lucky to because I mean if it just finished there's no new nothing new to do. But they were just kicking off. So by the time I graduated
2010 and Anantha was actually very I should say was very supportive of sending me to these like
kind of preliminary standards meeting when I was a grad student which was great. So even though
there's no publication out of it I learned a lot in terms of what happens in these standards
committees. And so when I graduated I joined TI and then so with Madhukar
Bhuttagavi and Minhwa Zhou, we worked on like, you know, algorithms for the new standard,
everything. So it was great. We designed HEVC and that's very rewarding because it's used in
a lot of these Apple products and a lot of devices around the world. You know, after the standard
ended, so it takes three years to do a standard, it was like, oh, you know, what to do next? You
can either work on the next standard, you can switch products, or you can consider going back to be faculty at MIT.
And I think I really enjoyed working with the under or for the grad students who interned at
TI over the summer, and Anantha was really encouraging me to consider faculty. And I was
also getting really interested in computer vision, in the sense of like, it's one thing to be
very energy efficient when you compress the pixels,
but it could also be very interesting to see
how energy efficient it is to understand the pixels.
Cause we know there's a wide range of applications
where you don't need to look at the video.
So then I decided to come back to MIT
and then that's where like the focus was then on,
you know, efficient video processing
and that became computer vision.
And then it was of course,
say the art there was deep learning.
I had, it was just starting off at that time.
And I would say, I should also give credit to Vinay Sharma
who was at Texas Instrument, who, you know,
he was a computer vision guy.
We'd always have these like nice discussions about,
you know, what's going on in the computer vision space.
Like, oh, this deep learning thing seems to be really ticking off this is like 2011. so then
when i came back to mi just like oh we should work on this uh space and so that's kind of how the
deep learning stuff started that's a really cool story and it reminds me that one of the things
that we had wanted to ask you about today which we haven't gotten to yet, was how your work from your time at MIT for the first time around
NTI led to an Emmy. I don't know how many people in our field have an Emmy. Tell us about that,
like specifically what was the Emmy for and did you have to get all dressed up and go to a dinner
or whatever like on TV? All right.
So first of all, let me, I mean, sounds good.
I mean, I clarify a couple of things.
So the Emmy is first of all for, it's for the entire standards committee that developed
HEVC, right?
So it's not a personal Emmy.
It's for like our whole committee together.
So that was led by Gary Sullivan and Jens Ohm.
Yeah, it was just in recognition of this new technology that we've built.
And now that obviously they can do it
to use it to distribute a lot of video.
I mean, like I mentioned, I love television.
So it was like really great to be like,
ooh, I never thought, you know,
I like consuming television, not producing.
So I was like, oh, this Emmy would never happen to us.
But then, oh, actually with an engineering approach,
you can actually get these Emmys.
I will also say it was an engineering Emmy,
not a creative arts Emmy. So they're
different. But nonetheless, it was super exciting to go. So
yeah, it was like a whole dinner thing. And I forget who the
celebrity was, they had a celebrity like who was the host.
And it was just actually really rewarding to see all you know,
former collaborators and colleagues because the Emmy thing was 2017. I had left TI at 2013,
so it had been like a four-year gap since I saw a lot of these people. So it was just a really nice
reunion as well. And in general, it's just like very nice recognition. Like actually developing
standards in terms of time and effort is actually can be really tiring. We meet like quarterly,
but then we have these really intense 10-day meetings where
most often when the meeting's over, most restaurants are also closed as well. It just
runs really late. It's really intense. And then even between the meetings, it's very intense. So
it's just nice to get that kind of recognition after all of that hard work. But yeah, it was a
lot of fun. It was certainly unexpected, but then when I think back, like, oh, like, yeah, I've always liked television. So this is a good alignment with my
other interests. That's amazing. And would you say, I mean, because like, so yeah, like you say,
right now, like, we can, like, we as a society consume a lot of video. And, and is was this
technology, like, seminal and sort of being able to pack a lot of video
into small pieces so that we can consume it at the rate that we do now? Like what specifically,
I mean there's a lot of standards out there but not all of them win
Emmys. So maybe you can elaborate a little bit more about what it was exactly.
Right and I think you know credit to like the whole standards community and the leadership of the
chairs to getting this done. Cause you're right.
There's a lot of standards and not that many take off in terms of
popularity. So I think the first key thing,
obviously for video coding standard is to improve compression.
So every standard, which happens, as I mentioned, every decade or so,
the goal is to improve the compression by 2X,
meaning same quality of video, and then 2x
smaller size. And this is, of course, important, because we're all using a lot more video these
days. And even though, you know, bandwidth and memory increases, but like the increase in usage
is much higher than that. So that's really critical. But then the second thing, and that was
a specific focus in this new standard, or HEVC at the time, there's another one that's coming out.
But HEVC was, if you think 264,
which is a prior standard that was developed in 2003,
how we use video in 2003 is very different from 2013, right?
So these days, a lot of the video is consumed
on battery operated devices,
like whether it be your phone, your laptop,
both consumed and I would say created, right?
Because you're taking videos.
And so there, I think the key thing to make HEVC
also be an effective standard
is we considered energy consumption.
So basically often there were a lot of things
where we had to evaluate the trade off,
like, oh, your new idea can give X amount of compression, but we had to evaluate the trade-off like, oh, your new idea can give X amount of compression, but
we had to evaluate the complexity of it. Does it also consume however much energy? And so is that
trade-off really worth it? Or is there a better way to achieve the same, you know, coding efficiency
in a different way that's much less energy heavy or complexity heavy? So by bringing those two
things in, I think the standard that we came up with
was twice as efficient in terms of compression,
but also the overhead of getting that compression
was very low compared to 264.
So I think those are the kind of things that really first
making sure that your standard does compress well
over a wide range of content.
So they really have to test for a wide range of content, range of scenarios because how we do video compression right now for video conferencing
conferencing low latency is very different from how you would compress like cinematic content
right um where you have a lot of compute time but quality is really key um so like you have to
consider all scenarios and then also these days compute as well. So I think being able to balance all of
that is what led to a key component leading to the success of the standard. That's a fascinating
story and also sort of talks about the changing times and how different standards sort of evolve
with the changing times and the huge amount of work effort that goes into enabling these technologies for
the larger mass as well. Any words of wisdom to our listeners who will be listening to this podcast
based on your experience working on a wide range of domains, collaborating with a wide range of
people, anything that you'd like to share with our listeners? Yeah, so I think there's, I was trying
to think about this before. I think there's two key things. So one is if I look look at my own trajectory and this is what I actually tell my own students I think it's really important
to work with people that you like to work with who you surround yourself is like so key actually
sometimes doesn't even really matter so much the problem although the problem is should be both
like you should both be moved or all be motivated to solve it I think I'm so lucky and happy with
all the people that I collaborate with. That's,
you know, I really enjoy the teamwork and learning from everyone. So I think that's really,
you know, very important. And I think the other thing, and this actually comes from like, so
I recently ran this workshop with another faculty member who teaches like a leadership workshop
here at MIT. But you know, there's a lot of interesting things. I think this happens in industry. When you become a manager leader,
you get sent to these leadership workshops and they teach you things like, you know, how to
give feedback, how to lead the team, how to manage conflicts. But I think actually a lot of those
skill sets are really important, even if you're quote unquote not a leader. Like even as like a
graduate student, like you don't know if you'll be a leader if you're written as a graduate student, like you don't know if you'll be a leader in future, but even as a graduate student, or any student actually, knowing how to give and receive feedback, in particular, I think
receive feedback is really important. So we ran a workshop on that recently, but there's actually a
lot of, okay, so a couple of things, like this is like, you often feel like these interpersonal,
non-technical challenges are like
you're the only one encountering them but it turns out there's a lot of literature and research on
this so even stuff like learning and like to give and receive feedback there's a ton of literature
on that and so we ran a workshop on that but as it turns out it was actually very timely because for
many folks they're getting their reviews back right now from and so on other conferences. And so being able to kind of look at that feedback, and
of course, there's going to be good reviews and bad reviews, but being able to look at
it and try and distill the useful feedback from that and not take it so personally, it's
hard to not take it personally, because you work really hard on it. But nonetheless, being try
to look at it more like this is a growth opportunity. You know,
actually getting feedback, in some ways is actually a
privilege, not really, like, nobody has to give you feedback,
you're kind of, you know, being given feedback isn't someone put
the time in to give you that feedback and try and make the
best use of it. I think having that lens for the kind of the grad school journey
or any journey is actually quite useful.
And more broadly, even if you don't intend to be a leader,
although you might be a leader, you might.
But nonetheless, like building those type of skills
of being able to manage feedback, managing conflict,
just these non-technical skills are really important.
At every point of your journey, these are not things that you would deal with after
you graduate, like you can use them every day.
So I think that that would be the other thing to, you know, we're all excited about technical
skills and abilities, but non-technical is also really critical.
I 100% agree with that. And I think one of the things that
I've come to realize toward, you know, at this time in my career, which is currently
paused, as some people may know, is that you get to a point, it's like, you know what,
the limiting factor, I mean, of course, people can reach a limit in terms of your technical
capacity, right? It's just like, I just don't understand this stuff, right? Like I will never be a string theory physicist or whatever, because
like that just not how my brain works. So that at a certain point, you might be able to, you might
cap out on some sort of technical aspect. What I didn't really realize is that you may be able to
cap out on your interpersonal aspect. Like you might not just be able to communicate your ideas.
You might just not be able to work with enough people. You might just not be able to cope with
the stress of having to deal with like, you i don't know uh 500 million dollar budget or something
like that and so to get these and i think that is something that a lot of young folks don't
necessarily realize it's just like if i just get you know if i just you know break through this
technology if i just do this or that or the other then that'll that'll be set for life and that's
not the case so the fact that you're teaching this at such a young age,
even like running workshops for your students,
like that's going to set them up
for a longer term success.
Because I mean, coming out of MIT,
you're going to have the technical chops,
but to be able to like handle all the other stuff
is a really, really meaningful lesson
to teach to your students.
I think so.
And I also think like these things
actually inhibit your ability
to develop your technical chops because you spend a lot of your time like if i reflect back you know
like 90 of the things that keep you up and i are typically these non-technical interpersonal issues
as opposed to the technical issues so if you could clear these things out or learn how to manage
these things then you could focus more of your energy on the technical aspects of things.
And then yeah, the other like I really want to emphasize is just like in, you know, in
computer architecture with tons of papers on different like you hit a technical challenge,
you can look at papers and read papers, and you're not the first one to hit that challenge.
And these interpersonal things, huge body of research, you're also not the first one
to have an issue with your advisor, not the first one to feel upset about a negative review. There's tons of people who have gone through this and there's different ways of
dealing with it. So it's good to use that as a resource. Like we're, you know, here we're all
researchers, just research this part as well. Like it's all there. I just didn't realize it until,
you know, later in life and I wish I had known earlier in life. So that's why I kind of feel
like for these younger folks, we should all do it. this actually evokes a little bit you know our previous guest jim keller who obviously
has a lot of technical jobs as well as managerial jobs talk to us a bit about how he reads a lot of
books about management that's how he like builds these teams that are successful because he's
read a lot of books specifically about that topic and And so it's interesting that you're doing something similar,
but sort of pushing it down at the university level already.
So that's pretty awesome.
I mean, I wish they would do in the high school level.
I think everyone has to deal with people and conflict and feedback.
So you don't have to be just a manager or professor or whatever.
You know, it's been wonderful, wonderful chatting with you.
We've learned a lot.
You've had a lot of great things to share with us
about all sorts of topics,
ranging from the technical and the non-technical.
So thanks for being here today.
Thank you so much for doing this.
I think this is a lot of fun
and thanks for just your leadership
and all this podcast area
and broadening the reach of this community
to the rest of the world.
Yeah, echoing Lisa's sentiment here, it's been an absolute delight talking to you today.
And to our listeners, thank you for being with us on the Computer Architecture Podcast.
Till next time, it's goodbye from us.