Computer Architecture Podcast - Ep 17: Architecture 2.0 and AI for Computer Systems Design with Dr. Vijay Janapa Reddi, Harvard University
Episode Date: September 3, 2024Dr. Vijay Janapa Reddi is an Associate Professor at Harvard University, and Vice President and Co-founder of MLCommons. He has made substantial contributions to mobile and edge computing systems, and ...played a key role in developing the MLPerf Benchmarks. Vijay has authored the machine learning systems book mlsysbook.ai, as part of his twin passions of education and outreach. He received the IEEE TCCA Young Computer Architect Award in 2016, has been inducted in the MICRO and HPCA Halls of Fame, and is a recipient of multiple best paper awards.Â
Transcript
Discussion (0)
Hi, and welcome to the Computer Architecture Podcast,
a show that brings you closer to cutting-edge work in computer architecture
and the remarkable people behind it.
We are your hosts. I'm Suvini Subramanian.
And I'm Lisa Xu.
Our guest on this episode was Vijay Janapareddy,
an associate professor at Harvard University
and vice president and co-founder of ML Commons.
He has made substantial contributions to mobile and edge computing systems
and played a key role in developing the MLPerf benchmarks.
Vijay has authored the machine learning systems book,
mlsysbook.ai, as part of his twin passions of education and outreach.
His work has also earned him numerous accolades,
including the IEEE TCC A Young Computer Architect
Award in 2016, induction into the Micro and HPCA Halls of Fame, and multiple Best Paper
Awards.
On this episode, Vijay discusses Architecture 2.0, a new era of using AI and ML for computer
systems design, exploring the opportunities, challenges, and educational shifts it necessitates. He also delves into his work on TinyML, enabling machine learning
on resource-constrained devices and its potential to transform our technological interactions.
A quick disclaimer that all views shared on this show are the opinions of individuals and do not reflect the views of the organizations they work for.
Vijay, welcome to the podcast. We're so excited to have you here.
Thank you for having me. It's a pleasure being here.
As longtime listeners of the podcast usually know, our first question is often, in broad broad strokes what's getting you up in the morning these days it is without doubt my four-year-old and my eight-year-old first thing
in the morning at about six o'clock and i'm sure that's very common a lot of people six is rough
i gotta tell you that six is rough i got seven amers so i got lucky well we get some midnight
wake-up calls so yeah so after they get you you up, then what is your day looking like these days?
Most of the time, it's kind of thinking about what's the next big thing that's actually
happening in our field.
That's honestly what kind of really keeps me up, thinking about it quite a bit.
I think because right now, it's such an exciting time when there's so much change going around
and finding the path through this nebulous
cloud that we're looking into is sort of like the most fascinating thing I feel because it is both
an opportunity for a whole bunch of different ideas that we can all explore and research and
education but at the same time it's also quite a challenge because it's easy to kind of go down
you know right home so just thinking about that and trying to identify what are the interesting areas is
sort of the most exciting thing right now.
Right.
So one of the themes that you have talked about in recent times is what's called Architecture
2.0, a shift from the traditional paradigm of how we design computing systems.
So what sparked this particular vision and what are the most exciting advancements
that are driving this particular paradigm shift? Yeah, so that's a great question. So architecture
2.0, just for the listeners to kind of be clear about, architecture 2.0 is fundamentally just
thinking about how we use AI ML to help us build better systems in the future as we're starting to
build increasingly more complex systems and to do that very efficiently and to
also do that with extreme sort of, you know, consciousness about like how we reduce time to
market. Because I think as systems get more complex, you know, validating, verifying,
designing, all of that gets inherently more complicated. And so we need new tools. And
that's fundamentally what Architecture 2.0 is really about. And obviously at this given point
of time, you know, it's a super exciting time
because it's like you're in this era
of not only just AI ML,
we're truly in this era of generative AI ML, right?
Which is sort of a very exciting area.
Now, the reason that actually came to light
is honestly, just from reflecting all the work
that's been happening in the community,
the architecture community
has been doing some very interesting work.
Obviously, we build a lot of systems for ML, without doubt.
But in recent years, we've definitely seen the shift towards having AI ML being used
for machine learning systems, right?
Like actually designing the systems in itself, right?
So this could be in the form of like, you know, whether it's Bayesian optimizations
or genetic algorithms or
reinforcement learning or pick your favorite you know bell whistle that you want to apply
right and it's been like reading those papers that have actually been coming out which i think
in all honesty are fantastic and phenomenal because they're showing sort of you know what's
possible but once you take a step back and you try to think about like how do we systematically
translate or convert this into something that we can use in a practical sort of sense you start getting into some really deep questions
about what are the challenges um around this space and from reading those papers is when i kind of
realized oh wow we don't really have an engineering principle around how we're going to be applying
this methodology you know in order to accelerate all the traditional stuff that we've been doing
and i think that's fundamentally what gave birth
to going around and talking to a large number
of community members around what are the new challenges
as we try to use AI ML for system slash architecture design.
I think there are several themes that you've talked about
within this broad umbrella,
everything from datasets to ML algorithms
to tools and infrastructure that you need.
And as you rightly pointed out, new methodologies and a new way of thinking about designing
these systems.
Can you tell us a little more about these different themes?
What are some of the challenges and opportunities that you see under each of these?
And I think the big and most fascinating element of all of this, as much as like, you know,
we talk about whenever we talk about AML, by and large, most of the community is wickedly
excited about, oh, I've got this new little model that I'm actually going to put in,
and it's going to do blah, blah, blah. Fundamentally, I think it's the most boring aspect of, you know,
AI ML, in all honesty. I think the most fascinating aspect of it is where it all begins, which is the
inherent data that we're actually talking about, right? Because data is effectively the new code
today. And when we kind of think about it from how do we apply AML methods for system design,
and you kind of go back and you look at like,
okay, what corpuses do we have?
Question is very simple.
What's the ImageNet dataset for computer architects?
That's a very simple question.
And yet we would struggle to answer that question.
Why? Because we have not systematically thought about it. We are the ones who have actually been building the systems that enable, you know, all this AI technology. Yet we ourselves have not thought about how we would be able to data sets for you know architecture design now architecture
design is a very complex thing it spans many layers right it goes all the way from talking
about high level design space exploration in my head it also cuts right through the eda flows
because at the end of the day when we talk about architecture it's not about just the design that
we come up with it's actually you know how to be taken down implementation right and so everything
from that top all the way down is sort of like you know the critical critical thing that i think is
um fascinating and how do we think about data there is one of the first and foremost things
we got to ask ourselves if we want to actually you know use this new methodology in our existing
workflows well i i'm very curious to hear what you mean specifically by this kind of
data exactly. Because, you know, when I think about sort of maybe architecture 1.0 of how we
would build things, the data would be, say, like some sim points or, you know, spec basically.
That's the data that we use to essentially not train the design, but sort of inform what we want
the design to be good at.
And that's the data that we test against, that's data that we design against, and that's the sort
of performance benchmark metric. In this case, it sounds like, you know, obviously the data would be
slightly different in this world because it's going into these AI ML techniques to try and
inform these designs and do them rapidly and optimize them. So, you know, obviously the word data is extremely broad.
Maybe you can dive down a little bit into what you mean
or what kind of different pillars of data you're talking about.
Yeah, pillars of data, that's a good way of kind of putting it.
I think there are three fundamental ways of kind of bucketing these things, right?
The absolute cutting edge one would be how do we get data in
a format that's actually useful for generation right another one another pillar is how do we
get data in order to do sort of um optimizations and and the more basic one in all honesty is like
how do we do some sort of prediction data so in my head there these three pillars you start with you
know getting data sets where we can make very basic predictions about what's going to happen next.
The next thing would be, how can I get the data in order to actually design the system to be much
more, you know, optimal for some, you know, whatever heuristic you choose to. And the third
one really is the generative aspects. And I think once you kind of bucket things into these three
major pillars, then you can systematically think about, you know, what needs to be done. Now, off these three, obviously,
prediction and optimization are things that we have been doing in the past, right? Because when
we do design space exploration, we are effectively looking at, you know, various design points and
trying to, you know, figure out what's the best optimized method that we have to kind of pick from.
Even if you're looking at prediction, we have done prediction. We look at, you know figure out what's the best optimized method that we have to kind of pick from even if you're looking at prediction yeah we have done prediction we look at you know prefetchers and
branch predictors they're all looking at you know data coming through and making predictions right
there is however a difference when we talk about prediction optimization and generation once we
start thinking about it in the context of Architecture 2.0. It's an incremental step.
That incremental step in my head is fundamentally
about breaking the abstraction layers, right?
So traditionally what we have done
when we have thought about optimizations is by and large,
we have been kind of focused on these abstraction layers
from the system stack going from the application algorithm
all the way down to the hardware.
We've created these multiple layers of abstraction, you know, isa being the most classic version of it right we create these nice
abstractions between the hardware and the software ecosystems and kind of let each independently
evolve what that ends up happening what ends up happening then is that you sort of do these
smaller optimizations and i think as you start getting into the ai space what's really interesting
is it's kind of stepping away from this traditional
paradigm of instruction set architectures to more about parameter set architectures, PSAs,
as I like to think about it. The idea of PSA is that in the future, you still need to be a core
architect. Don't make a mistake. I'm not saying that suddenly our students don't need to know
anything about architecture. All our people still need to know everything deep inside so we know when models are hallucinating and so forth.
So we take that fundamental understanding, flip that vertical stack into a more horizontal stack,
and then our future architects are really going to be understanding what are the parameters that are actually essential to expose across each of those horizontal layers. Because at that point,
once you expose the parameter space, then you let the AI agents actually get to work.
And at that point, it gets really fun because now you could have an agent that's perhaps just
dedicated to the hardware module, or you could obviously break it down into the individual
microarchitecture components and have multiple agents all kind of working and learning from
each other. But in the end, when you take a step back, they're effectively
learning from each other and exploring that massive design space that we truly have across
the system. And I think that sort of paradigm shift is really what we need to have rather than
thinking about things in a very traditional sort of a perspective about how we have done things
today, right? I think that sort of changes, you know,
what architecture one to two is going to be. That sounds very interesting. I think I want to
double click on this idea of this horizontal design space that you were talking about. It
sounds like, and let me make sure I heard it right, that, you know, of course, we have this,
a lot of layers, cross layer, and we and we often do cross-layer optimizations, but they're usually in adjacent layers. Are you saying then that you turn that on the side, those layers from like
ISA all the way up to, you know, microarchitecture, turn that on its side so that an AI can look at
all of the layers together and essentially optimize the parameters that we decide are good for exposure across what has traditionally been vertical stack, but now it's horizontal
and they have the purview of the whole white space. Okay. Interesting.
And I think that's going to be super fun because you start to kind of understand that
there are going to be differences about how we even expose those parameters, right? There might
be hierarchical parameters. You want one agent, one AI agent to kind of, you know, maybe just work
on the memory subsystem in complete isolation. Or you might actually want to break the memory
controller completely down and say, okay, even within the memory controller and the way it
interacts with the memory subsystem might actually have multiple agents because some of them might be
responsible for very specific parameters that are playing around with. And so when you kind of think about it, you get into this really interesting design space of how do you get the AI to actually map onto this horizontal parameter space that we're actually exploring.
And those kinds of things have not yet been fully explored because, as you said, Lisa, we have largely been doing co-design between two adjacent layers. We lump it into
hardware and software co-design, which is true, but if you really go into it, it's really just
algorithm and hardware co-design in this very tight binding. But there is so much more of what
an architecture stack really is, right? And there might be optimizations that we would perform
at the highest levels of the stack that are in fact suboptimal when you actually look at it
from a holistic system design,
because sometimes you wanna leave more room
for the system at the lower levels of the stack
to actually make other kinds of opposite decisions
to what we would normally do.
Yeah, that's fascinating,
because I feel like a lot of times
when you are a student doing microarchitecture design even.
Maybe you've honed in on some substructure
within the microarchitecture,
whether that be a BTB or a TLB or an L2 cache or whatever.
And you kind of do have to isolate yourself
into looking at that structure in and of itself.
You've got to get yourself a pattern stream
that goes into it.
And then within that pattern stream, you isolate
yourself to figure out, okay, here's what happens here. You know, maybe I need, if I like memory
systems, so like maybe you need an eight way cache, or maybe you need a four way cache, or maybe you
should have. And even with those dimensions of like, how many indices, how many ways,
how many megabytes, gigabytes, whatever, whatever, depending on what level of
the cache you're talking about. That often was a relatively taxing space to look at just because,
you know, you would still have to run lots and lots of jobs who didn't have the computational
power. And you would wonder sometimes like, okay, well, what if I change something in the L2 that
changes the traffic? You know, like the way that the L1 is filtering to the L2, now the traffic you know like the way that the l1 is filtering to the l2 now the traffic
has suddenly changed like there was no way to pop all the way up and so it sounds like what you're
saying is that if we turn everything on its side and now that we have the massive power of all this
ai we can look at everything potentially all together although we probably still have to be
judicious about what parameters are being exposed is Is that what you mean by the first piece, the prediction
piece and the data piece?
Yes.
And I think it's the architect's job still
to have very deep knowledge about what parameters are
actually critical.
So this by no means undermines what a traditional architect is
doing.
If anything, all we're trying to do
is, for instance, if we go back in time,
he says to kind of help you just compute through faster
so you can actually look at more interactions
with your fundamental knowledge.
I want to circle back to the data itself.
And the quality of the data is very important
for the quality of the AI systems or agents
that we build towards these different tasks.
This is true even in other domains.
So if you look at the toolkit that we have
to create such data, we have simulators on
the one hand, and then we have real world performance profiles on the other hand. So
how do you think we should go about collecting these data sets for architecture research?
What should we be careful about, especially as we try to ground this data in the real world?
We want to ensure that if you're simulating, the quality of the data should be good, which means
that it needs to correlate in some reasonable manner
to what we might expect in a real system.
So what do we think about these different attributes
of the data and the quality of the data?
And what are mechanisms that we need
so that we can create these data sets,
curate these data sets, and then also measure the quality
of the data sets themselves?
Yeah. I'm going to split the data element
up into two two fundamental pieces which you were already alluding to the first piece being quantity
of data because we do need these are inherently having to be big data oriented kind of problems
right and another one is once you have that big data then sort of how you how do you tune it around
for quality both are actually needed if you kind of look at what's actually happening in the ai community you know if you historically kind of look at the size
of the data that's been evolving for images for instance you start to see that originally you
know people try to curate these really high quality data sets right and people said like
that's the most important thing but then it's like over years like as the models have gotten bigger
we've started creating more noisy data sets.
You start getting noise in the data set because you start pulling the human out of the loop a little bit and you start relying on self-supervised methods or just having the systems effectively just kind of mining for data.
And then you end up with a lot of errors in the labels and so forth.
Now, just because you have a bit of error does not necessarily mean that,
you know, it's actually bad. Sometimes having a little bit of error can actually help the model
not get stuck in certain things, right? And so you do need a large amount of data.
And to that point, I would say, think about the number of simulations that we would all run,
right? Just globally, just think about the number of G gem5 simulations alone that you and i probably run
forget even what's happening in the companies just to jump by simulations that are being
run academically and even within my lab right now probably right what do we do with all that data
we basically get the paper out i guarantee you the students probably got it in some directory he or
she will forget about it you know once the paper, right, and then we just kind of like, you know, at some point, just kind of, you know,
archive or erase it, we don't really use it, and I think like that's a wasted opportunity, especially
for a domain that is quite specialized, right, there's a lot of domain knowledge you need to have
to be able to kind of, you know, understand how to work with things, just,. Just, I'll tell you a little bit about a project
that we're actually doing centered around data
in Architecture 2.0 later on.
But one of the very basic questions just yesterday,
one of my students, Shwetan asked the models was,
is data movement generally more costly than compute?
Now you can by and large ask any starting a phd in
architecture and this is probably one of the first things we try to teach them and guess what mr
l claude and chat gpt come back with right they say no data movement is actually not costly
now of course they once you start asking them they'll rationalize this and kind of come up
with you know all kinds of excuses about like, it really depends on what data you're talking about.
Oh, it depends on what compute. Yeah. But if you ask a vanilla student, you know, it kind of comes down to this point about.
You know, a very basic question. Right. And these models are not able to get it.
And so but a lot of that domain knowledge is kind of inherently captured in a lot of the data that we're inherently throwing away today i think that's a lost opportunity for us and so this is where i think a very simple
thing for the quantity side of the world would be what if we could just create a plug-in into gem5
or any other open many other excellent simulators that are out there right um like within gpg you
send all kinds of different things even for accelerators even for time loop and all these
you know modeling based systems what if we could kind of inherently create you know a platform agnostic back and we're
able to suck this data into some cloud service provider where you know it's open for the
architecture community to be able to happen that gives us a wealth of data on which we can start
training at least open source models in order to do basic tasks like prediction and optimization right and just be able to do it really well so
that's purely on the on the quantity side it's going to run over to the quality side of course
now as you kind of make the data sets noisy and you know as you start injecting as you start
implicitly injecting errors and then you got to worry about the quality of the data i think for
regulating the quality of data,
one of the key things you will ultimately need
is some human in the loop element, right?
And I think as a community,
we have to start thinking about training
our next generation of PhD students
and engineers and so forth
to help us kind of get that higher quality
so that we can end up with something
that's like a reasonable, you know,
a reasonable data set with low error, right? And this means labeling the data sets and so forth right and i think that that
gets into you know what kind of data set if it's a if it's a ginormous gem5 simulation log yeah
that's very very hard to kind of you know really sort of um you know streamline right because what
are you going to label on that thing it's very hard to label the best you can do is have some sort of metadata about the conditions of the experiment and so forth right
however we can still curate high quality data sets about basic information and questions such as like
you know is data movement more costly than compute that's a qa right if you kind of go back and you
look at nlp models by and large you, they've worked on these kinds of QA
data sets that we have, right? Question answering pair data sets, which, you know, test the model's
ability to understand the domain. So if we can create those kinds of data sets, which I think
students would be able to help, and so will the community, then I think we can start sort of,
you know, bootstrapping this quality-oriented data sets and start creating benchmarks,
which is a whole other area outside of the data itself. Yeah, I have some follow up questions here. So I guess
what I'm kind of picturing based on what you're saying, because I use as one of the early developers
of Gem5 way back in the day. And one of the things that we had tried to do was, you know,
have a very rigorous set of statistics about all of the major structures and they
just get all spit out at the end. And then there would be maybe a little bit of labeling
about what happened in this particular simulation so that you could distinguish what happened
between this run versus that run or what have you. And so I guess in your mind, are you
imagining something like this where you essentially spit out a bunch
of data saying like, okay, if the ROB size is eight, but has some ridiculous number, and the
L2 cache size is four gigabytes, which is also a ridiculous number, then, you know, then it can
essentially glean out some correlative stuff where when you have maybe a more reasonable number for both of those or you isolate like what is cause and what is effect or what is at least correlated when things are happening.
Is that what you're talking about? Or do you necessarily need some sort of label to say like, hey, I believe this run is testing new structure A because all the run to run, some of them may have a new structure
that wasn't there before at all
that introduces new relationships,
or some of them might have bugs,
which we would have tons of bugs where like,
oh, these results don't make any sense.
Like somebody had to look at it and say like,
this doesn't make any sense.
I guess what I'm imagining here
with respect to the generation piece,
you know, there is a structured generation of like, what are we spitting out? You know,
what are the, what are the pieces of data? What are the structures? And then there's the,
the sort of description, or I guess, label of it. Like if you invent a new widget,
how does that then get incorporated? Yeah. I mean, at the end of the day, we're, we're always
having these unit tests in some
capacity right so so in the case where we end up with something new i mean we will still have to
continue doing what we are doing today right we're just kind of writing these custom unit tests and
making sure that we're actually right about them but on a macro scale what i would say is that if
you're looking at it from the holistic system then i would very much do whatever we are already doing
in many of the big ai systems right so you know, you know, when you're hitting, let's say if you're hitting something
like Dolly, for instance, and you're generating an image, given a prompt, you have to generate an
image. Well, the prompt doesn't directly go straight to the model, right? It doesn't go
straight into the backend. You've got a whole bunch of infrastructure that's actually sitting
in the front end that's actually guarding the prompt and making sure that the prompt is intentionally good and it's well
well-meaning and so forth does not mean that the back end is completely you know
going to be safe right because you can you can generate pretty harmful images today with just
straight-of-the-art models right so you still have to kind of you know have some checks and you know
guardrails in place which is what you know the front-end classifiers are typically designed to do
so in a very similar way you would trainend classifiers are typically designed to do.
So in a very similar way, you would train simple classifiers, I would think, that are able to spot anomalies that are happening inside the system. And so you could effectively use that to kind of
have some mechanism of a feedback signal that comes back to the architects who are designing
the system. So I'd still go back again with the human in the
loop being the most critical element of this all. In addition to the technical challenges that you
discussed, it looks like there is a big part of the community contributions that's required in
order to bootstrap this entire ecosystem. You've been involved in multiple both open source and
community efforts over the years, including MLPerf as a founding member.
So can you talk about the importance
of such open source contributions
along with industry plus academic collaborations
in advancing this particular field?
And also, how do you think about bootstrapping
this particular ecosystem for Architecture 2.0?
Yeah, that's great.
I'm glad you're asking about that.
Yes, it is true that I'm a super big proponent of doing
community-driven efforts.
And in all honesty, the kudos and the credit
really goes to things that I've learned
when I was a student looking back
on what the community was doing.
I mean, the community built the GemFi simulator.
The community also helped contribute to GPU-CM
with TorchStart.
And as a community, every once in a while,
we kind of reach
a point where we really need to come together to create something that will unlock the next
generation of ideas and research that can come out, right. And so that's where I really draw a
lot of the inspiration from is kind of looking at how we have done these big mega projects that are
now like you know sort of the backbone, right. So from that sense like when you talk about
Architecture 2.0 or kind of building this data set,
one of the things that is that we're actually gonna,
you know, talk about it at one of the workshops is,
we've been, we basically, you know,
have created a massive corpus of last 50 years
of architecture research.
And we haven't talked about this,
we will be talking about it, but it's coming.
And what we have actually done is we've started creating
a data pipeline where, you know where we have data annotators,
basically undergraduate and graduate students
effectively labeling certain types of questions
because they need domain expertise,
such as the one that I was talking about,
which is data movement related thing.
And we've started creating that data set.
And we originally started with the ISCA 50 retrospectives
where we collected all the retrospectives, which is a very small sample that was put together by Jose Martinez and Lizzy John from last year.
And we did a QA data set around that.
And we took that data set and we fine-tuned some of the open source models, which are actually performing bad in architecture.
We immediately saw a spike in the ability, which was a clear signal that even with a little bit of a curated corpus you can actually
improve their domain knowledge about architecture and now we've effectively expanded that to be the
last 50 years worth of you know architectural research architecture again i said it's kind
of encompassing both traditional architecture as well as eda kind of flows so papers in that sort
of corpus and we've created a pipeline which allows us to kind of,
you know, start labeling as a community.
And this is what we're actually hoping to,
you know, announce pretty soon at one of the ISCA workshops
and then write a subsequent blog to engage the community.
And I think this is where, for instance, you know,
for people who do believe, okay,
AIML can be a useful tool in our toolkit.
It'll be a wonderful opportunity to contribute to help shape it.
In my vision, it's like first we start labeling the datasets.
We start labeling, then we start fine-tuning models.
And I would love for us as a community to have a collection of open-source
models that are actually, you know, domain-specific to us.
And then you can start, you can start trying to improve their knowledge across
various ways. And this is where benchmarks become critical. Because
benchmarks in my head are a way to kind of bring the community together because
everybody has to agree on what are the interesting tasks that we actually want to solve
first. And then you can start creating a roadmap
that allows not only apples to apples
comparisons with benchmarks but benchmarks are also sort of the north star like you know
for instance when we wanted to go to the moon we weren't necessarily talking about oh this is the
specific navigation system i have in a fall or this is the thruster that i have no you actually
focus on getting to the moon and then you work backwards and say okay what are all the elements that I need to have in order to be able to get to the moon?
Then you say, okay, well, I need to have a certain amount of, you know, the thrust capability. I need
to have a certain type of navigation capability, right? And you kind of identify all the pieces
and you kind of build a matrix that says, okay, if I can check all these, it tells me that I might
be able to get to the moon. And so in my head, that requires a community effort
because you have to build that complex matrix for one, right?
Which is identifying what are all the tasks
that we would need to be able to do incrementally one by one
so that we can say, okay, someday perhaps
we can ask a large language model to say,
okay, act like an architect, you know,
give me a RISC-V core that's got this sort of, you know,
ISA support and, you know, it's able to really optimize these particular workloads.
I'm not saying that one LLM is magically going to do it all. I'm saying it might actually invoke
other agents or existing traditional non-AI ML tools to actually get the job done, right?
So there's the two parts. One is kind of like getting all the data that needs to actually be put in place,
which is kind of what, you know, I think that's a massive community effort.
There's no way you can do this just by dumping some data set outside and say,
oh, OK, everybody adopt this.
I don't think that's going to work.
We all need to chip in much like the way I'm just picking Gem 5 as opposed to Child Year.
Much like Gem 5 is a community project. We all chip in.
When my students find a bug, I say, don't complain about it.
Just go fix it.
It's incredible that you actually have a massive simulator that someone wrote for you.
So just go fix the bug instead of complaining about it.
So if we all kind of contribute in that way, I truly believe that we'll be able to kind of build a new set of tools that will help us with hardware design.
And I think that's where the community aspect kind of comes in with respect to Architecture 2.0. Yeah, I mean, I think that sounds very
interesting. I mean, the way that our community always has worked, it seems, is that there's some
sort of thing happening. There's some turn, there's some change, and then there's a lot of discussion, and then eventually there's a
congealing around some sort of pillar of how we're going to do things as a community.
You know, so we eventually congeal around a benchmark suite, or we eventually congeal around
a simulator. You know, there's a few, but there's not, everybody doesn't come and roll their own, right? Because we sort of realize that collectively, it's better if we all collaborate on a few.
So I think what you're saying, it makes sense.
It seems like a tall endeavor too, but it always is in the early stages.
So for some of this, I mean, I'm imagining this big world of possibility, right? Where let's say one of the parameters that's on the table is, say,elloed around a pers a set of instructions more or less
right we might quibble about whether you need f mole or not or whatever depending on the situation
but we more or less they they look very similar you know barring risk v-sys and i guess what i
wonder is if we wanted to say explore something different, it seems like what would be necessary is for someone to say come up with a new instruction and then come up with a new compiler that uses that instruction adequately to come up with the instruction stream that then can then be fed into a number of sample machines to be able to produce
the data that would be rich enough for an ai to be able to reason about it right because you know if
you just say um add one new instruction and then compile one program and put it through one run of Gem 5, there's no way for an AI to be able to
reason about what it was that might change if you put that in the corpus of everything.
So it feels, I guess I'm just thinking through how this would work, and it feels like then
the kind of work that we do now, which is like, Hey, what would happen if
we had this new instruction, then you'd have to do all this work. And you have a sort of a hypothesis
in mind and you set up your, uh, your experiments to be able to figure it out now, sort of
the hypothesis maybe feels even more vague or more like, what would happen? Like,
would this instruction be a good idea? And you do all this work and then let the AI say,
yes, it would be a good idea under these circumstances for these types of instructions,
but you still have to run all the simulations. So I guess, I guess I'm just thinking about that process where, as a student, if you have a hypothesis, you sort of have to come up with your experiment set.
And now what you're trying to do is come up with an experiment set that is wide and varied enough to produce enough data so that an AI can draw a conclusion.
Is that sort of how you picture it? Yeah, I think like this, there's an
aspect of, yes, we might have to build all the tools and so forth to kind of get to that, you
know, evaluating that hypothesis. Now, I think like, I'd like to think, as you're kind of mentioning
that I was translating this into a visual in my head where I'm like, okay, if I want to, I'm
sitting in a room and I'm trying to think, okay, what's the next set of optimizations to perform, right?
In my head, I would assume that given all the simulation data that's kind of setting in,
for instance, right, from, you know, whatever simulators, you know, pick your favorite company
and all the tools that are internal, I would assume that I should be able to ask, like,
what are the common bottlenecks that I'm actually seeing and what aspects should I really focus on optimizing?
And as an architect in my head, the architect of maybe like 2030 or 2040
would be like kind of interacting with an AI agent
that's kind of asking probing questions.
The AI is really kind of just looking through the minds of data
and making connections that you and I normally would not make.
And I don't think it's going to necessarily, we don't necessarily have to push it to the point
where, okay, just give me the chip, but it's more of an interactive feedback loop, right?
That allows your architects to very intelligently brainstorm things because often architects are
kind of doing this today anyway, right? Chief architects kind of sit around and like talking
to all that, you know, IP modules that are getting integrated into the SOC.
And I would think that, you think that that feedback loop is very slow
today.
And I would assume that in the future,
the feedback loop is going to be extremely fast,
because the AI agent is effectively synthesizing
all this data and comes prepped for the meeting,
much like any other person.
And you can just ask the AI agent,
what would it likely be if I had know, I had this sort of,
you know, configuration, right? Which would be the notion of taking the prediction data,
looking at optimizations, right, that have been performed in the past, and then potentially kind
of making, you know, some sort of generative sort of an idea of like, okay, this is how I would
retweak your design. And so I agree with you, it's inherently nebulous, and I don't have all
the answers around this. But my hope is not so much that you and I honestly figure it out,
but my hope is that we get the next generation to fire up, because they're likely going to think
about these things in a very unorthodox manner that you and I probably don't think about,
because we're very much stuck in a certain box, given the rules and things that we have,
we ourselves broke, you know, in order to be who we are.
That's right.
They're going to be AI native, unlike us.
Right.
Yeah, that's a fascinating discussion.
And also you've painted an exciting vision for the possibilities in the future.
So I'm hoping a bunch of our listeners are geared up towards this particular challenge.
Switching gears a little bit to another thrust
in your research, you've worked on enabling ML
in resource-constrained devices, like edge devices,
mobile devices, and so on.
I think you've christened it TinyML.
Can you tell us a little bit about the unique challenges
in designing both efficient algorithms and hardware
for TinyML applications?
And how would you sort of compare and contrast it against, you know,
large-scale machine learning deployments?
Why is it exciting?
What is different about it?
What are some unique challenges in that particular space?
Yeah, so TinyML is effectively, you know, really talking about embedded machine learning.
And for folks who like typically when we talk about on-device machine learning,
you know, most people traditionally in the industry will say that, okay, that's kind of more talking about mobile devices, right?
Our smartphones are effectively the on-device element.
TinyML is really not about that.
It's really about pushing ML onto, you know, hundreds of kilobytes of memory capacity, right?
And so you're really talking about, you know, milliwatt level power consumption always on ml specifically in iot kind of devices
or even smartphones it's always on you know some element is constantly listening because it has to
in order to detect a keyword like when you say hey siri it's not like the system wakes up submodule
wakes up right so certain aspects always have to be on and the question is can i fit in neural
networks into a few hundred kilobytes or you you know, one or two megabytes of flash storage that I actually have.
And so that's what TinyML really is about.
And it's a vastly different ecosystem from the rest of the big ML stuff that's happening.
And I would say that it's quite fascinating because it's the perfect melding of hardware, software, and ML.
It's that blending of all three that I think is truly what TinyML sort of,
you know, is all about.
And that kind of opens up the space
to many interesting challenges.
I mean, this whole ecosystem kind of started
about five years ago,
maybe five to six years ago, I would say.
And, you know, back then it was just an idea
and there's a tiny
amount of foundation that got formed around this where we're all kind of just thinking about what
would it be if we could enable speech recognition on a coin cell battery operated device
i mean that's a pretty damn far shot that's quite the moonshot if you kind of think about this is
still five six years ago right and so that's where and this was
also during when i was at my sabbatical at google where there was a skunkworks project on hey can we
actually do this can we adapt tools like tensorflow to be able to run on microcontrollers much the
same way they were adapted from running on big servers and workstations onto mobile devices
things needed to be stripped down and so forth and of course you know today if you kind of look at it
there's an entire world of you know tiny models all over the place lots of you know
optimizations that are specialized for these embedded devices and this is a space that i think
is fascinating both for research and education i'd say in research it's especially fascinating
because talk about co-design this is one place people could really use co-design because they're highly bespoke
applications you know typically when we talk about co-design i often often kind of you know
skirt a bit because i'm kind of worried about like co-design looks you know complicated on paper it's
lots of innovation technical innovation but from a practical standpoint i'm always like
how is anyone going to take any of this and make sense in a company?
I'm not saying that they have to do it today, but even like seven years from now, where you're
asking them to rip apart the algorithms, you're asking them to rip apart the runtime, you're
asking them to rip apart the architecture slash microarchitecture. I'm like, it's an intellectual
exercise. That is awesome. But there's an aspect of it which just seems completely imbalanced right because often when
you're building systems in large scale you need them to kind of be general purpose to an extent
okay and you know when you're talking about tiny ml is very different because it's highly bespoke
your ring doorbell does nothing but pretty much just image classification it does not have to
listen to you like there's lots of very simple things that it can do. Alexa does not need to see you necessarily. It's mostly just trying to listen
to the sounds. Now, in the future, I think speech is going to become so common. I think this notion
of touch, I bet my daughter is going to be like, what? Why do you use or touch the screens? That's
so yucky. That's probably what she's going to say when she's, you know,
a couple of years older because she's just going to probably be talking to
every single thing.
It's going to be like, hey, widget, toast my bread for two minutes.
Bake for 325, right?
And that's a limited vocabulary space where you don't necessarily need big
models.
You can really get away with highly bespoke models that are highly
specialized.
And, of course, if you can do that, that's pretty incredible.
I mean, just think about the world where today you think about AI,
you still physically have to interact with it in the real world, right?
Like you kind of, you know, you're interacting with some sort of entity.
That's pretty clumsy and clunky if you kind of think about it,
because you are working around, you know, the machine and reality.
We are having to adapt to what the machine is. If it's truly embedded, you don't notice it. And that's the beauty of Mark Weisner's
vision about like Ubiquitous computing way back at Xerox PARC. They have this idea that you're going
to have intelligence spread across everywhere. And I think that's where we're getting to,
which is ultra low power consumption, specialized intelligence for specific things
that the devices need, and then being able to seamlessly interact with us.
And that's essentially why I'm super excited about the TinyML ecosystem.
Right.
I think we have a long way to go to get to that ambient computing where everything just
disappears into the background.
Truly magical technology is something that you don't even notice exists.
You briefly touched upon how TinyML also
has a space in education, because for a lot
of the other models, especially the largest models,
you need industrial scale machinery
to be able to interact and iterate with it
at multiple scales.
Can you expand a little bit on how
you think this is going to be useful in education?
Very broadly, I've thought a lot about how do you teach students about computer architecture,
especially in the current times, given that the space is evolving so rapidly.
You tell our listeners, how do you think about teaching computer architecture?
How do you think ecosystems like TinyML or the associated tooling and infrastructure
would be helpful and beneficial
in teaching students about the different concepts
in this particular space?
Yeah, I'm going to talk a little bit more broadly
than architecture, because I think architecture's scope is
also expanding, especially as we look in this domain of ML.
I think an architect who wants to play in space,
for better or worse, needs to understand the ML ecosystem,
the ML systems ecosystem.
So I'm definitely very passionate about education
in this space
because I think it like breeds new life into traditional embedded systems that have been
thought forever in universities worldwide, right? I still remember the first time I ever, like,
you know, was when I was a professor at UT Austin, you know, I went into the classroom, you know,
I was about to start teaching embedded systems and I was like, before leaving my room, my home,
I remember I grabbed the garage door opener because I was like, before leaving my room, my home, I remember I
grabbed the garage door opener because I was like, oh, this is a very basic embedded system.
It's amazing. It's like everywhere. It's like, you know, I'm going to use this to inspire students.
And I still remember, you know, holding it up and I'm like asking the kids and they were all like,
oh, it's a garage door opener. And I was like, this is an amazing piece of technology because
it's got all this stuff. And as I was getting excited, I saw their faces going really dull. And I was very perplexed by that as to what was going on.
And one of the kids at the front said, I really don't want to do engineering so I can build garage door openers for the rest of my life.
I was like, damn.
Kids got a point.
Two weeks later, I got my Google Glass and I took that in.
And suddenly everybody wanted to kind of work on like embedded systems and stuff.
They're like, oh yeah, this is super cool.
And I want to do this stuff.
I think like for education, I realized that it's really about kind of making it relevant
with the times where we are.
And so like obviously today when we look at AI systems and so forth, there's a lot of
excitement.
We've built lots of incredible hardware for this, but it's often very inaccessible, right?
Think about how often can we actually, you know, go come up with a design that we can actually take, you know, all the way through the lifecycle of the chip, right, from concept to doesn't quite happen.
But TinyML kind of opens it up to a very exciting space, right?
For one, there's a lot of open source ecosystem tools that are kind of coming up.
And because the designs are highly bespoke, you can actually do a lot of specialization.
And because they're also small designs,
you can actually completely go from your concept
to kind of getting the tape off this,
or whether you're doing it on FPGA.
It's so much more practical.
And I think the timing is sort of very interesting for education
because you have these open source tools
that are just mature enough to be able to pull this off, and then you've got an interesting educational area which is around ai everybody
wants to do ai but then often it's like some big model some big data set that you know i can go
around asking how many people have actually built a data set and i can guarantee you often whenever
i'm asking this question probably you know one or two people will raise their hands out of 30 or 50
people right very few people touch that.
But when it comes to this sort of embedded ecosystem space, the data set is highly bespoke.
So you can actually get them to go all the way from understanding how you collect the data, how you pre-process the data.
Imagine if I have to just kind of wake up a machine that says, you know, when I say, OK, OK, Vijay, that's what i want like well you go you can go easily collect all of that data and you can build the pipeline out and actually train the model do your optimizations and particularly deploy it and
actually get it to close the loop right and these these widgets today are literally five bucks a pop
i mean the ones that i have like you know folks can't see this but like you know we actually kind
of build these things right and they're really really really cheap. And of course, you know, Arduino, Seed, all these folks have started putting out, you know, putting out these, you know, MCUs. And these MCUs, these
microcontrollers are completely capable of running, you know, these models. So students have an
incredible opportunity to deeply understand whichever layer of the stack that they're quite
interested in, right? If you kind of look at Songhan's papers, who has also been doing some
pretty amazing work in
tiny ml for instance you know they've been able to build an entire runtime engine that sort of you
know optimizes it how often would you ever go out and build a big tensorflow like engine to show
some incredible capability that you can unlock you can't do that on a big system however in their
case they were able to kind of build a custom runtime model right so that kind of opens it up
really and of course there's lots of hardware solutions you know um that's like preaching to the quarter on like you know what it means to
build hardware so i'm not going to dabble into that but then that's a really exciting space
right the one thing though that's certainly missing in this ecosystem is sort of like
they're not enough educational resources around this right This is one of the reasons I think folks
know about this, but I started putting together my own class notes, and in fact, Suvanya actually
knows about this, where I started writing a machine learning systems book that talks about
what it means. Originally, it was a tiny ML specific book, but as I started writing my notes
in that, I started realizing it doesn't matter if it's tiny ML or big ML. Fundamentals are fundamentals, right? When you do operating systems, yes,
they're distributed operating systems and all kinds of crazy stuff when you go to RTOSs and so
forth, but you still have to learn one-on-one operating system. So when it comes to ML systems
and architecture, it doesn't matter if you're building a big ML or a small ML. The fundamentals
are still the same around all the nuances you need to understand
about what happens in an ml pipeline from the point when data comes into the point the data goes
out right and so i ended up uh creating this you know mls book that i you know it's an open source
project um where people have actually been contributing back so this goes back to my whole
passion about community involvement and so forth in fact just this morning i actually kind of um was working on getting the release out um because i've been
spending an insane amount of time i feel like until i get the release out i can't actually
rest because there's always something more to do when it comes to these educational things
so question about the open source notes that you're you're talking about that sounds really
really interesting i'm just wondering is the is the model sort of Wikipedia model where everybody can just put their stuff in, or is it a Linux model where
you need a pull request and Linus need to say, okay. It's definitely a pull request model.
So yeah, so someone has to be, you know, involved in curating it. Of course, there are a couple of
people that are, you know, you know, I certainly have been talking
to multiple faculty members.
And as much as I kind of do
the initial drafts and my students,
you know, my research lab is very active.
Every time I teach it,
my students kind of contribute,
oh, these are interesting seminal references
because the field is moving very fast, right?
So the question always is like,
how do you sort of keep up with it?
And that's the whole reason for making it
an open source sort of a project
where people can issue pull requests
and kind of keep it updated. That said, though, I was still struggling with it. And that's the whole reason for making it an open source sort of a project where people can issue pull requests and kind of keep it updated that said though i was still struggling
with it and that's when i kind of reached out to dave pattison to ask for a bit of advice on like
you know when they wrote the book you know they wrote the computer organization book back when
things were evolving rapidly back then right like today we kind of look at it as the holy bible but
when they were writing it there were heated debates going on about what's the right thing
to do what's not the right thing to do and so forth and i think the
advice that he gave me is kind of what i follow which is if a company has started putting it into
practice that could be a nice litmus test for whether this concept should be in an educational
resource because it means that there's community wisdom that yes this makes sense the
nuances of course will be different but that's sort of like a way of kind of proofing it against
the rapid change that's actually happening in the ecosystem and that's actually worked out quite well
i think that's a wonderful initiative and also a great resource not just for students but also
practitioners in this field because even once you get into industry or you're working in a particular space, because the space is
evolving pretty rapidly, it's hard to keep track of all the different developments, number
one, but also, as you mentioned, someone curating it and saying, these are the essential ideas
that you actually need to pay attention to.
I think that signal is quite useful.
And the process of doing this and curating it is incredibly valuable to the entire field.
So I highly encourage the listeners to doing this and curating it is incredibly valuable to the entire field.
So I highly encourage the listeners
to also go and check out the book.
Maybe this is a good time to wind the clocks back
a little bit.
You're clearly very passionate about teaching.
Maybe you can tell our audience, how
did you get interested in computer architecture?
What is your journey like as you got to Harvard,
where you are currently? Yeah, I'd say that it sounds a bit tacky, but in all honesty,
I think I got interested in computer architecture because when I was reading
Dave Patterson Hennessy's book, I remember, I mean, I kid you not, this sounds really weird,
but I read it like it was a storybook or like it was a novel because it was accessible.
I mean, I just kind of picked it up right now. I was like, who's going to read this massive book massive book i still remember when they gave it to me it was like this big fat book and i'd gone
and picked it up at the national university of singapore because that's where i started my
undergrad picked it up and i had a cd i was like what like who knows all this stuff and what i'm
gonna have to memorize all this stuff and so anyway i still remember like kind of just sitting
down and like reading through it and i found it fascinating that it was so accessible to learn
something that seemed so complicated that you would normally think that oh I have to go to class
and that's really kind of pick it up and that to this day you know kind of left an impression on
me it's like oh it's like when you have a good educational resource where you can learn you
might not be able to master it certainly you need mentors to help you master it but if you have a good
educational resource then that can really kind of you know spur you and it's and also think more
than just that material i think it's also the community aspect um i think some folks in our
community are very approachable and accessible i think like just looking at them as sort of mentors
and being like oh maybe someday you know i can be. For me, I honestly feel that that has a bigger impact on you
than the actual technical material.
And that's honestly how I ended up becoming a professor.
I never thought I was going to be a professor, to be honest.
I was so inspired by my own mentors.
I was like, wow, these people are so incredibly smart,
yet they're so humble and so nice and so forth.
And they were so invested in me,
even though I don't even know the ABCs of stuff.
Right.
And that I think is, you know, for me over time, it's kind of translated into,
it's we're all technical people.
We're all smart people and so forth.
But at the end of the day, we're humans first.
Right.
And it's, it's all about relationships and just being nice and taking care of
one another, I think is far more important than all the nitty gritty. One of the best pieces of advice that I was given, which I
take to heart from one of my colleagues, Gustavo at UT Austin back when I was there was, if you
can't have a cup of coffee with your colleague and just kind of hang out, forget writing a $20
million proposal or whatever it is, it will never work. If you can't just hang out with a person,
like the way I'm hanging out with Yusuf and A.R. Lisa,
yeah, there's no way you're gonna have fun
doing whatever it is, right?
And so I really feel like as much as we wanna invest
in technical things and always debate things
very technically, I think it's very important to remember
that we're all just trying to learn from one another
as researchers, and we're always a learner first
in our community so i think that's kind of really what it's kind of inspired me so i know it's not
the classic i did this not in that and that i think for me it's really just incredible mentors
and people that i've seen that was awesome i have never heard anybody say that they read
patterson and hennessey like a novel like that
is like incredible i think one of the great things about doing this podcast i'm sure you
agree souvenir is like when we ask this question like we usually lead with what gets you up and we
usually end with you know how did you become a computer architect and the the ways that people
became computer architects are very varied i mean they're they they run the gamut of different ways
but yours is quite singular
i'm quite amazed i mean i i enjoyed the book as well and i remember reading it and thinking like
oh wow there are several chapters like this is the first chapter textbook i've ever read where
i didn't have to like really reread it like you read it and it's like gets in
there basically line speed I was like wow and so so I had similar feelings although I don't think
I blitzed through it the way you did it doesn't I didn't consume it like candy
but it was fascinating though because I actually you know we run a rising stars program at ML
Commons to recognize outstanding junior students in ML and
systems. And Dave, you know, graciously agreed to kind of, you know, talk to the students. And when
I was introducing him, I kind of mentioned this because it left such a positive mark.
And I could see because he said this, he was like, he was very happy to hear that because he said,
a lot of people don't realize how much time and
effort you know we put into the writing and trying to make sure it's actually accessible that it's
not that it's actually really something that people can consume so and i think it shows the
amount of effort that they must have actually put into just making it available to us right
sure yeah yeah for sure and i think that also kind of touches on how important it is.
You know, our guests time and time again have talked about how important relation, how important it is to have good relationships and collaborations and communication being effective.
That's always top of everybody's mind and how to to do as well as our guest list has done, yourself included.
And so I hope that maybe this is one of those things
where like repetitions will get into our audience's brains.
You know, you want to do the technical stuff,
but at the same time,
you really have to learn how to work with people,
be able to communicate effectively,
maintain relationships,
and that's how you get bigger things done.
Because we are long past the age
of being able to do anything on your own
that's sort of like a rich value to everybody, right?
I think it's a new generation.
Cliff Young actually recently was visiting Harvard,
and he made this really astute observation in the casual conversation
where we're saying, you know, when we were building them up for benchmarks,
like, you know, Cliff and Dave were one of the, you know,
original pillars in there.
And, you know, it worked because we were able to bring all the community together and kind of work collectively you know have a lot of grudging consensus in that and he
said maybe it's just uh you know the reason perhaps like you know nowadays we have to do
these bigger community kind of things is because the cohort of people who are actually doing things
that have been deeply influenced with social media you know it's like if you kind of things is because the cohort of people who are actually doing things that have been deeply influenced with social media, you know, it's like, if you kind of think about the
generational changes that we've come through as individuals, right? And if you think about it,
like, yeah, that era of people, the current era of people are people who are deeply influenced
with social media, which is like, you know, it's a community kind of thing. Everything is kind of
shared. Everything is discussed. Everything is debated. And, you know, we do it collectively,
and we agree to disagree and so forth. And I think that
was a very interesting observation that he made. It was like, oh, it's, we live in a different world
and people think differently today. So maybe as we move forward, we should work on projects more
holistically and more collectively rather than the way we used to do things in the back,
for one, you know, back in the day.
For one, systems are much more complicated today, right?
All of it, you also need bigger teams.
And so I thought that was a very interesting observation
that he made about how times have changed
and how our culture has kind of evolved.
And that possibly is changing the way
we actually work together too.
Yeah, I look forward to the day where we can work together only through Instagram DMs.
No, I don't. I really don't.
Yeah. Well, Avita, I think this is a really, really interesting conversation.
I think we ran a lot of different topics, you know, from Architecture 2.0 to MLPerf and MLCommons and teaching and TinyML.
I feel very stimulated right now. And so thanks so much for joining us today.
Yeah, thank you so much for having me. Super fun.
Yeah, thank you so much. It was a fascinating Architecture Podcast. Till next time, it's goodbye from us.