Computer Architecture Podcast - Ep 17: Architecture 2.0 and AI for Computer Systems Design with Dr. Vijay Janapa Reddi, Harvard University

Starting point is 00:00:00 Hi, and welcome to the Computer Architecture Podcast, a show that brings you closer to cutting-edge work in computer architecture and the remarkable people behind it. We are your hosts. I'm Suvini Subramanian. And I'm Lisa Xu. Our guest on this episode was Vijay Janapareddy, an associate professor at Harvard University and vice president and co-founder of ML Commons.

Starting point is 00:00:23 He has made substantial contributions to mobile and edge computing systems and played a key role in developing the MLPerf benchmarks. Vijay has authored the machine learning systems book, mlsysbook.ai, as part of his twin passions of education and outreach. His work has also earned him numerous accolades, including the IEEE TCC A Young Computer Architect Award in 2016, induction into the Micro and HPCA Halls of Fame, and multiple Best Paper Awards.

Starting point is 00:00:53 On this episode, Vijay discusses Architecture 2.0, a new era of using AI and ML for computer systems design, exploring the opportunities, challenges, and educational shifts it necessitates. He also delves into his work on TinyML, enabling machine learning on resource-constrained devices and its potential to transform our technological interactions. A quick disclaimer that all views shared on this show are the opinions of individuals and do not reflect the views of the organizations they work for. Vijay, welcome to the podcast. We're so excited to have you here. Thank you for having me. It's a pleasure being here. As longtime listeners of the podcast usually know, our first question is often, in broad broad strokes what's getting you up in the morning these days it is without doubt my four-year-old and my eight-year-old first thing in the morning at about six o'clock and i'm sure that's very common a lot of people six is rough

Starting point is 00:01:57 i gotta tell you that six is rough i got seven amers so i got lucky well we get some midnight wake-up calls so yeah so after they get you you up, then what is your day looking like these days? Most of the time, it's kind of thinking about what's the next big thing that's actually happening in our field. That's honestly what kind of really keeps me up, thinking about it quite a bit. I think because right now, it's such an exciting time when there's so much change going around and finding the path through this nebulous cloud that we're looking into is sort of like the most fascinating thing I feel because it is both

Starting point is 00:02:31 an opportunity for a whole bunch of different ideas that we can all explore and research and education but at the same time it's also quite a challenge because it's easy to kind of go down you know right home so just thinking about that and trying to identify what are the interesting areas is sort of the most exciting thing right now. Right. So one of the themes that you have talked about in recent times is what's called Architecture 2.0, a shift from the traditional paradigm of how we design computing systems. So what sparked this particular vision and what are the most exciting advancements

Starting point is 00:03:05 that are driving this particular paradigm shift? Yeah, so that's a great question. So architecture 2.0, just for the listeners to kind of be clear about, architecture 2.0 is fundamentally just thinking about how we use AI ML to help us build better systems in the future as we're starting to build increasingly more complex systems and to do that very efficiently and to also do that with extreme sort of, you know, consciousness about like how we reduce time to market. Because I think as systems get more complex, you know, validating, verifying, designing, all of that gets inherently more complicated. And so we need new tools. And that's fundamentally what Architecture 2.0 is really about. And obviously at this given point

Starting point is 00:03:43 of time, you know, it's a super exciting time because it's like you're in this era of not only just AI ML, we're truly in this era of generative AI ML, right? Which is sort of a very exciting area. Now, the reason that actually came to light is honestly, just from reflecting all the work that's been happening in the community,

Starting point is 00:04:02 the architecture community has been doing some very interesting work. Obviously, we build a lot of systems for ML, without doubt. But in recent years, we've definitely seen the shift towards having AI ML being used for machine learning systems, right? Like actually designing the systems in itself, right? So this could be in the form of like, you know, whether it's Bayesian optimizations or genetic algorithms or

Starting point is 00:04:25 reinforcement learning or pick your favorite you know bell whistle that you want to apply right and it's been like reading those papers that have actually been coming out which i think in all honesty are fantastic and phenomenal because they're showing sort of you know what's possible but once you take a step back and you try to think about like how do we systematically translate or convert this into something that we can use in a practical sort of sense you start getting into some really deep questions about what are the challenges um around this space and from reading those papers is when i kind of realized oh wow we don't really have an engineering principle around how we're going to be applying this methodology you know in order to accelerate all the traditional stuff that we've been doing

Starting point is 00:05:03 and i think that's fundamentally what gave birth to going around and talking to a large number of community members around what are the new challenges as we try to use AI ML for system slash architecture design. I think there are several themes that you've talked about within this broad umbrella, everything from datasets to ML algorithms to tools and infrastructure that you need.

Starting point is 00:05:25 And as you rightly pointed out, new methodologies and a new way of thinking about designing these systems. Can you tell us a little more about these different themes? What are some of the challenges and opportunities that you see under each of these? And I think the big and most fascinating element of all of this, as much as like, you know, we talk about whenever we talk about AML, by and large, most of the community is wickedly excited about, oh, I've got this new little model that I'm actually going to put in, and it's going to do blah, blah, blah. Fundamentally, I think it's the most boring aspect of, you know,

Starting point is 00:05:52 AI ML, in all honesty. I think the most fascinating aspect of it is where it all begins, which is the inherent data that we're actually talking about, right? Because data is effectively the new code today. And when we kind of think about it from how do we apply AML methods for system design, and you kind of go back and you look at like, okay, what corpuses do we have? Question is very simple. What's the ImageNet dataset for computer architects? That's a very simple question.

Starting point is 00:06:20 And yet we would struggle to answer that question. Why? Because we have not systematically thought about it. We are the ones who have actually been building the systems that enable, you know, all this AI technology. Yet we ourselves have not thought about how we would be able to data sets for you know architecture design now architecture design is a very complex thing it spans many layers right it goes all the way from talking about high level design space exploration in my head it also cuts right through the eda flows because at the end of the day when we talk about architecture it's not about just the design that we come up with it's actually you know how to be taken down implementation right and so everything from that top all the way down is sort of like you know the critical critical thing that i think is um fascinating and how do we think about data there is one of the first and foremost things

Starting point is 00:07:14 we got to ask ourselves if we want to actually you know use this new methodology in our existing workflows well i i'm very curious to hear what you mean specifically by this kind of data exactly. Because, you know, when I think about sort of maybe architecture 1.0 of how we would build things, the data would be, say, like some sim points or, you know, spec basically. That's the data that we use to essentially not train the design, but sort of inform what we want the design to be good at. And that's the data that we test against, that's data that we design against, and that's the sort of performance benchmark metric. In this case, it sounds like, you know, obviously the data would be

Starting point is 00:07:54 slightly different in this world because it's going into these AI ML techniques to try and inform these designs and do them rapidly and optimize them. So, you know, obviously the word data is extremely broad. Maybe you can dive down a little bit into what you mean or what kind of different pillars of data you're talking about. Yeah, pillars of data, that's a good way of kind of putting it. I think there are three fundamental ways of kind of bucketing these things, right? The absolute cutting edge one would be how do we get data in a format that's actually useful for generation right another one another pillar is how do we

Starting point is 00:08:32 get data in order to do sort of um optimizations and and the more basic one in all honesty is like how do we do some sort of prediction data so in my head there these three pillars you start with you know getting data sets where we can make very basic predictions about what's going to happen next. The next thing would be, how can I get the data in order to actually design the system to be much more, you know, optimal for some, you know, whatever heuristic you choose to. And the third one really is the generative aspects. And I think once you kind of bucket things into these three major pillars, then you can systematically think about, you know, what needs to be done. Now, off these three, obviously, prediction and optimization are things that we have been doing in the past, right? Because when

Starting point is 00:09:14 we do design space exploration, we are effectively looking at, you know, various design points and trying to, you know, figure out what's the best optimized method that we have to kind of pick from. Even if you're looking at prediction, we have done prediction. We look at, you know figure out what's the best optimized method that we have to kind of pick from even if you're looking at prediction yeah we have done prediction we look at you know prefetchers and branch predictors they're all looking at you know data coming through and making predictions right there is however a difference when we talk about prediction optimization and generation once we start thinking about it in the context of Architecture 2.0. It's an incremental step. That incremental step in my head is fundamentally about breaking the abstraction layers, right?

Starting point is 00:09:50 So traditionally what we have done when we have thought about optimizations is by and large, we have been kind of focused on these abstraction layers from the system stack going from the application algorithm all the way down to the hardware. We've created these multiple layers of abstraction, you know, isa being the most classic version of it right we create these nice abstractions between the hardware and the software ecosystems and kind of let each independently evolve what that ends up happening what ends up happening then is that you sort of do these

Starting point is 00:10:17 smaller optimizations and i think as you start getting into the ai space what's really interesting is it's kind of stepping away from this traditional paradigm of instruction set architectures to more about parameter set architectures, PSAs, as I like to think about it. The idea of PSA is that in the future, you still need to be a core architect. Don't make a mistake. I'm not saying that suddenly our students don't need to know anything about architecture. All our people still need to know everything deep inside so we know when models are hallucinating and so forth. So we take that fundamental understanding, flip that vertical stack into a more horizontal stack, and then our future architects are really going to be understanding what are the parameters that are actually essential to expose across each of those horizontal layers. Because at that point,

Starting point is 00:11:06 once you expose the parameter space, then you let the AI agents actually get to work. And at that point, it gets really fun because now you could have an agent that's perhaps just dedicated to the hardware module, or you could obviously break it down into the individual microarchitecture components and have multiple agents all kind of working and learning from each other. But in the end, when you take a step back, they're effectively learning from each other and exploring that massive design space that we truly have across the system. And I think that sort of paradigm shift is really what we need to have rather than thinking about things in a very traditional sort of a perspective about how we have done things

Starting point is 00:11:42 today, right? I think that sort of changes, you know, what architecture one to two is going to be. That sounds very interesting. I think I want to double click on this idea of this horizontal design space that you were talking about. It sounds like, and let me make sure I heard it right, that, you know, of course, we have this, a lot of layers, cross layer, and we and we often do cross-layer optimizations, but they're usually in adjacent layers. Are you saying then that you turn that on the side, those layers from like ISA all the way up to, you know, microarchitecture, turn that on its side so that an AI can look at all of the layers together and essentially optimize the parameters that we decide are good for exposure across what has traditionally been vertical stack, but now it's horizontal and they have the purview of the whole white space. Okay. Interesting.

Starting point is 00:12:35 And I think that's going to be super fun because you start to kind of understand that there are going to be differences about how we even expose those parameters, right? There might be hierarchical parameters. You want one agent, one AI agent to kind of, you know, maybe just work on the memory subsystem in complete isolation. Or you might actually want to break the memory controller completely down and say, okay, even within the memory controller and the way it interacts with the memory subsystem might actually have multiple agents because some of them might be responsible for very specific parameters that are playing around with. And so when you kind of think about it, you get into this really interesting design space of how do you get the AI to actually map onto this horizontal parameter space that we're actually exploring. And those kinds of things have not yet been fully explored because, as you said, Lisa, we have largely been doing co-design between two adjacent layers. We lump it into

Starting point is 00:13:26 hardware and software co-design, which is true, but if you really go into it, it's really just algorithm and hardware co-design in this very tight binding. But there is so much more of what an architecture stack really is, right? And there might be optimizations that we would perform at the highest levels of the stack that are in fact suboptimal when you actually look at it from a holistic system design, because sometimes you wanna leave more room for the system at the lower levels of the stack to actually make other kinds of opposite decisions

Starting point is 00:13:55 to what we would normally do. Yeah, that's fascinating, because I feel like a lot of times when you are a student doing microarchitecture design even. Maybe you've honed in on some substructure within the microarchitecture, whether that be a BTB or a TLB or an L2 cache or whatever. And you kind of do have to isolate yourself

Starting point is 00:14:16 into looking at that structure in and of itself. You've got to get yourself a pattern stream that goes into it. And then within that pattern stream, you isolate yourself to figure out, okay, here's what happens here. You know, maybe I need, if I like memory systems, so like maybe you need an eight way cache, or maybe you need a four way cache, or maybe you should have. And even with those dimensions of like, how many indices, how many ways, how many megabytes, gigabytes, whatever, whatever, depending on what level of

Starting point is 00:14:45 the cache you're talking about. That often was a relatively taxing space to look at just because, you know, you would still have to run lots and lots of jobs who didn't have the computational power. And you would wonder sometimes like, okay, well, what if I change something in the L2 that changes the traffic? You know, like the way that the L1 is filtering to the L2, now the traffic you know like the way that the l1 is filtering to the l2 now the traffic has suddenly changed like there was no way to pop all the way up and so it sounds like what you're saying is that if we turn everything on its side and now that we have the massive power of all this ai we can look at everything potentially all together although we probably still have to be judicious about what parameters are being exposed is Is that what you mean by the first piece, the prediction

Starting point is 00:15:26 piece and the data piece? Yes. And I think it's the architect's job still to have very deep knowledge about what parameters are actually critical. So this by no means undermines what a traditional architect is doing. If anything, all we're trying to do

Starting point is 00:15:40 is, for instance, if we go back in time, he says to kind of help you just compute through faster so you can actually look at more interactions with your fundamental knowledge. I want to circle back to the data itself. And the quality of the data is very important for the quality of the AI systems or agents that we build towards these different tasks.

Starting point is 00:15:59 This is true even in other domains. So if you look at the toolkit that we have to create such data, we have simulators on the one hand, and then we have real world performance profiles on the other hand. So how do you think we should go about collecting these data sets for architecture research? What should we be careful about, especially as we try to ground this data in the real world? We want to ensure that if you're simulating, the quality of the data should be good, which means that it needs to correlate in some reasonable manner

Starting point is 00:16:25 to what we might expect in a real system. So what do we think about these different attributes of the data and the quality of the data? And what are mechanisms that we need so that we can create these data sets, curate these data sets, and then also measure the quality of the data sets themselves? Yeah. I'm going to split the data element

Starting point is 00:16:46 up into two two fundamental pieces which you were already alluding to the first piece being quantity of data because we do need these are inherently having to be big data oriented kind of problems right and another one is once you have that big data then sort of how you how do you tune it around for quality both are actually needed if you kind of look at what's actually happening in the ai community you know if you historically kind of look at the size of the data that's been evolving for images for instance you start to see that originally you know people try to curate these really high quality data sets right and people said like that's the most important thing but then it's like over years like as the models have gotten bigger we've started creating more noisy data sets.

Starting point is 00:17:27 You start getting noise in the data set because you start pulling the human out of the loop a little bit and you start relying on self-supervised methods or just having the systems effectively just kind of mining for data. And then you end up with a lot of errors in the labels and so forth. Now, just because you have a bit of error does not necessarily mean that, you know, it's actually bad. Sometimes having a little bit of error can actually help the model not get stuck in certain things, right? And so you do need a large amount of data. And to that point, I would say, think about the number of simulations that we would all run, right? Just globally, just think about the number of G gem5 simulations alone that you and i probably run forget even what's happening in the companies just to jump by simulations that are being

Starting point is 00:18:10 run academically and even within my lab right now probably right what do we do with all that data we basically get the paper out i guarantee you the students probably got it in some directory he or she will forget about it you know once the paper, right, and then we just kind of like, you know, at some point, just kind of, you know, archive or erase it, we don't really use it, and I think like that's a wasted opportunity, especially for a domain that is quite specialized, right, there's a lot of domain knowledge you need to have to be able to kind of, you know, understand how to work with things, just,. Just, I'll tell you a little bit about a project that we're actually doing centered around data in Architecture 2.0 later on.

Starting point is 00:18:51 But one of the very basic questions just yesterday, one of my students, Shwetan asked the models was, is data movement generally more costly than compute? Now you can by and large ask any starting a phd in architecture and this is probably one of the first things we try to teach them and guess what mr l claude and chat gpt come back with right they say no data movement is actually not costly now of course they once you start asking them they'll rationalize this and kind of come up with you know all kinds of excuses about like, it really depends on what data you're talking about.

Starting point is 00:19:28 Oh, it depends on what compute. Yeah. But if you ask a vanilla student, you know, it kind of comes down to this point about. You know, a very basic question. Right. And these models are not able to get it. And so but a lot of that domain knowledge is kind of inherently captured in a lot of the data that we're inherently throwing away today i think that's a lost opportunity for us and so this is where i think a very simple thing for the quantity side of the world would be what if we could just create a plug-in into gem5 or any other open many other excellent simulators that are out there right um like within gpg you send all kinds of different things even for accelerators even for time loop and all these you know modeling based systems what if we could kind of inherently create you know a platform agnostic back and we're able to suck this data into some cloud service provider where you know it's open for the

Starting point is 00:20:12 architecture community to be able to happen that gives us a wealth of data on which we can start training at least open source models in order to do basic tasks like prediction and optimization right and just be able to do it really well so that's purely on the on the quantity side it's going to run over to the quality side of course now as you kind of make the data sets noisy and you know as you start injecting as you start implicitly injecting errors and then you got to worry about the quality of the data i think for regulating the quality of data, one of the key things you will ultimately need is some human in the loop element, right?

Starting point is 00:20:49 And I think as a community, we have to start thinking about training our next generation of PhD students and engineers and so forth to help us kind of get that higher quality so that we can end up with something that's like a reasonable, you know, a reasonable data set with low error, right? And this means labeling the data sets and so forth right and i think that that

Starting point is 00:21:09 gets into you know what kind of data set if it's a if it's a ginormous gem5 simulation log yeah that's very very hard to kind of you know really sort of um you know streamline right because what are you going to label on that thing it's very hard to label the best you can do is have some sort of metadata about the conditions of the experiment and so forth right however we can still curate high quality data sets about basic information and questions such as like you know is data movement more costly than compute that's a qa right if you kind of go back and you look at nlp models by and large you, they've worked on these kinds of QA data sets that we have, right? Question answering pair data sets, which, you know, test the model's ability to understand the domain. So if we can create those kinds of data sets, which I think

Starting point is 00:21:54 students would be able to help, and so will the community, then I think we can start sort of, you know, bootstrapping this quality-oriented data sets and start creating benchmarks, which is a whole other area outside of the data itself. Yeah, I have some follow up questions here. So I guess what I'm kind of picturing based on what you're saying, because I use as one of the early developers of Gem5 way back in the day. And one of the things that we had tried to do was, you know, have a very rigorous set of statistics about all of the major structures and they just get all spit out at the end. And then there would be maybe a little bit of labeling about what happened in this particular simulation so that you could distinguish what happened

Starting point is 00:22:36 between this run versus that run or what have you. And so I guess in your mind, are you imagining something like this where you essentially spit out a bunch of data saying like, okay, if the ROB size is eight, but has some ridiculous number, and the L2 cache size is four gigabytes, which is also a ridiculous number, then, you know, then it can essentially glean out some correlative stuff where when you have maybe a more reasonable number for both of those or you isolate like what is cause and what is effect or what is at least correlated when things are happening. Is that what you're talking about? Or do you necessarily need some sort of label to say like, hey, I believe this run is testing new structure A because all the run to run, some of them may have a new structure that wasn't there before at all that introduces new relationships,

Starting point is 00:23:29 or some of them might have bugs, which we would have tons of bugs where like, oh, these results don't make any sense. Like somebody had to look at it and say like, this doesn't make any sense. I guess what I'm imagining here with respect to the generation piece, you know, there is a structured generation of like, what are we spitting out? You know,

Starting point is 00:23:48 what are the, what are the pieces of data? What are the structures? And then there's the, the sort of description, or I guess, label of it. Like if you invent a new widget, how does that then get incorporated? Yeah. I mean, at the end of the day, we're, we're always having these unit tests in some capacity right so so in the case where we end up with something new i mean we will still have to continue doing what we are doing today right we're just kind of writing these custom unit tests and making sure that we're actually right about them but on a macro scale what i would say is that if you're looking at it from the holistic system then i would very much do whatever we are already doing

Starting point is 00:24:23 in many of the big ai systems right so you know, you know, when you're hitting, let's say if you're hitting something like Dolly, for instance, and you're generating an image, given a prompt, you have to generate an image. Well, the prompt doesn't directly go straight to the model, right? It doesn't go straight into the backend. You've got a whole bunch of infrastructure that's actually sitting in the front end that's actually guarding the prompt and making sure that the prompt is intentionally good and it's well well-meaning and so forth does not mean that the back end is completely you know going to be safe right because you can you can generate pretty harmful images today with just straight-of-the-art models right so you still have to kind of you know have some checks and you know

Starting point is 00:25:00 guardrails in place which is what you know the front-end classifiers are typically designed to do so in a very similar way you would trainend classifiers are typically designed to do. So in a very similar way, you would train simple classifiers, I would think, that are able to spot anomalies that are happening inside the system. And so you could effectively use that to kind of have some mechanism of a feedback signal that comes back to the architects who are designing the system. So I'd still go back again with the human in the loop being the most critical element of this all. In addition to the technical challenges that you discussed, it looks like there is a big part of the community contributions that's required in order to bootstrap this entire ecosystem. You've been involved in multiple both open source and

Starting point is 00:25:41 community efforts over the years, including MLPerf as a founding member. So can you talk about the importance of such open source contributions along with industry plus academic collaborations in advancing this particular field? And also, how do you think about bootstrapping this particular ecosystem for Architecture 2.0? Yeah, that's great.

Starting point is 00:26:03 I'm glad you're asking about that. Yes, it is true that I'm a super big proponent of doing community-driven efforts. And in all honesty, the kudos and the credit really goes to things that I've learned when I was a student looking back on what the community was doing. I mean, the community built the GemFi simulator.

Starting point is 00:26:18 The community also helped contribute to GPU-CM with TorchStart. And as a community, every once in a while, we kind of reach a point where we really need to come together to create something that will unlock the next generation of ideas and research that can come out, right. And so that's where I really draw a lot of the inspiration from is kind of looking at how we have done these big mega projects that are now like you know sort of the backbone, right. So from that sense like when you talk about

Starting point is 00:26:44 Architecture 2.0 or kind of building this data set, one of the things that is that we're actually gonna, you know, talk about it at one of the workshops is, we've been, we basically, you know, have created a massive corpus of last 50 years of architecture research. And we haven't talked about this, we will be talking about it, but it's coming.

Starting point is 00:27:02 And what we have actually done is we've started creating a data pipeline where, you know where we have data annotators, basically undergraduate and graduate students effectively labeling certain types of questions because they need domain expertise, such as the one that I was talking about, which is data movement related thing. And we've started creating that data set.

Starting point is 00:27:20 And we originally started with the ISCA 50 retrospectives where we collected all the retrospectives, which is a very small sample that was put together by Jose Martinez and Lizzy John from last year. And we did a QA data set around that. And we took that data set and we fine-tuned some of the open source models, which are actually performing bad in architecture. We immediately saw a spike in the ability, which was a clear signal that even with a little bit of a curated corpus you can actually improve their domain knowledge about architecture and now we've effectively expanded that to be the last 50 years worth of you know architectural research architecture again i said it's kind of encompassing both traditional architecture as well as eda kind of flows so papers in that sort

Starting point is 00:28:03 of corpus and we've created a pipeline which allows us to kind of, you know, start labeling as a community. And this is what we're actually hoping to, you know, announce pretty soon at one of the ISCA workshops and then write a subsequent blog to engage the community. And I think this is where, for instance, you know, for people who do believe, okay, AIML can be a useful tool in our toolkit.

Starting point is 00:28:26 It'll be a wonderful opportunity to contribute to help shape it. In my vision, it's like first we start labeling the datasets. We start labeling, then we start fine-tuning models. And I would love for us as a community to have a collection of open-source models that are actually, you know, domain-specific to us. And then you can start, you can start trying to improve their knowledge across various ways. And this is where benchmarks become critical. Because benchmarks in my head are a way to kind of bring the community together because

Starting point is 00:28:56 everybody has to agree on what are the interesting tasks that we actually want to solve first. And then you can start creating a roadmap that allows not only apples to apples comparisons with benchmarks but benchmarks are also sort of the north star like you know for instance when we wanted to go to the moon we weren't necessarily talking about oh this is the specific navigation system i have in a fall or this is the thruster that i have no you actually focus on getting to the moon and then you work backwards and say okay what are all the elements that I need to have in order to be able to get to the moon? Then you say, okay, well, I need to have a certain amount of, you know, the thrust capability. I need

Starting point is 00:29:32 to have a certain type of navigation capability, right? And you kind of identify all the pieces and you kind of build a matrix that says, okay, if I can check all these, it tells me that I might be able to get to the moon. And so in my head, that requires a community effort because you have to build that complex matrix for one, right? Which is identifying what are all the tasks that we would need to be able to do incrementally one by one so that we can say, okay, someday perhaps we can ask a large language model to say,

Starting point is 00:29:59 okay, act like an architect, you know, give me a RISC-V core that's got this sort of, you know, ISA support and, you know, it's able to really optimize these particular workloads. I'm not saying that one LLM is magically going to do it all. I'm saying it might actually invoke other agents or existing traditional non-AI ML tools to actually get the job done, right? So there's the two parts. One is kind of like getting all the data that needs to actually be put in place, which is kind of what, you know, I think that's a massive community effort. There's no way you can do this just by dumping some data set outside and say,

Starting point is 00:30:32 oh, OK, everybody adopt this. I don't think that's going to work. We all need to chip in much like the way I'm just picking Gem 5 as opposed to Child Year. Much like Gem 5 is a community project. We all chip in. When my students find a bug, I say, don't complain about it. Just go fix it. It's incredible that you actually have a massive simulator that someone wrote for you. So just go fix the bug instead of complaining about it.

Starting point is 00:30:54 So if we all kind of contribute in that way, I truly believe that we'll be able to kind of build a new set of tools that will help us with hardware design. And I think that's where the community aspect kind of comes in with respect to Architecture 2.0. Yeah, I mean, I think that sounds very interesting. I mean, the way that our community always has worked, it seems, is that there's some sort of thing happening. There's some turn, there's some change, and then there's a lot of discussion, and then eventually there's a congealing around some sort of pillar of how we're going to do things as a community. You know, so we eventually congeal around a benchmark suite, or we eventually congeal around a simulator. You know, there's a few, but there's not, everybody doesn't come and roll their own, right? Because we sort of realize that collectively, it's better if we all collaborate on a few. So I think what you're saying, it makes sense.

Starting point is 00:31:54 It seems like a tall endeavor too, but it always is in the early stages. So for some of this, I mean, I'm imagining this big world of possibility, right? Where let's say one of the parameters that's on the table is, say,elloed around a pers a set of instructions more or less right we might quibble about whether you need f mole or not or whatever depending on the situation but we more or less they they look very similar you know barring risk v-sys and i guess what i wonder is if we wanted to say explore something different, it seems like what would be necessary is for someone to say come up with a new instruction and then come up with a new compiler that uses that instruction adequately to come up with the instruction stream that then can then be fed into a number of sample machines to be able to produce the data that would be rich enough for an ai to be able to reason about it right because you know if you just say um add one new instruction and then compile one program and put it through one run of Gem 5, there's no way for an AI to be able to reason about what it was that might change if you put that in the corpus of everything.

Starting point is 00:33:35 So it feels, I guess I'm just thinking through how this would work, and it feels like then the kind of work that we do now, which is like, Hey, what would happen if we had this new instruction, then you'd have to do all this work. And you have a sort of a hypothesis in mind and you set up your, uh, your experiments to be able to figure it out now, sort of the hypothesis maybe feels even more vague or more like, what would happen? Like, would this instruction be a good idea? And you do all this work and then let the AI say, yes, it would be a good idea under these circumstances for these types of instructions, but you still have to run all the simulations. So I guess, I guess I'm just thinking about that process where, as a student, if you have a hypothesis, you sort of have to come up with your experiment set.

Starting point is 00:34:33 And now what you're trying to do is come up with an experiment set that is wide and varied enough to produce enough data so that an AI can draw a conclusion. Is that sort of how you picture it? Yeah, I think like this, there's an aspect of, yes, we might have to build all the tools and so forth to kind of get to that, you know, evaluating that hypothesis. Now, I think like, I'd like to think, as you're kind of mentioning that I was translating this into a visual in my head where I'm like, okay, if I want to, I'm sitting in a room and I'm trying to think, okay, what's the next set of optimizations to perform, right? In my head, I would assume that given all the simulation data that's kind of setting in, for instance, right, from, you know, whatever simulators, you know, pick your favorite company

Starting point is 00:35:16 and all the tools that are internal, I would assume that I should be able to ask, like, what are the common bottlenecks that I'm actually seeing and what aspects should I really focus on optimizing? And as an architect in my head, the architect of maybe like 2030 or 2040 would be like kind of interacting with an AI agent that's kind of asking probing questions. The AI is really kind of just looking through the minds of data and making connections that you and I normally would not make. And I don't think it's going to necessarily, we don't necessarily have to push it to the point

Starting point is 00:35:46 where, okay, just give me the chip, but it's more of an interactive feedback loop, right? That allows your architects to very intelligently brainstorm things because often architects are kind of doing this today anyway, right? Chief architects kind of sit around and like talking to all that, you know, IP modules that are getting integrated into the SOC. And I would think that, you think that that feedback loop is very slow today. And I would assume that in the future, the feedback loop is going to be extremely fast,

Starting point is 00:36:13 because the AI agent is effectively synthesizing all this data and comes prepped for the meeting, much like any other person. And you can just ask the AI agent, what would it likely be if I had know, I had this sort of, you know, configuration, right? Which would be the notion of taking the prediction data, looking at optimizations, right, that have been performed in the past, and then potentially kind of making, you know, some sort of generative sort of an idea of like, okay, this is how I would

Starting point is 00:36:39 retweak your design. And so I agree with you, it's inherently nebulous, and I don't have all the answers around this. But my hope is not so much that you and I honestly figure it out, but my hope is that we get the next generation to fire up, because they're likely going to think about these things in a very unorthodox manner that you and I probably don't think about, because we're very much stuck in a certain box, given the rules and things that we have, we ourselves broke, you know, in order to be who we are. That's right. They're going to be AI native, unlike us.

Starting point is 00:37:09 Right. Yeah, that's a fascinating discussion. And also you've painted an exciting vision for the possibilities in the future. So I'm hoping a bunch of our listeners are geared up towards this particular challenge. Switching gears a little bit to another thrust in your research, you've worked on enabling ML in resource-constrained devices, like edge devices, mobile devices, and so on.

Starting point is 00:37:34 I think you've christened it TinyML. Can you tell us a little bit about the unique challenges in designing both efficient algorithms and hardware for TinyML applications? And how would you sort of compare and contrast it against, you know, large-scale machine learning deployments? Why is it exciting? What is different about it?

Starting point is 00:37:51 What are some unique challenges in that particular space? Yeah, so TinyML is effectively, you know, really talking about embedded machine learning. And for folks who like typically when we talk about on-device machine learning, you know, most people traditionally in the industry will say that, okay, that's kind of more talking about mobile devices, right? Our smartphones are effectively the on-device element. TinyML is really not about that. It's really about pushing ML onto, you know, hundreds of kilobytes of memory capacity, right? And so you're really talking about, you know, milliwatt level power consumption always on ml specifically in iot kind of devices

Starting point is 00:38:25 or even smartphones it's always on you know some element is constantly listening because it has to in order to detect a keyword like when you say hey siri it's not like the system wakes up submodule wakes up right so certain aspects always have to be on and the question is can i fit in neural networks into a few hundred kilobytes or you you know, one or two megabytes of flash storage that I actually have. And so that's what TinyML really is about. And it's a vastly different ecosystem from the rest of the big ML stuff that's happening. And I would say that it's quite fascinating because it's the perfect melding of hardware, software, and ML. It's that blending of all three that I think is truly what TinyML sort of,

Starting point is 00:39:09 you know, is all about. And that kind of opens up the space to many interesting challenges. I mean, this whole ecosystem kind of started about five years ago, maybe five to six years ago, I would say. And, you know, back then it was just an idea and there's a tiny

Starting point is 00:39:25 amount of foundation that got formed around this where we're all kind of just thinking about what would it be if we could enable speech recognition on a coin cell battery operated device i mean that's a pretty damn far shot that's quite the moonshot if you kind of think about this is still five six years ago right and so that's where and this was also during when i was at my sabbatical at google where there was a skunkworks project on hey can we actually do this can we adapt tools like tensorflow to be able to run on microcontrollers much the same way they were adapted from running on big servers and workstations onto mobile devices things needed to be stripped down and so forth and of course you know today if you kind of look at it

Starting point is 00:40:04 there's an entire world of you know tiny models all over the place lots of you know optimizations that are specialized for these embedded devices and this is a space that i think is fascinating both for research and education i'd say in research it's especially fascinating because talk about co-design this is one place people could really use co-design because they're highly bespoke applications you know typically when we talk about co-design i often often kind of you know skirt a bit because i'm kind of worried about like co-design looks you know complicated on paper it's lots of innovation technical innovation but from a practical standpoint i'm always like how is anyone going to take any of this and make sense in a company?

Starting point is 00:40:46 I'm not saying that they have to do it today, but even like seven years from now, where you're asking them to rip apart the algorithms, you're asking them to rip apart the runtime, you're asking them to rip apart the architecture slash microarchitecture. I'm like, it's an intellectual exercise. That is awesome. But there's an aspect of it which just seems completely imbalanced right because often when you're building systems in large scale you need them to kind of be general purpose to an extent okay and you know when you're talking about tiny ml is very different because it's highly bespoke your ring doorbell does nothing but pretty much just image classification it does not have to listen to you like there's lots of very simple things that it can do. Alexa does not need to see you necessarily. It's mostly just trying to listen

Starting point is 00:41:30 to the sounds. Now, in the future, I think speech is going to become so common. I think this notion of touch, I bet my daughter is going to be like, what? Why do you use or touch the screens? That's so yucky. That's probably what she's going to say when she's, you know, a couple of years older because she's just going to probably be talking to every single thing. It's going to be like, hey, widget, toast my bread for two minutes. Bake for 325, right? And that's a limited vocabulary space where you don't necessarily need big

Starting point is 00:41:59 models. You can really get away with highly bespoke models that are highly specialized. And, of course, if you can do that, that's pretty incredible. I mean, just think about the world where today you think about AI, you still physically have to interact with it in the real world, right? Like you kind of, you know, you're interacting with some sort of entity. That's pretty clumsy and clunky if you kind of think about it,

Starting point is 00:42:20 because you are working around, you know, the machine and reality. We are having to adapt to what the machine is. If it's truly embedded, you don't notice it. And that's the beauty of Mark Weisner's vision about like Ubiquitous computing way back at Xerox PARC. They have this idea that you're going to have intelligence spread across everywhere. And I think that's where we're getting to, which is ultra low power consumption, specialized intelligence for specific things that the devices need, and then being able to seamlessly interact with us. And that's essentially why I'm super excited about the TinyML ecosystem. Right.

Starting point is 00:42:56 I think we have a long way to go to get to that ambient computing where everything just disappears into the background. Truly magical technology is something that you don't even notice exists. You briefly touched upon how TinyML also has a space in education, because for a lot of the other models, especially the largest models, you need industrial scale machinery to be able to interact and iterate with it

Starting point is 00:43:17 at multiple scales. Can you expand a little bit on how you think this is going to be useful in education? Very broadly, I've thought a lot about how do you teach students about computer architecture, especially in the current times, given that the space is evolving so rapidly. You tell our listeners, how do you think about teaching computer architecture? How do you think ecosystems like TinyML or the associated tooling and infrastructure would be helpful and beneficial

Starting point is 00:43:45 in teaching students about the different concepts in this particular space? Yeah, I'm going to talk a little bit more broadly than architecture, because I think architecture's scope is also expanding, especially as we look in this domain of ML. I think an architect who wants to play in space, for better or worse, needs to understand the ML ecosystem, the ML systems ecosystem.

Starting point is 00:44:02 So I'm definitely very passionate about education in this space because I think it like breeds new life into traditional embedded systems that have been thought forever in universities worldwide, right? I still remember the first time I ever, like, you know, was when I was a professor at UT Austin, you know, I went into the classroom, you know, I was about to start teaching embedded systems and I was like, before leaving my room, my home, I remember I grabbed the garage door opener because I was like, before leaving my room, my home, I remember I grabbed the garage door opener because I was like, oh, this is a very basic embedded system.

Starting point is 00:44:28 It's amazing. It's like everywhere. It's like, you know, I'm going to use this to inspire students. And I still remember, you know, holding it up and I'm like asking the kids and they were all like, oh, it's a garage door opener. And I was like, this is an amazing piece of technology because it's got all this stuff. And as I was getting excited, I saw their faces going really dull. And I was very perplexed by that as to what was going on. And one of the kids at the front said, I really don't want to do engineering so I can build garage door openers for the rest of my life. I was like, damn. Kids got a point. Two weeks later, I got my Google Glass and I took that in.

Starting point is 00:45:03 And suddenly everybody wanted to kind of work on like embedded systems and stuff. They're like, oh yeah, this is super cool. And I want to do this stuff. I think like for education, I realized that it's really about kind of making it relevant with the times where we are. And so like obviously today when we look at AI systems and so forth, there's a lot of excitement. We've built lots of incredible hardware for this, but it's often very inaccessible, right?

Starting point is 00:45:26 Think about how often can we actually, you know, go come up with a design that we can actually take, you know, all the way through the lifecycle of the chip, right, from concept to doesn't quite happen. But TinyML kind of opens it up to a very exciting space, right? For one, there's a lot of open source ecosystem tools that are kind of coming up. And because the designs are highly bespoke, you can actually do a lot of specialization. And because they're also small designs, you can actually completely go from your concept to kind of getting the tape off this, or whether you're doing it on FPGA.

Starting point is 00:45:55 It's so much more practical. And I think the timing is sort of very interesting for education because you have these open source tools that are just mature enough to be able to pull this off, and then you've got an interesting educational area which is around ai everybody wants to do ai but then often it's like some big model some big data set that you know i can go around asking how many people have actually built a data set and i can guarantee you often whenever i'm asking this question probably you know one or two people will raise their hands out of 30 or 50 people right very few people touch that.

Starting point is 00:46:25 But when it comes to this sort of embedded ecosystem space, the data set is highly bespoke. So you can actually get them to go all the way from understanding how you collect the data, how you pre-process the data. Imagine if I have to just kind of wake up a machine that says, you know, when I say, OK, OK, Vijay, that's what i want like well you go you can go easily collect all of that data and you can build the pipeline out and actually train the model do your optimizations and particularly deploy it and actually get it to close the loop right and these these widgets today are literally five bucks a pop i mean the ones that i have like you know folks can't see this but like you know we actually kind of build these things right and they're really really really cheap. And of course, you know, Arduino, Seed, all these folks have started putting out, you know, putting out these, you know, MCUs. And these MCUs, these microcontrollers are completely capable of running, you know, these models. So students have an incredible opportunity to deeply understand whichever layer of the stack that they're quite

Starting point is 00:47:19 interested in, right? If you kind of look at Songhan's papers, who has also been doing some pretty amazing work in tiny ml for instance you know they've been able to build an entire runtime engine that sort of you know optimizes it how often would you ever go out and build a big tensorflow like engine to show some incredible capability that you can unlock you can't do that on a big system however in their case they were able to kind of build a custom runtime model right so that kind of opens it up really and of course there's lots of hardware solutions you know um that's like preaching to the quarter on like you know what it means to build hardware so i'm not going to dabble into that but then that's a really exciting space

Starting point is 00:47:53 right the one thing though that's certainly missing in this ecosystem is sort of like they're not enough educational resources around this right This is one of the reasons I think folks know about this, but I started putting together my own class notes, and in fact, Suvanya actually knows about this, where I started writing a machine learning systems book that talks about what it means. Originally, it was a tiny ML specific book, but as I started writing my notes in that, I started realizing it doesn't matter if it's tiny ML or big ML. Fundamentals are fundamentals, right? When you do operating systems, yes, they're distributed operating systems and all kinds of crazy stuff when you go to RTOSs and so forth, but you still have to learn one-on-one operating system. So when it comes to ML systems

Starting point is 00:48:37 and architecture, it doesn't matter if you're building a big ML or a small ML. The fundamentals are still the same around all the nuances you need to understand about what happens in an ml pipeline from the point when data comes into the point the data goes out right and so i ended up uh creating this you know mls book that i you know it's an open source project um where people have actually been contributing back so this goes back to my whole passion about community involvement and so forth in fact just this morning i actually kind of um was working on getting the release out um because i've been spending an insane amount of time i feel like until i get the release out i can't actually rest because there's always something more to do when it comes to these educational things

Starting point is 00:49:17 so question about the open source notes that you're you're talking about that sounds really really interesting i'm just wondering is the is the model sort of Wikipedia model where everybody can just put their stuff in, or is it a Linux model where you need a pull request and Linus need to say, okay. It's definitely a pull request model. So yeah, so someone has to be, you know, involved in curating it. Of course, there are a couple of people that are, you know, you know, I certainly have been talking to multiple faculty members. And as much as I kind of do the initial drafts and my students,

Starting point is 00:49:49 you know, my research lab is very active. Every time I teach it, my students kind of contribute, oh, these are interesting seminal references because the field is moving very fast, right? So the question always is like, how do you sort of keep up with it? And that's the whole reason for making it

Starting point is 00:50:01 an open source sort of a project where people can issue pull requests and kind of keep it updated. That said, though, I was still struggling with it. And that's the whole reason for making it an open source sort of a project where people can issue pull requests and kind of keep it updated that said though i was still struggling with it and that's when i kind of reached out to dave pattison to ask for a bit of advice on like you know when they wrote the book you know they wrote the computer organization book back when things were evolving rapidly back then right like today we kind of look at it as the holy bible but when they were writing it there were heated debates going on about what's the right thing to do what's not the right thing to do and so forth and i think the

Starting point is 00:50:27 advice that he gave me is kind of what i follow which is if a company has started putting it into practice that could be a nice litmus test for whether this concept should be in an educational resource because it means that there's community wisdom that yes this makes sense the nuances of course will be different but that's sort of like a way of kind of proofing it against the rapid change that's actually happening in the ecosystem and that's actually worked out quite well i think that's a wonderful initiative and also a great resource not just for students but also practitioners in this field because even once you get into industry or you're working in a particular space, because the space is evolving pretty rapidly, it's hard to keep track of all the different developments, number

Starting point is 00:51:13 one, but also, as you mentioned, someone curating it and saying, these are the essential ideas that you actually need to pay attention to. I think that signal is quite useful. And the process of doing this and curating it is incredibly valuable to the entire field. So I highly encourage the listeners to doing this and curating it is incredibly valuable to the entire field. So I highly encourage the listeners to also go and check out the book. Maybe this is a good time to wind the clocks back

Starting point is 00:51:30 a little bit. You're clearly very passionate about teaching. Maybe you can tell our audience, how did you get interested in computer architecture? What is your journey like as you got to Harvard, where you are currently? Yeah, I'd say that it sounds a bit tacky, but in all honesty, I think I got interested in computer architecture because when I was reading Dave Patterson Hennessy's book, I remember, I mean, I kid you not, this sounds really weird,

Starting point is 00:51:55 but I read it like it was a storybook or like it was a novel because it was accessible. I mean, I just kind of picked it up right now. I was like, who's going to read this massive book massive book i still remember when they gave it to me it was like this big fat book and i'd gone and picked it up at the national university of singapore because that's where i started my undergrad picked it up and i had a cd i was like what like who knows all this stuff and what i'm gonna have to memorize all this stuff and so anyway i still remember like kind of just sitting down and like reading through it and i found it fascinating that it was so accessible to learn something that seemed so complicated that you would normally think that oh I have to go to class and that's really kind of pick it up and that to this day you know kind of left an impression on

Starting point is 00:52:35 me it's like oh it's like when you have a good educational resource where you can learn you might not be able to master it certainly you need mentors to help you master it but if you have a good educational resource then that can really kind of you know spur you and it's and also think more than just that material i think it's also the community aspect um i think some folks in our community are very approachable and accessible i think like just looking at them as sort of mentors and being like oh maybe someday you know i can be. For me, I honestly feel that that has a bigger impact on you than the actual technical material. And that's honestly how I ended up becoming a professor.

Starting point is 00:53:11 I never thought I was going to be a professor, to be honest. I was so inspired by my own mentors. I was like, wow, these people are so incredibly smart, yet they're so humble and so nice and so forth. And they were so invested in me, even though I don't even know the ABCs of stuff. Right. And that I think is, you know, for me over time, it's kind of translated into,

Starting point is 00:53:33 it's we're all technical people. We're all smart people and so forth. But at the end of the day, we're humans first. Right. And it's, it's all about relationships and just being nice and taking care of one another, I think is far more important than all the nitty gritty. One of the best pieces of advice that I was given, which I take to heart from one of my colleagues, Gustavo at UT Austin back when I was there was, if you can't have a cup of coffee with your colleague and just kind of hang out, forget writing a $20

Starting point is 00:54:00 million proposal or whatever it is, it will never work. If you can't just hang out with a person, like the way I'm hanging out with Yusuf and A.R. Lisa, yeah, there's no way you're gonna have fun doing whatever it is, right? And so I really feel like as much as we wanna invest in technical things and always debate things very technically, I think it's very important to remember that we're all just trying to learn from one another

Starting point is 00:54:23 as researchers, and we're always a learner first in our community so i think that's kind of really what it's kind of inspired me so i know it's not the classic i did this not in that and that i think for me it's really just incredible mentors and people that i've seen that was awesome i have never heard anybody say that they read patterson and hennessey like a novel like that is like incredible i think one of the great things about doing this podcast i'm sure you agree souvenir is like when we ask this question like we usually lead with what gets you up and we usually end with you know how did you become a computer architect and the the ways that people

Starting point is 00:54:59 became computer architects are very varied i mean they're they they run the gamut of different ways but yours is quite singular i'm quite amazed i mean i i enjoyed the book as well and i remember reading it and thinking like oh wow there are several chapters like this is the first chapter textbook i've ever read where i didn't have to like really reread it like you read it and it's like gets in there basically line speed I was like wow and so so I had similar feelings although I don't think I blitzed through it the way you did it doesn't I didn't consume it like candy but it was fascinating though because I actually you know we run a rising stars program at ML

Starting point is 00:55:43 Commons to recognize outstanding junior students in ML and systems. And Dave, you know, graciously agreed to kind of, you know, talk to the students. And when I was introducing him, I kind of mentioned this because it left such a positive mark. And I could see because he said this, he was like, he was very happy to hear that because he said, a lot of people don't realize how much time and effort you know we put into the writing and trying to make sure it's actually accessible that it's not that it's actually really something that people can consume so and i think it shows the amount of effort that they must have actually put into just making it available to us right

Starting point is 00:56:20 sure yeah yeah for sure and i think that also kind of touches on how important it is. You know, our guests time and time again have talked about how important relation, how important it is to have good relationships and collaborations and communication being effective. That's always top of everybody's mind and how to to do as well as our guest list has done, yourself included. And so I hope that maybe this is one of those things where like repetitions will get into our audience's brains. You know, you want to do the technical stuff, but at the same time, you really have to learn how to work with people,

Starting point is 00:56:57 be able to communicate effectively, maintain relationships, and that's how you get bigger things done. Because we are long past the age of being able to do anything on your own that's sort of like a rich value to everybody, right? I think it's a new generation. Cliff Young actually recently was visiting Harvard,

Starting point is 00:57:15 and he made this really astute observation in the casual conversation where we're saying, you know, when we were building them up for benchmarks, like, you know, Cliff and Dave were one of the, you know, original pillars in there. And, you know, it worked because we were able to bring all the community together and kind of work collectively you know have a lot of grudging consensus in that and he said maybe it's just uh you know the reason perhaps like you know nowadays we have to do these bigger community kind of things is because the cohort of people who are actually doing things that have been deeply influenced with social media you know it's like if you kind of things is because the cohort of people who are actually doing things that have been deeply influenced with social media, you know, it's like, if you kind of think about the

Starting point is 00:57:47 generational changes that we've come through as individuals, right? And if you think about it, like, yeah, that era of people, the current era of people are people who are deeply influenced with social media, which is like, you know, it's a community kind of thing. Everything is kind of shared. Everything is discussed. Everything is debated. And, you know, we do it collectively, and we agree to disagree and so forth. And I think that was a very interesting observation that he made. It was like, oh, it's, we live in a different world and people think differently today. So maybe as we move forward, we should work on projects more holistically and more collectively rather than the way we used to do things in the back,

Starting point is 00:58:25 for one, you know, back in the day. For one, systems are much more complicated today, right? All of it, you also need bigger teams. And so I thought that was a very interesting observation that he made about how times have changed and how our culture has kind of evolved. And that possibly is changing the way we actually work together too.

Starting point is 00:58:44 Yeah, I look forward to the day where we can work together only through Instagram DMs. No, I don't. I really don't. Yeah. Well, Avita, I think this is a really, really interesting conversation. I think we ran a lot of different topics, you know, from Architecture 2.0 to MLPerf and MLCommons and teaching and TinyML. I feel very stimulated right now. And so thanks so much for joining us today. Yeah, thank you so much for having me. Super fun. Yeah, thank you so much. It was a fascinating Architecture Podcast. Till next time, it's goodbye from us.

Your Ad Here

Computer Architecture Podcast - Ep 17: Architecture 2.0 and AI for Computer Systems Design with Dr. Vijay Janapa Reddi, Harvard University

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.