Computer Architecture Podcast - Ep 7: Domain-specific Systems for AR/VR and Extended Reality with Dr. Sarita Adve, University of Illinois at Urbana-Champaign

Starting point is 00:00:00 Hi, and welcome to the Computer Architecture Podcast, a show that brings you closer to cutting-edge work in computer architecture and the remarkable people behind it. We are your hosts. I'm Suvaneh Subramanian. And I'm Lisa Xu. Today we have with us Professor Sarita Advay, who is the Richard T. Chang Professor of Computer Science at the University of Illinois at Urbana-Champaign. Her research interests span the systems stack, including hardware, programming

Starting point is 00:00:25 languages, operating systems, and applications. She co-developed the memory consistency models for the C++ and Java programming languages based on her early work on data race-free models. Her group recently released the Illinois Extended Reality Testbed, or ELIXIR, the first fully open-source extended reality system to democratize XR systems research and development. She has also made innovative contributions to heterogeneous computing, software-driven approaches to resiliency, and approximate computing. She's a fellow of the American Academy of Arts and Science, IEEE, ACM, and a recipient of the ACM Sigarch Morris Wilkes Award. As ACM Sigarch chair, she co-founded the CARES movement and is a winner of the CRA Distinguished

Starting point is 00:01:08 Service Award. Today, she's here to talk with us about her most recent work on domain-specific systems for AR, VR, and extended reality, and how it ties to her long line of work on scalable specialization and heterogeneous computing. A quick disclaimer that all views shared on the show are the opinions of individuals and do not reflect the views of the organizations they work for. Sarita, welcome to the podcast. We're so happy to have you here today. Thank you. Tell us, Sarita, what's getting you

Starting point is 00:01:45 up in the morning these days? So I think a lot of people at this moment of time are coming out of a lot of adversity. And I just wanted to take a moment to share my story a little bit. In May, both my parents in India caught COVID and I lost my mom. And I spent four months in India, helping my dad recover from this loss and helping him recover from COVID too. I returned back to the US just about a month ago. And this whole thing has taken a huge toll on me. But what wakes me up in the morning is the thought of calling my dad up. I do that every morning and every evening. And he is just so chirpy and,

Starting point is 00:02:30 and positive and his attitude just inspires me. What he's done is embraced my mother's trying to continue my mother's legacy, who was a bonsai artist. And he started teaching her classes, her bonsai classes, workshops, etc. I'm totally immersed himself in his work. And so I'm really grateful that I share his gift of, you know, having work that I love tremendously, and people that I love tremendously. Without either of those, I don't think I would be able to wake up in the morning at this time. Oh, Sarita, I'm so sorry to hear about your mom, but thank you so much for sharing that with us.

Starting point is 00:03:17 My condolences to you, but I'm so glad that your dad has the fortitude to keep going and that you also have the fortitude to keep going and that you have such a wonderful supporting cast around you to help you through this. And I'm happy to help you talk today about the work that you love to do. Yeah, yeah, yeah. I, you know, when it comes to work, I, these days, I almost feel like a child in a candy store. There's just so much interesting stuff going on in our in our area. I mean, what an amazing time to be doing research in architecture and systems, more broadly speaking. I think my biggest problem is just not enough hours in the day to do everything that I want to do. So, you know, I mean, who can be luckier than that, right?

Starting point is 00:03:57 So let me tell you a little bit about what I'm doing these days, which is very different, actually, from what I've done before. I can start from, you know, the beginnings of the Elixir project, which goes back about five or six years ago when, you know, it was clear that heterogeneity was here, specialization was here, with the end of Moore's law, they're not scaling, etc. We needed to build accelerators, et cetera, et cetera. You know, not surprisingly, I focused my attention on memory systems for heterogeneous systems, how to build heterogeneous memory systems. And we did a fair bit of work on, you know, specialized memory systems, coherence for heterogeneity, memory models, et cetera, et cetera. But very soon I got really frustrated doing this work. We were getting papers accepted,

Starting point is 00:04:46 you know, lots of awards and this and that, but it was really frustrating because the applications and benchmarks that we were running on these systems were really toy things. And, you know, I mean, breadth first search and page rank and these small kernels that were not appropriate for the kind of technology we were building. So my interests have always been at the system level, looking at hardware and software together, looking at the big picture of how the whole thing needs to operate together. And thinking about accelerators, you know, my interest was in how do I build a system

Starting point is 00:05:19 that has many different accelerators working together? How do I build the software for the system? What are the programming models, the runtime, et cetera? And these toy applications just didn't cut it, right? They just don't lead to a work of the type that I wanted to do. And the other frustrating thing was that I was tired of doing simulation.

Starting point is 00:05:39 So again, for the type of things I wanted to do, I really needed to build real systems. So lucky for me, I was part of a research center. This is a long line of research centers that DARPA and SRC, which is the Semiconductor Research Consortium, they jointly have been funding these research centers. And the charter for these centers is to look about 10 years out. And there's a broad goal of what we want to achieve, but there's a lot of flexibility in how we use the funding. So there was a call for seed proposals to come up with ideas for the next round of these centers. And I happened to put in one.

Starting point is 00:06:19 And the way that happened was, you know, literally thinking, okay, we really need to find some applications, you know, what applications. And it we really need to find some applications, you know, what applications. And it so happened that Steve LaValle, who was the founding chief scientist of Oculus, he was my colleague. He just returned back from his very successful stint at Oculus. And, you know, AR, VR was was was sort of in the air. And it was clear to me this was really going to be very important. And so I had this student, Muhammad Huzaifa, who was looking for another topic. And he and I walked up to Steve's office and said, hey, you know, what are the hardware problems in this area? And Steve is just this amazing, you know, super positive, positive encouraging inspiring person uh and he just started talking

Starting point is 00:07:06 to us in his office for a long time uh i didn't know anything about arvr at that time and most of what he said went over my head but uh you know it was his uh his uh enthusiasm is so contagious and and i just came out thinking you know this is super important i know nothing about it but we're going to do something in this area right and? And so we put in a proposal to come up with benchmarks for AR, VR. And so that's how this whole thing started. We literally knew nothing about extended reality. So by extended reality, I include augmented virtual and mixed reality or XR. So we knew nothing about XR, you know, didn't know how we were going to do this, but somehow this was really important, needed to be done. And I didn't want to do my research

Starting point is 00:07:49 with toy applications anymore. So that's how this whole thing started. You know, I thought, okay, benchmarks, right? How difficult is it going to be? Talk to a few people, find some, you know, benchmarks, slap the suite together and do research. Well, it turned out to be a huge focus of my life. The last, whatever, three, you know, in the beginning, it was slow, but then the last two, three years or so, it's been just the dominant thing. And it's just been so much fun, but so hard. It's probably the hardest thing I've done, mainly because this is a real system, right? This is a real, I call it, we call it an application, but it's not an application. It's a full system. And so here I want to point out,

Starting point is 00:08:32 you know, we talk about domain specific accelerators and domain specific architectures, but what's truly exciting to me about this new age we are in is that we have to worry about domain specific systems, right? And it's not just AR, VR, but it's robotics, it's autonomous vehicles, right? You look around you and we're talking about real specialized systems. That's a whole new way of doing research, right? A whole new, a plethora of research problems that shows up. And these are all the systems that I talked about are all on the edge, which means huge resource constraints, at the same time, huge performance requirements, right? And the other thing is that it's the end to end user experience that matters. So with AR, VR, it's, you know, I mean, is the user going to

Starting point is 00:09:15 puke, right? I mean, that's sort of your quality of experience. And we don't have metrics, we really don't have the metrics for measuring the goodness of these systems are just, you know, still a topic of research. So first thing, XR is this great playground for us architects, because there are there's just an orders of magnitude difference between what we can do today in the in the systems that we have today and what we desire in the future. So orders of magnitude in performance, power and quality of experience. The second thing is that XR touches so many domains. There's graphics, there's computer vision, robotics, haptics, optics, and what what have you and so to do research in

Starting point is 00:10:07 this area you have to be um at least the initial people have to be experts or or have to gain expertise or or have people involved who work in all of these areas which is super challenging but super fun um the third thing is because of this large gap, right? And because we're thinking about a whole system, this is truly a co-design cross layer optimization problem. You cannot solve this problem by just working in the hardware or working in the compiler programming stack or the runtime or the application. It really has to all come together,

Starting point is 00:10:40 which is the kind of work I love to do. And it's just been incredible. And then the third thing is the kind of work I love to do. And it's just, it's just been incredible. And then the third thing is the metrics, as I said, it's the end-to-end user metrics, and you can't do research where end-to-end user metrics are so important. And, you know, you're just looking at one algorithm. Yeah, that's important. You want to be looking at one algorithm, but then how does that algorithm fit into this entire system is what you have to ask right and the biggest problem in this area for us when we started was that um everything was closed meaning uh the companies were you know in there were no open source anything right there was we didn't even know how an overall xr workflow

Starting point is 00:11:21 was supposed to work and you know what were what were, what were the smarts in there, et cetera, et cetera. And so it was, you know, slowly, slowly talking to people, working with companies, working with academics who were working in their own silos. We finally realized that, Hey, what we really needed was to build this open source XR system so that we could democratize research in this area, reduce the barrier to entry. And now we have that. And we hope people will use Elixir and work in this super exciting field.

Starting point is 00:11:52 Thank you so much for that, you know, really both great origin story as well as an articulation of all the why this particular domain is like really exciting and challenging from a systems researcher's perspective. You know, there are several themes that you touched upon over there, how this is a very challenging domain in terms of the performance, power and other requirements, how it's very diverse. You have a wide range of different kinds of applications that you need to sort of accelerate, expand the entire stack. There are a lot of trade-offs to sort of deal with. Maybe we can expand on some of these themes one at a time. There are a lot of different topics here that we can sort of double click on. I'll just start with the first one. As you said, this is a very diverse

Starting point is 00:12:28 space. So AR, VR, XR, you have audio, you have video, you have compute, graphics, you have many other different things. And we have sort of touched upon domain-specific accelerators for things like machine learning. But when you have this really, really diverse set of applications, how do you sort of think about accelerating this particular domain or this particular system? Like do you build accelerators for each individual component? Do you sort of change them together? Like how do you even sort of grapple with this entire complexity? So yeah, you can do the bottom up research, which is, you know, look at individual components,

Starting point is 00:13:00 figure out how to accelerate them, et cetera, et cetera. But then, and maybe that's the right way to go, right? But our view is that you really need to look at this top down and think about the entire system. And so some of the research we are doing right now is looking at cross-component co-design, building an accelerator for each component for the systems we desire, right? For looking out, right? Looking out five to ten years, right? What are the capabilities we want? I think building, you know, monolithic accelerator per component is just not going to be scalable, right? It's just not

Starting point is 00:13:37 going to work. And so, we are really more after the science, okay, of how do you build specialized systems when there is so much diversity in the tasks, and yet each task is important. So one of the results that we have that is really very interesting to me is that there's really no one component here that completely dominates on all metrics, on performance and quality of experience, right? So there'll be some component, there's like slam or finding the pose of the user. That's a lot of computation. Okay, so performance-wise, it's super important.

Starting point is 00:14:19 But then there's a really small component called reprojection that needs to be invoked very, very often so that it can compensate for the latency that might be incurred by the rendering process, which is, again, a time hog. But this reprojection can actually compensate for that latency, compensate for some frame drops, and it's super important from a quality point of view. And so you cannot just ignore it just because, oh, it didn't show up on my stack of execution time because, you know, it takes so, you know, each invocation of reprojection takes such a small time, right?

Starting point is 00:14:53 And so what we are doing is actually, I mean, this is huge research to be done still, but our idea is to look, do a very systematic study of figuring out, right, what are the key primitives that we want to accelerate, not what are the components, okay, what are the primitives that we want to accelerate, how do we share these acceleration primitives across these components, you know, what is the science behind identifying these primitives, identifying the communication architecture, identifying how to schedule these, both from a hardware perspective, as well as the entire programming and runtime stack, and so on. And so some of the most interesting results relate to observations that say, oh, even though these tasks look very different, there is this large portion that is really common to them. And so if we pull this out, right, and if we reuse the results from this particular component, because it's common to the other one, hey, immediately, you can reduce the amount of time that you're spending

Starting point is 00:15:54 this component because we reuse the not just the hardware, but the actual results that have produced from this other component. So and of course, there's the hardware sharing aspect as well. So it's a very different way of thinking about system design that, you know, how am I going to accelerate, you know, this particular neural network? That was super interesting, Surya. Initially, when you were describing all of this, and you said that this is the hardest thing you've ever done, I was like, really? Compared to consistency models? I mean, like, come on. But then as you described the whole flow, I was like, oh, I can see models? I mean, like, come on. But then as you describe

Starting point is 00:16:25 the whole flow, it's like, oh, I can see how this is hard. The computer architecture community felt like it really grew up or like the big heyday was when you could say like, okay, everything is the same. I'm going to vary one thing. And you have all the structure in place. You have a lot of harnesses in place. You vary the one thing. And then you say like, okay, but what you're talking about is like a lot of moving parts, all new and all putting all that infrastructure together from the very beginning. And one thing that just stuck out as of what you were talking about just now is it almost sounds like you're trying to put together a general purpose framework for this domain, meaning that each of these domains now, you know, kind of, we're all standing on the shoulders

Starting point is 00:17:03 of giants, right? As we gain more and more knowledge about what we can build. And as the kinds of things we want to build become bigger and bigger and bigger. Now we have to like keep, you know, partitioning the tree and to make bigger and bigger sub pieces. And so it almost sounded like from the XRARVR standpoint, you know, there's a lot of things that these systems have to do. It's an end to end system. You're trying to put together the constructs of like, okay, here's how you build it. Like, cause now in, you know, classical computer architecture, we have memory, we have, you know, load store, we have branch predictors, we have caches. Like there's a kind of,

Starting point is 00:17:36 a sort of thrust of how we generally do things. We need to be able to handle control flow. We need to be able to handle, you know, memory and like a few basic primitives. So is that how you sort of think of it? And do you think the kinds of things that you would determine for AR, XR, VR might actually translate to other kinds of general purpose domain specific fields? Yeah, absolutely. That is absolutely right. In fact, the way, you know, the talk that I give on Elixir when it's to systems audiences, as opposed to XR audiences, my initial motivation is to build a scalable and generalizable specialization system design techniques, right? You know, we started with XR as an example domain, as a driver for the technologies that we want to build. It just so happened that this is an all-encompassing domain. So now I'm an XR person.

Starting point is 00:18:39 But no, we also work in robotics, in autonomous vehicles, and we are trying to use similar kinds of technologies in those areas as well. And these technologies, so, you know, just to give you concrete ideas, they include things like how do you automatically identify what you should accelerate, right? That has nothing to do with XR per se. This is a generalizable concept that you can apply to other systems. How do you use approximations? All of these domains have such stringent requirements. And because of these human level kind of end-to-end constraints, there's a lot of play that needs to be done in terms of how do you

Starting point is 00:19:27 selectively, judiciously use approximations and trade off quality, you know, for resource usage, right? And that's a problem that's generic, right? It's not something that's specific to XR. You know, I've collaborated with others to solve this problem again, you know, cross layer kind of way in the context of machine learning, the autonomous vehicle stack in the context of robotics, right. And now, in fact, you're going to bring that work to Elixir. So yeah, I mean, Elixir is sort of a passion for me, but I also have a finger in, you know, these other other pies so that we don't just get carried away by just just you know core XR stuff. Yeah it sounds like it. Maybe you can spend a minute telling us exactly what Elixir is, what it consists of. Yeah yeah so Elixir is is a full-blown open source XR system.

Starting point is 00:20:24 Okay I wish this was video I would show you our demo okay so you can go to elixir.org and see is a full-blown open source XR system. I wish this was video, I will show you our demo. So you can go to elixir.org and see our demo. So a generic XR system has three main components. So there's the perception pipeline, which is, so you're wearing a headset, right? And the system needs to know where the user is, what the world looks like from the user's perspective. So that's the perception component. Then there's the visual

Starting point is 00:20:51 pipeline, which is responsible for using all this information from perception and then generating the actual pixels. Using this user's information and the application, right? If it's a gaming application or a virtual reality for surgery, et cetera, the application, you know, figures out what needs to be done. And then the system needs to generate the pixels to actually display, right? And then there's audio where, you know,

Starting point is 00:21:17 if you have audio in your application to deal with that, again, spatial audio, which where the audio engine also needs to communicate with the perception pipeline to figure out where the user is so that the audio can be spatial localized, et cetera. And then there's the runtime that takes all of these three components and interfaces them. There's a lot of communication that happens between them. There are sensors, right? The perception component needs to work with cameras, IMUs, LIDARs, whatever, and so on.

Starting point is 00:21:46 And so there's this runtime and communication framework that you need to put together for all of these components to communicate with each other while maintaining all of the dependencies that need to be maintained and preserving and giving the QOE, right? And then you have the application on top that's running, be it your game engine or your virtual surgery or whatever itE, right? And then you have the application on top that's running, which be it your game engine or your virtual surgery

Starting point is 00:22:08 or whatever it is, right? And then this whole assembly runs on some hardware. So what Elixir is, is the entire runtime that runs on the headset, okay? The perception component, the visual component, the audio component, the runtime that's orchestrating all of this, shooting pixels onto the headset. So there's an open source headset company.

Starting point is 00:22:34 Well, now they're commercializing it. So it's called Northstar. They are amazing. We've been able to interface to the Northstar displays. You can wear this headset on your face and, you know, walk around and you're running Elixir, a completely open source system, right? Completely open source.

Starting point is 00:22:54 You can measure everything in it pretty much, right? Do whatever research you want to do. Okay. Yeah. And then, of course, you don't need the headset. You can send the displays on your monitor on the desktop and you won't get the full immersive experience, but you can do research that way. We've taken a lot of trouble to make this super modular. So the idea is that you can swap in and out components. So if you're running SLAM, which is visual inertial odometry, which is what you need to figure out where you are in the world, you can

Starting point is 00:23:19 bring in your own SLAM component and put it in. And if you're a SLAM researcher, you can see what is the impact of this on the end-to-end, right? You're no longer restricted to just looking at how your SLAM does, but you can actually see what is the benefit of getting better pose accuracy on the images that the end users see. So it's designed in a very, very modular way. And in fact, we already have multiple components, multiple implementations of each component of several components that we can swap in and out. And a lot of the work, there was a lot of engineering that went in to make sure we have clean interfaces to be able to do this. I mean, lots of flexibility in the system, lots of ways, lots of things you can measure. But the thing I'm most excited about is that

Starting point is 00:24:05 we recently launched the Elixir Consortium, which is, you know, we're not going to solve this problem, right? There's no way, okay? There's just so much to be done. Neither are we going to build the final infrastructure for people to use. So what we've done is launched a consortium where this is going to be an open system right where the community can contribute and you know we're inviting people to come and contribute their components to this test bed so that it can grow and and continue to grow and and literally you know our goal is to democratize XR research that's what we're after so that's one thing is to have a reference test bed test bed that grows so the community can do research on it the second thing is to have a reference testbed, a testbed that grows so the community can do research on it.

Starting point is 00:24:47 The second thing is to standardize, so standardize benchmarking. There's actually no standard ways to compare XR systems today, even in industry. And so that's why we have several industrial partners who are interested in that aspect of it. And even from a research point of view, you need to be able to have standard applications and data sets to be able to run on this system so that people can compare results and reproduce results. So that's what we would like to do through this consortium is to build a consensus around this.

Starting point is 00:25:19 And the third is to just create a community. So this is such a broad community and it's so fragmented right now. We'd like to bring people together. So we have, you know, suggestions for working groups, you know, open meetings where people can join in, talk about their work related to XR or just hang out and, you know, see, you know, what's going on and hopefully we make progress. That's great. Yeah.

Starting point is 00:25:45 Thanks for a great overview and the Elixir project and the platform itself sounds really exciting that a lot of people can contribute to. And the community is also like a very important aspect. I wanted to expand on one of the themes that you talked about, like for AR, VR, XR, there is no clear or established set of benchmarks or metrics that people can use to compare different systems, different approaches and things like that. Could you maybe expand on that and tell our audience, so what are the metrics that are of interest for an AR, VR, and XR system, both on the quality side

Starting point is 00:26:15 and I guess on the performance side as well? So one of the key metrics is motion to photon latency, which is basically the user, let's say the user moved their head, right? And so then now the system needs to project the visuals that the user is supposed to see with this new orientation. So what is the latency from the time that the user moved their head to the time that they saw what they were supposed to see based on this new perspective.

Starting point is 00:26:47 So that's motion, right? User's movement to photon when you actually flash the photon on the user's screen. Oh, right. And does this sort of depend on the scene or what's in the scene and things like that? So is there like variability in terms of depending on the richness of the scene or how far away the user is from a given object, things like that. Exactly, exactly, right. So we have a paper where we talk about, you know, all of these things, right, the whole end-to-end system, you know, it covers metrics, etc., variability, all of the questions you're asking. And the answer is yes. The motion to

Starting point is 00:27:22 photon latency actually relates to a lot that's going on in the system. It relates to how quickly I can figure out the user's pose in the system, right? Because how do you know what to project unless you know where the user's looking, right? So there's that component, the perception component that I talked about. It relates to how quickly you're able to render the scene, right, that you need to render. It relates to the graphics of, you know, the display part, right, how quickly I'm able to display this. And then there's also smarts in the system which try to compensate

Starting point is 00:27:55 for the rendering latency. So we try to sort of predict ahead of time, right, where the user might actually be looking. You know, the pose estimation algorithm determined where the user's looking now, but from that time to the time that the display actually got the rendered scene, time has elapsed, and the user may be looking elsewhere by that time. And so there's a mechanism in the system to predict where the user might be looking at this moment when the display is actually going to show the pixels. So there's that aspect as well.

Starting point is 00:28:33 And then based on the prediction, you reproject the scene using some math so that, you know, hopefully you're showing exactly the right scene. So the part where you're doing post estimation, there's a lot of variability there, depending on how quickly the users are moving their head, what is around the user in the physical space so that you can figure out, there's many different ways to solve this problem, but there's variability in all of them.

Starting point is 00:29:04 There's variability on the rendering part, right? Where the application is actually, you know, running the physics, let's say, or, you know, whatever it is that, and that's totally scene dependent. So there's a lot of variability there. On the backend, where you're doing the reprojection and where you're actually doing the display, there's, there's less variability of today, right? But, but if you're doing, yeah. And,, yeah. And so, you know, one of the big parts of our work is how do you schedule hardware given that there is so much variability in what's happening around you.

Starting point is 00:29:40 So the runtime and the scheduling is a huge component of this whole thing. And then if you're going to share accelerators, you have to consider this variability in mind. Right. That sounds like a really challenging problem because even if you did not have to share an accelerator, let's say you just had individual components that were all accelerating this, in the presence of variability, that itself becomes a very difficult problem. And I'm sure so the moment you have a shared resource, whether it's hardware or any other component of the stack,

Starting point is 00:30:08 it becomes even more challenging. So how do you build an accelerator? Like how do you decide which components of your hardware can actually be shared or are amenable to sharing? So there could be compute resources that you could share. There could be memory resources that you can share. There are communication resources that you can share. And of course, all of this also depends on the task data flow itself. Like maybe there are two portions of your data flow

Starting point is 00:30:29 graph that are communicating to each other that you could co-locate and things like that. So how do you sort of wrangle this big problem space? Yeah, so a lot of this is work in progress. You know, the papers are just being submitted and coming out because, again, we, it was a conscious choice on our part to spend a lot of time trying to understand the whole system before we started, you know, coming up with these sort of, with any kind of siloed systems. And also, another thing we try to do is, again, you know, going back to the previous part of the conversation, come up with generalizable techniques. So for example, there's a stack, right?

Starting point is 00:31:11 We don't want to get into the business of saying, here's a piece of code, map it to an accelerator. That's sort of doable, like in a one-off kind of thing, right? What we want to do is build the stack that will be able to automatically look at what are the different aspects of this code and How are you going to represent the software in a way that you can easily then map to a diverse set of hardware or even think about how you would design the hardware for this software? And for that, you need an intermediate representation.

Starting point is 00:31:58 And that is work that I've drawn on Vikram Adve and Sasha Misalovich, and we have a collaboration. So Vikram's been driving this for a while, intermediate representations for heterogeneous computing. And so there's a system called HPVM, heterogeneous parallel virtual machine. So this is, again, joint work led by Vikram. So this is an IR that is based on data flow graphs,

Starting point is 00:32:26 and these are hierarchical data flow graphs, where the key is that parallelism is represented at various levels. So since it's a hierarchical data flow graph, I mean, obviously, we can represent thread-level parallelism. These are the different parallel nodes in the graph. We can represent nested parallelism. We can represent streaming parallelism, you know, SIMD parallelism, etc. And communication is explicitly represented as the edges in the in the data flow graph. So we use that as our representation, right? And again, this has nothing to do with XR, this is just a general sort of, you know, the science of developing heterogeneous systems. So we use that as our representation. And then we've collaborated with folks at Harvard, David Brooks and Gu

Starting point is 00:33:11 Yon-Wei's group, they were looking at building at automated techniques. So they published some papers on automated techniques for generating accelerators, but they were not looking at parallelism in that much detail. So they were just looking at basic block level parallelism. Okay. But if you want to accelerate systems, you know, parallelism is super important. Okay. And so we combine the HPVM representation with their tools to figure out what is the level of parallelism that makes sense, right? Running a search algorithm to figure out what is the right level of parallelism that should be exploited for this code that's represented in HPVM.

Starting point is 00:33:51 So we ran that on some components of Elixir and we have results that show that using these automated tools, we get much better performance, et cetera, than using the tools that we had before. It's really trying to divide up these tasks into pieces and solving them in a generalizable way. So this was just one example of what we've done in terms of automated accelerator generation. I mean, this is just one small piece, right? This is just looking at the parallelism. It's not yet looking at the memory

Starting point is 00:34:23 system. That's our next step because the HPVM has the ability to provide, to make communication explicit. And we've done a bunch of work in the past on our Spandex coherence protocol where we have a very clear interface about communication, et cetera. We want to put these two things together and, you know, generate automatable ways for expressing the type of communication that makes sense for these data flow graph nodes, etc. So that sounds like it spans like many layers of the stack. So one of the recurring themes through your career has been like pulling folks from multiple layers of the stack to sort of talk the same language and sort of reconciling different notions or different paradigms. So this has happened in the memory consistency model side. And it looks like even in the AR, VR space,

Starting point is 00:35:06 you know, you have folks all the way from hardware to software to, you know, application developers and users and so on. Do you have any broad words of wisdom for our listeners on how do you sort of bring together different communities? It's obviously a lengthy process. People talk different languages and things like that. Yeah, definitely.

Starting point is 00:35:23 And it is true that my career has been about this type of work, because, you know, I've always thought that this is the world that I feel is really important and excites me the most. You know, this didn't happen overnight. I like to say this, right? So most people from the outside, they will see the big successes and ask, how do you do that, right? But behind each big success is a long story of trials that fail, right? But they failed only to the extent that maybe you have not heard about that paper. But for me, it was a success because it gave me the experience to do, you know, the thing that finally succeeded, right? And that's really important. So in my case, for example, this whole business of Elixir, right?

Starting point is 00:36:11 You might think, oh, you know, there's this cross layer and, you know, all this stuff, right? But, you know, I was doing this cross layer stuff in, I think it was 1999. I mean, it wasn't even a thing. When I moved to Illinois, one of the things that was really cool, I was at Rice before I came to Illinois. And Rice was a great place. I really thrived there and did a lot of really interesting work, awesome students, etc. But the thing about Illinois was it was a bigger place. And there were people doing things that weren't around at Rice. And so I wanted to

Starting point is 00:36:44 take advantage of that. And I I wanted to take advantage of that. And I put together a small team of people. In this case, it was just four, architecture myself, operating systems, signal processing, and multimedia, and networking. And I really thought that multimedia was the way to go at that time. Again, that whole full stack thinking and cross-layer optimizations were really necessary. And so we built a system that I think it may have been the first in this, at least in, you know, our closely related communities, that was actually a cross-layer

Starting point is 00:37:18 co-design for performance power and quality of service at that time. The papers, we found it very hard to get the papers in. That's a whole another story. And so the multimedia community has seen that work. Okay. I thought it's not that well known in the architecture community, but it was a huge, huge experience for me because I learned to work with these people who spoke different languages and we actually put together again a science right so principles that I use even now right and so even though those papers did not appear in ISCA or ASCLOS those principles took the test of time and I use them now for Elixir. So going back to your question of how do you bring people together it's slowly you know piece by piece right it doesn't

Starting point is 00:38:03 just happen one fine day you find yourself doing something like elixir no right it's it's there's a history here uh same thing with memory models you know when i was a grad student doing my phd you know a lot of the work on memory models was hardware centric right just mostly hardware people you know for whatever reason it you know we understood you know soon enough that this was not just a hardware problem, but a hardware software problem. And at that time, I wasn't savvy enough to go to the software people. We did do some collaboration. And in fact, the inspiration for this came from work that my colleagues, my professor at that time, Bart Miller, who was an OS professor and Rob Netzer.

Starting point is 00:38:43 They were working on data race debugging, et cetera. And sort of, you know, we got some inspiration from them and the data race free model was born. But again, the key thing being the realization that this was a hardware software interface problem and not just a hardware problem. And so, you know, that was already the start of hardware software work, right? It came about because of, you know, talking to people in a very narrow kind of a way. Nobody in industry bought our idea.

Starting point is 00:39:09 They had a very hard time selling, quote unquote, selling this work to people. People could, you know, the hardware people almost didn't understand what was going on. And in fact, although the ISCA paper got in, the journal paper we wrote was rejected multiple times. It was a, you know, cause of great grief for me. But it kept at it, kept at it, kept at it,

Starting point is 00:39:28 knowing that this was really the right way, the right solution. And then finally found people who got it from the software community. And that's a whole another story. And then magic happened and things came together. So it doesn't happen overnight. That's a great story for everybody, Sarita. I

Starting point is 00:39:45 think, you know, a lot of times people do look at some of the leading lights in our community and just think like, oh man, you know, how do I follow that formula? And you're right. A lot of it is really, really hidden behind the scenes. And there's a lot of experience that goes along with building the kind of eventual successes. I kind of wanted to double click on something you were saying, you know, maybe a little bit all the way back before I have so many things built up from all the wonderful things that you've been saying. So one was earlier, you were saying, you know, at, at the moment, your main hop level metric is just like, is the, is the user going to barf? And that's, but it does seem like a real one, right? Because, you know, this is something that you wear on your head and now you somehow

Starting point is 00:40:27 have to translate this kind of qualitative thing. Is the user going to barf? In the end, you have to get it down to something that is more measurable than perhaps, you know, putting these things and actually measuring whether people barf, right? So earlier you were talking about the metric of this like motion to photon latency. And with things like this, it seems like there's probably a notion similar to uncanny valleys

Starting point is 00:40:52 where I can imagine putting something on, I'm looking at a blank white wall. I turn my head, it's still more blank white wall. Has it actually changed? Has it not changed? Am I gonna get dizzy? Because maybe it detected, it's like still white wall and it looks the

Starting point is 00:41:05 same, but somehow we know that it's moved, but it hasn't. So are there key things that you've been able to determine that lead to barfing, I suppose, that are measurable beyond this latency thing? This latency thing clearly must be part of it, but are there others? Yeah, I mean, the latency thing is a be part of it, but are there others? Yeah. I mean, the, the latency thing is, is a, is a big part of it. And then there's, you know, there's the variability in it. Let's say you're taking in an augmented reality situation, you know, you're taking you're moving one thing from here to there and, and,

Starting point is 00:41:38 you know, you see the judder in there and that, you know, that doesn't quite work. There's the whole image quality thing. You know, did I render my image correctly, appropriately or not? There's a whole range of metrics that honestly, I'm not even an expert on yet. I mean, there's a bunch of stuff that the XR community has still not figured out as to how to measure all of this. Another point, I think as architects and systems designers,

Starting point is 00:42:08 we tend to go after quantitative metrics and we must continue to do that. There is no question about that. But in this area, I think we are going to have to bite the bullet and start to think about user studies. And that's something we don't know how to do in this community, right? So that's a new thing coming up. And we literally had to do that for this paper that we first wrote about Elixir, because the quantitative metrics that we were using, which are all state of the art, were not showing us the differences that we were seeing ourselves, right, when we use the system, right? So the image quality image quality from you know versus ground truth from

Starting point is 00:42:47 you know this version of the system to that version of the system uh you know quantitatively it was just a little bit off but quantitatively you could see are you kidding me this is so much worse than this right and so uh you know we have a section in the paper that talks about how this is a user study like in a very informal way just just ask my students in the lab, right, saying this is better than this, but we're going to have to do this a lot more. And I think this is just a new frontier for systems researchers. We don't know how to do these things. Yeah, that's very interesting because I feel like I've read a number of papers where they

Starting point is 00:43:23 present a new tool and then they'll say like, and then we gave it to some undergrads and they were able to do this, you know, in 10 lines of code in a week without a problem. And that's kind of like our version of a user study, right? So that's very interesting. Yeah, because once you start building full systems, whether they be robotic systems or whatever, there's something in our brains. I guess this is kind of the uncanny valley question. There's something in our brains. I guess this is the kind of the uncanny valley question. There's something in our brains that knows, and we don't necessarily know how to translate that knowing into like pure quantitative pieces. And maybe, maybe the answer, as you say, is not to, maybe the answer is that we then just have to have

Starting point is 00:43:57 user studies. Yeah. Well, I think it's both, right? So we are very interested in research on metrics. We are not experts in that area, right? We are not really, we know a lot about XR now, but we are not experts in that area in terms of actually figuring out how to, or maybe because of the system, this is something where we will make a contribution, who knows. But it is certainly something that as part of the consortium, we're trying to push to get more consensus on what these quantitative metrics should be. Because, I mean, we need both. Yeah. And then the other thing is that to do this kind of end-to-end user study, you do need to build systems, which is another thing in our community that, you know, it's changing. Again, that hard but fun part. I've just been fortunate that the last few years,

Starting point is 00:44:41 I've had the opportunity to be involved in two big systems building things. I talked a lot about Elixir, but we have another project that's also a big, it's a DARPA funded and a big collaboration with IDM and Harvard and Columbia, where we are building an SOC for the application of autonomous vehicles. But, you know, the goal there is to build a full stack of a heterogeneous system with many accelerators, compilation stack, the scheduler, and then, of course, the application. So that's another big project where I'm actually, you know, for the first time involved in an actual chip building exercise. that's been again amazing and you know I've never you know in my career I have not I had not built a system a real system right I mean built

Starting point is 00:45:33 simulators and stuff that's different but a real system until you know recently it's just a whole different ball game I mean and our community does not have enough respect, I feel, for this type of work, and it needs to. We've reached an era where simulators are just not going to be enough. FPJs are great, but, you know, even there, you're limited. And actually taping out a chip is super hard. So we have a lot of questions in front of our community to figure out how are we going to evaluate these systems that we need to design? I think that's one very, very interesting question. Yeah, I agree with you. As someone who's, you know, sitting on the industrial side, you know, I think there's long been this potential disconnect between academia and industry where

Starting point is 00:46:23 the perception from the academia side is like industry they just all say no to everything and then the industry feels of academia oh they just got their heads in the sky and it's because there's this disconnect of realizing that the last mile of building something is potentially the hardest mile and there's lots and lots of things to think about and from an industrial, it can't ever not work. Otherwise you have the whole floating point fiasco of the 90s or what have you. And that was exactly why.

Starting point is 00:46:54 So we did this work on coherence that I think is the right solution for a heterogeneous system, but to get industry to buy into something like this, which really affects the entire system, right? It's not, it's not one part of one accelerator. It's, it's really what binds everything together. We felt the need to build and to show that, that we can make this work. So yeah, I think the kind of work that we are doing now, it just, it's, it's, it's hard, but fun. Yeah. And I think that the thing about building the whole system

Starting point is 00:47:26 too, as well as making it into a science, I think that's important as well, because there's also a difference from making a system work through hacks, like just like actually going to hack it together and making a system work through principles that are only deviated from when you really have to, because you're deep in that last mile. And so from an industrial perspective, a lot of times you want to try and get to something that's mostly adhering to principles so that when they take it for themselves, they have principles rather than just like, well, they made it work and it's because they spaghetti this and duct taped that or what have you. Okay. Well then maybe at this point, we want to transition a little bit to some of your sort of extracurricular activities, you know, you've done.

Starting point is 00:48:08 They're curricular. Let's see, let's maybe not directly technical, you know, technical problem solving, other kinds of problem solving activities. We all know you as very involved in our community as a, not only a technical leading light in a community, but as also a leader of the community itself. And, you know, you've had a lot of service roles and stuff, but one of the things that has been kind of perhaps different in recent years is your co-founding of this CARES initiative. Would you maybe take a minute to describe for folks who don't know what it is, what it is, and then, you know, how it's been going and how you feel, whether you feel like it's been making an impact on our community? So the goal of the CARES Committee is to ensure that we don't have discrimination and harassment

Starting point is 00:48:57 in our conferences and other professional events. The mandate has recently been broadened to also include ethical violations. So things like reviewer ethics, et cetera. ACM, which is the parent organization under which the CARES program exists, has policies that say explicitly that discrimination and harassment have no place in our communities. And these you know, these are the ethics and values by which we want to conduct ourselves. But the problem has been that reporting violations is hard for people. The investigations take a long time. And, you know, there's a general sense of nervousness, I think, and lack of trust as to what happens, you know, with these reports, etc. Because often the people who are targets of such bad behavior are, you know, on the lower sort of rungs of the CARES Committee is to provide a set of friendly, respected, trusted members of the community who are well-known, who are not strangers,

Starting point is 00:50:13 that people can come to for advice as a resource to help them navigate situations that they may find themselves that are clearly in violation of the policies and values that we all hold dear as a community and that are officially part of the ACM policies. So the goal of CARES is not to investigate the problem because that requires a level of skill that, you know, the CARES committee members may not necessarily have. But the goal is to be there for our community to listen, to be a sounding board for questions, reporting of incidents, and help navigate the ACM processes to file the reports and to be there for you while things are being investigated. And so now that CARES is approaching, it's several years now old, right, as an initiative.

Starting point is 00:51:11 Have you, from your vantage point, seen or do you feel like it's making a difference? Have you seen an impact it's had on our community? I think so. The thing about the CARES committee is that confidentiality is a big part of it. Right. And so you will never hear from the CARES committee that so-and-so problem was solved. Right. They're not allowed to talk about it. And because I was the SIGARCH chair, I deliberately kept myself distant from the actual inner workings because, you know, while I'm a sympathizer, I didn't want to create a precedent where the chair of SIGARCH felt that they could be privy to all of the care's workings, right? So what I observed is an observation from the community's viewpoint, okay, in terms of the impact. And I believe that it has made an impact.

Starting point is 00:52:12 I see people standing up more, you know, against behavior that they see that is unacceptable. This was not true in the past. I see more people coming and talking about their problems, which was not the case in the past. I see more people coming and talking about their problems, which was not the case in the past. And in general, I see a sensitivity towards issues related to diversity and inclusion, which is really, you know, a core tenet behind CARES

Starting point is 00:52:39 that was not there in the past. So whether this came about from CARES or not, I don't know. But definitely there was a huge burst of activity that, you know, Lisa, you were a part of as well during that time that has sensitized our community. There's no question about that and moved us forward. There's a lot of work to be done still. I do know of cases that are not being resolved in the way I would like them to. So the workIG, the broader meetings, and pushed for this. A lot of us did a lot of evangelizing. And very soon, many of the ACM SIGs started their own CARES committees, patterned after ours. SIG Micro joined us very soon, SIG Graph, SIG Mod, SIGGRAPH, SIGMOD, SIGCOM, and there's a whole host of other SIGs that are documented that now have a CARES committee.

Starting point is 00:53:50 And this happened quickly, which is just, it just goes to show that there was a need for something like this. Yeah, it's not solving all the problems, but it's definitely filling a need. So yeah, I'm really happy that this has taken off the way it has. And another thing I'd like to share with our listeners is that we recently launched CARES in my academic, in my department, CS at Illinois CARES. And the goals are very similar. It goes beyond just discrimination and harassment, but other colleagues of mine wrote up a clear code of conduct, our values and code of conduct document. And the charter of our CARES committee is to ensure that our department holds up those values.

Starting point is 00:54:35 And it's the same thing. We don't investigate. It's patterned. You go to our website, you see a lot of same-world biogestic archcares. You know, we don't investigate. We're there as a sounding board to help our community navigate these issues so that everybody understands that we care and these types of behaviors are completely unacceptable. Yeah, I think that's wonderful. And I feel like as a community member that CARES has made a difference. And it's very interesting to me that it has gone essentially viral in insofar as much as something like this can go can go viral and it's probably

Starting point is 00:55:12 the most positive thing um that we can say has gone viral in our in our community so so i think you know i probably speak on behalf of our whole community and saying that this has been a marvelous initiative. And the fact that it has spread the way it has is, as you say, reflective of a need. So that's great. I want to also sort of, you know, circle back to a point you made earlier. So this did not happen overnight either.

Starting point is 00:55:39 It was not easy. A lot of people involved and it was definitely not easy to get this. Yeah. So I think one meta point to maybe draw away from this is, you know, one thing that's clear when speaking with you is that you are guided by a North Star on multiple fronts, both technically and morally, ethically, and all that kind of stuff. And once that North Star is there, you just go towards it. Right. And I think a lot of times it's very easy to try and do whatever is expedient or do whatever everybody else is doing.

Starting point is 00:56:14 You know, I think in, in, in Kim Hazelwood's episode, she also talked about this from a, from a research perspective, just like everybody over here, everybody over there, everybody over here, and just looking for the hot new place so that you can, you know, get your publications out or what have you. But across life and tech, it does seem very clear that if you have a, a North star to, to kind of guide what you try and remain tenacious about, then eventually, something does come out at the end that is positive. And so on that note, maybe we can turn around and ask you for general life and career advice for some of our listeners, because I think it kind of spans career stage and industry and academia. Okay.

Starting point is 00:57:01 So I gave a talk recently at the Young Architects Workshop and tried to synthesize some of this and just speaking to what you just said, which, you know, thank you for done that many times, right? And sometimes you just fall flat, right? And I try to talk about those moments too, so that people realize that it doesn't always work out, right? And then the other thing was to believe in yourself and to be passionate. So you have to have the combination of all of these things and you don't always succeed. And that's okay. If you truly believe in what you're doing and are passionate about what you're doing you will do it you will do it and you may not succeed now but you will eventually succeed and there's then then there's that you know believe in yourself thing right so you have to believe in yourself and just keep going and then the last part of that talk was about the people.

Starting point is 00:58:08 And I started off this whole thing with people. For me, people are very important. Surround yourself with people who make you fly. Just ditch those who bring you down. Don't spend your time on them. They're not worth it. And there will be such people. We haven't talked about gender issues much here.

Starting point is 00:58:27 You know, if you are in any sort of underrepresented part of the population, you've probably had people who are not the best for you. Ditch them. Surround yourself with people who make you fly. And then pay it forward. Don't forget that. You know, we started this whole thing with how, you know, there was a perceived dichotomy between service and fun, but services is not service. It's

Starting point is 00:58:52 really, you know, you talked about my extracurricular activities. I don't view these activities as extracurricular. I think it's part of being in the community. I want a community that is a kind and encouraging community because that's what makes me fly, right? So it's very self-serving. Wonderful. That's such great advice. And I think a lot of times that is one thing that is good to kind of get out there where when you do things that serve the community, often it does turn around to make it better for you. And so that's one great way to kind of participate and motivate that kind of work. So this has been such a wonderful conversation. Professor Sarita Adave, thank you so much for joining us today.

Starting point is 00:59:38 It's been so fun talking to you. I learned a lot of stuff and we're so glad you were able to join us today. Yeah, it was a wonderful conversation. Thank you so much for joining us. And to our listeners, thank you for being with us on the Computer Architecture Podcast. Till next time, it's goodbye from us.

Your Ad Here

Computer Architecture Podcast - Ep 7: Domain-specific Systems for AR/VR and Extended Reality with Dr. Sarita Adve, University of Illinois at Urbana-Champaign

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.