Computer Architecture Podcast - Ep 7: Domain-specific Systems for AR/VR and Extended Reality with Dr. Sarita Adve, University of Illinois at Urbana-Champaign
Episode Date: February 9, 2022Dr. Sarita Adve is the Richard T. Cheng Professor of Computer Science at the University of Illinois at Urbana-Champaign. Her research interests span the system stack, including hardware, programming l...anguages, operating systems, and applications. She co-developed the memory consistency models for the C++ and Java programming languages, based on her early work on data-race-free (DRF) models, and has made innovative contributions to heterogenous computing and software-driven approaches to resiliency. Her group recently released the Illinois Extended Reality testbed (ILLIXR), the first fully open source extended reality system to democratize XR systems research and development. She is a fellow of the American Academy of Arts and Science, IEEE, ACM and a recipient of the ACM SIGARCH Maurice Wilkes award. As ACM SIGARCH chair, she co-founded the CARES movement, and is a winner of the CRA distinguished service award.
Transcript
Discussion (0)
Hi, and welcome to the Computer Architecture Podcast, a show that brings you closer to
cutting-edge work in computer architecture and the remarkable people behind it.
We are your hosts.
I'm Suvaneh Subramanian.
And I'm Lisa Xu.
Today we have with us Professor Sarita Advay, who is the Richard T. Chang Professor of Computer
Science at the University of Illinois at Urbana-Champaign.
Her research interests span the systems stack, including hardware, programming
languages, operating systems, and applications. She co-developed the memory consistency models
for the C++ and Java programming languages based on her early work on data race-free models.
Her group recently released the Illinois Extended Reality Testbed, or ELIXIR, the first fully
open-source extended reality system to democratize XR systems research
and development. She has also made innovative contributions to heterogeneous computing,
software-driven approaches to resiliency, and approximate computing. She's a fellow of the
American Academy of Arts and Science, IEEE, ACM, and a recipient of the ACM Sigarch Morris Wilkes
Award. As ACM Sigarch chair, she co-founded the CARES movement and is a winner of the CRA Distinguished
Service Award.
Today, she's here to talk with us about her most recent work on domain-specific systems
for AR, VR, and extended reality, and how it ties to her long line of work on scalable
specialization and heterogeneous computing.
A quick disclaimer that all views shared on the show are the opinions of
individuals and do not reflect the views of the organizations they work for.
Sarita, welcome to the podcast. We're so happy to have you here today.
Thank you. Tell us, Sarita, what's getting you
up in the morning these days? So I think a lot of people at this moment of time are coming out of
a lot of adversity. And I just wanted to take a moment to share my story a little bit. In May,
both my parents in India caught COVID and I lost my mom. And I spent
four months in India, helping my dad recover from this loss and helping him recover from COVID too.
I returned back to the US just about a month ago. And this whole thing has taken a huge toll on me.
But what wakes me up in the morning is the thought of calling my dad up.
I do that every morning and every evening.
And he is just so chirpy and,
and positive and his attitude just inspires me.
What he's done is embraced my mother's trying to continue my mother's
legacy, who was a bonsai artist.
And he started teaching her classes, her bonsai classes,
workshops, etc. I'm totally immersed himself in his work. And so I'm really grateful that I share
his gift of, you know, having work that I love tremendously, and people that I love tremendously.
Without either of those, I don't think I would be able to wake up in the morning at this time.
Oh, Sarita, I'm so sorry to hear about your mom, but thank you so much for sharing that with us.
My condolences to you, but I'm so glad that your dad has the fortitude to keep going and that you also have the fortitude to keep going and that you have such a wonderful supporting cast around
you to help you through this.
And I'm happy to help you talk today about the work that you love to do.
Yeah, yeah, yeah. I, you know, when it comes to work, I, these days, I almost feel like a child
in a candy store. There's just so much interesting stuff going on in our in our area. I mean, what an
amazing time to be doing research in architecture
and systems, more broadly speaking. I think my biggest problem is just not enough hours in the
day to do everything that I want to do. So, you know, I mean, who can be luckier than that, right?
So let me tell you a little bit about what I'm doing these days, which is very different,
actually, from what I've done before.
I can start from, you know, the beginnings of the Elixir project, which goes back about five or six years ago when, you know, it was clear that heterogeneity was here, specialization
was here, with the end of Moore's law, they're not scaling, etc.
We needed to build accelerators, et cetera, et cetera. You know, not surprisingly, I focused my attention on memory systems for heterogeneous systems, how to build heterogeneous
memory systems. And we did a fair bit of work on, you know, specialized memory systems,
coherence for heterogeneity, memory models, et cetera, et cetera. But very soon I got really
frustrated doing this work. We were getting papers accepted,
you know, lots of awards and this and that, but it was really frustrating because the applications
and benchmarks that we were running on these systems were really toy things. And, you know,
I mean, breadth first search and page rank and these small kernels that were not appropriate
for the kind of technology we were building. So my interests have always been at the system level,
looking at hardware and software together,
looking at the big picture of how the whole thing needs to operate together.
And thinking about accelerators, you know,
my interest was in how do I build a system
that has many different accelerators working together?
How do I build the software for the system?
What are the programming models, the runtime, et cetera?
And these toy applications just didn't cut it, right?
They just don't lead to a work of the type
that I wanted to do.
And the other frustrating thing was
that I was tired of doing simulation.
So again, for the type of things I wanted to do,
I really needed to build real systems.
So lucky for me, I was part of a research center.
This is a long line of research centers that DARPA and SRC, which is the Semiconductor Research Consortium, they jointly have been funding these research centers.
And the charter for these centers is to look about 10 years out.
And there's a broad goal of what we want to
achieve, but there's a lot of flexibility in how we use the funding. So there was a call for seed
proposals to come up with ideas for the next round of these centers. And I happened to put in one.
And the way that happened was, you know, literally thinking, okay, we really need to find some
applications, you know, what applications. And it we really need to find some applications, you know, what applications.
And it so happened that Steve LaValle, who was the founding chief scientist of Oculus, he was my colleague.
He just returned back from his very successful stint at Oculus.
And, you know, AR, VR was was was sort of in the air.
And it was clear to me this was really going to be very important.
And so I had this student, Muhammad Huzaifa, who was looking for another topic. And he and I walked up to Steve's office and said, hey, you know, what are the hardware problems in this area?
And Steve is just this amazing, you know, super positive, positive encouraging inspiring person uh and he just started talking
to us in his office for a long time uh i didn't know anything about arvr at that time and most
of what he said went over my head but uh you know it was his uh his uh enthusiasm is so contagious
and and i just came out thinking you know this is super important i know nothing about it but
we're going to do something in this area right and? And so we put in a proposal to come up with benchmarks for AR, VR.
And so that's how this whole thing started. We literally knew nothing about extended reality.
So by extended reality, I include augmented virtual and mixed reality or XR. So we knew
nothing about XR, you know, didn't know how we were going to do this,
but somehow this was really important, needed to be done. And I didn't want to do my research
with toy applications anymore. So that's how this whole thing started. You know, I thought,
okay, benchmarks, right? How difficult is it going to be? Talk to a few people, find some,
you know, benchmarks, slap the suite together and do research. Well, it turned out to be a huge focus
of my life. The last, whatever, three, you know, in the beginning, it was slow, but then the last
two, three years or so, it's been just the dominant thing. And it's just been so much fun,
but so hard. It's probably the hardest thing I've done, mainly because this is a real system,
right? This is a real, I call it, we call it an
application, but it's not an application. It's a full system. And so here I want to point out,
you know, we talk about domain specific accelerators and domain specific architectures,
but what's truly exciting to me about this new age we are in is that we have to worry about
domain specific systems, right? And it's not just
AR, VR, but it's robotics, it's autonomous vehicles, right? You look around you and we're
talking about real specialized systems. That's a whole new way of doing research, right? A whole
new, a plethora of research problems that shows up. And these are all the systems that I talked
about are all on the edge, which means huge resource constraints, at the same time, huge performance requirements, right? And the other thing is that it's the end
to end user experience that matters. So with AR, VR, it's, you know, I mean, is the user going to
puke, right? I mean, that's sort of your quality of experience. And we don't have metrics, we
really don't have the metrics for measuring the goodness of these systems are just, you know, still a topic of research.
So first thing, XR is this great playground for us architects, because there are there's just an orders of magnitude difference between what we can do today in the in the systems that we have today and what we desire in the future.
So orders of magnitude in performance,
power and quality of experience.
The second thing is that XR touches so many domains.
There's graphics, there's computer vision, robotics,
haptics, optics, and what what have you and so to do research in
this area you have to be um at least the initial people have to be experts or or have to gain
expertise or or have people involved who work in all of these areas which is super challenging but
super fun um the third thing is because of this large gap, right? And because we're thinking about a whole system,
this is truly a co-design cross layer optimization problem.
You cannot solve this problem by just working
in the hardware or working in the compiler programming stack
or the runtime or the application.
It really has to all come together,
which is the kind of work I love to do.
And it's just been incredible. And then the third thing is the kind of work I love to do. And it's just, it's just been incredible.
And then the third thing is the metrics, as I said, it's the end-to-end user metrics,
and you can't do research where end-to-end user metrics are so important. And, you know,
you're just looking at one algorithm. Yeah, that's important. You want to be looking at
one algorithm, but then how does that algorithm fit into this entire system is what you have to ask right and the biggest problem in this area
for us when we started was that um everything was closed meaning uh the companies were you know in
there were no open source anything right there was we didn't even know how an overall xr workflow
was supposed to work and you know what were what were, what were the smarts in there, et cetera, et cetera. And so it was, you know,
slowly, slowly talking to people, working with companies,
working with academics who were working in their own silos.
We finally realized that, Hey,
what we really needed was to build this open source XR system so that we could
democratize research in this area, reduce the barrier to entry.
And now we have that.
And we hope people will use Elixir and work in this super exciting field.
Thank you so much for that, you know, really both great origin story as well as an articulation
of all the why this particular domain is like really exciting and challenging from a systems
researcher's perspective.
You know, there are several themes that you touched upon over there, how this is a very challenging domain in terms of the performance,
power and other requirements, how it's very diverse. You have a wide range of different
kinds of applications that you need to sort of accelerate, expand the entire stack.
There are a lot of trade-offs to sort of deal with. Maybe we can expand on some of these themes
one at a time. There are a lot of different topics here that we can sort of double click on. I'll just start with the first one. As you said, this is a very diverse
space. So AR, VR, XR, you have audio, you have video, you have compute, graphics, you have many
other different things. And we have sort of touched upon domain-specific accelerators for things like
machine learning. But when you have this really, really diverse set of applications, how do you
sort of think about accelerating this particular domain or this particular system?
Like do you build accelerators for each individual component?
Do you sort of change them together?
Like how do you even sort of grapple with this entire complexity?
So yeah, you can do the bottom up research, which is, you know, look at individual components,
figure out how to accelerate them, et cetera, et cetera.
But then, and maybe that's
the right way to go, right? But our view is that you really need to look at this top down
and think about the entire system. And so some of the research we are doing right now
is looking at cross-component co-design, building an accelerator for each component
for the systems we desire, right? For looking out, right? Looking
out five to ten years, right? What are the capabilities we want? I think building, you know,
monolithic accelerator per component is just not going to be scalable, right? It's just not
going to work. And so, we are really more after the science, okay, of how do you build specialized systems when there is so
much diversity in the tasks, and yet each task is important. So one of the results that we have
that is really very interesting to me is that there's really no one component here that completely dominates on all metrics,
on performance and quality of experience, right?
So there'll be some component,
there's like slam or finding the pose of the user.
That's a lot of computation.
Okay, so performance-wise, it's super important.
But then there's a really small component
called reprojection that needs to be invoked very, very often
so that it can compensate for the latency that might be incurred by the rendering process,
which is, again, a time hog.
But this reprojection can actually compensate for that latency, compensate for some frame
drops, and it's super important from a quality point of view.
And so you cannot just ignore it just because, oh, it didn't show up on my stack of execution time
because, you know, it takes so, you know, each invocation of reprojection takes such a small time, right?
And so what we are doing is actually, I mean, this is huge research to be done still,
but our idea is to look, do a very systematic study of figuring out, right, what are the key primitives
that we want to accelerate, not what are the components, okay, what are the primitives that
we want to accelerate, how do we share these acceleration primitives across these components,
you know, what is the science behind identifying these primitives, identifying the communication
architecture, identifying how to schedule these, both from a hardware perspective, as well as the entire programming and runtime stack, and so on.
And so some of the most interesting results relate to observations that say, oh, even though these tasks look very different, there is this large portion that is really common to them. And so if we pull this out, right, and if we reuse the results from this particular component, because it's common
to the other one, hey, immediately, you can reduce the amount of time that you're spending
this component because we reuse the not just the hardware, but the actual results that have produced
from this other component. So and of course, there's the hardware sharing aspect as well.
So it's a
very different way of thinking about system design that, you know, how am I going to accelerate,
you know, this particular neural network? That was super interesting, Surya. Initially,
when you were describing all of this, and you said that this is the hardest thing you've ever
done, I was like, really? Compared to consistency models? I mean, like, come on. But then as you
described the whole flow, I was like, oh, I can see models? I mean, like, come on. But then as you describe
the whole flow, it's like, oh, I can see how this is hard. The computer architecture community felt
like it really grew up or like the big heyday was when you could say like, okay, everything is the
same. I'm going to vary one thing. And you have all the structure in place. You have a lot of
harnesses in place. You vary the one thing. And then you say like, okay, but what you're talking
about is like a lot of moving parts, all new and all putting all that infrastructure together from the very beginning.
And one thing that just stuck out as of what you were talking about just now is it almost
sounds like you're trying to put together a general purpose framework for this domain,
meaning that each of these domains now, you know, kind of, we're all standing on the shoulders
of giants, right?
As we gain more and more knowledge about what we can build. And as the kinds of things we
want to build become bigger and bigger and bigger. Now we have to like keep, you know,
partitioning the tree and to make bigger and bigger sub pieces. And so it almost sounded like
from the XRARVR standpoint, you know, there's a lot of things that these systems have to do.
It's an end to end system. You're trying to put together the constructs of like, okay, here's how you build it. Like,
cause now in, you know, classical computer architecture, we have memory, we have, you know,
load store, we have branch predictors, we have caches. Like there's a kind of,
a sort of thrust of how we generally do things. We need to be able to handle control flow. We
need to be able to handle, you know, memory and like a few basic primitives.
So is that how you sort of think of it? And do you think the kinds of things that you would determine for AR, XR, VR might actually translate to other kinds of general purpose domain specific fields? Yeah, absolutely. That is absolutely right. In fact,
the way, you know, the talk that I give on Elixir when it's to systems audiences, as opposed to XR
audiences, my initial motivation is to build a scalable and generalizable specialization system design techniques, right? You know, we started with XR as an example domain,
as a driver for the technologies that we want to build.
It just so happened that this is an all-encompassing domain.
So now I'm an XR person.
But no, we also work in robotics, in autonomous vehicles,
and we are trying to use similar kinds of technologies in those areas as well.
And these technologies, so, you know, just to give you concrete ideas, they include things like how do you automatically identify what you should accelerate, right?
That has nothing to do with XR per se.
This is a generalizable concept that you can apply to other systems.
How do you use approximations?
All of these domains have such stringent requirements.
And because of these human level kind of end-to-end constraints, there's a lot of play that needs to be done in terms of how do you
selectively, judiciously use approximations and trade off quality, you know, for resource usage,
right? And that's a problem that's generic, right? It's not something that's specific to XR.
You know, I've collaborated with others to solve this problem again, you know, cross layer kind of way in the
context of machine learning, the autonomous vehicle stack in the context of robotics,
right. And now, in fact, you're going to bring that work to Elixir. So yeah, I mean, Elixir is
sort of a passion for me, but I also have a finger in, you know, these other other pies so that we don't just get carried away by just just you know
core XR stuff. Yeah it sounds like it. Maybe you can spend a minute telling us exactly what
Elixir is, what it consists of. Yeah yeah so Elixir is is a full-blown open source XR system.
Okay I wish this was video I would show you our demo okay so you can go to elixir.org and see is a full-blown open source XR system.
I wish this was video, I will show you our demo.
So you can go to elixir.org and see our demo.
So a generic XR system has three main components.
So there's the perception pipeline, which is,
so you're wearing a headset, right?
And the system needs to know where the user is, what the world looks like
from the user's perspective. So that's the perception component. Then there's the visual
pipeline, which is responsible for using all this information from perception and then generating
the actual pixels. Using this user's information and the application, right? If it's a gaming
application or a virtual reality for surgery, et cetera,
the application, you know,
figures out what needs to be done.
And then the system needs to generate the pixels
to actually display, right?
And then there's audio where, you know,
if you have audio in your application to deal with that,
again, spatial audio,
which where the audio engine also needs to communicate
with the perception pipeline to figure out where the user is so that the audio can be spatial localized, et cetera.
And then there's the runtime that takes all of these three components and interfaces them.
There's a lot of communication that happens between them.
There are sensors, right?
The perception component needs to work with cameras, IMUs, LIDARs, whatever, and so on.
And so there's this runtime and communication framework
that you need to put together
for all of these components to communicate with each other
while maintaining all of the dependencies
that need to be maintained and preserving
and giving the QOE, right?
And then you have the application on top that's running, be it your game engine or your virtual surgery or whatever itE, right? And then you have the application on top that's running,
which be it your game engine or your virtual surgery
or whatever it is, right?
And then this whole assembly runs on some hardware.
So what Elixir is, is the entire runtime
that runs on the headset, okay?
The perception component, the visual component,
the audio component, the runtime that's orchestrating all of this,
shooting pixels onto the headset.
So there's an open source headset company.
Well, now they're commercializing it.
So it's called Northstar.
They are amazing.
We've been able to interface to the Northstar displays.
You can wear this headset on your face and, you know,
walk around and you're running Elixir,
a completely open source system, right?
Completely open source.
You can measure everything in it pretty much, right?
Do whatever research you want to do.
Okay.
Yeah.
And then, of course, you don't need the headset.
You can send the displays on your monitor on the desktop and you won't get the full immersive experience, but you can do research that way. We've taken a lot of trouble to make this super modular. So the
idea is that you can swap in and out components. So if you're running SLAM, which is visual
inertial odometry, which is what you need to figure out where you are in the world, you can
bring in your own SLAM component and put it in. And if you're a SLAM researcher, you can see what is the impact
of this on the end-to-end, right? You're no longer restricted to just looking at how your SLAM does,
but you can actually see what is the benefit of getting better pose accuracy on the images that
the end users see. So it's designed in a very, very modular way. And in fact, we already have multiple components,
multiple implementations of each component of several components that we can swap in and out.
And a lot of the work, there was a lot of engineering that went in to make sure we
have clean interfaces to be able to do this. I mean, lots of flexibility in the system,
lots of ways, lots of things you can measure. But the thing I'm most excited about is that
we recently launched the Elixir Consortium, which is, you know, we're not going to solve this
problem, right? There's no way, okay? There's just so much to be done. Neither are we going to
build the final infrastructure for people to use. So what we've done is launched a consortium where
this is going to be an open system right where the community can
contribute and you know we're inviting people to come and contribute their components to this test
bed so that it can grow and and continue to grow and and literally you know our goal is to democratize
XR research that's what we're after so that's one thing is to have a reference test bed test bed
that grows so the community can do research on it the second thing is to have a reference testbed, a testbed that grows so the community can do research on it.
The second thing is to standardize, so standardize benchmarking.
There's actually no standard ways to compare XR systems today, even in industry.
And so that's why we have several industrial partners who are interested in that aspect of it.
And even from a research point of view, you need to be able to have standard applications
and data sets to be able to run on this system
so that people can compare results and reproduce results.
So that's what we would like to do through this consortium
is to build a consensus around this.
And the third is to just create a community.
So this is such a broad community
and it's so fragmented right now.
We'd like to bring people together. So we have, you know, suggestions for working groups,
you know, open meetings where people can join in, talk about their work related to XR or just
hang out and, you know, see, you know, what's going on and hopefully we make progress.
That's great.
Yeah.
Thanks for a great overview and the Elixir project and the platform itself sounds really
exciting that a lot of people can contribute to.
And the community is also like a very important aspect.
I wanted to expand on one of the themes that you talked about, like for AR, VR, XR, there
is no clear or established set of benchmarks or metrics that people can use to compare
different systems,
different approaches and things like that. Could you maybe expand on that and tell our audience,
so what are the metrics that are of interest for an AR, VR, and XR system, both on the quality side
and I guess on the performance side as well? So one of the key metrics is motion to photon
latency, which is basically the user, let's say the user moved
their head, right?
And so then now the system needs to project the visuals that the user is supposed to see
with this new orientation.
So what is the latency from the time that the user moved their head to the time that
they saw what they were supposed
to see based on this new perspective.
So that's motion, right?
User's movement to photon when you actually flash the photon on the user's screen.
Oh, right.
And does this sort of depend on the scene or what's in the scene and things like that?
So is there like variability in terms of depending on the richness of the scene or how far away the user is from a
given object, things like that. Exactly, exactly, right. So we have a paper where we talk about,
you know, all of these things, right, the whole end-to-end system, you know, it covers metrics,
etc., variability, all of the questions you're asking. And the answer is yes. The motion to
photon latency actually relates to a lot that's going on in the system.
It relates to how quickly I can figure out the user's pose in the system, right?
Because how do you know what to project unless you know where the user's looking, right?
So there's that component, the perception component that I talked about.
It relates to how quickly you're able to render the scene, right, that you need to render.
It relates to the graphics of, you know, the display part, right,
how quickly I'm able to display this.
And then there's also smarts in the system which try to compensate
for the rendering latency.
So we try to sort of predict ahead of time, right,
where the user might actually be looking.
You know, the pose estimation algorithm determined where the user's looking now, but from that
time to the time that the display actually got the rendered scene, time has elapsed,
and the user may be looking elsewhere by that time.
And so there's a mechanism in the system to predict where the user might be looking at this
moment when the display is actually going to show the pixels. So there's that aspect as well.
And then based on the prediction, you reproject the scene using some math so that, you know,
hopefully you're showing exactly the right scene. So the part where you're doing post estimation,
there's a lot of variability there,
depending on how quickly the users are moving their head,
what is around the user in the physical space
so that you can figure out,
there's many different ways to solve this problem,
but there's variability in all of them.
There's variability on the rendering part, right? Where the application is actually, you know, running
the physics, let's say, or, you know, whatever it is that, and that's totally scene dependent.
So there's a lot of variability there. On the backend, where you're doing the reprojection
and where you're actually doing the display, there's, there's less variability of today,
right? But, but if you're doing, yeah. And,, yeah. And so, you know, one of the big parts of our work
is how do you schedule hardware
given that there is so much variability
in what's happening around you.
So the runtime and the scheduling
is a huge component of this whole thing.
And then if you're going to share accelerators, you have to consider this variability in mind.
Right. That sounds like a really challenging problem because even if you did not have to
share an accelerator, let's say you just had individual components that were all accelerating
this, in the presence of variability, that itself becomes a very difficult problem. And I'm sure so
the moment you have a shared resource,
whether it's hardware or any other component of the stack,
it becomes even more challenging.
So how do you build an accelerator?
Like how do you decide which components of your hardware
can actually be shared or are amenable to sharing?
So there could be compute resources that you could share.
There could be memory resources that you can share.
There are communication resources that you can share.
And of course, all of this also depends on the task data flow itself. Like maybe there are two portions of your data flow
graph that are communicating to each other that you could co-locate and things like that. So
how do you sort of wrangle this big problem space? Yeah, so a lot of this is work in progress.
You know, the papers are just being submitted and coming out because, again, we, it was a
conscious choice on our part to spend a lot of time trying to understand the whole system before
we started, you know, coming up with these sort of, with any kind of siloed systems. And also,
another thing we try to do is, again, you know, going back to the previous part of the conversation,
come up with generalizable techniques.
So for example, there's a stack, right?
We don't want to get into the business of saying,
here's a piece of code, map it to an accelerator.
That's sort of doable, like in a one-off kind of thing, right?
What we want to do is build the stack that will be able to automatically look at what are the different aspects of this code and How are you going to represent the software in a way
that you can easily then map to a diverse set of hardware
or even think about how you would design the hardware
for this software?
And for that, you need an intermediate representation.
And that is work that I've drawn on Vikram Adve
and Sasha Misalovich, and we have a collaboration.
So Vikram's been driving this for a while,
intermediate representations for heterogeneous computing.
And so there's a system called HPVM,
heterogeneous parallel virtual machine.
So this is, again, joint work led by Vikram.
So this is an IR that is based on data flow graphs,
and these are hierarchical data flow graphs, where the key is that parallelism is represented at various levels. So since it's
a hierarchical data flow graph, I mean, obviously, we can represent thread-level parallelism. These
are the different parallel nodes in the graph. We can represent nested parallelism. We can
represent streaming parallelism,
you know, SIMD parallelism, etc. And communication is explicitly represented as the edges in the
in the data flow graph. So we use that as our representation, right? And again, this has nothing
to do with XR, this is just a general sort of, you know, the science of developing heterogeneous
systems. So we use that as our representation. And then we've collaborated with folks at Harvard, David Brooks and Gu
Yon-Wei's group, they were looking at building at automated techniques. So they published some
papers on automated techniques for generating accelerators, but they were not looking at
parallelism in that much detail. So they were just looking at basic block level parallelism. Okay. But if you want to accelerate systems,
you know, parallelism is super important. Okay. And so we combine the HPVM representation with
their tools to figure out what is the level of parallelism that makes sense, right? Running a
search algorithm to figure out what is the right level of parallelism
that should be exploited for this code
that's represented in HPVM.
So we ran that on some components of Elixir
and we have results that show
that using these automated tools,
we get much better performance, et cetera,
than using the tools that we had before. It's really trying to divide up
these tasks into pieces and solving them in a generalizable way. So this was just one example
of what we've done in terms of automated accelerator generation. I mean, this is just
one small piece, right? This is just looking at the parallelism. It's not yet looking at the memory
system. That's our next step because the HPVM has the ability to provide,
to make communication explicit.
And we've done a bunch of work in the past on our Spandex coherence protocol
where we have a very clear interface about communication, et cetera.
We want to put these two things together and, you know,
generate automatable ways for expressing the type of communication that makes sense for these data flow graph nodes, etc.
So that sounds like it spans like many layers of the stack.
So one of the recurring themes through your career has been like pulling folks from multiple layers of the stack to sort of talk the same language and sort of reconciling different notions or different paradigms. So this has happened in the memory consistency model side. And it looks like even in the AR, VR space,
you know, you have folks all the way from hardware
to software to, you know, application developers
and users and so on.
Do you have any broad words of wisdom for our listeners
on how do you sort of bring together different communities?
It's obviously a lengthy process.
People talk different languages and things like that.
Yeah, definitely.
And it is true that my career has been about this
type of work, because, you know, I've always thought that this is the world that I feel is
really important and excites me the most. You know, this didn't happen overnight. I like to say this,
right? So most people from the outside, they will see the big successes and ask, how do you do that, right? But behind each big success is a long story
of trials that fail, right? But they failed only to the extent that maybe you have not heard about
that paper. But for me, it was a success because it gave me the experience to do, you know, the
thing that finally succeeded, right? And that's really important.
So in my case, for example, this whole business of Elixir, right?
You might think, oh, you know, there's this cross layer and, you know, all this stuff,
right?
But, you know, I was doing this cross layer stuff in, I think it was 1999.
I mean, it wasn't even a thing.
When I moved to Illinois, one of the things that was really cool,
I was at Rice before I came to Illinois. And Rice was a great place. I really thrived there and did
a lot of really interesting work, awesome students, etc. But the thing about Illinois was it was a
bigger place. And there were people doing things that weren't around at Rice. And so I wanted to
take advantage of that. And I I wanted to take advantage of that.
And I put together a small team of people.
In this case, it was just four, architecture myself,
operating systems, signal processing, and multimedia, and networking.
And I really thought that multimedia was the way to go at that time.
Again, that whole full stack thinking and cross-layer optimizations
were really necessary. And so we built a system that I think it may have been the first in this,
at least in, you know, our closely related communities, that was actually a cross-layer
co-design for performance power and quality of service at that time. The papers, we found it very hard to get
the papers in. That's a whole another story. And so the multimedia community has seen that work.
Okay. I thought it's not that well known in the architecture community, but it was a huge,
huge experience for me because I learned to work with these people who spoke different languages
and we actually put together again a science right so
principles that I use even now right and so even though those papers did not appear in ISCA or
ASCLOS those principles took the test of time and I use them now for Elixir. So going back to your
question of how do you bring people together it's slowly you know piece by piece right it doesn't
just happen one fine day you find yourself doing
something like elixir no right it's it's there's a history here uh same thing with memory models
you know when i was a grad student doing my phd you know a lot of the work on memory models was
hardware centric right just mostly hardware people you know for whatever reason it you know we
understood you know soon enough that this was not just a hardware problem, but a hardware software problem.
And at that time, I wasn't savvy enough to go to the software people.
We did do some collaboration.
And in fact, the inspiration for this came from work that my colleagues, my professor at that time, Bart Miller, who was an OS professor and Rob Netzer.
They were working on data race debugging, et cetera.
And sort of, you know, we got some inspiration from them and the data race free model was
born.
But again, the key thing being the realization that this was a hardware software interface
problem and not just a hardware problem.
And so, you know, that was already the start of hardware software work, right?
It came about because of, you know, talking to people in a very narrow kind of a way.
Nobody in industry bought our idea.
They had a very hard time selling,
quote unquote, selling this work to people.
People could, you know, the hardware people
almost didn't understand what was going on.
And in fact, although the ISCA paper got in,
the journal paper we wrote was rejected multiple times.
It was a, you know, cause of great grief for me.
But it kept at it, kept at it, kept at it,
knowing that this was really the right way,
the right solution.
And then finally found people who got it
from the software community.
And that's a whole another story.
And then magic happened and things came together.
So it doesn't happen overnight.
That's a great story for everybody, Sarita. I
think, you know, a lot of times people do look at some of the leading lights in our community and
just think like, oh man, you know, how do I follow that formula? And you're right. A lot of it is
really, really hidden behind the scenes. And there's a lot of experience that goes along
with building the kind of eventual successes. I kind of wanted to double click on something you were saying, you know,
maybe a little bit all the way back before I have so many things built up from all the wonderful
things that you've been saying. So one was earlier, you were saying, you know, at, at the
moment, your main hop level metric is just like, is the, is the user going to barf? And that's,
but it does seem like a real one, right? Because, you know, this is something that you wear on your head and now you somehow
have to translate this kind of qualitative thing.
Is the user going to barf?
In the end, you have to get it down to something that is more measurable than perhaps, you
know, putting these things and actually measuring whether people barf, right?
So earlier you were talking about the metric of this like motion to photon latency.
And with things like this,
it seems like there's probably a notion
similar to uncanny valleys
where I can imagine putting something on,
I'm looking at a blank white wall.
I turn my head, it's still more blank white wall.
Has it actually changed?
Has it not changed?
Am I gonna get dizzy?
Because maybe it detected,
it's like still white wall and it looks the
same, but somehow we know that it's moved, but it hasn't.
So are there key things that you've been able to determine that lead to barfing, I suppose,
that are measurable beyond this latency thing?
This latency thing clearly must be part of it, but are there others?
Yeah, I mean, the latency thing is a be part of it, but are there others? Yeah. I mean, the, the latency thing is, is a, is a big part of it.
And then there's, you know, there's the variability in it.
Let's say you're taking in an augmented reality situation, you know,
you're taking you're moving one thing from here to there and, and,
you know, you see the judder in there and that, you know,
that doesn't quite work. There's the whole image quality thing.
You know, did I render my image correctly, appropriately or not?
There's a whole range of metrics that honestly,
I'm not even an expert on yet.
I mean, there's a bunch of stuff that the XR community
has still not figured out as to how to measure all of this.
Another point, I think as architects and systems designers,
we tend to go after quantitative metrics and we must continue to do that.
There is no question about that. But in this area,
I think we are going to have to bite the bullet and start to think about user
studies. And that's something we don't know how to do in this community, right?
So that's a new thing coming up. And we literally had to do that for this paper that we first wrote
about Elixir, because the quantitative metrics that we were using, which are all state of the
art, were not showing us the differences that we were seeing ourselves, right, when we use the
system, right? So the image quality image quality from you know versus ground truth from
you know this version of the system to that version of the system uh you know quantitatively
it was just a little bit off but quantitatively you could see are you kidding me this is so much
worse than this right and so uh you know we have a section in the paper that talks about how this is
a user study like in a very informal way just just ask my students in the lab, right, saying this is better than
this, but we're going to have to do this a lot more.
And I think this is just a new frontier for systems researchers.
We don't know how to do these things.
Yeah, that's very interesting because I feel like I've read a number of papers where they
present a new tool and then they'll say like, and then we gave it to some undergrads and they were able to do this, you know, in 10 lines of code in a week without a problem.
And that's kind of like our version of a user study, right?
So that's very interesting.
Yeah, because once you start building full systems, whether they be robotic systems or whatever, there's something in our brains.
I guess this is kind of the uncanny valley question.
There's something in our brains. I guess this is the kind of the uncanny valley question. There's something in our brains that knows, and we don't
necessarily know how to translate that knowing into like pure quantitative pieces. And maybe,
maybe the answer, as you say, is not to, maybe the answer is that we then just have to have
user studies. Yeah. Well, I think it's both, right? So we are very interested in research
on metrics. We are not experts in that area, right? We are not really, we know a lot about XR now, but we are not experts in that area
in terms of actually figuring out how to, or maybe because of the system, this is something
where we will make a contribution, who knows. But it is certainly something that as part of
the consortium, we're trying to push to get more consensus on what these quantitative metrics should be. Because,
I mean, we need both. Yeah. And then the other thing is that to do this kind of end-to-end
user study, you do need to build systems, which is another thing in our community that, you know,
it's changing. Again, that hard but fun part. I've just been fortunate that the last few years,
I've had the opportunity to be involved in two big systems building things.
I talked a lot about Elixir, but we have another project that's also a big, it's a DARPA funded
and a big collaboration with IDM and Harvard and Columbia, where we are building an SOC for
the application of autonomous vehicles. But, you know, the goal there is to build a full stack
of a heterogeneous system with many accelerators, compilation stack, the scheduler, and then,
of course, the application. So that's another big project where I'm actually, you know, for the
first time involved in an actual chip building exercise. that's been again amazing and you know I've
never you know in my career I have not I had not built a system a real system right I mean built
simulators and stuff that's different but a real system until you know recently it's just a whole
different ball game I mean and our community does not have enough respect, I feel, for this type of
work, and it needs to. We've reached an era where simulators are just not going to be enough. FPJs
are great, but, you know, even there, you're limited. And actually taping out a chip is
super hard. So we have a lot of questions in front of our community to figure out how are we going to
evaluate these systems that we need to design? I think that's one very, very interesting question.
Yeah, I agree with you. As someone who's, you know, sitting on the industrial side, you know,
I think there's long been this potential disconnect between academia and industry where
the perception from the academia side is
like industry they just all say no to everything and then the industry feels of academia oh they
just got their heads in the sky and it's because there's this disconnect of realizing that the last
mile of building something is potentially the hardest mile and there's lots and lots of things
to think about and from an industrial, it can't ever not work.
Otherwise you have the whole floating point fiasco
of the 90s or what have you.
And that was exactly why.
So we did this work on coherence
that I think is the right solution
for a heterogeneous system,
but to get industry to buy into something like this,
which really affects the entire system, right? It's not, it's not one part of one accelerator. It's,
it's really what binds everything together. We felt the need to build and to show that,
that we can make this work. So yeah, I think the kind of work that we are doing now, it just,
it's, it's, it's hard, but fun. Yeah. And I think that the thing about building the whole system
too, as well as making it into a science, I think that's important as well, because
there's also a difference from making a system work through hacks, like just like actually going
to hack it together and making a system work through principles that are only deviated from
when you really have to, because you're deep in that last mile. And so from an industrial perspective, a lot of times you want to try and get to something that's
mostly adhering to principles so that when they take it for themselves, they have principles
rather than just like, well, they made it work and it's because they spaghetti this and duct
taped that or what have you. Okay. Well then maybe at this point, we want to transition a little bit
to some of your sort of extracurricular activities, you know, you've done.
They're curricular.
Let's see, let's maybe not directly technical, you know, technical problem solving, other kinds of problem solving activities.
We all know you as very involved in our community as a, not only a technical leading light in a community, but as also a leader of the community itself. And, you know, you've had a lot of service roles and stuff,
but one of the things that has been kind of perhaps different in recent years is your
co-founding of this CARES initiative. Would you maybe take a minute to describe for folks who
don't know what it is, what it is, and then, you know, how it's been going and how you feel,
whether you feel like it's been making an impact on our community?
So the goal of the CARES Committee is to ensure that we don't have discrimination and harassment
in our conferences and other professional events. The mandate has recently been broadened to also include ethical violations. So things like
reviewer ethics, et cetera. ACM, which is the parent organization under which the CARES
program exists, has policies that say explicitly that discrimination and harassment have no place
in our communities. And these you know, these are the
ethics and values by which we want to conduct ourselves. But the problem has been that
reporting violations is hard for people. The investigations take a long time. And,
you know, there's a general sense of nervousness, I think, and lack of trust as to what happens, you know, with these reports, etc.
Because often the people who are targets of such bad behavior are, you know, on the lower sort of rungs of the CARES Committee is to provide a set of friendly, respected, trusted members of the community who are well-known, who are not strangers,
that people can come to for advice as a resource to help them navigate situations that they may find themselves that are clearly in violation of the policies and values
that we all hold dear as a community and that are officially part of the ACM policies. So the goal
of CARES is not to investigate the problem because that requires a level of skill that,
you know, the CARES committee members may not necessarily have. But the goal is to be there for our community to listen,
to be a sounding board for questions, reporting of incidents,
and help navigate the ACM processes to file the reports
and to be there for you while things are being investigated.
And so now that CARES is approaching, it's several years now old, right, as an initiative.
Have you, from your vantage point, seen or do you feel like it's making a difference?
Have you seen an impact it's had on our community?
I think so.
The thing about the CARES committee is that confidentiality is a big part of it.
Right. And so you will never hear from the CARES committee that so-and-so problem was solved.
Right. They're not allowed to talk about it. And because I was the SIGARCH chair, I deliberately kept myself distant from the actual inner workings because, you know, while I'm a sympathizer, I didn't want to create a precedent where the chair of SIGARCH felt that they could be privy to all of the care's workings, right? So what I observed is an observation from the community's viewpoint,
okay, in terms of the impact.
And I believe that it has made an impact.
I see people standing up more,
you know, against behavior that they see
that is unacceptable.
This was not true in the past.
I see more people coming
and talking about their problems, which was not the case in the past. I see more people coming and talking about their problems,
which was not the case in the past. And in general, I see a sensitivity towards issues
related to diversity and inclusion, which is really, you know, a core tenet behind CARES
that was not there in the past. So whether this came about from CARES or not, I don't know.
But definitely there was a huge burst of activity that, you know, Lisa, you were a part of as well
during that time that has sensitized our community. There's no question about that
and moved us forward. There's a lot of work to be done still. I do know of cases that are not being resolved in the way I would like them to. So the workIG, the broader meetings, and pushed for this.
A lot of us did a lot of evangelizing.
And very soon, many of the ACM SIGs started their own CARES committees, patterned after ours.
SIG Micro joined us very soon, SIG Graph, SIG Mod, SIGGRAPH, SIGMOD, SIGCOM, and there's a whole host of other SIGs that are documented
that now have a CARES committee.
And this happened quickly, which is just, it just goes to show that there was a need
for something like this.
Yeah, it's not solving all the problems, but it's definitely filling a need.
So yeah, I'm really happy that this has taken off the way it
has. And another thing I'd like to share with our listeners is that we recently launched CARES in my
academic, in my department, CS at Illinois CARES. And the goals are very similar. It goes beyond
just discrimination and harassment, but other colleagues of mine wrote up a clear code of conduct, our values and code of conduct document.
And the charter of our CARES committee is to ensure that our department holds up those values.
And it's the same thing. We don't investigate.
It's patterned. You go to our website, you see a lot of same-world biogestic archcares.
You know, we don't investigate. We're there as a sounding board to help our community navigate these issues so that everybody
understands that we care and these types of behaviors are completely unacceptable.
Yeah, I think that's wonderful.
And I feel like as a community member that CARES has made a difference.
And it's very interesting to me that it has gone
essentially viral in insofar as much as something like this can go can go viral and it's probably
the most positive thing um that we can say has gone viral in our in our community so so i think
you know i probably speak on behalf of our whole community and saying that this has been a marvelous initiative.
And the fact that it has spread the way it has is,
as you say, reflective of a need.
So that's great.
I want to also sort of, you know,
circle back to a point you made earlier.
So this did not happen overnight either.
It was not easy.
A lot of people involved
and it was definitely not easy to get this.
Yeah. So I think one meta point to maybe draw away from this is, you know, one thing that's
clear when speaking with you is that you are guided by a North Star on multiple fronts,
both technically and morally, ethically, and all that kind of stuff. And once that North Star is
there, you just go towards it. Right. And I think a lot
of times it's very easy to try and do whatever is expedient or do whatever everybody else is doing.
You know, I think in, in, in Kim Hazelwood's episode, she also talked about this from a,
from a research perspective, just like everybody over here, everybody over there, everybody over
here, and just looking for the hot new place so that you can, you know, get your publications out or what have you. But across
life and tech, it does seem very clear that if you have a, a North star to, to kind of guide
what you try and remain tenacious about, then eventually, something does come out at the end that is positive. And so on that note, maybe we can turn around and ask you for general life and career advice
for some of our listeners, because I think it kind of spans career stage and industry
and academia.
Okay.
So I gave a talk recently at the Young Architects Workshop and tried to synthesize some of this and just speaking to what you just said, which, you know, thank you for done that many times, right? And sometimes you
just fall flat, right? And I try to talk about those moments too, so that people realize that
it doesn't always work out, right? And then the other thing was to believe in yourself and to be
passionate. So you have to have the combination of all of these things and you don't always succeed.
And that's okay. If you truly believe in what you're doing and are passionate about what you're doing you will do it
you will do it and you may not succeed now but you will eventually succeed and there's then then
there's that you know believe in yourself thing right so you have to believe in yourself and just
keep going and then the last part of that talk was about the people.
And I started off this whole thing with people.
For me, people are very important.
Surround yourself with people who make you fly.
Just ditch those who bring you down.
Don't spend your time on them.
They're not worth it.
And there will be such people.
We haven't talked about gender issues much here.
You know, if you are in any sort of underrepresented part of the population, you've probably had
people who are not the best for you.
Ditch them.
Surround yourself with people who make you fly.
And then pay it forward.
Don't forget that.
You know, we started this whole thing with how,
you know, there was a perceived dichotomy between service and fun, but services is not service. It's
really, you know, you talked about my extracurricular activities. I don't view these
activities as extracurricular. I think it's part of being in the community. I want a community
that is a kind and encouraging community because that's
what makes me fly, right? So it's very self-serving. Wonderful. That's such great advice. And I think
a lot of times that is one thing that is good to kind of get out there where when you do things
that serve the community, often it does turn around to make it better for you.
And so that's one great way to kind of participate and motivate that kind of work. So this has been
such a wonderful conversation. Professor Sarita Adave, thank you so much for joining us today.
It's been so fun talking to you. I learned a lot of stuff and we're so glad you were able to join
us today. Yeah, it was a wonderful conversation.
Thank you so much for joining us.
And to our listeners, thank you for being with us on the Computer Architecture Podcast.
Till next time, it's goodbye from us.