Computer Architecture Podcast - Ep 10: Physically-constrained Computing Systems with Dr. Brandon Lucia, Carnegie Mellon University
Episode Date: November 19, 2022Dr. Brandon Lucia is a professor in the Department of Electrical and Computer Engineering at Carnegie Mellon University. Prof. Lucia has made significant contributions to enabling capable and reliable... intermittent computing systems, developing techniques that span the hardware-software stack from novel microarchitectures, to programming models and tools. He is a recipient of the IEEE TCCA Young Computer Architect Award, the Sloan Research Fellowship, and several best paper awards.
Transcript
Discussion (0)
Hi and welcome to the Computer Architecture Podcast, a show that brings you closer to
cutting-edge work in computer architecture and the remarkable people behind it. We are your hosts,
I'm Suvainai Subramanian. And I'm Lisa Xu. Today we have with us Professor Brandon Lucia,
who is a professor in the Department of Electrical and Computer Engineering at Carnegie Mellon
University. Professor Lucia has made significant contributions to enabling
capable and reliable intermittent computing systems, developing techniques that span the
hardware-software stack from novel microarchitectures to programming models and tools. He is a recipient
of the IEEE TCCA Young Computer Architect Award, the Sloan Research Fellowship, and several Best
Paper Awards. Today, we're really excited to have him here to talk to us about physically constrained computing systems, including
intermittent and orbital edge computing. A quick disclaimer that all views shared
on the show are the opinions of individuals and do not reflect the views of the organizations Brandon, welcome to the podcast. We're really happy to have you here.
Yeah, it's really wonderful to be here. Thanks for having me on today.
Yeah, we're super excited to talk with you. And so what's getting you up in the morning these days?
What's getting me up in the morning these days? Well, generally in my life, what's getting me up in the morning is doing yoga, which I've started doing basically every day.
And it's sometimes it's like the best part of my day. And I think professionally,
what's getting me up in the morning is everyone being back in physical spaces again and having
having the ability to work with my students in person again. It was a long couple of years where
we weren't doing that. And it's really awesome to be back in front of like a whiteboard and like
doing research now that everyone's kind of back in physical spaces again. So that's been really,
really good. It's getting me feeling excited about getting up in the morning every day.
Good to hear. So did you have any students that came on to your team, your group rather,
in the middle of the pandemic that you didn't get to meet until recently, just out of curiosity?
I didn't have any students that joined and I was unable to meet in general. We've been
doing things like during the pandemic, we got kind of creative. We would have meetings outdoors in
the park. There's like really, Pittsburgh has like really good parklands right around CMU. So we would, you know, sort of work
in our offices alone with closed doors and masks and windows open and ventilation and all that
stuff, or just stay home for a long time. We're just staying home. But we were, yeah, we were
having meetings outdoors. We got creative with it. There were a lot of tents up around campus. So
I was able to, I was able to, you know, more or less interact with the
students that joined my lab, even the ones during the pandemic. That's good to hear. That's good to
hear. So speaking of getting creative, it seems like, you know, what we'd really love to talk
to you about is some of the stuff that you've been working on. They're kind of unified by this
notion of being physically constrained systems where, you know, you have really, really, really
limited physical resources that you have
to work with, which I assume requires getting really creative with how you utilize those
resources in order to get your goals accomplished and what you want the devices to do. So maybe you
can start us off a little bit by just telling us what it means to be physically constrained,
and then these two examples that you've been working on specifically like intermittent and orbital space-based
systems?
Yeah, yeah.
So physically constrained systems are ones where something about the environment means
you can't have more resources.
So in an intermittent system, that's a system that's extremely physically constrained by
the amount of energy that's available.
So in an intermittent system, we assume that you have some geometric constraints.
So the device has to be physically pretty small.
So, you know, think like sensor systems
with computers attached to them.
And that you're collecting your energy
from the environment.
And if you're very small
and trying to collect energy from the environment,
you might use a solar panel,
but your solar panels also like,
it's a really small solar panel.
So you're not getting that much power from it.
Other examples exist where you can, you you know harvest radio waves and use that to power the computer system
intermittent intermittently and so the physical constraints of the environment the need to be
physically and geometrically small and then the consequence of that which is the inability to get
lots of energy into the system that makes for some really interesting computer systems problems.
That problem in the small in intermittent systems systems, really shows up again if you shoot
your computer systems into space.
Things are bigger, things are geometrically bigger, but the types of computations that
we want to do are also bigger.
You have a similar problem where you're constrained in size.
On the CubeSat, which is a kind of standard unit of small satellites that are getting sent up into
space these days. You can cover it with solar panels, but you can only get so many solar panels
on there without having some complicated mechanical apparatus. So we kind of have an
incarnation of the same problem where we're sort of geometrically constrained. And that constrains
the amount of power, which constrains, essentially it constrains the efficiency of the system. You
need to be so efficient if you want to do some useful quantum of work
that's dictated by your application.
So I'm really drawn to these problems because it's like you have a new set
of constraints that aren't maybe the typical ones.
Like we think a lot about finish within a deadline or finish without consuming
too much memory in the system, things like that that are, I don't know,
kind of computery constraints. And these physical constraints are something about the world that says, you know,
you have to deal with this, otherwise you're not solving the problem. So I think it's really
fascinating. And it's really like inspiring to work on stuff where something about the world
just says, this is how you have to do it. So that's what I think is really interesting about
these physically constrained devices, like tiny intermittent systems and little satellites.
That's quite exciting.
And you talked about a few different characteristics of these systems who are physically constrained.
Intermittence is one of the themes that shows up.
Can you maybe tell our listeners about how do these differ from conventional computing?
How do these constraints affect the way you think about how you design systems for these
particular applications?
Yeah, that's a really good question.
I guess the first most important thing,
given what I just described these as,
is usually we have to think about energy first.
And so in a lot of systems, the system designer
figures out the application requirements
and says, we're going to make this go super fast.
And also, if we can save some energy, that might be nice too. And then, so that's the sort of structure of the optimization
process for your application. You know, you get your requirements and you figure out how to make
it go fast or use less memory or whatever. And then all the way at the end, a lot of times it's
like, well, it'd be nice if they use so much energy and that's changing. That's changing.
There are more examples of systems today where people are putting energy first. And that,
you know, happens in the data center and everywhere else.
For a long time in my career, that wasn't really true.
Everything was about bottom line performance and then, oh, energy is nice too.
The big thing that we've been focused on, and this is one of the main physical constraints
that you put on a system is energy efficiency.
Thinking about that first, and it's not just because using less energy is intellectually
pleasing or we can pat ourselves on the back for consuming a little bit less of the battery or something like that.
In a setup like an internet system, if you're not energy efficient, the application doesn't work because you're so heavily constrained on the availability of energy that if you're not efficient, you just can't do the work at all.
Which is fundamentally different from building large scale systems where you want
to decrease your power bill in the data center or something.
That's an admirable goal.
I think we should decrease the power bill in the data center.
I think that's another very good area of research.
But in these cases, it's not that the power bill goes down or the battery lasts another
40 minutes or something.
It's actually that if you're not energy efficient, the application just doesn't work. The same goes for satellites, really. If you're orbiting Earth, you can think
through the setup of the problem and see that this is true. You're orbiting Earth, so you have
about 45 minutes on the sunny side while you're flying around Earth, little cube satellite.
And if you're a very small device like a pocket cube satellite, you get hundreds of milliwatts,
like a watt on a good day kind of thing.
Depends on the orientation of the satellite solar panels and how they're pointing at the
sun.
And so you have only that much energy integrated over the 45 minute period of the orbit.
And you might have a little bit of energy storage.
And the systems we've been building, we've tried to get rid of using batteries.
There's some complexity in building systems around batteries.
And so generally, I like to not use batteries.
And so you have a limited energy storage reservoir.
So now you sort of have to optimize your system design under that constraint.
You have so much income power.
You have so much power consumption.
And you know that you're only going to be getting power for 45 minutes um and then you know during one orbital period you might want to be capturing images of
you know the atmosphere capturing images of clouds or the oceans or things like that
and uh you know piling up a bunch of data or processing it on orbit that's what we've been
looking at is how to process it on orbit and so that imposes like some amount of power consumption
um there's also in order to make the system useful,
there's some kind of minimum amount of computing
you have to do along with that.
So it's not, it's like if we can't process the entire image,
then the system won't be useful.
So you have to balance those constraints
against one another.
And we do have to think about sort of minimum
acceptable level of performance,
but we have to do that under this strict energy constraint,
where if we're not efficient, then the system doesn't work. We can't do the work
because we're so constrained by power and energy. So that I think is really, that's the biggest
fundamental difference is that if we don't think about energy first, our systems just won't work
at all. And I think that that's a really interesting constraint to work under, and it's really
motivating for me. Yeah, that's a great overview constraint to work under. And it's really motivating for me.
SIDDHARTH RAO, Yeah, that's a great overview
of the different kinds of problems and challenges.
You touched upon several different themes here.
I'll just sort of paraphrase them.
So you talked about energy first.
There were considerations around both power and the energy
cadence and the total amount of energy that's available.
I think there's an interesting aspect of the delivery system
itself.
You talked about batteries versus solar panels are not having batteries at all and that
impacts how you sort of deliver and store energy for these computing applications. So there are a
few different themes. Maybe we can sort of click on some of these things and see what that means
for the system design. So obviously in the age of accelerators, one ready thing that people think
about is specialization
techniques all the way from microarchitecture all the way up to programming models and so on.
So maybe that's one particular aspect, but it sounds like the energy system, the harvesting
system, the storage system, and the distributed system is a pretty unique consideration given
the landscape where these systems are deployed. So how does that sort of intersect with the system
design space?
Outside of specialization, what other things
do you have to think about in terms of,
do you have solar panels?
What cadence do they actually supply energy at?
Do you have a regular cadence?
Is it like a bursty?
And how does that affect the way you organize your tasks
and programs and the considerations
when you're running a particular application?
Because you have to do some quantum of work to make it useful before you can actually move on.
So how do you think about that very broadly? Yeah, there's a lot there. I'm glad you mentioned
specialization. That's a topic that we've been thinking about, you know, in these constrained
systems, what is the role of specialization recently? And you're right to point out everyone
in the architecture community is I'm sure aware of this,
the era accelerators we're inhabiting right now,
everything is turning into an accelerator
and for each different variety of machine learning kernel
that people have decided is important,
there's another accelerator out there that's available.
I think that's really cool.
And I think it's been a really interesting time
to watch the computer architecture community evolve.
And I can think back to like, oh, maybe 2012, 2013.
I forget when the first round
of these deep neural net accelerator papers
was really hitting the scene.
It was right, I think right at the end of grad school,
something like that.
And it was really exciting
to see these new kinds of architectures that they don't,
it's like, what is this? It doesn't have an ISA. How do I think about this? And that was a really interesting thing to see these new kinds of architectures that they don't it's like what is this it doesn't
have an ISA how do I think about this and that was a really interesting thing to see happen
we have been kind of steering away from that a little bit I think there is a role for
specialization and I think specialization is an important way for systems that are designed to
do one task really well to get a lot of performance and to get a lot of efficiency.
So specialization is a really important tool for architects, and I think it needs to be
in everyone's toolbox.
We've been looking at, in these systems especially where you're highly energy constrained, and
in some cases you have these performance requirements like in the satellite, we've been looking how we can not rely only on specialization to make systems energy efficient.
And we've got a group of collaborators.
This is work I'm doing with my collaborator, Nathan Beckman, here at CMU.
We've been leading an effort around a new architecture.
It's a CGRA architecture, and it's designed to be extremely energy efficient.
And so we sort of applied this energy first design principle to putting together a CGRA.
And so there's lots of funny choices since we were thinking about energy first. There's lots
of choices that you probably wouldn't make in a larger scale CGRA system where you're looking for
high performance. Just as a simple example, we don't do any fine-grained time multiplexing
on the processing elements inside of our CGRA
architecture.
That's an odd choice if you're looking at high performance,
because in high performance design,
you'd want to be multiplexing lots of operations
on your processing elements.
And so that's just one small example
of where we're making choices that are guided by optimizing
for energy first.
We actually change things in the microarchitecture that you...
We make choices in the microarchitecture you probably wouldn't make in a larger scale design.
These choices, this whole design exercise around this architecture was informed by physically
constrained deployment scenarios.
So it's like if you want to be processing images as they come in off of a camera and
you are... You want to go 10 years on a AA battery battery or you want to be operating off of a solar panel or very small amounts of energy, then you have to put this efficiency first.
And that's literally the thread we followed to get to the design that we landed on is looking at the set of physical constraints up front.
We looked at the data rate. We looked at the power consumption of what's up there now.
And we said, here's how far off we are how much do we need to optimize and then we sort of moved forward
through a series of designs that you know we learned lessons from each one and we moved on to
the next one and we have a prototype cgra now that that is uh is extremely efficient but it avoids
getting uh overly specialized for any of the workloads that it might execute so you mentioned
specialization i think specialization is great but i think that if we can get the level of efficiency that we need
with something like a CGRA that supports a general purpose class of workloads without being overly
specialized, I think that that's a good thing because we don't know what the next important
computation is going to be. So we want to support everything. So Brandon, what you just said there,
like it almost made, like when you started talking
about specialization versus not, it made me almost feel like this is a little bit tongue
in cheek that, you know, if you're going to send something up into space, you want to
give them like a buck knife and some duct tape.
That's, that's what you need, right?
You could have something that like maybe makes your toast really, you know, perfectly crisp
or whatever, one for your toast,
one for your bagel, one for your coffee or whatever. But in the end, like in that really,
really constrained world, it's, it almost sounds like to me, like you're trying to build
the duct tape and the, and the buck knife so that, so that you can be like really,
really generic, really efficient, really simple. And so that is an interesting choice. So, but
at the same time, the thing that you said about this, the CGRA, where you don't multiply the multiplex operations, and then you immediately started talking about image processing.
You know, so in, in some of my previous lives, you know, when we talk about image processing, there's a certain frame rate that you have to, that you have to meet, right?
So like you were saying, like, if you don't't meet it you might as well not do it and image processing is one of those where it seems like there's got to be a certain amount of processing
done in order to to make the image processing useful and then there's definitely a certain
number of pixels that have to get you have to get through and to do that without any multiplexing at
all seems to me like a potentially difficult prospect because you could you could have a
four pixel image maybe but maybe that's not a very interesting like do you ever get to the point where you're like you know we just
can't we just can't do this or we can only do it if we make this i don't know cube salad light
uh 30 bigger and we just we can't do that like have you faced a problem like that or can you
have you always been able to sort of creative your way around? Yeah, that's a really good question.
So I want to go back to something you said, this buck knife and duct tape thing.
I want buck knife, duct tape, and system will live and die by the ability to
compile this broad set of programs down to it without having to go through a lot of system
specific pain to do that. The other thing you asked about is a really apt question. You do need
to hit some minimum level of performance and that whether that's in an intermittent camera system
that you hang on a tree and look for hedgehogs, or if it's in a satellite where you're collecting a new frame that's an image of Earth every 1.7 seconds, which is that's actually a number we have to deal with.
You look down at Earth and you're flying around at 450 kilometers up from the surface of Earth and every 1.7 seconds you get a new frame. In the first case, in intermittent systems,
one trick we can play, because this is a unique domain,
a constrained domain, where we can actually
decrease the frame rate.
And in a lot of cases, the applications
that we would support still work with a decreased frame rate.
So a lot of camera systems would go 30 or 60 frames per second,
depending on what you're trying to do. But if you're monitoring traffic on a road, frame rate. So a lot of camera systems would go, you know, 30 or 60 frames per second, depending
on what you're trying to do. But if you're monitoring traffic on a road, or if you're,
you know, looking for flooding in crawl spaces, or if you're trying to spot rodents in a warehouse,
or whatever, like these kinds of applications where you might do like pervasive long lived
deployments of cameras, once a second is okay, once every five seconds might be okay.
Then, you know, you start to hit some thresholds. So say you're looking for birds on your rooftop. That's
just a random example that you might care about. You might want to know if there's pests in or
around your house. If you take an image every 45 seconds, that might not be so useful. This
is something you have to distill out from whoever is defining your application.
It's good to fit a broad range of applications.
And that means you have to have as much efficiency as possible
to make it feasible to even run these programs at all.
But then also, you have to deliver performance
that's acceptable to the applications.
And some of them just won't work.
Some of them, you will hit a wall.
You'll need to change something.
It could mean that you need to figure out a way to scale. That's a tricky thing to scale up when you're in this
extremely low power, power constrained and energy constrained operating regime. Scaling up is tough
because we don't want to increase the power consumption. And that's usually what happens
when you add more resources. Another thing we can do, I mean, we can increase the amount
of energy that we let the system store and then
run as like a burst.
And so this is like another strategy.
We can't do 30 frames per second all the time,
but we can do 30 frames per second for a little while,
for two seconds.
We found something really, really important
we want to look for.
But in order to do that, you need to actually imagine
a system where you have an energy
harvester and a capacitor.
You want to fill up this whole capacitor,
and then you can just slam through all the energy
all at once doing high frame rate processing.
If you do that, then you have to turn the whole system off
and charge up again, or you have to heavily duty cycle, and you can only do a very low average power operating in the wake of that burst of operations that you did.
Those are both tricks that you can play. I think in general, scale is tough, though, because you generally add more resources, and the more resources we add, the higher the power consumption. And so it becomes difficult to maintain the level
of efficiency that we need.
So another thing that we can do,
if you look at the satellite use case is,
we can actually use the design of a distributed system
to make up for the,
if you have deficient performance on a single satellite,
we can build out a distributed system
that can share the work.
So something we looked at in our 2020 paper
on orbital edge computing is you
have data arriving and there's way more data than we could feasibly send to
the ground. So that's not an option. And there's actually,
as the system was when we designed it,
we have matched the power consumption of our compute to the input power.
So we were just able to, you know,
just use up our energy when we went into the eclipse.
And if you have just that amount of compute, to the input power. So we were just able to, you know, just use up our energy when we went into the eclipse.
And if you have just that amount of compute, it's actually more data that we collect in each frame
than we can process before we get to the next frame.
So you only have like two-ish seconds to process each frame.
And so it's really hard to keep up.
And so if you just launch one satellite,
you actually end up missing some of the frames,
some of the images.
And what we did instead of that is we assume that you have 10 or 50 or 100 small satellites,
which is, by the way, way cheaper than launching one really big satellite.
Like orders of magnitude less money to launch 10 or 50 small satellites.
So it's still a good thing to do, even though you're increasing the amount of hardware
that you're putting into orbit.
But if you get those satellites up there
and you tell them to work together,
each of them can, for example,
each of them can grab the same frame.
And the first one in the group says,
I'll handle, you know, tile image up.
And the first one, I'll handle the first few tiles,
the first four or whatever.
And the second one in line says, okay,
I'll take the same image and grab the second few tiles. And so four or whatever. And the second one in line says, OK, I'll take the same image
and grab the second few tiles.
And so as long as there's not something happening on the ground
that changes very quickly, like it would change significantly
in the time between when satellites
were looking at that same spot, they can actually
distribute the work without having to communicate at all
even, which is a really cool aspect of this.
Normally, you think of a distributed system
having some communication medium so
that the satellites or distributed components
can coordinate.
Here, we don't even need to have that.
We just say, when you hit this GPS coordinate,
well, grab another frame.
And you know, you, the satellite,
know that you're always responsible for these set
of pixels.
And so one of the common things that we would do to images
is try and segment out parts of the common things that we would do to images is try and segment
out parts of the images that are clouds and then look for objects on the ground. Like, you know,
look for, we always think of, you know, sort of search and rescue mission, look for the plane
wreckage at sea or something like that, or look for the signs of wildfires. And so for that,
you're running kind of, you know, multiple neural nets or other kinds of models on these images.
And so the compute there can be pretty heavy lifting.
And so we find that you actually, to get coverage of your entire orbit, you need to distribute
the computation in this way.
There's not really a way to do it locally on a single satellite.
The level of efficiency would have to be extremely, extremely high, and you'd still have to be
hitting a good performance target.
So it's just a really tough problem to solve
inside of a single satellite.
We haven't given up on that.
We're still optimizing for the single satellite case
to get as far as we can,
because that benefits the distributed case anyway.
But doing it in a distributed system
in the way that I just described
is a way of kind of getting around the problem
by using system design
instead of just microarchitecture design
to solve the problem,
which is, I think that's in general,
a good thing to do.
Think about how you can attack the problem
from a different layer of the system stack.
And it might actually be easier if you do that.
So that's how we deal with these kinds of performance challenges
that you get in these highly constrained systems.
Yeah, the energy performance and capability trade-off curve
is definitely really interesting.
I wanted to circle back to one of the annotations
that you made to Lisa's Buck and Knife analogy, which is having a compiler as well. So maybe we
can double click on the compiler and associated tooling for these kind of systems that you build.
So in an energy-first world, what tools do you need to equip a programmer with? What sort of
guidance do you need to give to a programmer so that they can construct their programs or break up a program into tasks and things like that,
that matches the constraints that's available in the system. For example, if you're coming from a
performance-first world, you could annotate the total number of flops, how much bytes are you
accessing in a particular kernel, and you could use that to determine, okay, what's the performance
that you can hit? And that's feedback that you can give to the programmer that's one way that the
programmer can reason about things in an energy first world where you know that you know you only
have this much energy to work with these many jewels or this many milliwatts of power that's
available uh what sort of feedback does your compiler need to provide like what sort of tools
do you need to provide to the programmer so that they can structure their computations and know that, OK, this unit of work
can be completed within some period of time?
Or if not, like, I need to break it up into further tasks
or annotate it in some other way.
What sort of feedback loop do you need?
Like, what do the tools look like here?
What does the compiler need to do?
How do you think about static program analysis
and other kinds of techniques in this particular space?
This is a really cool question.
And there's some things that we've done and some things that I wish someone would do.
And maybe we'll do them eventually.
The question about what tools does a programmer need?
It really depends on what the programmer is trying to do.
If the solution to the problem is like I described before, you know, this is one solution to
the problem of dealing with highly power constrained
and energy constrained systems.
If the solution is to develop a new coarse-grained
reconfigurable array architecture that
has this funny hardware software interface,
the compiler needs to exist.
And that's like sort of step zero.
We need to have a compiler that lets you write some Rust code
or whatever the fashionable language in five years
is going to be,
and then target that to your system.
And if you don't have that, then the thing
won't get out of the starting gate.
So just as a minimum, having the ability
to compile general purpose code.
And that's hard, actually.
Compiling very general outer loops
for the regular control flow and irregular sparse memory access
patterns to any old architecture, CGRA or otherwise, that's a tough problem.
That's an open problem, actually.
And I think that it's a really important one
if we care about energy efficiency.
Because being able to target the system at all
and then being able to generate code that will execute
efficiently on the system, I think
that that's something we need to be thinking about as a research
community.
And people are.
Absolutely, people are thinking about this problem
in the research community.
So the first thing is just having a compiler that works. The second thing is,
is maybe having a compiler that, that lets the programmer, helps the programmer understand their,
their use of energy. And this is a problem I, I feel like we've come back to this problem every
six months for, for like five years or more. The problem is I have an arbitrary block of code
in my program, and I would really
like to know how much energy will that consume on my system.
And that's a pretty general question.
And of course, you're probably thinking right now,
well, that depends on the microarchitecture,
and it depends on the state of the cache
when I start executing this part of the program.
And so of course, any model that's
going to be useful to make that kind of estimate
needs to account for lots of things that
are within the microarchitecture.
But in the systems that we look at,
that kind of tool also needs to think about externalities,
like what's the state of all of the peripheral sensor devices
when we start executing this code?
And what is the state of the environment because sometimes if you have some power systems some power systems will vary in their
efficiency their ability to deliver power to the system based on the amount
of incoming power in the environment and so you have sort of funny effects where
you you sort of you have more loss depending on environmental conditions
through the power system so building a software analysis and exposing the right amount of information from the system to
make the analysis possible, that's a really tricky problem to solve because we don't know exactly
what information we need to expose and then operationalizing it inside of a compiler to
produce a useful estimate. You know, this block of code will take two millijoules.
Done.
That's a really tough thing.
My view on this, and this is, we had a paper in 2018
on this topic, but this is kind of my relatively unproven view,
I guess, is that we have to think about this problem,
I want to say probabilistically, or in terms
of the distribution of behavior that we might see at runtime.
And that may be enough to help the programmer understand how they need to change their program so that it's either more efficient or that it works within the constraints of their system.
I don't think we'll ever have a perfect tool that you give it a piece of C code and it says for that system, it'll take two millijoules.
I just think that's an unrealistic goal.
But I think a more refined view, and this might be possible, we're working on this problem.
We don't have a solution yet, but I think it's a very fascinating space to be working in,
is to build up a distributional model. Here's the set of behavior that we might see in the actual environment. And here's the set of environmental characteristics that we care about.
And you put this all together and it might become part of a programming model or become part of a
tool flow that
goes all the way from the language
down to the hardware software interface.
And it may include microarchitectural changes
where we need more information from the system
to sort of buttress the software pieces
to make all this possible.
So I think that's some of the compiler challenges
that I see that are unique to this space.
We do care a lot about energy.
And we have some peculiar constraints
that require information to flow from the language
to the architecture, from the architecture to the language.
And we need to match all of that stuff
together with analyses that give the programmer something useful
so they can make their program better.
Or better than that, even, analyses
that allow the compiler to automatically make your program better. better than that even analyses that allow the compiler to automatically
make your program better wouldn't that be nice and so there's there's ways to do that um that
i think are are going to be important going forward as these systems find more find more
users uh in the world that was really interesting brandon so when you're talking about like you know
this block of code and coming up with like a statistical view of how much energy this particular buckle code depending on you know because depending on
the state of the system which is now much bigger than what computer architects have historically
thought about which is just like the micro architecture itself but now the system which
is like everything you know is it cloudy or whatever all that kind of stuff do you think
though that like in this particular would look in in in how we
reason about the model say like oh this code is good enough you know a lot of the things that we
think about in the data center are thinking about things like oh the p95 or the p98 or p99.5 or what
have you for this kind of thing where if you if it's too much energy the thing does not work
do you have to go do you think you'll have to go so extreme to say like, okay, I need this to be
100, it may be probabilistic and here's maybe an average case piece, but I need to understand
exactly what the max, like the very, very worst case is and try and avoid that worst case.
Because whether you design around P50, whether you design around P99, or whether you design
around absolute worst case are very different ways to think about it.
And what you've said earlier is that
if you run out of energy, it doesn't work.
It seems like you have to pull yourself
all the way to the extreme,
in which case is a probabilistic model helpful?
Great question.
The state of the world today
is that a programmer writes a program
and then they say, well, I hope this works.
And then they deploy it in 10 000 devices and
hopefully it works so they have no way of predicting this at all um i think getting a
precise estimate um is is hard and in a lot of cases it's infeasible but when you have a
non-trivially complex system i think it's it's an infeasible analysis because there's too many
factors um and they they just sort of all crash into each other and make it really difficult to produce like
single number estimates. When the question is energy, you are right to observe that if you run
out of energy then the system will have to stop and wait for a long time, which might not be
catastrophic, or if you're in a tiny intermittent system, it might just mean power off and hopefully we get more energy later.
So there's a few things you can do.
My student, Kiwan Meng, had a paper on this
where if something seems like it's not
to make any more progress because you keep running out
of energy.
So the setup here is you have some block of code
and you know that it needs to execute atomically.
So it needs to happen all at once.
And hopefully your system is provisioned with enough energy to run that block all at once. Well, if it
seems like it's not making any progress, then you have to do something else. That was the
realization that we had. And so one thing you can do is sort of the default, which is
just keep banging your head against the wall and it'll run and be stuck forever. That's
not a very good way to design the system though. So another thing you can do is have an approximation procedure
for your algorithm.
So you can say if you're doing, I don't know, for example,
if you're processing an image, you
might subsample the pixels in the image
and process effectively a smaller image.
But you can make progress through this iteration
of the loop.
And then maybe the next time around the loop,
the power conditions improve.
And so you have more energy at your disposal uh to do the work um so that kind of thing is is uh you
know thinking about energy but then also thinking about you know the software runtime system um
that's something where the programmer would need to know what is the amount of energy that i expect
this piece of the program to consume um in order to make choices about those refinements, those sort of degradation options where you would tune down the quality of the result to decrease
the amount of energy that it would consume.
If you have no estimate of how much energy the thing is going to consume, though, then
it's very hard to make those kinds of judgments.
And that really is the state of the world today.
There's just not good tools for making any kind of estimate at all.
But I think that having the distributional behavior
will help understand if there is maybe an outlier case
that you should be aware of,
even if you don't see it during your testing,
when you have the thing on your lab bench,
that's when you can make changes to it.
If you know that there is this,
there's modes in your distribution
and you know there is this one mode
that's way over there on the right,
and it's a very high energy consumer,
even if it's rare,
you probably want to have some
kind of mitigation baked into your system. It could just be a watchdog timer that notices that
you're stuck and then restarts the whole thing periodically. But that's not very sophisticated
and it means you're giving up. So it's kind of interesting to think about how you can identify
those modes in advance, even if they're rare, and then accommodate them in the runtime system.
And again, this is kind of an open problem,
because producing these sort of distributional estimates
is something that we can't really do with the tools today.
And then reacting to them is only something
that people are beginning to build systems that
can handle that kind of work.
Yeah, it sounds like approximation
is a useful tool in your toolkit for these applications.
Another technique that comes to mind
is checkpointing and then continuing execution from there
once you have the energy.
Is that something that's commonly deployed?
Are there any unique challenges for checkpointing and recovery
for these kind of systems?
Because it sounds like it's bread and butter for systems
where you're energy constrained.
Yeah, so checkpointing is actually a part of a lot of intermittent systems. In order to make progress
when you're energy constrained like this, if you ever anticipate the system will run out of energy,
there needs to be some strategy in place to maintain just basic forward progress in your
application. And so in order to do that, you need to, there's lots of different ways to do it.
People have been studying this since, I mean, yeah, for a long time now.
But looking at the pieces of state that you need to capture, sometimes it's just the register file and sometimes it's, you know, the register and registers and stack and whatever.
You can do it with software techniques and you can do it with little hardware widgets that we can add to, we can add to the microarchitecture.
The ISCA best paper this year actually was a mechanism on making better architectural support for backups.
That wasn't my work.
That was Joshua San Miguel's lab produced that work,
and I think it's very interesting stuff.
You know, checkpointing is really a foundational technique
in intermittent systems.
If you don't have some mechanism for
checkpointing, then you can't make any forward progress. Things get a little more interesting
when you need to do a whole bunch of work atomically. And atomically here means you
can't take a checkpoint in the middle of it. You have to do it all at once while the system is
turned on. And so you can think about reasons you might want to do that. So you're grabbing data
from multiple sensors. You want to process it and then spit something out the radio to send an alert because those sensors said there's something that's interesting here.
So if you need things to happen atomically like that, then checkpointing actually doesn't work.
If you take a checkpoint, it might be the worst case scenario where you collect your sensor readings and then take a checkpoint.
And then maybe the device turns off at that point. And maybe it's off for like 10 minutes.
And maybe those sensor readings are now totally outdated
and don't mean anything.
Well, if you've checkpointed them,
when the device turns back on, what happens
is you run into the end of that region that really
should have been atomic.
And so you send your radio message,
and that doesn't correspond to reality anymore
because you have this long delay in the middle.
So there, checkpointing is actually a way to break the system. The model that we've converged on
in a lot of the systems that we're building these days, and I'm not saying this is the right or the
only model, this is just the one that we keep coming back to because it's useful for us,
is to do checkpoints when the programmer allows us to. So as long as something hasn't been marked as
an atomic region, we can grab a checkpoint right before the system turns off. And that's usually
sufficient. You can make sure that everything is, you know, all the states lined up and correct and
your non-volatile memory is correctly persisted because you treat the checkpoint as a persist
point. But then when you have work that needs to happen atomically like that,
the programmer needs to annotate those regions into their program.
There's not really any way to infer that
because that's something the programmer says is a property of their application.
So when they have those kinds of regions,
then we need to be more careful with how we match the application to the power system.
If the atomic region will consume too much energy or if the atomic region will consume too much energy,
or if the atomic region may consume too much energy,
those are cases we need to look out for.
The latter case, where it may consume too much energy,
is actually a pretty interesting problem.
You might see it work every single time on your lab bench
using your benchtop energy harvesting setup.
But then when you deploy the system,
maybe some some input
is a little bit different and things are consuming different amount of power and now your atomic
region which must execute to completion it requires more energy than you'll ever have
stored in your energy buffer on your system so you're kind of stuck and that's a really kind of
that's a tricky problem for an analysis to to the programmer get around I think it's an interesting one the the problem of
the need to do atomic atomic work like that because it comes right back around
to what got me interested in this whole area to begin with usually those things
are tied to a physical constraint of the system those two sensors that I
mentioned maybe you have to always collect those two sensor values together
and that's because of something in the environment. If you're monitoring temperature and pressure together,
the reason you need to collect those together is because those properties are coupled by the
environment in which you deploy the system. So it's a physical constraint that entails the atomic
region to begin with. And that's what gives us this problem to solve. So I think that's an
interesting aspect of this problem too. It's usually because of some environmental physical
constraint that we have to do atomic work like that.
So I was sort of interested in what the debug cycle looks like.
You were talking about how currently the state of the world
is programmers just write an application, deploy it,
and hope that it works.
What does the debug cycle look like today?
And ideally, what would it look like? I would say that the debug cycle look like today and ideally what would it look like?
I would say that the debug cycle for intermittent embedded systems,
it has all the punishment of the debug cycle for embedded systems and then it has the additional
level of punishment of having to deal with energy and the power system and the environment on top of that.
So there's this additional complexity
that you're unfortunately forced to face.
And sometimes things just work.
Sometimes you don't have these killer books
and then sometimes it's not that way.
And you have a program that takes too much energy
and you don't know why.
And so you have to go through and figure out how did my sensors get misconfigured at this point in the
code? How did I end up in this loop for more iterations than I anticipated? Reasoning about
my program in advance, I thought that we wouldn't be able to get stuck here and consume this much
energy. Like I was saying before, the tools for understanding energy consumption are fairly
rudimentary and there really isn't a good standard way to debug the energy consumption of a program,
although there are lots of people working on that problem. I don't mean to take anything away from
that because there's an active area of research in the compilers and systems community and there's a
lot of good work going on there, but there's not a standard way to debug the energy
consumption or the power consumption of a system today and I think it will be useful
to have tools in the future that let you do things like an energy efficiency
regression test you know we don't have that concept now but it would be nice if
I change my program to know well hey something just got less efficient are we
badly using micro architectural resources in a way that we didn't anticipate? Or did you just misconfigure
sensors? Or, you know, even which level of abstraction to look at is a question there
for that kind of thing. But having some framework to think about this would be very useful. And I
don't think there is a standard way of doing that right now. I think it's interesting that you
mentioned that because a parallel on the other end of the spectrum, which is like high performance computing or data centers, is that energy efficiency is becoming increasingly important.
Then you have things like machine learning computing, which is pretty, it does consume a lot of power.
And there are concerns around like, can we keep going at this rate forever?
And there's a renewed push towards sort of measuring the power consumption of the models
that you train, for example, right?
And accounting for those things very carefully.
Of course, it's not the same set of parameters as on the edge side, but the broad strokes
are similar.
Like think about energy, think about power very carefully from the get-go, have tools
that sort of enable you to reason about the power consumption of your system and so on and so forth so there might be some
interesting convergence of themes over here and I'm very curious to see how
this shapes up yeah I think generally it is just it's a nice it's nice to imagine
a tool that helps you understand if I make this change in my program what
happens to the total amount of energy that the system will consume?
And that applies very generally. That's a very broad problem statement.
I would like it if there was a tool that solved that for all domains.
I have a feeling that there will be different constraints depending on which domain you look at.
You know, the data center or like the kind of conventional edge devices,
which are a little larger or tiny beyond the edge little sensor devices all of those are going to have different constraints and I would expect that there might
be some similarities but we need to think about different things to answer that question at each
of those different levels so along the lines of souvenir's question here about the debug cycle
what one something you said souvenir made me think of this um so earlier Brandon you said that you know the way of the world right now is you write, you know, a programmer writes the code, deploys it to 10,000 devices in hopes that it works.
So presumably you've done a lot of like benchtop testing to think about the energy consumption in the sense that let's say you are sending
out 10,000 mini satellites into space. Do they come preloaded with the program? And so is it a
deploy once and you're done? Or can you then think about, you know, making it, you know, like, can
you debug live and say like, Oh, actually, I want to make a change. And then in which case, how do
you like, I can't imagine it's low energy to then send a new program up into,
like, or do you collect them back? Like, how does this whole process work?
Yeah, that's a, that's an awesome question. And a really interesting part of designing
an orbital computing system stack. For our latest satellite design, the Tartan Artebais satellite,
we actually built in the ability for, so the system has a receive chain, we can talk to it from Earth.
And we built in the ability to replace part of the program using a sort of protected region of code that functions like a bootloader.
So we can do, I like dynamic software updating from Earth. Our thinking there was we might want to change what the thing does over the course of its lifetime to vary the workload from one experiment to another or update our...
You think about how this would be used in a user model.
You want to update the neural network model that you're running or something.
You want to send up a new batch of weights and those get plugged into the memory in the right spot.
We have the support in our platform, in our satellite bus to do that.
That's good because these satellites might be deployed for five years.
They might be up there for five years before their orbit decays and they fall into the
atmosphere and they become unusable.
I mean, they turn into dust, I guess.
But they're up there for a while.
And so it's nice to deploy with the ability to do this updating loop.
There are things that are a property of being a physically constrained system that can be
very difficult to deal with that aren't really related to this, you know, updating the workload,
updating the application part of the system.
So for example, if you have a bootloader or like a privileged, I don't want to say operating
system because it's not quite an operating system.
I don't think there's a good example of one of those for satellites in general yet,
although that's an interesting problem too. But the sort of core software that manages all of
the devices and lets the different subsystems communicate to one another. If you have bugs
in that part of your satellite system, or you have unanticipated conditions that could lead
to increased energy consumption, or you have some condition in that part of the code that doesn't
deal well with fluctuations and power that you didn't anticipate if any of
those lead to a failure a hard stop failure then you can't use your
application updating mechanism because the satellite in a lot of cases won't
turn on can't use its radio, can't point its camera, whatever. Whatever subsystem you want to
use, you can't use it if there's a problem in that sort of spec core of
software that drives the whole thing. So you can do benchtop testing and we did
we did a lot of testing. You do a lot of, usually you build out what's called a
flat sat which is the satellite all taken apart. And it's all the different boards connected by wires on the
benchtop. And you use that to do, it's a very precise prototype. It's exactly the boards that
you're going to deploy, but it's just laid out on the table. And so you do a lot of debugging there.
You can do power cycling experiments. We have a big lamp in the lab that
is a good approximation of sun on orbit.
So we can use that to illuminate our solar panels
and watch how the system behaves under different power
conditions.
And then a lot of just functional testing too.
You want to see what happens if you receive a radio packet
at the same time that the system is shutting down because you're going into
eclipse or something.
You want to test all of these conditions.
And they're really tricky things to test because this is a fairly involved and fairly complex
embedded system.
And we have to think about these environmental concerns.
So you do a lot of this upfront testing.
And that's the model that we use right now.
You do as much testing as you can
and try and validate your design assumptions
and show that things are as robust as they can be.
And then you hope for the best when they deploy.
In large-scale satellites, they'll use redundancy a lot.
So if they have computer components on larger satellites
where the whole thing is order of a billion dollars
to put together and launch, You don't launch one computer. You launch multiple and then you have failover. And
of course you do it that way, right? But with tiny satellites, the whole thing is, the whole
satellite built is 500 bucks or something. It's very inexpensive. In some cases, even less than that. So doing redundancy is kind of, it's at odds with
the cheap, small, almost disposable ideology behind doing small sets to begin with. You get
your redundancy by launching more satellites is actually what it ends up being because they're
so cheap to just build out entirely. Very cool, Brandon. I think listening to you talk about, you know, atomic at the, you know, sort of the PL level,
you know, like concurrency issues, PL type stuff, you know, very high level programming type things.
And then now you're at a stage in your career where you've turned around and built real systems
that are being deployed into some of the harshest environments that you could imagine, or, you know,
very harsh environment as an outer space. Whereas a lot of computer architect proper, you know, who like dealt with microarchitecture stuff
have actually not built systems at all. So I'm just very curious to hear you talk about
your trajectory, A, of like how you went from these high level atomics and like what, what got
you into this? Because I remember the first time you gave, I saw you give a presentation on, on
these space systems. I was like, when did Brandon get into this? And how did this happen?
And so maybe you can talk a little bit about A, that trajectory, and then B, a little bit about
just the notion of building real systems. Because I think there's a lot of computer architects,
we churn out a lot of them who have never built anything that you can touch. And so how did you get to where you are?
That's a cool question. I was just thinking of how long ago it actually was when I was in grad
school. And I think it shocks me. It makes me feel very strange to say that 15 years ago,
I was in grad school, which seems like, I don't know, 15 years seems like a long time.
I know. I know what you mean.
So when I started my career as a PhD student at the University of Washington,
I was enthusiastically interested in programming languages, and I realized while I was there that I was actually not just interested in programming languages, but I was interested in what's behind it that makes it run.
It was sort of the programming language was this nice abstraction.
And it's fun to learn about how those work and to show how you can solve important problems by defining them precisely using a language.
But I got very interested in what's underneath. And so I think my career trajectory has been, to some degree,
it's been a series of trying to understand what's underneath.
And so that led me from PL into, you know, my PhD dissertation work was on tooling and systems and architecture.
There's a thread of concurrency
throughout all of that stuff.
That was what I was really focused on.
But I found it very interesting to see
how the layers interact with one another
and what kinds of problems fall out
from correct and unintended interactions
between the different layers in the system.
I love memory consistency models because they
are that kind of cross-layer puzzle.
It's a topic I like to think about.
So that kind of led me down into the lower levels of the system
stack.
At the end of my PhD, it also got
me interested in embedded systems
because it's not just abstractions that
end up being an architectural simulator, which
architectural simulators are fine. But I wanted to work with something not just abstractions that end up being you know an architectural simulator which architectural
simulators are fine but i wanted to work with something that i could actually put my hands on
and so embedded systems were nice because now you have the whole system right in front of you and
there's there's nothing about it that is is modeled or simulated or anything else i'm just
looking at this this little pcb with some stuff attached to it and it's it's doing what i told
it to do and i thought that was really fascinating.
Around that time I met Ben Ransford who, he worked in the early days on batteryless systems
and the Memento system was highly influential to my getting into intermittent systems.
And I remember we got together and we were having pizza and beers one time in the kind of the Ave
near University of Washington.
And we were talking about this mementos paper.
And we realized that if you run on a system
with non-volatile memory,
then that algorithm doesn't work out anymore.
And so I remember that's one of the things
that I was thinking, huh,
that's kind of like a memory consistency model problem,
but it has this neat twist
where you're doing energy harvesting now too.
That's cool.
So that's what pushed me toward getting into this.
And then I realized that it was actually a really deep well of problems
that we haven't, I don't even think we've named all the problems yet,
let alone solve them in this area of intermittent computing.
That's what led me to start working on these extreme low power
constrained systems. And it's fun to keep peeling back the
layers. So for example, I've continued to do that in my career, definitely with building out more
real intermittent systems and building satellites as an example of that. You can hold the satellite
in your hand. And it's really interesting to think that's going to go inside of a rocket,
and it's going to go up through the atmosphere atmosphere and it's going to be flying around Earth.
And that's like a really interesting thing. You know, you build the mechanical parts of it.
You build the abstractions and software that you need and you build a communication stack and you actually put it to use.
I think that that's a really fun way to do research because you can see the result of what you did in a concrete way.
It's very satisfying way to do research because you can see the result of what you did in a concrete way. It's very satisfying way to do work.
Another example of this that I haven't mentioned is in my lab and actually with some support
from a really cool class that we have at CMU in the ECE department, we've been trying to
tape out chips recently.
I say trying to, I also say my students have been taping out chips through this class. And it's an
amazing process where you go from basically a text description of what you want the world
to be like, and then someone has a very fancy 3D printer and they can build you a chip.
And that is a really amazing process to see. And I'm really proud of my students for figuring
out all the low level nuts and bolts of how to do that. And we've got back a couple of test chips,
and it's been a really interesting process to see them go from high-level simulations
all the way to silicon for the same reason that I find it interesting to work on embedded systems,
because you can hold it in your hand.
This is a concrete manifestation of what we've been working on.
There's a lot of effort and a lot of time that you put into making a chip.
So I don't think every project needs to go all the way to a tapo.
That would be an egregious waste of resources.
But I think that it is interesting if you have an idea
that would benefit from especially precise power
and energy characterization.
It's kind of a fun thing to do to push all the way down to silicon.
Does the satellite work that you've done,
something you said triggered my recollection of ZebraNet
from when I was quite youthful.
I remember at the time being like, oh, this is interesting,
because we were thinking about caches and replacement
policies.
And then suddenly there's this thing,
and it just felt so out of left field.
But now, and then that's it
it i'm suddenly remembering that that's sort of how i felt when i first heard your space
presentations like where did this come from but now i'm realizing that there are quite probably
some parallels between the two projects and you've got these sensors deployed in some relatively
harsh environment where you really don't know what your next um you want regular information but you don't
know where your next burst of energy is going to come from did that affect your work at all too or
yeah yeah yeah there were a few projects at the beginning of my work in uh intermittent computing
um that were very very influential um margaret martinosi i've always had a ton of respect for
the work that she does she's an incredible researcher and a really great person.
And I think that her work on ZebraNet was really cool.
It was like an outside-the-box idea.
It was a real system deployment.
Again, you can see the devices working in the way that they were intended to work in the environment.
I think that's really cool.
It's really interesting to see work get out into the world and produce a real result.
And again, I don't think every project should or could go all the way to a fully built system with a deployment. But I think that there's some ideas that every person is working on that warrants that level of investment of time to really push to a deployment and see what the difficult things are and to see, you know, where are the hard parts?
Where are the easy parts?
And why are we to see, where are the hard parts, where are the easy parts, and why are we doing this too?
I think it can be very satisfying for students
to see a good answer to why are we doing this.
And why are we doing this might be to do a deployment
and see the devices actually in the field.
I remember Emery Burger had a paper where he,
I think they sort of like glued computers to turtles.
And I think they had something called TurtleNet.
I think it was Emory Burger anyway.
And I remember having the same feeling about that work, where it's like, this is such a strange and interesting application of the idea.
And you can see it work end to end.
It's just very cool. system actually do what you hoped that it would do, especially because it's this tower
of technology pieces that you end up putting together.
And it's this incredible amount of complexity and it's all research stuff.
So it's not just complexity, but it's new things that have never seen light in the world
up to this point.
And you put them all together and you have a system that actually does something.
I find that really exciting. And that's why we want to shoot things into space
we're we're building architectures and building systems and it's just a it's kind of a thrill to
see it go inside of a satellite and then and then go into space and and i you know i i think that
yeah students probably really appreciate this too because they're they're seeing their work come
together and do something that's i think objectively pretty cool and i think that that's
a that's a cool good part about it is that it just feels cool and it feels important to put pieces
together and to do something like that and it's a fascinating journey from grad school through
you know embedded systems to low power systems to physically constrained systems. So in the mode of reflection, like any words of wisdom to our listeners,
students, researchers, others in the community, very broadly, based on your experiences.
Oh, I don't know if I have any wisdom. One thing that I've always found useful is to
be comfortable being naive about things. Go into a new area.
I mean, for me, space, I was completely naive.
As a sidebar, this is maybe of some interest.
I got interested in doing space computer systems in 2014, 2015, because a friend of mine who plays the cello, I was in a band with him, he found some random guy on Twitter
that played the upright bass
and also had a crowdfunded project
to build tiny satellites.
And so my cello playing friend said to me,
hey, you might be interested in talking to this guy.
Isn't that kind of like intermittent computing?
Because I told him about we were working on that.
And yada, yada, yada.
The guy that plays the bass
is Zach Manchester, who's a close collaborator of mine
on all of our space stuff.
And we're actually working on building
with other collaborators,
building out an NSF center right now
that we just got funded to work on space systems,
space computer systems that we're very excited about.
The reason we met is because of a random connection
through the music Twitter world,
which I love that that's how things originated here.
And now it's so many years later, we're working together to build computational satellite systems.
When I started, though, I feel like Zach must have been annoyed after a while
because I kept asking these naive questions about how things work in space.
And I didn't know anything about it.
And we just, you know, we just kind of pushed on it and kept learning as we went along and
failing often and, you know, crashing into things that just didn't work.
And eventually you have something that does work.
And now I know more about space systems than I did when I started.
And I think that there's a real value to going into a new area with just being a little bit naive
and being OK with being a little bit naive
and knowing that there's a lot to learn.
And going and trying something anyway,
even if you're not sure it's the right thing to do
because you don't have the experience in the area yet.
I also think it's fun to just go and take an approach.
It's a good way to get started.
You don't know if it's the right one.
You don't know the area well enough to make a good judgment yet.
And just take an approach.
Go try and do something.
I think that's a good way to get started.
And I think it's a good way to learn at least.
And sometimes you end up doing something that ends up being useful even if you did approach the problem naively.
Maybe that's a good way to get out of a rut if the area that you're getting into is in a one lane of solutions, you know, going in naively and trying something that's
out of that lane, it could maybe be a good thing. So yeah, so I think going in, going in and being
comfortable with being naive in an area that that's, that's been helpful for me being okay
with with frequently being wrong. That's it's fine to be wrong all the time. You learn
from it. I don't know if that's wisdom, but that's something that's been helpful for me.
That's really interesting because what you just said now actually made me think about something
that I wonder about a fair amount. And you're a professor, you're a teacher, right? You're a teacher of students, both undergrad and grad students. And so in my career so far, I have also found a lot
of value in just doing things, right? Like you've got to do it in order to see like, oh, because
there are definitely in times where someone older and wiser than I am has told me something. And I
didn't really grok it until I tried doing something.
And I was like, oh, that's what they meant. I get it now. Like this, this is exactly like they,
they said, watch out for that corner. You're going to bump your head. And I, I couldn't see it. And
then I bumped my head. And so in your position as a, you know, a professor, a teacher of both
undergrads and grad students, like how do you set that balance of letting them
just go bump their heads and also, but also being like, okay, you know, I want to teach them. So I
want to tell them that there's a thing there and they're going to bump their head. It's a tricky
balance. And I think you mentioned that you, you know, you recently had a, I mean, I guess like
teaching children too, like, do you let them fall down or you tell them don't fall down?
Yeah, I think teaching is a lot like parenting. And my son, Remy, he doesn't know about the existence of corners and he doesn't know about the existence of bumping.
And you sort of have to figure out how much bumping do you let them do?
That's a really interesting way that you said that, because i think it applies to students as well and and maybe to everyone i mean maybe it's not
just students or in some sense everyone is a student in that same way how how much do you
just let yourself crash through things and hope for the best and and how much do you try and uh
you know i guess yeah how much do you coach students students and help them to understand what they're getting into?
I think there was definitely there's times where proximity to the deadline on the calendar is a factor.
You know, if it needs to really get done now, it's like we could do the teaching moment right now.
But we have seven hours until the paper needs to be submitted.
That's I mean, jokes aside, there
is practical considerations.
I love teaching undergrads.
I just made up a new undergrad computer architecture
and systems course that we offer in ECE at CMU.
And it was really fun to teach it for the first time
and to see which things I assume everyone knows that undergrads
in this course did not know.
And there
were plenty of those moments where it was, I had to first figure out even what the question was.
And then once I had figured that out, I could help the students understand, you know, what
misconceptions they had and help to help to fix those. So I think, yeah, really, it depends on
the situation. And it depends on the person, though, how much, how much coaching, how much
coaching to apply. And I think, you know, I was just describing approach everything naively,
and I think that that is something that we can uniformly apply, and it's good to come in
to an area ready to learn. I hope that students do that when they get into a new area of research,
and whether that's taking a class or getting into, you know, PhD research or something,
just kind of approach it with an open mind and be ready to crash into it a little bit. So I think, yeah, there needs to be some coaching, some
guidance, but I think it's good to just have some latitude to flop around and be ready to be wrong.
Well, with that, I think I would like to say, Professor Brandon Lucia, thank you so much for
joining us today. It's been a total delight talking to you. It was a really fun conversation,
learned a lot,
made us, made me, I'm sure both of us,
Souveney and I, think about things in a different way.
So thank you for being here.
Yeah, thanks very much for having me.
I really enjoyed this conversation
with the two of you today.
Yep, thanks a lot, Brandon.
It was a fascinating conversation.
And to our listeners, thank you for being with us
on the Computer Architecture Podcast.
Till next time, it's goodbye from us.