Computer Architecture Podcast - Ep 10: Physically-constrained Computing Systems with Dr. Brandon Lucia, Carnegie Mellon University

Episode Date: November 19, 2022

Dr. Brandon Lucia is a professor in the Department of Electrical and Computer Engineering at Carnegie Mellon University. Prof. Lucia has made significant contributions to enabling capable and reliable... intermittent computing systems, developing techniques that span the hardware-software stack from novel microarchitectures, to programming models and tools. He is a recipient of the IEEE TCCA Young Computer Architect Award, the Sloan Research Fellowship, and several best paper awards.

Transcript
Discussion (0)
Starting point is 00:00:00 Hi and welcome to the Computer Architecture Podcast, a show that brings you closer to cutting-edge work in computer architecture and the remarkable people behind it. We are your hosts, I'm Suvainai Subramanian. And I'm Lisa Xu. Today we have with us Professor Brandon Lucia, who is a professor in the Department of Electrical and Computer Engineering at Carnegie Mellon University. Professor Lucia has made significant contributions to enabling capable and reliable intermittent computing systems, developing techniques that span the hardware-software stack from novel microarchitectures to programming models and tools. He is a recipient of the IEEE TCCA Young Computer Architect Award, the Sloan Research Fellowship, and several Best
Starting point is 00:00:42 Paper Awards. Today, we're really excited to have him here to talk to us about physically constrained computing systems, including intermittent and orbital edge computing. A quick disclaimer that all views shared on the show are the opinions of individuals and do not reflect the views of the organizations Brandon, welcome to the podcast. We're really happy to have you here. Yeah, it's really wonderful to be here. Thanks for having me on today. Yeah, we're super excited to talk with you. And so what's getting you up in the morning these days? What's getting me up in the morning these days? Well, generally in my life, what's getting me up in the morning is doing yoga, which I've started doing basically every day. And it's sometimes it's like the best part of my day. And I think professionally, what's getting me up in the morning is everyone being back in physical spaces again and having
Starting point is 00:01:40 having the ability to work with my students in person again. It was a long couple of years where we weren't doing that. And it's really awesome to be back in front of like a whiteboard and like doing research now that everyone's kind of back in physical spaces again. So that's been really, really good. It's getting me feeling excited about getting up in the morning every day. Good to hear. So did you have any students that came on to your team, your group rather, in the middle of the pandemic that you didn't get to meet until recently, just out of curiosity? I didn't have any students that joined and I was unable to meet in general. We've been doing things like during the pandemic, we got kind of creative. We would have meetings outdoors in
Starting point is 00:02:22 the park. There's like really, Pittsburgh has like really good parklands right around CMU. So we would, you know, sort of work in our offices alone with closed doors and masks and windows open and ventilation and all that stuff, or just stay home for a long time. We're just staying home. But we were, yeah, we were having meetings outdoors. We got creative with it. There were a lot of tents up around campus. So I was able to, I was able to, you know, more or less interact with the students that joined my lab, even the ones during the pandemic. That's good to hear. That's good to hear. So speaking of getting creative, it seems like, you know, what we'd really love to talk to you about is some of the stuff that you've been working on. They're kind of unified by this
Starting point is 00:02:58 notion of being physically constrained systems where, you know, you have really, really, really limited physical resources that you have to work with, which I assume requires getting really creative with how you utilize those resources in order to get your goals accomplished and what you want the devices to do. So maybe you can start us off a little bit by just telling us what it means to be physically constrained, and then these two examples that you've been working on specifically like intermittent and orbital space-based systems? Yeah, yeah.
Starting point is 00:03:27 So physically constrained systems are ones where something about the environment means you can't have more resources. So in an intermittent system, that's a system that's extremely physically constrained by the amount of energy that's available. So in an intermittent system, we assume that you have some geometric constraints. So the device has to be physically pretty small. So, you know, think like sensor systems with computers attached to them.
Starting point is 00:03:51 And that you're collecting your energy from the environment. And if you're very small and trying to collect energy from the environment, you might use a solar panel, but your solar panels also like, it's a really small solar panel. So you're not getting that much power from it.
Starting point is 00:04:03 Other examples exist where you can, you you know harvest radio waves and use that to power the computer system intermittent intermittently and so the physical constraints of the environment the need to be physically and geometrically small and then the consequence of that which is the inability to get lots of energy into the system that makes for some really interesting computer systems problems. That problem in the small in intermittent systems systems, really shows up again if you shoot your computer systems into space. Things are bigger, things are geometrically bigger, but the types of computations that we want to do are also bigger.
Starting point is 00:04:36 You have a similar problem where you're constrained in size. On the CubeSat, which is a kind of standard unit of small satellites that are getting sent up into space these days. You can cover it with solar panels, but you can only get so many solar panels on there without having some complicated mechanical apparatus. So we kind of have an incarnation of the same problem where we're sort of geometrically constrained. And that constrains the amount of power, which constrains, essentially it constrains the efficiency of the system. You need to be so efficient if you want to do some useful quantum of work that's dictated by your application.
Starting point is 00:05:08 So I'm really drawn to these problems because it's like you have a new set of constraints that aren't maybe the typical ones. Like we think a lot about finish within a deadline or finish without consuming too much memory in the system, things like that that are, I don't know, kind of computery constraints. And these physical constraints are something about the world that says, you know, you have to deal with this, otherwise you're not solving the problem. So I think it's really fascinating. And it's really like inspiring to work on stuff where something about the world just says, this is how you have to do it. So that's what I think is really interesting about
Starting point is 00:05:41 these physically constrained devices, like tiny intermittent systems and little satellites. That's quite exciting. And you talked about a few different characteristics of these systems who are physically constrained. Intermittence is one of the themes that shows up. Can you maybe tell our listeners about how do these differ from conventional computing? How do these constraints affect the way you think about how you design systems for these particular applications? Yeah, that's a really good question.
Starting point is 00:06:09 I guess the first most important thing, given what I just described these as, is usually we have to think about energy first. And so in a lot of systems, the system designer figures out the application requirements and says, we're going to make this go super fast. And also, if we can save some energy, that might be nice too. And then, so that's the sort of structure of the optimization process for your application. You know, you get your requirements and you figure out how to make
Starting point is 00:06:33 it go fast or use less memory or whatever. And then all the way at the end, a lot of times it's like, well, it'd be nice if they use so much energy and that's changing. That's changing. There are more examples of systems today where people are putting energy first. And that, you know, happens in the data center and everywhere else. For a long time in my career, that wasn't really true. Everything was about bottom line performance and then, oh, energy is nice too. The big thing that we've been focused on, and this is one of the main physical constraints that you put on a system is energy efficiency.
Starting point is 00:06:59 Thinking about that first, and it's not just because using less energy is intellectually pleasing or we can pat ourselves on the back for consuming a little bit less of the battery or something like that. In a setup like an internet system, if you're not energy efficient, the application doesn't work because you're so heavily constrained on the availability of energy that if you're not efficient, you just can't do the work at all. Which is fundamentally different from building large scale systems where you want to decrease your power bill in the data center or something. That's an admirable goal. I think we should decrease the power bill in the data center. I think that's another very good area of research.
Starting point is 00:07:36 But in these cases, it's not that the power bill goes down or the battery lasts another 40 minutes or something. It's actually that if you're not energy efficient, the application just doesn't work. The same goes for satellites, really. If you're orbiting Earth, you can think through the setup of the problem and see that this is true. You're orbiting Earth, so you have about 45 minutes on the sunny side while you're flying around Earth, little cube satellite. And if you're a very small device like a pocket cube satellite, you get hundreds of milliwatts, like a watt on a good day kind of thing. Depends on the orientation of the satellite solar panels and how they're pointing at the
Starting point is 00:08:12 sun. And so you have only that much energy integrated over the 45 minute period of the orbit. And you might have a little bit of energy storage. And the systems we've been building, we've tried to get rid of using batteries. There's some complexity in building systems around batteries. And so generally, I like to not use batteries. And so you have a limited energy storage reservoir. So now you sort of have to optimize your system design under that constraint.
Starting point is 00:08:38 You have so much income power. You have so much power consumption. And you know that you're only going to be getting power for 45 minutes um and then you know during one orbital period you might want to be capturing images of you know the atmosphere capturing images of clouds or the oceans or things like that and uh you know piling up a bunch of data or processing it on orbit that's what we've been looking at is how to process it on orbit and so that imposes like some amount of power consumption um there's also in order to make the system useful, there's some kind of minimum amount of computing
Starting point is 00:09:08 you have to do along with that. So it's not, it's like if we can't process the entire image, then the system won't be useful. So you have to balance those constraints against one another. And we do have to think about sort of minimum acceptable level of performance, but we have to do that under this strict energy constraint,
Starting point is 00:09:24 where if we're not efficient, then the system doesn't work. We can't do the work because we're so constrained by power and energy. So that I think is really, that's the biggest fundamental difference is that if we don't think about energy first, our systems just won't work at all. And I think that that's a really interesting constraint to work under, and it's really motivating for me. Yeah, that's a great overview constraint to work under. And it's really motivating for me. SIDDHARTH RAO, Yeah, that's a great overview of the different kinds of problems and challenges. You touched upon several different themes here.
Starting point is 00:09:51 I'll just sort of paraphrase them. So you talked about energy first. There were considerations around both power and the energy cadence and the total amount of energy that's available. I think there's an interesting aspect of the delivery system itself. You talked about batteries versus solar panels are not having batteries at all and that impacts how you sort of deliver and store energy for these computing applications. So there are a
Starting point is 00:10:14 few different themes. Maybe we can sort of click on some of these things and see what that means for the system design. So obviously in the age of accelerators, one ready thing that people think about is specialization techniques all the way from microarchitecture all the way up to programming models and so on. So maybe that's one particular aspect, but it sounds like the energy system, the harvesting system, the storage system, and the distributed system is a pretty unique consideration given the landscape where these systems are deployed. So how does that sort of intersect with the system design space?
Starting point is 00:10:45 Outside of specialization, what other things do you have to think about in terms of, do you have solar panels? What cadence do they actually supply energy at? Do you have a regular cadence? Is it like a bursty? And how does that affect the way you organize your tasks and programs and the considerations
Starting point is 00:11:02 when you're running a particular application? Because you have to do some quantum of work to make it useful before you can actually move on. So how do you think about that very broadly? Yeah, there's a lot there. I'm glad you mentioned specialization. That's a topic that we've been thinking about, you know, in these constrained systems, what is the role of specialization recently? And you're right to point out everyone in the architecture community is I'm sure aware of this, the era accelerators we're inhabiting right now, everything is turning into an accelerator
Starting point is 00:11:33 and for each different variety of machine learning kernel that people have decided is important, there's another accelerator out there that's available. I think that's really cool. And I think it's been a really interesting time to watch the computer architecture community evolve. And I can think back to like, oh, maybe 2012, 2013. I forget when the first round
Starting point is 00:11:52 of these deep neural net accelerator papers was really hitting the scene. It was right, I think right at the end of grad school, something like that. And it was really exciting to see these new kinds of architectures that they don't, it's like, what is this? It doesn't have an ISA. How do I think about this? And that was a really interesting thing to see these new kinds of architectures that they don't it's like what is this it doesn't have an ISA how do I think about this and that was a really interesting thing to see happen
Starting point is 00:12:10 we have been kind of steering away from that a little bit I think there is a role for specialization and I think specialization is an important way for systems that are designed to do one task really well to get a lot of performance and to get a lot of efficiency. So specialization is a really important tool for architects, and I think it needs to be in everyone's toolbox. We've been looking at, in these systems especially where you're highly energy constrained, and in some cases you have these performance requirements like in the satellite, we've been looking how we can not rely only on specialization to make systems energy efficient. And we've got a group of collaborators.
Starting point is 00:12:53 This is work I'm doing with my collaborator, Nathan Beckman, here at CMU. We've been leading an effort around a new architecture. It's a CGRA architecture, and it's designed to be extremely energy efficient. And so we sort of applied this energy first design principle to putting together a CGRA. And so there's lots of funny choices since we were thinking about energy first. There's lots of choices that you probably wouldn't make in a larger scale CGRA system where you're looking for high performance. Just as a simple example, we don't do any fine-grained time multiplexing on the processing elements inside of our CGRA
Starting point is 00:13:28 architecture. That's an odd choice if you're looking at high performance, because in high performance design, you'd want to be multiplexing lots of operations on your processing elements. And so that's just one small example of where we're making choices that are guided by optimizing for energy first.
Starting point is 00:13:45 We actually change things in the microarchitecture that you... We make choices in the microarchitecture you probably wouldn't make in a larger scale design. These choices, this whole design exercise around this architecture was informed by physically constrained deployment scenarios. So it's like if you want to be processing images as they come in off of a camera and you are... You want to go 10 years on a AA battery battery or you want to be operating off of a solar panel or very small amounts of energy, then you have to put this efficiency first. And that's literally the thread we followed to get to the design that we landed on is looking at the set of physical constraints up front. We looked at the data rate. We looked at the power consumption of what's up there now.
Starting point is 00:14:24 And we said, here's how far off we are how much do we need to optimize and then we sort of moved forward through a series of designs that you know we learned lessons from each one and we moved on to the next one and we have a prototype cgra now that that is uh is extremely efficient but it avoids getting uh overly specialized for any of the workloads that it might execute so you mentioned specialization i think specialization is great but i think that if we can get the level of efficiency that we need with something like a CGRA that supports a general purpose class of workloads without being overly specialized, I think that that's a good thing because we don't know what the next important computation is going to be. So we want to support everything. So Brandon, what you just said there,
Starting point is 00:15:04 like it almost made, like when you started talking about specialization versus not, it made me almost feel like this is a little bit tongue in cheek that, you know, if you're going to send something up into space, you want to give them like a buck knife and some duct tape. That's, that's what you need, right? You could have something that like maybe makes your toast really, you know, perfectly crisp or whatever, one for your toast, one for your bagel, one for your coffee or whatever. But in the end, like in that really,
Starting point is 00:15:30 really constrained world, it's, it almost sounds like to me, like you're trying to build the duct tape and the, and the buck knife so that, so that you can be like really, really generic, really efficient, really simple. And so that is an interesting choice. So, but at the same time, the thing that you said about this, the CGRA, where you don't multiply the multiplex operations, and then you immediately started talking about image processing. You know, so in, in some of my previous lives, you know, when we talk about image processing, there's a certain frame rate that you have to, that you have to meet, right? So like you were saying, like, if you don't't meet it you might as well not do it and image processing is one of those where it seems like there's got to be a certain amount of processing done in order to to make the image processing useful and then there's definitely a certain number of pixels that have to get you have to get through and to do that without any multiplexing at
Starting point is 00:16:17 all seems to me like a potentially difficult prospect because you could you could have a four pixel image maybe but maybe that's not a very interesting like do you ever get to the point where you're like you know we just can't we just can't do this or we can only do it if we make this i don't know cube salad light uh 30 bigger and we just we can't do that like have you faced a problem like that or can you have you always been able to sort of creative your way around? Yeah, that's a really good question. So I want to go back to something you said, this buck knife and duct tape thing. I want buck knife, duct tape, and system will live and die by the ability to compile this broad set of programs down to it without having to go through a lot of system
Starting point is 00:17:10 specific pain to do that. The other thing you asked about is a really apt question. You do need to hit some minimum level of performance and that whether that's in an intermittent camera system that you hang on a tree and look for hedgehogs, or if it's in a satellite where you're collecting a new frame that's an image of Earth every 1.7 seconds, which is that's actually a number we have to deal with. You look down at Earth and you're flying around at 450 kilometers up from the surface of Earth and every 1.7 seconds you get a new frame. In the first case, in intermittent systems, one trick we can play, because this is a unique domain, a constrained domain, where we can actually decrease the frame rate. And in a lot of cases, the applications
Starting point is 00:17:57 that we would support still work with a decreased frame rate. So a lot of camera systems would go 30 or 60 frames per second, depending on what you're trying to do. But if you're monitoring traffic on a road, frame rate. So a lot of camera systems would go, you know, 30 or 60 frames per second, depending on what you're trying to do. But if you're monitoring traffic on a road, or if you're, you know, looking for flooding in crawl spaces, or if you're trying to spot rodents in a warehouse, or whatever, like these kinds of applications where you might do like pervasive long lived deployments of cameras, once a second is okay, once every five seconds might be okay. Then, you know, you start to hit some thresholds. So say you're looking for birds on your rooftop. That's
Starting point is 00:18:30 just a random example that you might care about. You might want to know if there's pests in or around your house. If you take an image every 45 seconds, that might not be so useful. This is something you have to distill out from whoever is defining your application. It's good to fit a broad range of applications. And that means you have to have as much efficiency as possible to make it feasible to even run these programs at all. But then also, you have to deliver performance that's acceptable to the applications.
Starting point is 00:18:59 And some of them just won't work. Some of them, you will hit a wall. You'll need to change something. It could mean that you need to figure out a way to scale. That's a tricky thing to scale up when you're in this extremely low power, power constrained and energy constrained operating regime. Scaling up is tough because we don't want to increase the power consumption. And that's usually what happens when you add more resources. Another thing we can do, I mean, we can increase the amount of energy that we let the system store and then
Starting point is 00:19:32 run as like a burst. And so this is like another strategy. We can't do 30 frames per second all the time, but we can do 30 frames per second for a little while, for two seconds. We found something really, really important we want to look for. But in order to do that, you need to actually imagine
Starting point is 00:19:48 a system where you have an energy harvester and a capacitor. You want to fill up this whole capacitor, and then you can just slam through all the energy all at once doing high frame rate processing. If you do that, then you have to turn the whole system off and charge up again, or you have to heavily duty cycle, and you can only do a very low average power operating in the wake of that burst of operations that you did. Those are both tricks that you can play. I think in general, scale is tough, though, because you generally add more resources, and the more resources we add, the higher the power consumption. And so it becomes difficult to maintain the level
Starting point is 00:20:25 of efficiency that we need. So another thing that we can do, if you look at the satellite use case is, we can actually use the design of a distributed system to make up for the, if you have deficient performance on a single satellite, we can build out a distributed system that can share the work.
Starting point is 00:20:42 So something we looked at in our 2020 paper on orbital edge computing is you have data arriving and there's way more data than we could feasibly send to the ground. So that's not an option. And there's actually, as the system was when we designed it, we have matched the power consumption of our compute to the input power. So we were just able to, you know, just use up our energy when we went into the eclipse.
Starting point is 00:21:04 And if you have just that amount of compute, to the input power. So we were just able to, you know, just use up our energy when we went into the eclipse. And if you have just that amount of compute, it's actually more data that we collect in each frame than we can process before we get to the next frame. So you only have like two-ish seconds to process each frame. And so it's really hard to keep up. And so if you just launch one satellite, you actually end up missing some of the frames, some of the images.
Starting point is 00:21:25 And what we did instead of that is we assume that you have 10 or 50 or 100 small satellites, which is, by the way, way cheaper than launching one really big satellite. Like orders of magnitude less money to launch 10 or 50 small satellites. So it's still a good thing to do, even though you're increasing the amount of hardware that you're putting into orbit. But if you get those satellites up there and you tell them to work together, each of them can, for example,
Starting point is 00:21:53 each of them can grab the same frame. And the first one in the group says, I'll handle, you know, tile image up. And the first one, I'll handle the first few tiles, the first four or whatever. And the second one in line says, okay, I'll take the same image and grab the second few tiles. And so four or whatever. And the second one in line says, OK, I'll take the same image and grab the second few tiles.
Starting point is 00:22:07 And so as long as there's not something happening on the ground that changes very quickly, like it would change significantly in the time between when satellites were looking at that same spot, they can actually distribute the work without having to communicate at all even, which is a really cool aspect of this. Normally, you think of a distributed system having some communication medium so
Starting point is 00:22:27 that the satellites or distributed components can coordinate. Here, we don't even need to have that. We just say, when you hit this GPS coordinate, well, grab another frame. And you know, you, the satellite, know that you're always responsible for these set of pixels.
Starting point is 00:22:41 And so one of the common things that we would do to images is try and segment out parts of the common things that we would do to images is try and segment out parts of the images that are clouds and then look for objects on the ground. Like, you know, look for, we always think of, you know, sort of search and rescue mission, look for the plane wreckage at sea or something like that, or look for the signs of wildfires. And so for that, you're running kind of, you know, multiple neural nets or other kinds of models on these images. And so the compute there can be pretty heavy lifting. And so we find that you actually, to get coverage of your entire orbit, you need to distribute
Starting point is 00:23:15 the computation in this way. There's not really a way to do it locally on a single satellite. The level of efficiency would have to be extremely, extremely high, and you'd still have to be hitting a good performance target. So it's just a really tough problem to solve inside of a single satellite. We haven't given up on that. We're still optimizing for the single satellite case
Starting point is 00:23:30 to get as far as we can, because that benefits the distributed case anyway. But doing it in a distributed system in the way that I just described is a way of kind of getting around the problem by using system design instead of just microarchitecture design to solve the problem,
Starting point is 00:23:42 which is, I think that's in general, a good thing to do. Think about how you can attack the problem from a different layer of the system stack. And it might actually be easier if you do that. So that's how we deal with these kinds of performance challenges that you get in these highly constrained systems. Yeah, the energy performance and capability trade-off curve
Starting point is 00:24:00 is definitely really interesting. I wanted to circle back to one of the annotations that you made to Lisa's Buck and Knife analogy, which is having a compiler as well. So maybe we can double click on the compiler and associated tooling for these kind of systems that you build. So in an energy-first world, what tools do you need to equip a programmer with? What sort of guidance do you need to give to a programmer so that they can construct their programs or break up a program into tasks and things like that, that matches the constraints that's available in the system. For example, if you're coming from a performance-first world, you could annotate the total number of flops, how much bytes are you
Starting point is 00:24:39 accessing in a particular kernel, and you could use that to determine, okay, what's the performance that you can hit? And that's feedback that you can give to the programmer that's one way that the programmer can reason about things in an energy first world where you know that you know you only have this much energy to work with these many jewels or this many milliwatts of power that's available uh what sort of feedback does your compiler need to provide like what sort of tools do you need to provide to the programmer so that they can structure their computations and know that, OK, this unit of work can be completed within some period of time? Or if not, like, I need to break it up into further tasks
Starting point is 00:25:12 or annotate it in some other way. What sort of feedback loop do you need? Like, what do the tools look like here? What does the compiler need to do? How do you think about static program analysis and other kinds of techniques in this particular space? This is a really cool question. And there's some things that we've done and some things that I wish someone would do.
Starting point is 00:25:29 And maybe we'll do them eventually. The question about what tools does a programmer need? It really depends on what the programmer is trying to do. If the solution to the problem is like I described before, you know, this is one solution to the problem of dealing with highly power constrained and energy constrained systems. If the solution is to develop a new coarse-grained reconfigurable array architecture that
Starting point is 00:25:52 has this funny hardware software interface, the compiler needs to exist. And that's like sort of step zero. We need to have a compiler that lets you write some Rust code or whatever the fashionable language in five years is going to be, and then target that to your system. And if you don't have that, then the thing
Starting point is 00:26:08 won't get out of the starting gate. So just as a minimum, having the ability to compile general purpose code. And that's hard, actually. Compiling very general outer loops for the regular control flow and irregular sparse memory access patterns to any old architecture, CGRA or otherwise, that's a tough problem. That's an open problem, actually.
Starting point is 00:26:28 And I think that it's a really important one if we care about energy efficiency. Because being able to target the system at all and then being able to generate code that will execute efficiently on the system, I think that that's something we need to be thinking about as a research community. And people are.
Starting point is 00:26:41 Absolutely, people are thinking about this problem in the research community. So the first thing is just having a compiler that works. The second thing is, is maybe having a compiler that, that lets the programmer, helps the programmer understand their, their use of energy. And this is a problem I, I feel like we've come back to this problem every six months for, for like five years or more. The problem is I have an arbitrary block of code in my program, and I would really like to know how much energy will that consume on my system.
Starting point is 00:27:12 And that's a pretty general question. And of course, you're probably thinking right now, well, that depends on the microarchitecture, and it depends on the state of the cache when I start executing this part of the program. And so of course, any model that's going to be useful to make that kind of estimate needs to account for lots of things that
Starting point is 00:27:28 are within the microarchitecture. But in the systems that we look at, that kind of tool also needs to think about externalities, like what's the state of all of the peripheral sensor devices when we start executing this code? And what is the state of the environment because sometimes if you have some power systems some power systems will vary in their efficiency their ability to deliver power to the system based on the amount of incoming power in the environment and so you have sort of funny effects where
Starting point is 00:27:59 you you sort of you have more loss depending on environmental conditions through the power system so building a software analysis and exposing the right amount of information from the system to make the analysis possible, that's a really tricky problem to solve because we don't know exactly what information we need to expose and then operationalizing it inside of a compiler to produce a useful estimate. You know, this block of code will take two millijoules. Done. That's a really tough thing. My view on this, and this is, we had a paper in 2018
Starting point is 00:28:32 on this topic, but this is kind of my relatively unproven view, I guess, is that we have to think about this problem, I want to say probabilistically, or in terms of the distribution of behavior that we might see at runtime. And that may be enough to help the programmer understand how they need to change their program so that it's either more efficient or that it works within the constraints of their system. I don't think we'll ever have a perfect tool that you give it a piece of C code and it says for that system, it'll take two millijoules. I just think that's an unrealistic goal. But I think a more refined view, and this might be possible, we're working on this problem.
Starting point is 00:29:12 We don't have a solution yet, but I think it's a very fascinating space to be working in, is to build up a distributional model. Here's the set of behavior that we might see in the actual environment. And here's the set of environmental characteristics that we care about. And you put this all together and it might become part of a programming model or become part of a tool flow that goes all the way from the language down to the hardware software interface. And it may include microarchitectural changes where we need more information from the system
Starting point is 00:29:34 to sort of buttress the software pieces to make all this possible. So I think that's some of the compiler challenges that I see that are unique to this space. We do care a lot about energy. And we have some peculiar constraints that require information to flow from the language to the architecture, from the architecture to the language.
Starting point is 00:29:54 And we need to match all of that stuff together with analyses that give the programmer something useful so they can make their program better. Or better than that, even, analyses that allow the compiler to automatically make your program better. better than that even analyses that allow the compiler to automatically make your program better wouldn't that be nice and so there's there's ways to do that um that i think are are going to be important going forward as these systems find more find more users uh in the world that was really interesting brandon so when you're talking about like you know
Starting point is 00:30:21 this block of code and coming up with like a statistical view of how much energy this particular buckle code depending on you know because depending on the state of the system which is now much bigger than what computer architects have historically thought about which is just like the micro architecture itself but now the system which is like everything you know is it cloudy or whatever all that kind of stuff do you think though that like in this particular would look in in in how we reason about the model say like oh this code is good enough you know a lot of the things that we think about in the data center are thinking about things like oh the p95 or the p98 or p99.5 or what have you for this kind of thing where if you if it's too much energy the thing does not work
Starting point is 00:31:01 do you have to go do you think you'll have to go so extreme to say like, okay, I need this to be 100, it may be probabilistic and here's maybe an average case piece, but I need to understand exactly what the max, like the very, very worst case is and try and avoid that worst case. Because whether you design around P50, whether you design around P99, or whether you design around absolute worst case are very different ways to think about it. And what you've said earlier is that if you run out of energy, it doesn't work. It seems like you have to pull yourself
Starting point is 00:31:30 all the way to the extreme, in which case is a probabilistic model helpful? Great question. The state of the world today is that a programmer writes a program and then they say, well, I hope this works. And then they deploy it in 10 000 devices and hopefully it works so they have no way of predicting this at all um i think getting a
Starting point is 00:31:53 precise estimate um is is hard and in a lot of cases it's infeasible but when you have a non-trivially complex system i think it's it's an infeasible analysis because there's too many factors um and they they just sort of all crash into each other and make it really difficult to produce like single number estimates. When the question is energy, you are right to observe that if you run out of energy then the system will have to stop and wait for a long time, which might not be catastrophic, or if you're in a tiny intermittent system, it might just mean power off and hopefully we get more energy later. So there's a few things you can do. My student, Kiwan Meng, had a paper on this
Starting point is 00:32:32 where if something seems like it's not to make any more progress because you keep running out of energy. So the setup here is you have some block of code and you know that it needs to execute atomically. So it needs to happen all at once. And hopefully your system is provisioned with enough energy to run that block all at once. Well, if it seems like it's not making any progress, then you have to do something else. That was the
Starting point is 00:32:53 realization that we had. And so one thing you can do is sort of the default, which is just keep banging your head against the wall and it'll run and be stuck forever. That's not a very good way to design the system though. So another thing you can do is have an approximation procedure for your algorithm. So you can say if you're doing, I don't know, for example, if you're processing an image, you might subsample the pixels in the image and process effectively a smaller image.
Starting point is 00:33:18 But you can make progress through this iteration of the loop. And then maybe the next time around the loop, the power conditions improve. And so you have more energy at your disposal uh to do the work um so that kind of thing is is uh you know thinking about energy but then also thinking about you know the software runtime system um that's something where the programmer would need to know what is the amount of energy that i expect this piece of the program to consume um in order to make choices about those refinements, those sort of degradation options where you would tune down the quality of the result to decrease
Starting point is 00:33:50 the amount of energy that it would consume. If you have no estimate of how much energy the thing is going to consume, though, then it's very hard to make those kinds of judgments. And that really is the state of the world today. There's just not good tools for making any kind of estimate at all. But I think that having the distributional behavior will help understand if there is maybe an outlier case that you should be aware of,
Starting point is 00:34:09 even if you don't see it during your testing, when you have the thing on your lab bench, that's when you can make changes to it. If you know that there is this, there's modes in your distribution and you know there is this one mode that's way over there on the right, and it's a very high energy consumer,
Starting point is 00:34:23 even if it's rare, you probably want to have some kind of mitigation baked into your system. It could just be a watchdog timer that notices that you're stuck and then restarts the whole thing periodically. But that's not very sophisticated and it means you're giving up. So it's kind of interesting to think about how you can identify those modes in advance, even if they're rare, and then accommodate them in the runtime system. And again, this is kind of an open problem, because producing these sort of distributional estimates
Starting point is 00:34:49 is something that we can't really do with the tools today. And then reacting to them is only something that people are beginning to build systems that can handle that kind of work. Yeah, it sounds like approximation is a useful tool in your toolkit for these applications. Another technique that comes to mind is checkpointing and then continuing execution from there
Starting point is 00:35:08 once you have the energy. Is that something that's commonly deployed? Are there any unique challenges for checkpointing and recovery for these kind of systems? Because it sounds like it's bread and butter for systems where you're energy constrained. Yeah, so checkpointing is actually a part of a lot of intermittent systems. In order to make progress when you're energy constrained like this, if you ever anticipate the system will run out of energy,
Starting point is 00:35:34 there needs to be some strategy in place to maintain just basic forward progress in your application. And so in order to do that, you need to, there's lots of different ways to do it. People have been studying this since, I mean, yeah, for a long time now. But looking at the pieces of state that you need to capture, sometimes it's just the register file and sometimes it's, you know, the register and registers and stack and whatever. You can do it with software techniques and you can do it with little hardware widgets that we can add to, we can add to the microarchitecture. The ISCA best paper this year actually was a mechanism on making better architectural support for backups. That wasn't my work. That was Joshua San Miguel's lab produced that work,
Starting point is 00:36:17 and I think it's very interesting stuff. You know, checkpointing is really a foundational technique in intermittent systems. If you don't have some mechanism for checkpointing, then you can't make any forward progress. Things get a little more interesting when you need to do a whole bunch of work atomically. And atomically here means you can't take a checkpoint in the middle of it. You have to do it all at once while the system is turned on. And so you can think about reasons you might want to do that. So you're grabbing data
Starting point is 00:36:43 from multiple sensors. You want to process it and then spit something out the radio to send an alert because those sensors said there's something that's interesting here. So if you need things to happen atomically like that, then checkpointing actually doesn't work. If you take a checkpoint, it might be the worst case scenario where you collect your sensor readings and then take a checkpoint. And then maybe the device turns off at that point. And maybe it's off for like 10 minutes. And maybe those sensor readings are now totally outdated and don't mean anything. Well, if you've checkpointed them, when the device turns back on, what happens
Starting point is 00:37:14 is you run into the end of that region that really should have been atomic. And so you send your radio message, and that doesn't correspond to reality anymore because you have this long delay in the middle. So there, checkpointing is actually a way to break the system. The model that we've converged on in a lot of the systems that we're building these days, and I'm not saying this is the right or the only model, this is just the one that we keep coming back to because it's useful for us,
Starting point is 00:37:39 is to do checkpoints when the programmer allows us to. So as long as something hasn't been marked as an atomic region, we can grab a checkpoint right before the system turns off. And that's usually sufficient. You can make sure that everything is, you know, all the states lined up and correct and your non-volatile memory is correctly persisted because you treat the checkpoint as a persist point. But then when you have work that needs to happen atomically like that, the programmer needs to annotate those regions into their program. There's not really any way to infer that because that's something the programmer says is a property of their application.
Starting point is 00:38:14 So when they have those kinds of regions, then we need to be more careful with how we match the application to the power system. If the atomic region will consume too much energy or if the atomic region will consume too much energy, or if the atomic region may consume too much energy, those are cases we need to look out for. The latter case, where it may consume too much energy, is actually a pretty interesting problem. You might see it work every single time on your lab bench
Starting point is 00:38:39 using your benchtop energy harvesting setup. But then when you deploy the system, maybe some some input is a little bit different and things are consuming different amount of power and now your atomic region which must execute to completion it requires more energy than you'll ever have stored in your energy buffer on your system so you're kind of stuck and that's a really kind of that's a tricky problem for an analysis to to the programmer get around I think it's an interesting one the the problem of the need to do atomic atomic work like that because it comes right back around
Starting point is 00:39:12 to what got me interested in this whole area to begin with usually those things are tied to a physical constraint of the system those two sensors that I mentioned maybe you have to always collect those two sensor values together and that's because of something in the environment. If you're monitoring temperature and pressure together, the reason you need to collect those together is because those properties are coupled by the environment in which you deploy the system. So it's a physical constraint that entails the atomic region to begin with. And that's what gives us this problem to solve. So I think that's an interesting aspect of this problem too. It's usually because of some environmental physical
Starting point is 00:39:47 constraint that we have to do atomic work like that. So I was sort of interested in what the debug cycle looks like. You were talking about how currently the state of the world is programmers just write an application, deploy it, and hope that it works. What does the debug cycle look like today? And ideally, what would it look like? I would say that the debug cycle look like today and ideally what would it look like? I would say that the debug cycle for intermittent embedded systems,
Starting point is 00:40:12 it has all the punishment of the debug cycle for embedded systems and then it has the additional level of punishment of having to deal with energy and the power system and the environment on top of that. So there's this additional complexity that you're unfortunately forced to face. And sometimes things just work. Sometimes you don't have these killer books and then sometimes it's not that way. And you have a program that takes too much energy
Starting point is 00:40:43 and you don't know why. And so you have to go through and figure out how did my sensors get misconfigured at this point in the code? How did I end up in this loop for more iterations than I anticipated? Reasoning about my program in advance, I thought that we wouldn't be able to get stuck here and consume this much energy. Like I was saying before, the tools for understanding energy consumption are fairly rudimentary and there really isn't a good standard way to debug the energy consumption of a program, although there are lots of people working on that problem. I don't mean to take anything away from that because there's an active area of research in the compilers and systems community and there's a
Starting point is 00:41:23 lot of good work going on there, but there's not a standard way to debug the energy consumption or the power consumption of a system today and I think it will be useful to have tools in the future that let you do things like an energy efficiency regression test you know we don't have that concept now but it would be nice if I change my program to know well hey something just got less efficient are we badly using micro architectural resources in a way that we didn't anticipate? Or did you just misconfigure sensors? Or, you know, even which level of abstraction to look at is a question there for that kind of thing. But having some framework to think about this would be very useful. And I
Starting point is 00:42:00 don't think there is a standard way of doing that right now. I think it's interesting that you mentioned that because a parallel on the other end of the spectrum, which is like high performance computing or data centers, is that energy efficiency is becoming increasingly important. Then you have things like machine learning computing, which is pretty, it does consume a lot of power. And there are concerns around like, can we keep going at this rate forever? And there's a renewed push towards sort of measuring the power consumption of the models that you train, for example, right? And accounting for those things very carefully. Of course, it's not the same set of parameters as on the edge side, but the broad strokes
Starting point is 00:42:36 are similar. Like think about energy, think about power very carefully from the get-go, have tools that sort of enable you to reason about the power consumption of your system and so on and so forth so there might be some interesting convergence of themes over here and I'm very curious to see how this shapes up yeah I think generally it is just it's a nice it's nice to imagine a tool that helps you understand if I make this change in my program what happens to the total amount of energy that the system will consume? And that applies very generally. That's a very broad problem statement.
Starting point is 00:43:10 I would like it if there was a tool that solved that for all domains. I have a feeling that there will be different constraints depending on which domain you look at. You know, the data center or like the kind of conventional edge devices, which are a little larger or tiny beyond the edge little sensor devices all of those are going to have different constraints and I would expect that there might be some similarities but we need to think about different things to answer that question at each of those different levels so along the lines of souvenir's question here about the debug cycle what one something you said souvenir made me think of this um so earlier Brandon you said that you know the way of the world right now is you write, you know, a programmer writes the code, deploys it to 10,000 devices in hopes that it works. So presumably you've done a lot of like benchtop testing to think about the energy consumption in the sense that let's say you are sending
Starting point is 00:44:05 out 10,000 mini satellites into space. Do they come preloaded with the program? And so is it a deploy once and you're done? Or can you then think about, you know, making it, you know, like, can you debug live and say like, Oh, actually, I want to make a change. And then in which case, how do you like, I can't imagine it's low energy to then send a new program up into, like, or do you collect them back? Like, how does this whole process work? Yeah, that's a, that's an awesome question. And a really interesting part of designing an orbital computing system stack. For our latest satellite design, the Tartan Artebais satellite, we actually built in the ability for, so the system has a receive chain, we can talk to it from Earth.
Starting point is 00:44:47 And we built in the ability to replace part of the program using a sort of protected region of code that functions like a bootloader. So we can do, I like dynamic software updating from Earth. Our thinking there was we might want to change what the thing does over the course of its lifetime to vary the workload from one experiment to another or update our... You think about how this would be used in a user model. You want to update the neural network model that you're running or something. You want to send up a new batch of weights and those get plugged into the memory in the right spot. We have the support in our platform, in our satellite bus to do that. That's good because these satellites might be deployed for five years. They might be up there for five years before their orbit decays and they fall into the
Starting point is 00:45:35 atmosphere and they become unusable. I mean, they turn into dust, I guess. But they're up there for a while. And so it's nice to deploy with the ability to do this updating loop. There are things that are a property of being a physically constrained system that can be very difficult to deal with that aren't really related to this, you know, updating the workload, updating the application part of the system. So for example, if you have a bootloader or like a privileged, I don't want to say operating
Starting point is 00:46:02 system because it's not quite an operating system. I don't think there's a good example of one of those for satellites in general yet, although that's an interesting problem too. But the sort of core software that manages all of the devices and lets the different subsystems communicate to one another. If you have bugs in that part of your satellite system, or you have unanticipated conditions that could lead to increased energy consumption, or you have some condition in that part of the code that doesn't deal well with fluctuations and power that you didn't anticipate if any of those lead to a failure a hard stop failure then you can't use your
Starting point is 00:46:39 application updating mechanism because the satellite in a lot of cases won't turn on can't use its radio, can't point its camera, whatever. Whatever subsystem you want to use, you can't use it if there's a problem in that sort of spec core of software that drives the whole thing. So you can do benchtop testing and we did we did a lot of testing. You do a lot of, usually you build out what's called a flat sat which is the satellite all taken apart. And it's all the different boards connected by wires on the benchtop. And you use that to do, it's a very precise prototype. It's exactly the boards that you're going to deploy, but it's just laid out on the table. And so you do a lot of debugging there.
Starting point is 00:47:19 You can do power cycling experiments. We have a big lamp in the lab that is a good approximation of sun on orbit. So we can use that to illuminate our solar panels and watch how the system behaves under different power conditions. And then a lot of just functional testing too. You want to see what happens if you receive a radio packet at the same time that the system is shutting down because you're going into
Starting point is 00:47:46 eclipse or something. You want to test all of these conditions. And they're really tricky things to test because this is a fairly involved and fairly complex embedded system. And we have to think about these environmental concerns. So you do a lot of this upfront testing. And that's the model that we use right now. You do as much testing as you can
Starting point is 00:48:06 and try and validate your design assumptions and show that things are as robust as they can be. And then you hope for the best when they deploy. In large-scale satellites, they'll use redundancy a lot. So if they have computer components on larger satellites where the whole thing is order of a billion dollars to put together and launch, You don't launch one computer. You launch multiple and then you have failover. And of course you do it that way, right? But with tiny satellites, the whole thing is, the whole
Starting point is 00:48:36 satellite built is 500 bucks or something. It's very inexpensive. In some cases, even less than that. So doing redundancy is kind of, it's at odds with the cheap, small, almost disposable ideology behind doing small sets to begin with. You get your redundancy by launching more satellites is actually what it ends up being because they're so cheap to just build out entirely. Very cool, Brandon. I think listening to you talk about, you know, atomic at the, you know, sort of the PL level, you know, like concurrency issues, PL type stuff, you know, very high level programming type things. And then now you're at a stage in your career where you've turned around and built real systems that are being deployed into some of the harshest environments that you could imagine, or, you know, very harsh environment as an outer space. Whereas a lot of computer architect proper, you know, who like dealt with microarchitecture stuff
Starting point is 00:49:49 have actually not built systems at all. So I'm just very curious to hear you talk about your trajectory, A, of like how you went from these high level atomics and like what, what got you into this? Because I remember the first time you gave, I saw you give a presentation on, on these space systems. I was like, when did Brandon get into this? And how did this happen? And so maybe you can talk a little bit about A, that trajectory, and then B, a little bit about just the notion of building real systems. Because I think there's a lot of computer architects, we churn out a lot of them who have never built anything that you can touch. And so how did you get to where you are? That's a cool question. I was just thinking of how long ago it actually was when I was in grad
Starting point is 00:50:33 school. And I think it shocks me. It makes me feel very strange to say that 15 years ago, I was in grad school, which seems like, I don't know, 15 years seems like a long time. I know. I know what you mean. So when I started my career as a PhD student at the University of Washington, I was enthusiastically interested in programming languages, and I realized while I was there that I was actually not just interested in programming languages, but I was interested in what's behind it that makes it run. It was sort of the programming language was this nice abstraction. And it's fun to learn about how those work and to show how you can solve important problems by defining them precisely using a language. But I got very interested in what's underneath. And so I think my career trajectory has been, to some degree,
Starting point is 00:51:28 it's been a series of trying to understand what's underneath. And so that led me from PL into, you know, my PhD dissertation work was on tooling and systems and architecture. There's a thread of concurrency throughout all of that stuff. That was what I was really focused on. But I found it very interesting to see how the layers interact with one another and what kinds of problems fall out
Starting point is 00:51:54 from correct and unintended interactions between the different layers in the system. I love memory consistency models because they are that kind of cross-layer puzzle. It's a topic I like to think about. So that kind of led me down into the lower levels of the system stack. At the end of my PhD, it also got
Starting point is 00:52:16 me interested in embedded systems because it's not just abstractions that end up being an architectural simulator, which architectural simulators are fine. But I wanted to work with something not just abstractions that end up being you know an architectural simulator which architectural simulators are fine but i wanted to work with something that i could actually put my hands on and so embedded systems were nice because now you have the whole system right in front of you and there's there's nothing about it that is is modeled or simulated or anything else i'm just looking at this this little pcb with some stuff attached to it and it's it's doing what i told
Starting point is 00:52:43 it to do and i thought that was really fascinating. Around that time I met Ben Ransford who, he worked in the early days on batteryless systems and the Memento system was highly influential to my getting into intermittent systems. And I remember we got together and we were having pizza and beers one time in the kind of the Ave near University of Washington. And we were talking about this mementos paper. And we realized that if you run on a system with non-volatile memory,
Starting point is 00:53:15 then that algorithm doesn't work out anymore. And so I remember that's one of the things that I was thinking, huh, that's kind of like a memory consistency model problem, but it has this neat twist where you're doing energy harvesting now too. That's cool. So that's what pushed me toward getting into this.
Starting point is 00:53:29 And then I realized that it was actually a really deep well of problems that we haven't, I don't even think we've named all the problems yet, let alone solve them in this area of intermittent computing. That's what led me to start working on these extreme low power constrained systems. And it's fun to keep peeling back the layers. So for example, I've continued to do that in my career, definitely with building out more real intermittent systems and building satellites as an example of that. You can hold the satellite in your hand. And it's really interesting to think that's going to go inside of a rocket,
Starting point is 00:54:03 and it's going to go up through the atmosphere atmosphere and it's going to be flying around Earth. And that's like a really interesting thing. You know, you build the mechanical parts of it. You build the abstractions and software that you need and you build a communication stack and you actually put it to use. I think that that's a really fun way to do research because you can see the result of what you did in a concrete way. It's very satisfying way to do research because you can see the result of what you did in a concrete way. It's very satisfying way to do work. Another example of this that I haven't mentioned is in my lab and actually with some support from a really cool class that we have at CMU in the ECE department, we've been trying to tape out chips recently.
Starting point is 00:54:39 I say trying to, I also say my students have been taping out chips through this class. And it's an amazing process where you go from basically a text description of what you want the world to be like, and then someone has a very fancy 3D printer and they can build you a chip. And that is a really amazing process to see. And I'm really proud of my students for figuring out all the low level nuts and bolts of how to do that. And we've got back a couple of test chips, and it's been a really interesting process to see them go from high-level simulations all the way to silicon for the same reason that I find it interesting to work on embedded systems, because you can hold it in your hand.
Starting point is 00:55:17 This is a concrete manifestation of what we've been working on. There's a lot of effort and a lot of time that you put into making a chip. So I don't think every project needs to go all the way to a tapo. That would be an egregious waste of resources. But I think that it is interesting if you have an idea that would benefit from especially precise power and energy characterization. It's kind of a fun thing to do to push all the way down to silicon.
Starting point is 00:55:43 Does the satellite work that you've done, something you said triggered my recollection of ZebraNet from when I was quite youthful. I remember at the time being like, oh, this is interesting, because we were thinking about caches and replacement policies. And then suddenly there's this thing, and it just felt so out of left field.
Starting point is 00:56:03 But now, and then that's it it i'm suddenly remembering that that's sort of how i felt when i first heard your space presentations like where did this come from but now i'm realizing that there are quite probably some parallels between the two projects and you've got these sensors deployed in some relatively harsh environment where you really don't know what your next um you want regular information but you don't know where your next burst of energy is going to come from did that affect your work at all too or yeah yeah yeah there were a few projects at the beginning of my work in uh intermittent computing um that were very very influential um margaret martinosi i've always had a ton of respect for
Starting point is 00:56:43 the work that she does she's an incredible researcher and a really great person. And I think that her work on ZebraNet was really cool. It was like an outside-the-box idea. It was a real system deployment. Again, you can see the devices working in the way that they were intended to work in the environment. I think that's really cool. It's really interesting to see work get out into the world and produce a real result. And again, I don't think every project should or could go all the way to a fully built system with a deployment. But I think that there's some ideas that every person is working on that warrants that level of investment of time to really push to a deployment and see what the difficult things are and to see, you know, where are the hard parts?
Starting point is 00:57:23 Where are the easy parts? And why are we to see, where are the hard parts, where are the easy parts, and why are we doing this too? I think it can be very satisfying for students to see a good answer to why are we doing this. And why are we doing this might be to do a deployment and see the devices actually in the field. I remember Emery Burger had a paper where he, I think they sort of like glued computers to turtles.
Starting point is 00:57:43 And I think they had something called TurtleNet. I think it was Emory Burger anyway. And I remember having the same feeling about that work, where it's like, this is such a strange and interesting application of the idea. And you can see it work end to end. It's just very cool. system actually do what you hoped that it would do, especially because it's this tower of technology pieces that you end up putting together. And it's this incredible amount of complexity and it's all research stuff. So it's not just complexity, but it's new things that have never seen light in the world
Starting point is 00:58:19 up to this point. And you put them all together and you have a system that actually does something. I find that really exciting. And that's why we want to shoot things into space we're we're building architectures and building systems and it's just a it's kind of a thrill to see it go inside of a satellite and then and then go into space and and i you know i i think that yeah students probably really appreciate this too because they're they're seeing their work come together and do something that's i think objectively pretty cool and i think that that's a that's a cool good part about it is that it just feels cool and it feels important to put pieces
Starting point is 00:58:53 together and to do something like that and it's a fascinating journey from grad school through you know embedded systems to low power systems to physically constrained systems. So in the mode of reflection, like any words of wisdom to our listeners, students, researchers, others in the community, very broadly, based on your experiences. Oh, I don't know if I have any wisdom. One thing that I've always found useful is to be comfortable being naive about things. Go into a new area. I mean, for me, space, I was completely naive. As a sidebar, this is maybe of some interest. I got interested in doing space computer systems in 2014, 2015, because a friend of mine who plays the cello, I was in a band with him, he found some random guy on Twitter
Starting point is 00:59:46 that played the upright bass and also had a crowdfunded project to build tiny satellites. And so my cello playing friend said to me, hey, you might be interested in talking to this guy. Isn't that kind of like intermittent computing? Because I told him about we were working on that. And yada, yada, yada.
Starting point is 01:00:03 The guy that plays the bass is Zach Manchester, who's a close collaborator of mine on all of our space stuff. And we're actually working on building with other collaborators, building out an NSF center right now that we just got funded to work on space systems, space computer systems that we're very excited about.
Starting point is 01:00:20 The reason we met is because of a random connection through the music Twitter world, which I love that that's how things originated here. And now it's so many years later, we're working together to build computational satellite systems. When I started, though, I feel like Zach must have been annoyed after a while because I kept asking these naive questions about how things work in space. And I didn't know anything about it. And we just, you know, we just kind of pushed on it and kept learning as we went along and
Starting point is 01:00:50 failing often and, you know, crashing into things that just didn't work. And eventually you have something that does work. And now I know more about space systems than I did when I started. And I think that there's a real value to going into a new area with just being a little bit naive and being OK with being a little bit naive and knowing that there's a lot to learn. And going and trying something anyway, even if you're not sure it's the right thing to do
Starting point is 01:01:17 because you don't have the experience in the area yet. I also think it's fun to just go and take an approach. It's a good way to get started. You don't know if it's the right one. You don't know the area well enough to make a good judgment yet. And just take an approach. Go try and do something. I think that's a good way to get started.
Starting point is 01:01:34 And I think it's a good way to learn at least. And sometimes you end up doing something that ends up being useful even if you did approach the problem naively. Maybe that's a good way to get out of a rut if the area that you're getting into is in a one lane of solutions, you know, going in naively and trying something that's out of that lane, it could maybe be a good thing. So yeah, so I think going in, going in and being comfortable with being naive in an area that that's, that's been helpful for me being okay with with frequently being wrong. That's it's fine to be wrong all the time. You learn from it. I don't know if that's wisdom, but that's something that's been helpful for me. That's really interesting because what you just said now actually made me think about something
Starting point is 01:02:18 that I wonder about a fair amount. And you're a professor, you're a teacher, right? You're a teacher of students, both undergrad and grad students. And so in my career so far, I have also found a lot of value in just doing things, right? Like you've got to do it in order to see like, oh, because there are definitely in times where someone older and wiser than I am has told me something. And I didn't really grok it until I tried doing something. And I was like, oh, that's what they meant. I get it now. Like this, this is exactly like they, they said, watch out for that corner. You're going to bump your head. And I, I couldn't see it. And then I bumped my head. And so in your position as a, you know, a professor, a teacher of both undergrads and grad students, like how do you set that balance of letting them
Starting point is 01:03:05 just go bump their heads and also, but also being like, okay, you know, I want to teach them. So I want to tell them that there's a thing there and they're going to bump their head. It's a tricky balance. And I think you mentioned that you, you know, you recently had a, I mean, I guess like teaching children too, like, do you let them fall down or you tell them don't fall down? Yeah, I think teaching is a lot like parenting. And my son, Remy, he doesn't know about the existence of corners and he doesn't know about the existence of bumping. And you sort of have to figure out how much bumping do you let them do? That's a really interesting way that you said that, because i think it applies to students as well and and maybe to everyone i mean maybe it's not just students or in some sense everyone is a student in that same way how how much do you
Starting point is 01:03:53 just let yourself crash through things and hope for the best and and how much do you try and uh you know i guess yeah how much do you coach students students and help them to understand what they're getting into? I think there was definitely there's times where proximity to the deadline on the calendar is a factor. You know, if it needs to really get done now, it's like we could do the teaching moment right now. But we have seven hours until the paper needs to be submitted. That's I mean, jokes aside, there is practical considerations. I love teaching undergrads.
Starting point is 01:04:29 I just made up a new undergrad computer architecture and systems course that we offer in ECE at CMU. And it was really fun to teach it for the first time and to see which things I assume everyone knows that undergrads in this course did not know. And there were plenty of those moments where it was, I had to first figure out even what the question was. And then once I had figured that out, I could help the students understand, you know, what
Starting point is 01:04:53 misconceptions they had and help to help to fix those. So I think, yeah, really, it depends on the situation. And it depends on the person, though, how much, how much coaching, how much coaching to apply. And I think, you know, I was just describing approach everything naively, and I think that that is something that we can uniformly apply, and it's good to come in to an area ready to learn. I hope that students do that when they get into a new area of research, and whether that's taking a class or getting into, you know, PhD research or something, just kind of approach it with an open mind and be ready to crash into it a little bit. So I think, yeah, there needs to be some coaching, some guidance, but I think it's good to just have some latitude to flop around and be ready to be wrong.
Starting point is 01:05:34 Well, with that, I think I would like to say, Professor Brandon Lucia, thank you so much for joining us today. It's been a total delight talking to you. It was a really fun conversation, learned a lot, made us, made me, I'm sure both of us, Souveney and I, think about things in a different way. So thank you for being here. Yeah, thanks very much for having me. I really enjoyed this conversation
Starting point is 01:05:55 with the two of you today. Yep, thanks a lot, Brandon. It was a fascinating conversation. And to our listeners, thank you for being with us on the Computer Architecture Podcast. Till next time, it's goodbye from us.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.