Embedded - 326: Wrong in the Right Way

Starting point is 00:00:00 Welcome to Embedded. I'm Elysia White, here with Christopher White. Our guest this week is Aaron Talvati, and we're going to talk about learning as a computer. Hi, Aaron. Thank you for joining us. Hi. Thank you for having me. Could you tell us about yourself as though you were introducing yourself on a panel? Sure. I am an associate professor at Harvey Mudd College in the computer science department. Before that, I worked for nine years at Franklin and Marshall College, where I helped to found

Starting point is 00:00:40 their computer science department. And before that, I did my PhD at University of Michigan. My research interests are in machine learning and specifically in the sub area of reinforcement learning. Yes, we've heard about reinforcement learning from Patrick Polarski a few years ago where he was applying it to robotics and making smart prosthetics.

Starting point is 00:01:05 Yeah, Patrick's work is really cool. Okay, so now we want to do lightning round where we will ask you short questions and we want short answers. And if we're behaving ourselves, we won't ask why and are you sure? All right, I'm ready. Favorite restaurant in Claremont?

Starting point is 00:01:23 Let's say Monankoya Ramen. When do you think the singularity will happen and will our robot overlords have mercy? I'm skeptical that the singularity will happen. I don't think our robot overlords will be capable of mercy. Favorite video game, if you have one? Oh, no. I'm really terrible at favorites. Favorite video game, if you have one. Oh, no. I'm really terrible at favorites. Favorite video game. Okay, well, Shadow of the Colossus is a longtime favorite.

Starting point is 00:01:52 Favorite video game for training models in machine learning. Oh, right now I'm spending a lot of time with Pong. Really? Okay, we'll get back to that. Which make better students students computers or humans oh humans for sure you have to say that yeah but the robot overlords won't have mercy well i just think it's it may be in apt to apply the word mercy to them do you have a tip everyone should know? That everyone should know? I don't know.

Starting point is 00:02:29 The thing I've been thinking about lately, just because of everything that's happening, I've been rapidly trying to put my courses online. So a new tip that I'm enjoying is that Zoom, which is normally like a conference call software, actually has a lot of the functionality of a good lecture recording software. Like you can just make a conference call with just yourself and then share your screen, annotate things. It does automatic transcription. So that's been pretty handy to me.

Starting point is 00:03:00 If there are any professors out there who are in a similar boat, give it a try. I can write that down for my brother. Yeah, that's pretty cool. I've used Zoom for years and didn't know you could do that. That's great. Okay, so now some longer questions. And I think we're going to start with one of the shorter ones. Pong?

Starting point is 00:03:16 Really? Yeah, really. It's hard to emphasize how bad we are at making computers learn to play video games. I mean, why? It's a deterministic system. It shouldn't matter. I mean, why is it so hard? So I should be clear.

Starting point is 00:03:37 Pong is actually one of the easier games for reinforcement learning systems to learn to play. But most reinforcement learning agents that have been successful have used an approach that's maybe unfamiliar and maybe counterintuitive to most people. I think when people think about playing a video game, they think, oh, I'm gonna like, you know, press some buttons, move the joystick, and then see what happens on the screen. And then I'll be able to make predictions. Like if I do this, then this will happen. If I do this, then this will happen. And then you can use that to make decisions, right? You can say, I want this to happen, so I'm going to do this. Or, oh, I don't want that to happen, so I'm going to do this to prevent it or something like that. So that's what I would call model-based learning, where you're learning to make predictions in the world that's like a little version of the world in your imagination.

Starting point is 00:04:29 And you can interact with that imaginary world instead of the real world to plan. I think I understand that because there are some games that I play that it took me a long time to even learn the rules. Right. And so it's more about figuring out what the goals are than, you know, actually... The mechanics? The mechanics of it. And then there are some games that I play that once I figure out the mechanics, I'm totally bored by the game. Right. And you can think about, you know, just like classic board games like Checkers. You can describe the rules of checkers in just a few minutes. And now you have an imaginary version of the checkers board in your mind that you can play forward. And so now you can imagine possibilities as you imagine possible moves you could take without having to actually interact with the checkers board.

Starting point is 00:05:23 Yes. I mean, that's as long as you are willing to do the tactical thinking. Exactly. Right. So in contrast, the most successful reinforcement learning agents use what I call model-free learning, where they don't even attempt to learn, to make predictions about what's going to happen next in the world. They only try to learn, like, how good, how well is this action going to work out for me in this situation? So here's one example in that I think a lot of humans experience,

Starting point is 00:06:03 where to say you every day, you drive to work and you use the same route every single day. So every day you go through this intersection and you turn left. And then one day you're at that intersection and you turn left. And then a couple of minutes later, you're like, oh no, I wasn't going to work. I was going to the party. I should have gone straight instead of going left. Right? It's never happened to me. So there's two behavioral systems in play there. There's one that some behavioral scientists would call the habitual system that I would call the model-free system that is just like, oh, when we're at this intersection, it's a good idea to turn left. It works out for us to turn left at this intersection, but it's not thinking about the consequences of turning left. And then there's another system, which would sometimes be called the goal-based system or the model-based system, that says, oh, I'm trying to achieve this goal, so I need to do this and this and this.

Starting point is 00:06:54 Like, I need to make this happen and then make this happen, change the world in this way. And so that system catches up with you and is like, oh, no, we weren't trying to go here. We were trying to go there. We made a mistake. And the first system is much faster. Oh, yes. Much faster. But the second system requires, well, I mean, in humans, it requires effort, which is why we often don't do it.

Starting point is 00:07:15 That's right. I mean, they're both very important tools. But interestingly, in the successful, you know, reinforcement learning applications that have been applied to, you know, big, complicated things, and when I say big, complicated things, I mean like Pong, they use only model-free approaches. And so when I say it's hard to learn Pong, it's because I am interested in model-based approaches, and they are still not robust enough to even be applied to something like Pong. So this is not something that's well applied to, say, chess at this point? Oh, interestingly, it is well applied to chess. Oh, okay. So chess is state-based. You look at a board and you can predict things based on that board.

Starting point is 00:08:04 But Pong, you have to look at the previous image and the next things based on that board. But pong, you have to look at the previous image and the next image in order to predict the future. That's right. Okay. So that's part of it is you have to remember part of the past, but actually it's even more complicated than you're imagining. So in the version of pong, everyone thinks that pong is just like, oh, when I go up, my paddle goes up. When I go down, my paddle goes down. And the ball just travels linearly and bounces in the normal bouncy way. In the Atari version of Pong, the movement of the paddle depends on the last 17 actions that you took.

Starting point is 00:08:45 Really? That's why I'm so bad at it. Exactly. It's actually very hard. If anyone, like, go back, find an emulator, and try the Atari 2600 version of Pong. It's actually very difficult. And the ball does move linearly normally, but when it interacts with the

Starting point is 00:09:01 paddle, the direction in which it will bounce off depends on all sorts of complicated things, particularly similar features of what the paddle's been up to. That's why when you're moving fast, it goes at a different angle than if you were just sitting there while it bounced off you. That's right. And it also depends on where in the paddle it bounces off. And so one of the things that's interesting to me about Pong as a microcosm is that it has lots of aspects that are really easy to predict. Like when the ball's just in the middle of nowhere, it's just linear movement.

Starting point is 00:09:37 Like that's super easy. But the paddle itself is very difficult to predict. And the ball's interaction with the paddle is very difficult to predict. And so if you're just naively learning a model of this system, it's probably going to get those two things wrong. It's going to get a lot of things right, but those two things are very complicated, so it'll approximate them and there will be some error. And the interesting thing is that even if it's a relatively innocuous seeming amount of error, it's enough to totally ruin the ability to plan. Because if your model is wrong enough that you can't tell the difference between the ball bouncing off the paddle or going through and losing a point, then you can't make decisions. You can't tell whether an action is good or bad because your model can't tell you whether it's

Starting point is 00:10:32 going to work out or not. Okay. So there's the model-based, which is trying to learn the rules. What was the word for the other one? Model-free. Model-free. And did the model free versions play Pong okay? Oh yeah, they blow it away. They have no problem. What are they learning? So they're learning like, oh, the kind of thing that you imagine your strategy is in Pong, right? It's like, oh, the ball is kind of above my paddle. So maybe I better move my paddle upwards. And of course, in Pong, this is not important. But in the real world, what are the applications? Yeah, so obviously, Pong itself is not an application with inherent value. But what I'm trying to get at is this issue, which is going to be true of the broader

Starting point is 00:11:27 world, which is there are lots of parts of the world that are simple and easy to predict. And there are some parts of the world that are going to be really complicated, very difficult to make predictions about. And somehow, we want our agents to be able to navigate in this space. We want them to be able to benefit from learning to make predictions about the future, but not be completely, um, ruined by errors in their model, which are inevitable because parts of the world are very complicated. I remember actually when I went to Mudd, I did a thesis on intelligent tutoring systems.

Starting point is 00:12:09 And we were talking about, or I was working on expert models. Sure. And it feels like this is sort of the same, except these are machine learning expert models versus the sort of expert models that were used in the 90s. I think that's a reasonable comparison. And it actually gets at the core issue, right? So there are lots of fields where model-based methods are applied.

Starting point is 00:12:39 So, for instance, in robotics, it's very common to have a model of your robot's dynamics. Like if we apply this torque, then it will move this joint and this angle, you know, that kind of stuff. But those models are human designed and usually iterated over a whole lot in engineering processes, like, oh, let's get this model as good as we can. Sometimes there are learned aspects, but usually they're sort of limited within a broader framework that's constraining that learning problem. Yeah, we call those empirically determined aspects. Sure. And so it's building in a whole lot of human effort to make the model very high quality, which means that it can then be used effectively for planning. So in the AI realm, I think just aesthetically, we're often more interested in autonomy. We want to think about systems that can cope with the world without a whole lot of human intervention,

Starting point is 00:13:53 that can encounter a new phenomenon and not have a designer come in and hand it a whole bunch of information about that new phenomenon, but have it just deal with it in the moment. And so that's where the challenge comes in, where we have to have model classes that are robust enough that can handle lots of different things that might happen, but are constrained enough that you can actually learn a pretty good model. And then somehow, even in that space, figure out how to deal with the errors that are definitely going to be there because we're trying to learn models of basically anything. How is it different? I mean, I know free learning has to be like deep neural networks,

Starting point is 00:14:35 and you just throw a bunch of data at them, and you tell them the right answer. I mean, that's supervised learning, and you're doing reinforcement. But even with reinforcement, at the end, there is a, I won the game or not, or what, there's some reward function eventually. How do you, how do you tell it you want to do models instead? I mean, how do you organize this such that you get rules based? I see. So usually we have, we think of it as being separated into two learning problems. So there's the problem of learning the model, which is more like supervised learning. It's more like, oh, here's what I've been observing in the world, some stuff from history. And then that's the input to my my supervised learning problem and then the output would be here's what's going to happen next cat pictures in cat word out sure for instance or

Starting point is 00:15:32 like you know the ball's velocity is this and the paddle's velocity is this and then here's what i think the ball's position and the paddle's position are going to be next something like that so so that's one learning problem. And then the other learning problem is the more traditional reinforcement learning problem where we're trying to learn a way of behaving to maximize this reward signal and do a good job, whatever that means. And so the interface between these two problems is the planning where somehow we want to use the model to augment what the reinforcement learning process is doing so that it can do it more quickly or more effectively or something like that.

Starting point is 00:16:11 But how? I mean, how do you make it more robust like that? How do you tell it what the rules are? Oh, well, the hope is that it will learn what the rules are. So that's what the what's going to happen next function is supposed to provide. Okay. So in model-free, you're providing your thing we're trying to learn is something called the value function. And the value function's job is to tell you, like, here I am in this, like, the world is in this particular state. And now I'm going to think about this state, and I'm going to think about taking this particular action, like I'm going to push the joystick up, for instance, the value function tells you what is the long term benefit of pushing

Starting point is 00:17:12 the joystick up, and then behaving ideally optimally after that, or maybe, you know, at least as well as I've been behaving recently after that. So the value functions job is to sort of measure the long-term benefit of taking a particular action right now. Okay. So sometimes like in control theory, if this connection helps, sometimes that's called the cost to go function. Okay.

Starting point is 00:17:39 Where it's similarly, it's like, Oh, here's the long-term cost of doing the thing that I'm about to do. So that value it's like, oh, here's the long-term cost of doing the thing that I'm about to do. So that value function is like, the purpose of it is to tell you like, how well is it going to work out for me if I make this particular decision in this particular situation? And it doesn't predict what's going to happen next. Like where's the ball going to be? Okay. In a model based system, we'll still probably try to learn some kind of value function, but we're also going to learn where's the ball going to be next. Okay. to interacting with the world in order to learn about the value of taking this action or that action.

Starting point is 00:18:25 We can also interact with the model, our imagined system, to figure out whether this action or that action is going to work out. So we need less data. Okay. Where do the hallucinations come in? What? All right. So... You weren't expecting that question, were you?

Starting point is 00:18:47 No, I wasn't. That was great though. So let's think about the planning process. There's lots of things you could do. But the main thing is that the planning process, we're trying to get our hand on the long-term benefit of taking a particular action. So if we're going to use our model to do that, we have to kind of imagine, well, if I take this action, what will happen next? And then once I'm there and I take some other action, what will happen after that?

Starting point is 00:19:19 And then after that, and after that, and after that, and after that. Then you can play chess. Then you can play chess. Or you can imagine, so if you're playing pong, right, that's how you tell I should start moving the paddle up now. Otherwise I'm going to miss the ball in 10 steps. For instance. Okay. So that process is called iterating the model, right? We're like taking the model, making a prediction and then taking its prediction and feeding it back in to get another prediction, and then feeding that back in and to get another prediction and so on. So the further out you go, the less likely it is to be real. Yeah. So what happens is even little tiny innocuous errors in your model

Starting point is 00:20:09 that you might not even notice on a one-step level expand exponentially as you go out, because you're feeding wrong things in and getting even wronger things out in this cycle that just keeps producing more and more garbage. Yeah. Tiny perturbations lead to junk. That's right. So yeah, I did have to do this piece of work where at least an attempt to mitigate this issue. So the problem was to think about, well, all right, so why do we get garbage from garbage? And one of the reasons is that the model, like if say you're training it like a normal supervised learning model,

Starting point is 00:20:53 so you've got a bunch of data from the world and you're like, here's what the situation was, and then here's what happened next. And here's what the situation was, and here's what happened next. And you feed that into whatever your favorite model class, your favorite optimizer. And you get out some function that takes here's what happened. Here's what's going and gives you what's going to happen next. So the issue is, if it produces output with some error, then maybe that output could never possibly have happened in the real world. Or maybe it never would have happened.

Starting point is 00:21:26 And so the model will not have been trained on any inputs that look anything like that. Oh, okay. So in supervised learning, that's called a training set test set mismatch. That's where the distribution of examples that we're going to use our model on are different than the distribution of examples that we're going to use our model on are different than the distribution of examples that we trained it on. Yes, I'm very, very familiar with that problem. Painfully familiar. So this is, I think this is really interesting.

Starting point is 00:21:58 It sort of means that some of the, that's like a really basic assumption of supervised learning, and it just doesn't apply in this setting. So the attempt to get around it was to say, well, all right, well, then maybe we should make the training distribution more like the test distribution. So let's take actual outputs from the model and put them in the training set so that it will be trained on the garbage that it produces. So in reinforcement learning, you don't have the supervised data. You end up creating it yourself as part, or the machine learning widget ends up creating it as part of its process. Sure. Yeah. And so if I had a system where I'm trying to drive a car, I could be saying, having, you know, I'm on first step of reinforcement learning. I think the throttle should be a million. And I think the steering angle should be,

Starting point is 00:23:02 I don't know, let's just go for 2 pi. Sure. And that's, I mean, maybe the 2 pi is valid, sort of. But why, I mean, in the real world, we couldn't go a million whatever. Right. And so how would you get a training set of completely implausible data? Oh, how do you get the implausible data? You get it from the model itself. So here's the...

Starting point is 00:23:31 Well, I mean, so if I say I want to go a million miles an hour, how do... Oh, and then it feeds back in. I should let you talk. All right. So the model is like its function, right? So it thinks something is going to happen. Right? Yes.

Starting point is 00:23:51 So you can ask it what's going to happen. And now the point is, so let's say it's, okay, here's an example from Pong. It's like a real thing that really happens with some models. So sometimes you get a small pixel error where like there's some pixel that's maybe white that wasn't supposed to be white. Well, that doesn't seem like a big deal. It's very small, little speck. But the consequence of it is that sometimes it means that it looks to the model that that white speck is not usually there. And so then in the next step, if you take that slightly erroneous frame and feed it into the model, in the next step, it might think that that white spec is a ball.

Starting point is 00:24:34 And now there are two balls on the screen, as far as the prediction is concerned. Oh, okay. So that's terrible, right? Like a tiny little pixel error has resulted in this huge semantic error now this frame is worthless yeah okay yeah no i got that i'm still trying to figure out how to connect things together but yeah yeah so let me let me see if i can do it so so now i got this garbagey frame what would what we'd really rather happen is if the

Starting point is 00:25:07 model saw this little white speck, which is not really ball-shaped, and looked at it and said, oh, you know what happens? Sometimes when I generate images, I generate a little white speck, but that's not the ball. I should predict that that white spec is just going to go away. That's what we would like to have happen. Yeah, yeah. So here's how we're going to accomplish it. We're going to take the frames that the model produces. Like here's the frame with the white spec that the model has generated.

Starting point is 00:25:41 It's got this error. And we're going to line it up with what really is happening in the world. So in the real frame, there's no white spec. And so we're going to take the white spec frame and treat it as if we did really see that happen. But then the real next thing happened. The real next thing doesn't have a white speck. And it certainly doesn't have two balls. Okay. So we're adding, we're augmenting our training set to now also train the model to kind of try to veer back to reality whenever it starts to deviate.

Starting point is 00:26:23 Did that make sense? Yes. Although I missed, I didn't, I realized just now that the white spec came because the model created it because the model is dumb. Because it's dumb, right? It's an approximation. It's going to make little errors sometimes. Yeah. I mean, it doesn't know at the beginning of training, it knows nothing. Right. Okay. Where do the hallucinations come in? So I... Sticking doggedly to this point. Called this type of training. So there's a venerable training technique called experience replay, where you remember things that the agent has encountered before, and then you just reach back into that sometimes and replay that experience as if it's happening right now, as a way to just like get more data. And so this idea, instead of reaching back into things that

Starting point is 00:27:26 have actually happened, it reaches into things that the agent has imagined. And so I just in a sort of punny kind of way called that instead of experience replay, called it hallucinated replay. And that's where the hallucinations come in. Okay. When you write about this, you provide an, what to me, abnormal level of mathematical proof. Sure. I find it kind of reassuring because so much of machine learning is like, I don't know, I tried it. Kind of worked for me, but, you know, may not work for you. Made up a new activation function. It seems to work better. Why is there so much math in yours and why isn't there math in other ones? All right. Well, I can talk about myself and then maybe I can share some opinions about the rest. So, well, you know, like I think that math is a way to like expose the truth about the universe in a way that's harder to do through empiricism.

Starting point is 00:28:40 I don't exclusively work in theory. I certainly know people who are just theoreticians. So I'm attracted to trying to tackle problems that really affect practice. Right now, people just don't apply model-based reinforcement learning to interesting problems because it just doesn't work. And so if we could handle that, then that would be huge. We would all of a sudden be able to benefit from all of these approaches more robustly. So practice matters to me. I want to go after problems that I can see when I try to do

Starting point is 00:29:18 things. But then if I can, if I have something that seems to be working, I like to prove something about why it works, so that I feel like I've actually learned something instead of just kind of stumbled on something that was working in a few examples. So this piece of work is actually a really good example of that. I have a paper from a while ago that is just like, I don't know, let's try it and see. And then it works surprisingly well. And I'm sort of like, oh, wow, it works. And so then I went back to my cave and started doing math to see if I could figure out whether it really works or whether just like sometimes kind of works. And it does turn out that you can prove some things about why this is a good idea, which was really exciting to me. So I think it's more

Starting point is 00:30:07 just an aesthetic way, an aesthetic part of how I structure my work that I like to go after fundamental questions. And then if I can prove something about those issues and about possible solutions, why doesn't everybody do that? Well, I mean, listen, I mean, not everyone has to. I think that there is real value in just like running ahead and being like, look at this and look at this and look at this. As long as somebody is coming up behind and saying, all right, well, that's a real thing. That's not really a real thing. And sort of cleaning it up. We're certainly in a moment in machine learning where we just got a whole bunch of shiny new toys. And so there's a lot of people who are mostly doing the exploratory

Starting point is 00:30:56 work of finding out what we can do with the new toys. And that's important. We need to know kind of how far we can stretch this stuff. I do think that eventually we'll hit some limits. We'll find some problems that we wish we could tackle and find that we're not really able to practically do it with deep nets, for instance. And then we'll have a new explosion of more fundamental, more theoretical work. I mean, this is a cycle that's gone on for decades in AI. Yes. Although this is the biggest boom we've ever had. I think that's true. Yeah. Certainly the most economically impactful, which does change the dynamics a little bit. But learning it has been really hard for me, partially because it is growing so fast. And half the time I read a paper, I get through it and I realize that I've already heard about this, but it was with totally different terms.

Starting point is 00:31:59 Is there formalization coming soon? Is it here but poorly distributed? Oh, that's a great question. Is there formalization coming soon? Is it here but poorly distributed? to where all the furious activity is. Instead, I try to teach my students about the fundamental principles that underlie basically all of this stuff. So let's learn about Bayesian reasoning and let's learn about least squares optimization and let's learn about likelihood maximization, like different foundations, different foundations that you could build an algorithm upon, gradient descent, all of these things. And so then, with those tools in mind, it's easier to, I think, pick out the signal from the noise, right? So a lot of, for instance, deep network is just tweaks to network architectures. Deep down inside, it's all just gradient descent on a really complicated function. And whether this tweak or that tweak is going to be helpful in your problem, I mean, it's pretty up in the air right now, unless you can find someone who's done some example that's really similar to what you want to do, then you can maybe just import their stuff.

Starting point is 00:33:41 Otherwise, it really just does, I think that's what grad students are for right now, is just to provide the hours of tweaking and exploration and experimentation until you find a functional form that works. It's not very satisfying. It's certainly why I, in my research, I tend not to focus too much on deep nets because you always run this risk that like, oh, I tried something and it didn't work. And it's like, well, now what? I don't know if it like fundamentally doesn't work, if it's because I just need to add a max pooling layer, if it's, you know, what's happening here. Yes. The model parameters, tweaking them is, I mean, it can be a lifetime.

Starting point is 00:34:24 Yeah. parameters, tweaking them is, I mean, it can be a lifetime. But those things that you mentioned, the, I want to say K nearest neighbors, but you didn't say that one. Oh, that's a good one too. The techniques you mentioned, they're all math heavy. That's true. Do I have to learn all the math or can I just use a cookbook? Well, I think it depends on what you want to do, right? So certainly there are lots of really cool software tools out there that let you just mess with these things and build them into your systems.

Starting point is 00:34:58 And I think sometimes that's totally appropriate. The main risk you run, though, is the fact is that every single one of these approaches has limitations, has built-in assumptions. And I think it is important, at least eventually, to know what some of those are so that you don't find that you're misapplying something. You know, like, as we were just talking about train set and test set, you know, I think it would be very easy for a novice who hasn't really looked into any of the theory, you know, they could just apply whatever their favorite thing is to some training set and get zero error and be super thrilled. Whereas somebody who spent a little bit of time studying machine learning might understand, oh, you know what, really what we care about is generalization. Zero error on the training set is probably a bad sign. Maybe. Because it probably means that our function was so complicated that it just memorized the entire data set. And then if we throw new data at it, it will have no idea what to do.

Starting point is 00:36:09 How do I learn it? I mean, it is really complicated. And it's one of those things that I considered trying to go take some graduate level classes, but that doesn't work out for me. Do you have recommended books or sites or methods? as possible resources for my students. And I eventually just gave up and just gave them my lecture notes. I don't think, I think that there are books out there. What I found was that there are books out there that are targeted to a lay audience, but that they tend to de-emphasize some of the things that I think are really important.

Starting point is 00:37:05 And they're more just like, oh, here's the code for, you know, whatever, K-Nearest Neighbors. Now you can implement it too. But instead of talking about, oh, here's when it's good and here's when it's bad and here are the core issues with it. So here's when you should apply it and not. I don't know that there's, and then there are resources that are, that do cover that stuff, but that I find to be

Starting point is 00:37:31 nearly unreadable because they're, they're mainly targeted as reference material for, for already sophisticated machine learning folks. So I haven't personally found that nice middle ground that's like, oh, here's at the level that I would want to explain it to a new person that here's the fundamental principles that we want to apply. And then, but we're starting from ground up as not assuming that you're really sophisticated with matrix math and really sophisticated with, you know, multivariate calculus and all that stuff. So I don't, I'm sorry, I don't have that resource. In reinforcement learning in particular, though, I will say that I think the, like, the book,

Starting point is 00:38:16 which is called Reinforcement Learning and Introduction by Sutton and Bartow, is actually a pretty accessible resource. It does get pretty mathy at times, but I think that if you're willing to sit with it, it is relatively self-contained and doesn't require you to take a whole curriculum in order to crack the book open. To answer some of my own questions, or the book that I like the most right now is Garon's Hands-On Machine Learning with Scikit-Keras and TensorFlow. It actually did start with

Starting point is 00:38:55 the lower level statistics techniques, but it didn't have a lot of math. I failed out of Ian Goodfellow's book uh deep learning which i would really like to learn but the yeah it was it was just too much math and not enough how do i use this right and maybe not enough intuition i think that's a big part that's missing in a lot of things it's like oh here look let's just do a bunch of matrix transformations and see what happens. And the author, of course, can see all that in their head.

Starting point is 00:39:31 But often the reader is at sea about why this is all happening. That's a good tip. I will write that one down. Like I said, it's been a few years since I taught machine learning, so I'm sure there have been new books. And that's good information. I think I remember that book also has some few years since I taught machine learning, so I'm sure there have been new books. And that's good.

Starting point is 00:39:45 I think I remember that book also has some of the things you were talking about, like this is good for this, but not for this. Yeah, that's exactly what we need. Yeah, yeah. That was, it has been a really good book for me. Putting together a lot of the random assorted, I picked it up here and there techniques and ordering them. That's awesome. Yeah. I mean,

Starting point is 00:40:08 it's like I said, I think if you want to just apply things, it's not necessary that you dive deep into all the optimization stuff. That's more like, if you want to be capable of inventing new algorithms, you're like, Oh, you know what?

Starting point is 00:40:23 No one can solve problems like this. Like, let's figure out how algorithms, you know, like, oh, you know what, no one can solve problems like this. Like, let's figure out how to, you know, change an algorithm or make a new algorithm that can tackle this. Then you need more theoretical base. But I think if you want to apply machine learning approaches to some data that you have, I'd say mostly what you need to know is, how do I do these things? And then what are their limitations and built in assumptions? So you can do a good job of choosing what to do. Cool. What about the ethics of AI?

Starting point is 00:40:55 We've been hearing a lot about this in the news. Well, before the current news, um, do you, do you worry about it? Do you teach it in the classroom? Sure. So one thing I will say is I will admit that I wish I taught more of it in the classroom, and I'm still trying to figure out how to fold it all in. But I do absolutely try to make sure to keep connecting when we're learning about, you know, AI techniques or machine learning techniques to keep connecting it to what is really happening in the world. AI is creating all kinds of ethical concerns. And I think for the most part, as a society, we're not super prepared to handle them.

Starting point is 00:41:52 But it's not, I mean, so a lot of times people bring up the ethical concern, like, well, what if we make a computer that destroys us? And listen, I mean, that's sure of course we should worry about destroying ourselves but in at least from my experience in the field I don't see that as an imminent threat it can't even play pong it can't even play pong exactly that's what it wants us to think Andrew Ng is a pretty well-known researcher in the field and and he has said in the past something like, I don't study evil AI for the same reason that I don't study overpopulation on Mars.

Starting point is 00:42:38 It may be a problem eventually, but it's probably not the most immediate thing to worry about. So the more immediate concerns are to do with people and what they do with the power that these systems give them. So that includes, you know, the like vastly increased ability to surveil everyday goings on, right? If you can now take this mountain of data and actually pull meaningful information out of it, that's a new capability. Things like manipulating the behavior of people, right? So that's what targeted ads are at their core. It's a data-oriented way to figure out what kinds of levers and buttons you can press to affect a person's behavior. And like advertising has been around forever, but the capacity to do it at a larger scale, to fine-tune those approaches, I think that's new. So there's just all kinds of stuff going on that does feel a little bit like playing with fire. And I don't think that everyone is fully

Starting point is 00:43:55 on board. I think a lot of people tend to think, oh, well, a computer's doing it, so it must be fair or it must be rational. You know, like people are applying machine learning techniques to policing, to criminal justice stuff. There's no reason to believe that those algorithms are producing fair outcomes. They were all designed by humans and they're all being applied by humans.

Starting point is 00:44:20 So they're going to encode all the human garbage, including bias. You know, not just how do I do the thing that I want to do, but also to think past that to what are the long range consequences of this thing existing? And can I do that math that I'm so happy with, do you also worry about the applications and the long-term? Yeah, I suppose I do. Maybe it doesn't creep in too much because my work tends to be more conceptual and theoretical. So pretty much nothing I've ever published is deployable in any sense. So it's not an immediate concern, but absolutely it matters to me that the problems that I'm working on are more likely to, at least as far as I can predict, are more likely to benefit us all more than they're going to hurt us all.

Starting point is 00:46:01 I tend to agree with you. It's good to hear you say that. So going back actually to the question I have written that I should ask you first is that your bio says you apply machine learning to artificial intelligence. Aren't those the same thing? Oh, I wouldn't say that at all. I think that there,

Starting point is 00:46:24 I think of them as being, and I guess different people will give you different answers. So I have an office neighbor who would answer this in a totally different way. But for me, I think of them as being intersecting, but not fully intersecting fields. So machine learning, I think of as being the field of taking data and turning it into useful computational artifacts. You could do that for all kinds of reasons, with all kinds of goals and with all kinds of sensibilities. People apply machine learning stuff directly to, say, biological problems, and their entire goal is like, I just want a piece of software that takes the amino acids in this protein and tells me its shape. And so I'm going to use data-driven methods to do it, but I have no bigger ambitions than to just get a piece of software that answers this question. Machine learning is only one way that I could do it, right? Whereas I think of artificial intelligence as typically being not so much defined by how you're doing something or what you're doing, but as a set of goals or a set of aesthetic concerns about how you do things.

Starting point is 00:47:47 There's a long history of AI problems becoming not AI as soon as they're solved. Yes. Which, of course, isn't going to happen to machine learning, right? Machine learning doesn't become not machine learning once you have a successful approach. It stays machine learning. So in AI, I think of it as being more of a set of philosophical positions and goals. It's like, we're trying to maybe achieve some similarity to natural intelligence, or we're trying to understand some aspects of natural intelligence. And that affects how we choose our problems, what kinds of approaches we consider to be acceptable. So we go back to Pong, right? Again, as a problem, it's not interesting at all. It would not be hard for just a sophomore to program a Pong player that would win every time. So it's interesting as a problem only when we limit our approaches to things that seem like they might be plausibly part of an autonomous system.

Starting point is 00:49:09 So absolutely, I think you can use machine learning techniques in service of AI goals. But of course, lots of people have used non-learning techniques in service of AI goals. There's a long history of, for instance, logical reasoning being applied to answer questions about how do we plan? How do we reason about the world? These are questions about cognition, not so much about solving a problem. I like that. I like that philosophy because so much of the discourse right now about AI is about machine learning and deep learning. Then you hear about self-driving cars and everybody, you know, conflates that with AI and the cars are thinking about how to drive

Starting point is 00:49:49 and doing all this stuff and they're becoming, you know, sentient robots and everybody treats it that way in the popular discussion. But that's not the case at all. It's just there's very specific things they're doing. They're not doing general AI kinds of things, right? And once we have reliable self-driving cars that everyone just uses, no one's going to think of it as thinking. Just the same exact way that no one thinks of their fuel injection system as thinking. There's a computer. It's making decisions. It's making decisions I wouldn't know how to make sure so therefore it must be smarter it must be smart right i mean this this goes back centuries right like the first mechanical calculators people were saying oh it's like it's thinking because it can

Starting point is 00:50:37 do division right that there had never been a thing that could do division other than a human before. So, and now calculators are just calculators. Of course, they're just dumb little calculators. That's right. I think it's fine. I think that moving line makes sense because I don't think AI is a set of techniques. I think it's a horizon. Looking at your paper titles, I keep being struck by how often I'm like, well, yes, of course. For example, you have a grant right now, using imperfect predictions to make good decisions. Sure. And this seems like an important life skill. Yeah, I agree.

Starting point is 00:51:30 How does it work with your research? I mean, how do I make imperfect predictions and good decisions, but really this is about your work. What does that mean? Oh, yeah. So you said, how do you do it? I would love to know how you do it. Well, I mean, as I walk around the house, I, you know, try to predict where the dog will be in order not to trip over them. And yet I still manage to trip over one dog or the other at least once a week.

Starting point is 00:52:08 So they're definitely imperfect predictions. They're mostly good decisions. Right. Isn't that amazing? Like we live, just think about the world. Like we live in this incredibly complicated world. We don't even appreciate it most of the time. Like if you go outside, you're going to see a tree.

Starting point is 00:52:25 And it's going to have thousands of leaves on it. And you have no capacity if you look away from the tree and then look back to know where all those leaves are going to be. Or if you look at your carpet, it's got patterns and things are swirled around and someone stepped there. So there's a footprint and all kinds of things. But you look away from your carpet, maybe you remember some things for a short period, but you're not going to be able to model like, oh, how are all the fibers of the carpet going to change? So the world is incredibly complicated. Somehow you and I have this. Oh, and Chris, too, I suppose. This amazing ability to filter out unnecessary detail and also to cope with uncertainty and also to know when we're in situations where we have a hard time predicting. And somehow we're all still like walking around and driving cars and eating sandwiches and stuff like we're totally functioning in this incredibly complicated world.

Starting point is 00:53:26 I'm gobsmacked. I have no idea how you and I do it, which is why I study what I do. And so what are the methods you use to investigate this? I mean, this is part of the reinforcement learning. That's right. So there's sort of a few principles that I'm trying to get at and understand how they can become more integrated into practice. So we talked before about hallucinated replay. one kind of principle, which is that when we learn a model, there's kind of a tradition in model-based reinforcement learning to think of there's two problems, right? There's let's learn to predict the next thing in the world. And another problem, which is let's try to predict how good it is to take this action or that action. And those two are thought of as totally separate. Like you could just, oh, let's just do supervised learning for one part and let's just do reinforcement learning or traditional planning

Starting point is 00:54:29 approaches for the other part. And then we'll just glue those things together. And it turns out that doesn't work. Like that's the sort of the lesson from hallucinated replay. And so the one idea there is when we're learning a model of the world, we should know that it's for planning. And then we should adjust the way we optimize our model so that it's wrong in the right way, so that whatever errors it has are not catastrophic for the purposes of planning. Because of course, when we're approximating, we get to decide, at least to some degree, how do we apportion our resources? What are we going to be accurate about? What are we going to be wrong about? And so we want as much as we can to

Starting point is 00:55:19 shape our approximation so that its predictions, its wrong predictions are somehow still not terrible for the purpose of planning. So that's one thing that I'm trying to understand. Another thing is the other side, right? It's like, oh, say we're going to plan. So we're going to assume that we have some wrong model of the world. How do we use the model in a way that we get good advice from it based on stuff that it knows and can reliably predict, but we are not totally ruined by the stuff that it doesn't know or the errors that it has? So can we detect what it's wrong about? And can we find ways to use it selectively so that we get the juice but not the rind? This sounds, I mean, going back to what I do know about, it sounds like common filters where they can track error estimates of their inputs and provide error estimates of their outputs.

Starting point is 00:56:26 Is that a reasonable comparison or is it totally different? Yeah, I think it's reasonable. So common filters, well, one thing I should say is that common filters are primarily for the filtering problem, which is a little bit distinct from what we need in this application. So like filtering is like, oh, here's what's happened to me so far. What is the world like right now? Where am I, for instance? Whereas for planning, we need to imagine the future. So here's what's happened so far. What's the world gonna be like if I behave like this for a few steps? So the other thing I will say is that common filters are measuring some level of uncertainty. And that's helpful.

Starting point is 00:57:16 But there's this interesting sticky wicket that I'm still trying to like tease apart and figure out what it means, which is that, so say there's uncertainty in your world, like say there's Gaussian noise in your sensors, or, you know, you're rolling a die or flipping a coin, like there's some actual uncertainty. So your model would totally reasonably express some uncertainty, which is what a Kalman filter would do, right? It would say, the variance is like this. It's very difficult to tell the difference between just that, like a legitimately random phenomenon and a situation where your model wrong so where this is so true yes where it's like oh you know i'm trying to model there's a blinking light and i'm trying to model it as markov so whereas i only pay attention to its

Starting point is 00:58:16 current state in order to predict its next um its next state and it turns out it's not markov it's like you know two blinks and then two, like two on and two off or two on and two off. So my model is just legit wrong. But in its wrongness, it's just going to say, oh, it's random, right? 50-50. And so it's really hard to tell the difference between a coin flip and actual error. And so that's one of the challenges that we face is we want to know when the model is wrong, but maybe we still want to let it be right about things that are really uncertain. Okay. I think I'm getting it, but I also think maybe my brain is starting to expand or explode. So I have more questions, but I think

Starting point is 00:59:08 actually we should probably ask easier questions. Christopher, do you have any easier questions? Yes. Is this something that would be fun to play with for a somewhat tuned-in person who's done some machine learning, like training to play

Starting point is 00:59:23 simple video games, or is this like don't bother, it's too hard. Don't leave it at the professionals. Oh, I think reinforcement learning, just some basic reinforcement learning stuff is totally accessible, especially to someone who's got a little bit of machine learning experience. For instance, in my undergraduate AI class,

Starting point is 00:59:46 I mean, of course, there's some lead up to this, but we spend maybe two or three weeks just like getting the basics of reinforcement learning down. So I think, you know, getting to a core algorithm like Q-learning is totally within reach for someone who wants to do it. And it is super fun to just apply it to some simple problems and see it learn to do things. It never stops being fascinating. If you want to apply it to video games like Atari, then the most successful methods use deep nets. So it's, again, totally within reach. There are existing implementations out there that you can just mess with. But it becomes a lot more computationally intensive to run that stuff.

Starting point is 01:00:35 And not everyone's going to have the patience to train it for days in order to see the results. Okay. So don't try to apply it to Animal Crossing to start with. Oh, don't do that. Are there any resources for starting to look at that? I mean, I know there's a big gaming or video game community around reinforcement learning because it's a small-ish problem. But I don't, are there, where would we get started? With reinforcement learning in general?

Starting point is 01:01:12 Yeah, if we wanted to do something small and fun. So like I said, I think Sutton and Bartow is a great resource that reinforcement learning introduction. There are also some really good online lectures out there. Um, David Silver put out a sort of free set of video lectures about reinforcement learning that a lot of people like. So those are, I think, both good places to start. Cool. Are you enjoying your time at Mudd? I really am. It's a great place. Cool. Of course, that was the only answer. We would have cut anything else.

Starting point is 01:01:46 Of course, yeah. In fact, I already said the other thing and you cut it out and put that in. Erin, do you have any thoughts you'd like to leave us with? Oh, I don't know. I think this conversation is just making me revel in how weird and complicated the universe is. And so, like, let's just go all out and try to understand it. It's a big order, but we can try. We can try. That's all we can do.

Starting point is 01:02:16 Our guest has been Aaron Tavolte, Associate Professor of Computer Science at Harvey Mudd College. Thanks, Aaron. Thank you. Thank you. Thank you to Christopher for producing and co-hosting. Thank you to Daryl Young for introducing me to Erin. Also, thank you to our Patreon supporters for Erin's mic. And of course, thank you for listening. You can always contact us at show at embedded.fm or hit the contact link on Embedded FM.

Starting point is 01:02:44 And now a thought to leave you with. We didn't talk about it today, but it's still scary times in the world. Take care of yourself. I really mean it. Embedded is an independently produced radio show that focuses on the many aspects of engineering. It is a production of Logical Elegance, an embedded software consulting company in California. If there are advertisements in the show, we did not put them there and do not receive money from them. At this time, our sponsors are

Starting point is 01:03:17 Logical Elegance and listeners like you.

Embedded - 326: Wrong in the Right Way

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.