ACM ByteCast - Andrew Barto and Richard Sutton - Episode 80

Starting point is 00:00:01 This is ACM Bytecast, a podcast series from the Association for Computing Machinery, the world's largest education and scientific computing society. We talk to researchers, practitioners, and innovators who are at the intersection of computing research and practice. They share their experiences, the lessons they've learned, and their visions for the future of computing. I am your host, Rashmi Mohan. In a world where machines are beginning to learn, not a world where machines are beginning to learn, not just from data, but from experience, two scientists quietly wrote the playbook that made it all possible. They didn't just invent algorithms. They explored one of the oldest questions in both

Starting point is 00:00:43 science and philosophy. How does learning happen? Their work laid the foundation for reinforcement learning, the idea that intelligence at its core emerges from trial, error and reward, sitting with two of the most thoughtful minds in AI history whose ideas have shaped AlphaGo to chat GPD, please help me welcome the 2024 Turing Award winners who need no other introduction. Richard Sutton and Andrew Bartow, welcome to ACM Bytecast. Thank you very much. Thank you. Wonderful.

Starting point is 00:01:16 So I'd like to lead with a simple question that I ask all my guests. If you could please introduce yourself and talk about what you currently do, as well as give us some insight into what drew you into this field of work. I'm Richard Sutton. I have always done research and reinforcement learning. I still am a professor at the University of Alberta, and I'm a research scientist at a startup called Keene Technologies. I'm still trying to fully understand how learning happens and how learning can play a role in the larger architecture of a mind.

Starting point is 00:01:55 Maybe I'll stop there. Right, and I'll take a turn. I'm Andy Bartow. I am a professor emeritus at the University of Massachusetts in Amherst. I've been retired for about 13 or 14 years. Rich was my first PhD student. We worked together for a number of years, and I currently try to keep up a little bit with what's going on, but even for young people, it's impossible to keep up with the pace of development in artificial intelligence. In the old days, I was current in what was going on, but I have not been able to do that. So I spend my time, well, doing whatever I want, basically, because I don't have to go to faculty meetings. I don't have

Starting point is 00:02:47 to write proposals. And I'm enjoying being retired. So I have given a few talks lately, and the title has been, what is so interesting about reinforcement learning? And the reason I came up with that title is that the basic idea of reinforcement learning is very, very old and common sense, learning from the consequences of our behavior, of our actions, good or bad, and so on. So the question is, why is it playing such a role in AI these days? And why did Rich and I get this award because this is a very old subject, and I guess we sort of rediscovered it. I'm not quite sure what the correct word would be, but we've created a way of thinking about it computationally. I think that's kind of the essence of it. Thank you. That's a wonderful summary. And sounds like

Starting point is 00:03:44 when you say that you do whatever you want, it sounds like a great place to be. We're all inspiring to get there. Andy, let me ask you one more question. And before I go back to Rich, I mean, if you go back all the way, what drew you into computer science to start with? When you were a student, what was it that you were aspiring to study and how did you stumble upon or was it a very conscious decision to pick computer science? Well, I started out as an undergraduate. My father was a mechanical engineer. And my mother was an artist and a registered nurse. And so I really wanted to do something that combined art and engineering. So I went into it. naval architecture and marine engineering. And I lasted a year in that and became less interested in it

Starting point is 00:04:32 and switched to mathematics. So I basically was raised to be an engineer, but I liked math, was reasonably good at it. And so my undergraduate degree from University of Michigan was in math. and along the way, I stumbled on things that really interested me, in particular the idea of cybernetics, which was introduced back in the 50s, and the idea of system theory, studying societies or organizations as systems, in particular mathematical systems. So I really was motivated by looking at systems from a mathematical point of view. And of course, that led to being interested in what we now call computational neuroscience because the models that people were creating were mechanisms and systems.

Starting point is 00:05:31 So I got very interested in that. My master's degree in PhD are in computer science, but I was atypical as a computer scientist. I just took two courses in programming. I never liked to program. I could never sleep after coding. But I was more interested in more biologically directed applications of math and computing as well. But I was not interested in a lot of the typical subjects that computer scientists were doing.

Starting point is 00:06:05 So basically, engineering, I'm kind of an old school engineer in the way I, think about things. And I've been so fortunate to have so many wonderful students along the way that, you know, this area of reinforcement learning really came into its own because of the efforts of a lot of very talented students and colleagues. Fascinating. Thank you for sharing that. I do want to come back to that. But Rich, I'd have the same question of you. What brought you into computer science? Was that a very conscious choice? Well, I think from as early as I can remember, I've been fascinated by the puzzle of the mind and how it works and how my mind works, kind of being centered on oneself, introspecting when

Starting point is 00:06:57 you're growing up. What am I? How does it work? How is it that I can see and get a sense of the world? This is, I think, an obvious intellectual puzzle for everyone as a try to understand their world. Then I went to school, of course, and I heard about computers, and computers were often described as, like, you know, giant brains. So I wanted to find out about them,

Starting point is 00:07:22 and then when I did find out about them, they were not at all like brains or minds, because they only would do exactly what you told them to do. And nothing more, and that could never be a mind, I thought. But then I thought, again, you know, maybe if you programmed it the right way, it could be a mind. And after all, what could a mind be?

Starting point is 00:07:40 a machine that's processing its input and producing decisions, it couldn't possibly be a machine, and yet what else could it be? So it was like it has to be true, but it couldn't be true. And I was always struck by that puzzle. And then I found, you know, various fields that would look more deeply in psychology and in artificial intelligence. And so when I went to college, I was trying to figure out, you know, shall I be a brain scientist or shall I be an AI researcher? And I don't know, maybe I got a disappointing grade in one of my neuroscience courses. And so I decided that, oh, that wasn't any good. I'm just going to try to do it from the AI side. That's a, yeah, good, good decision. Obviously, it led to a very fulfilling and successful career.

Starting point is 00:08:29 I'm curious. I know that, Rich, you were a student of Andes. You collaborated together, of course. And it sounds like when, you know, at the time when you both decide, I don't know how that decision came about to say focus on reinforcement learning. Was it that, you know, was it something that you pick when the rest of the world was kind of not giving it the attention that it deserved? What made you sort of stay embedded in that field? And the question is to both of you, whoever wants to go first. It's a lovely question. It's very, very appropriate. I feel that we did, what did Andy say, rediscover the,

Starting point is 00:09:05 topic. We had some help. And it's really what we did together. Like when we both went to the, I guess we did both go to the University of Massachusetts. Andy went a good year or so before me. When we arrived. I was a postdoc when you arrived. That's right. So we engaged with this puzzle. Something was missing and we eventually came to call it reinforcement learning. Why don't you explain what that journey was like, Andy? Okay. Well, I began as a postdoc, and the year was 1977 at the University of Massachusetts,

Starting point is 00:09:44 and I was hired for this project whose objective was to evaluate and assess this very unorthodox idea that neurons, the basic computational unit of brains, the idea was that neurons were themselves goal-directed organisms or hedonists, for example, that they actually would learn on the basis of the consequences of their activity. So as if they were little microscopic animals that behaved in order to achieve reward or to avoid penalty. And our project was this idea worth studying. It was

Starting point is 00:10:26 an idea forward by someone else. And the challenge was, is this crazy idea worth studying either from a scientific point of view or from a technological point of view? And we delved into it. And it was a wonderful period. I didn't have to teach. I didn't have to go to faculty meetings. And we really were able to look at the history of these ideas. And we could, concluded, I guess it's not surprising, but we concluded that it is interesting and it really has been neglected. A lot of reasons it was neglected in AI, the idea of a neural network back then was just not, you know, it was considered to be a dead end and reviled, actually. And from a neuroscience point of view, neuroscience, I mean, a key idea for synaptic or learning in the brain was

Starting point is 00:11:26 Hebs hypothesis about how synapses change their efficacies or weights. And what we were proposing was that neurons are much more complicated than that. They actually have local memory. They can remember what they did until some consequences come back through various feedback loops. And that could take several seconds, perhaps, and not instantaneous. So it was a wonderful period of exploration. and eventually led to mathematical and computational implementations of these ideas, and then lots of very bright people got involved in it.

Starting point is 00:12:07 So it evolved over the years, and I'm amazed now that it's actually playing a role in current AI. But I was mostly driven because nobody else was studying this. So I felt kind of like a pioneer, even though the subject had been discussed intensely over many many years, but it really wasn't a current subject of exploration. And so I was motivated by the fact that it was not a common direction of study. And actually, I guess as I've retired, you know, I'm less motivated because now so many other people are doing it. So it's not so exciting as it had been in the early days. That is an incredible insight. And maybe I'll ask Rich this follow-up question.

Starting point is 00:12:59 I have two follow-up questions, actually. One is it sounds like the nature of the work was also very required interdisciplinary collaboration with folks who are studying neuroscience or biology or the human mind. So did you get those opportunities? Did you seek out those opportunities at that time to kind of further your research? And the second question is around research itself, the fact that you pursue a topic that is, hey, nobody else is looking at this, so I want to pursue it, is that the fundamental sort of the way that a researcher thinks about a problem? Or is it usually with like, you know, some sort of line of sight into how this can be applied or used or there's a problem to be solved? Let's take that.

Starting point is 00:13:43 I think it is maybe the way a researcher should think, but it's not the way we usually. do it. We usually have something particular in mind and we're usually within our discipline. But first, the fact that back then, when we were trying, we had this idea that came to us from this fellow Harry Klopp that brought the idea to us and set us to work on it. We were trying to make sense of it. We were trying to find out if it had already been explored. And we eventually decided it hadn't been previously explored. To just, side of thing like that, you can't just go into one field. You have to go into all the fields that might have explored it. So it was sort of unnecessary just from the topic and the conclusion we

Starting point is 00:14:33 were reaching that it hadn't been studied that we had to go study many different fields. So many different fields means like control theory and like operations research, AI, of course, cybernetics, any mathematics of sequential decision making. Now, We make the brief statement that hadn't been explored. Well, of course, nothing is totally new under the sun, as Andy taught me and still tells me all the time. Nothing is totally new under the sun. There are always precursors.

Starting point is 00:15:04 So we had to find the precursors and put them in the proper place and then remark and set in prominence what had been insufficiently studied. So it was very interdisciplinary. In particular, I brought the element of psychology in. So psychology, you know, the animal learning theorists have thought probably more than anybody about how learning works. And it was already out of fashion when I was studying.

Starting point is 00:15:34 But it was very relevant and it is very relevant. And it's clear if you look at animals that animals do learn by trial and error. Animals and people learn by trial and error by trying things. and seeing what is most pleasurable and most rewarding and avoiding pain. So I think it's a very strong interplay between the psychology and the neuroscience and the control theory of its various forms and the artificial intelligence. God, thank you. Andy, did you have something to add?

Starting point is 00:16:06 Let me add to it. Yeah, so I studied psychology as an undergraduate, but it was not a, animal learning theory. It was social psychology, I guess, and I really didn't pursue it because didn't really grab me. Since then, I've talked to many psychologists, and Rich came to the project with a background of psychology, and I'm not a neuroscientist either. I'm not a psychologist or a neuroscientist, not sure how you would characterize me, but I've been fascinated by mechanisms, I guess, You know, as a mechanical engineer's son, I always was dealing with motors and pumps and devices, mechanical devices.

Starting point is 00:16:57 And then eventually, in graduate school, I studied abstract machines, automata theory. Then, of course, neuroscience is about machines. And so, you know, that all kind of fit together for me. I think also I should add that I'm not very competitive. So getting into a field that nobody else is in, it relieves that problem of having to compete. So when things get competitive, I tend to go in a different direction. Before we leave the subject of the interdisciplinary things,

Starting point is 00:17:36 this might be a good chance to remark on the enormous, impact or interchange. There's been between the neuroscience and reinforcement learning as a computational endeavor. So it's a striking parallel between some of the reinforcement learning algorithms and the apparent workings of the brain and the brain reward systems and the computer reward systems. They work very, very similarly. I really actually let Andy, because he's developed this more than I have. But these are striking parallels. Well, yeah, so that to me is one of the most exciting things. It's actually your thesis topic developed an algorithm or a class of algorithms called temporal difference algorithms that are used in computational systems, and then later,

Starting point is 00:18:31 a neuroscientist found data that really fit that algorithm very closely. And this is Wolfram Schultz's recording from dopamine neurons, from awake-behaving monkeys. And it was an uncanny correspondence. And when Rich developed that algorithm, those data didn't exist. This was a computational creation, and it turned out that it matched quite closely to what Schultz was recording when recording from dopamine cells. And so this has changed neuroscience of, reward systems. And it's quite exciting that that confluence happened. So that was one of the most motivating things that I came across in all of those years. I think this correspondence is not completely to be unexpected because the problems that we were looking at, problems of learning,

Starting point is 00:19:36 animals had to deal with those same problems. And so the solutions, it's not completely far-fetched that the solutions were somewhat closely related. So all of that was very exciting. Yeah, so, yeah, go ahead. Sorry, you know, I was going to say, I mean, it's great serendipity that that data fit the model so well. ACM Bytecast is available on Apple Podcasts,

Starting point is 00:20:02 the Google Podcast, Podbean, Spotify, Stitcher and Tune In. If you're enjoying this episode, please subscribe and leave us a review on your favorite platform. Before we go further, I was wondering if maybe for our audience, you could maybe briefly explain what is temporal difference learning in simple terms. Why was it such a breakthrough? That would be for rich, I think. Temporal difference learning, it's a little bit of a fancy name. temporal difference, temporal means time, and a difference over time is just a change. And so the idea of temporal difference learning is that if you look at your predictions, you're predicting that something is going to happen, and how does your prediction change over time?

Starting point is 00:20:51 And then you use the change in your prediction as an error. So if you're playing, I don't know, you're playing chess and you think you're going to win, and then a little bit later, a few moves later, you no longer think you're going to. going to win, you can learn from that temporal difference, even though the game isn't over yet. Maybe you will end up winning, but still you can learn as you go along and that that's a key to be able to learn without waiting for the final thing. And we show that in many cases that you can actually learn better if you don't wait for the final result, but just learn from the change in your predictions. And what they found in the brain is that there's a signal in the brain that

Starting point is 00:21:32 corresponds to the surprise of the prediction. You know, if you are surprised because you changed your mind, you thought it was something, one thing would happen, and then a little bit later you thought it would come out differently. You can learn from the surprise of that change. That's the basic idea of temporal difference. Let me add to something. So I tend to think of that method.

Starting point is 00:21:53 It's an error correction method. So in some sense, it's not per se reinforcement learning. it's a component of a reinforcement learning system. It's not learning from consequences, pluses or minuses. It's learning from errors, and that's a supervised process. But it works together with a reinforcement learning system, and it's turned out to be quite effective in a number of applications. Yeah, so one way to express that simple way to understand it,

Starting point is 00:22:24 is that life and a mind involves both prediction and control. Maybe the ultimate thing is we're interested in control. We want to do things that have good outcomes. But in order to have good controls, you have to make good predictions that they help you to control well. So this was also, I think maybe you can speak to this. Rich was motivated also by animal learning, the idea of a secondary reinforcement system. Is that correct? Yeah, that's exactly right.

Starting point is 00:22:57 you can kind of almost see the algorithms in the animal learning ideas from back, way back. And I think it's a very intuitive thing. It's like, let's say you're driving your car and you do something and then you almost have a crash. But you don't. Now, if you can learn from your predictions, you can say, oh, I did something and then I thought maybe it was going to crash and then I got out of it. And so you can learn two things. You can learn not to do what got you into that fearful state, and you can learn to reinforce and repeat what got you out of it.

Starting point is 00:23:33 But if you're waiting for the final outcome, the final outcome is, well, you didn't have a crash. So maybe it was just fine, you know? This is a very common-sensical idea, really. I mean, that's actually a great example. Going back to the chess example, Rich, you know, I understand what you mean by saying with every move that you make, you're learning something, you possibly made an error, and you know, you're correcting your

Starting point is 00:23:59 path. So the part that you take could vary based on what you've learned so far. How do you measure the efficacy of learning? Is it that, hey, this first chess game, I played, I lost, but I learned an amount, and then the next chess game, I'm doing slightly better. Like, what is that measure to say that that person is actually learning and learning more and getting better outcomes? Well, we learn all the time. I think it's sort of obvious, although modern AI practice doesn't work that way. Modern AI practice often separates learning from behavior. But I think it's clear that animals and people, we learn all the time. And it doesn't mean that we're, you know, it's not monotonic improvement. Different things can happen. And you can't reduce it to a single number, like how much you know at a particular moment in time. You can't. You can't, you can't reduce it to a single number, like how much you know at a particular moment in time. You can't. You can't, you can't, you can't, you can can just keep learning and hope that you are getting, you know, on average and long run better and better. Got it.

Starting point is 00:24:59 That's a fair point, especially with learning, because there's no sort of end to it. You continue to learn every day that you live. So there's definitely that aspect. Going back to one other point, is there, say, an insight from neuroscience or biology or by observing animals that we as computer scientists underestimate today? I think so. Yeah, I agree. I think we've just scratched the surface of what animals are able to do.

Starting point is 00:25:31 Is that work that's actively happening, do you think, in the research community? I think it's strange how little there is directly overlap between the computational work and the psychological and neuroscience works. Okay. Looking back at your sort of your research career, was there like a single moment, like an aha moment or a surprising discovery that you made that stands out for you? Hmm. Thinking back. Well, this connection between TD learning and the dopamine system, but I think I had retired by that time. I don't remember. So it wasn't really motivating for me as a, active researcher. But in those early days, I think what was interesting to me was finding that there was a lot of confusion, thinking that error correction is the same as reinforcement learning, learning from errors which supervised learning, which is the most prevalent sort of machine

Starting point is 00:26:39 learning that is being used, equating that with reinforcement learning. And even, you know, renowned researchers, I'd seen that mistake. And so trying to explain why this is a mistake and what is the correct way of thinking about these things was, you know, that was motivating for me. Maybe not the most. I'm not sure if I can put my finger on. It was a long period of time. I think just getting a new grant was a great moment. To get several new grants. mostly from the National Science Foundation. It's basic research. Yeah, but what were the most outstanding?

Starting point is 00:27:24 I can't, it was all, it's all, it was so long ago. I think our work was slow and we were gradually figuring out and it was just, it was mostly a deepening of understanding. Things were, would fall into place, we'd say, yeah, that's what's going on, or it's only this and nothing else. Perhaps that's right, and then that should get confirmed over times we didn't find other things.

Starting point is 00:27:54 Now, there were events that were done mainly by others where, like, we all know about AlphaGo. That was so striking. That was a big step. Well, I think before that was the backgammon system that Jerry Tosaro did at IBM. Yeah, that was very impressive. I'm tending to think of things that other people have done.

Starting point is 00:28:19 You know, even Q learning was just very nice. It was consistent with what we were doing, but it was a nice development. I think it also aligns with the spirit of what you do. I mean, learning happens over a period of time, and that also explains why there is so much longevity in the relevance of the research even today, which leads me to my next question, which is, is there a modern application of reinforcement learning

Starting point is 00:28:44 that you think sort of best represents the spirit of your early research? It would be like Alpha Zero. Alpha Zero was, you know, a game learning system that would play any of a wide variety of two-player games,

Starting point is 00:28:59 and it didn't require much help other than knowledge of the rules of the game. Well, I tend to think of robot applications. Mm, that's good. Now we're beginning to see very impressive. movement abilities of often humanoid robots, but other kinds of robots too. And I think in many cases, they're using reinforcement learning to adjust the movement algorithms that are implemented by robots.

Starting point is 00:29:32 I've always been interested in motor control and how the brain does that. And I think reinforcement learning is involved in how we learn to move. And it is being used by a, people developing robots. That's a really good one. Yeah, that's pretty fascinating. Do you have any thoughts on how you imagine the relationship between, you know, humans and reinforcement learning-based systems or products or robots, for example, will evolve, say, over the next 20 years?

Starting point is 00:30:05 Any thoughts around that at all? We each have thoughts, maybe a bit different. I think it's, you know, it will be a fulfillment of the, eternal striving of people to understand themselves and to make themselves work better. And so this is what will happen. This will continue, and in the next 20 years, within that time frame, there's a very good chance that we will understand our minds to be able to recreate the basic functions. And I just find it so exciting.

Starting point is 00:30:37 It also makes people concerned or worried, but I don't know, I'm sitting here. I've just brought up that understanding is good and if we understand our minds, that will be a good thing. Some people might do bad things with it, but it's basically a good thing if we understand our minds. It will cause us to have to rethink a lot of things,

Starting point is 00:30:57 who we are, what we can do, what we want to do. And I think this could be very exciting. So my views are a little bit different. You know, there are concerns about dangers of applying some of these algorithms. And I guess for me, the dangers of reinforcement learning go way, way back. They've been talked about with regard to algorithms that optimize.

Starting point is 00:31:22 So there is a concern that if you ask a system to maximize some measure, you don't know a priori what it's going to come up with. And so you have to do it with care. But on the other hand, optimization has been extremely beneficial for lots and lots of applications. And it just needs to be, if you release an agent, a self-improving agent that uses reinforcement learning,

Starting point is 00:31:50 you need to have constraints that prevent it from doing things you don't want it to do. So that's a concern. And I know people are working to deal with these issues, but then in terms of what's going to happen, I have no idea. I mean, the field is moving so quickly and I'm not connected to it in an intimate way,

Starting point is 00:32:13 but I do have concerns that good engineering practices are not necessarily being followed. That's a whole other subject. I certainly agree with Rich understanding the mind, the brain. I mean, that's a useful thing, but not everyone is headed toward that objective. Yeah, no, I think it's a fair point. I think having those guardrails in place

Starting point is 00:32:36 to ensure that we're used. or building products and technology that is for the general benefit of society. But you're right. I don't know if there is a body out there that is setting policy and then there is adherence to it. So that's certainly something for all the researchers and practitioners out there to think about. What would you tell a young researcher maybe who's starting out in AI, like how should they be thinking about, you know, failure, persistence, curiosity and, you know, safety, ethics, etc. Do you have specific thoughts? Like what guidance would you give and a young researcher in this field?

Starting point is 00:33:13 Well, let me tackle that. I think in science, and actually especially in AI research, fashion is really quite prevalent. So currently now large language models are very fashionable. Lots and lots of people pouring effort into that. But I've seen fashions arise and decay over the years in artificial intelligence. And so I advise students not to follow the fashions, but to follow their passion, their interests. I mean, we hear from Rich that he's been interested in understanding the mind better. I think intrinsic interests should be driving one's choices of what to do rather than fashion. I'm not a fashionable guy. Good advice, though.

Starting point is 00:34:07 So I agree completely. And I like to think that the most important things a young person will ever do will be things that are really almost obvious to them now, but they don't realize that other people don't see them. That's how I felt about reinforcement learning reinforcement. Obviously, you know, animals have to try things and do what feels good and avoid what feels bad. Such an obvious thing, and yet it was not a field.

Starting point is 00:34:35 and maybe our contribution was just to recognize that obvious thing and develop it. And so I think that's a general strategy. You can always look for the obvious thing and develop it. The obvious thing that people are not paying attention to. So like what did Charles Darwin did? He just looked around and said, oh, look, basically people are just animals. And I mean, which is obvious if you look at them, we all have, you know, hearts and lungs and skin and muscles. so we are obviously animals, and we evolve from them, and things like that.

Starting point is 00:35:09 So you want to see the obvious that is not recognized in your field and that other people aren't seeing stick to it. This makes me think of, I mean, Rich has heard me talk about this, but if you look at the history of machine learning, even before computers existed in the 30s, they were building these electromechanical systems that learned using reinforcement learning. And in the early days of implementing learning systems on computers, they were reinforcement learning systems.

Starting point is 00:35:40 And then at a certain point, it switched to doing what we now call supervised learning. And there are lots of reasons for that. But one was that the cognitive revolution in psychology kind of made this basic type of learning really not something that was worthy of study anymore. So I think the history of psychology had a lot to do with extinguishing computational efforts to implement reinforcement learning systems. So it's fashions, again. The fashions are a very big effect. And so like in psychology, animal learning theory was fashionable for some years.

Starting point is 00:36:23 And then like in the 60s and the 70s it became out of fashion. Reinforcement learning has varied in fashions. neural networks have tremendously varied up and down at least three times. You have to look past that. And so I would, learning from experience, you know, that's the quickest way to understand what reinforcement learning is about, a system that can just interact with the world and learn from that without getting instructions from a teacher, but just get experiences from the world, things that feel good and things that feel bad, and learning from them.

Starting point is 00:36:57 And so this is an obvious idea. The first mention of it I like to mention was Alan Turing in 1947, before there even was a field of artificial intelligence. He gave a talk and spoke the words that what we really want is a machine that can learn from experience. So that was at the very beginning, and then, you know, Annie and I kind of reawoke it about 1980. And then it's, you know, things swing forward and back. Right now, reinforcement learning is only sort of half in favor because the most important things are large language models, and they don't learn from experience. They don't really have experience. A lot of people are using reinforcement learning to improve large language models.

Starting point is 00:37:43 Yeah, they're losing it inside it, and very often it's not like an evaluation. It's like they will do things and then a person will say that's good. So it's not the world telling you that had a good outcome. It's just they require a person to be there. Well, that's one form. There are others that they're, I don't, I'm not, no. You're right. I mean, they're doing lots and lots of different things. But one thing they're not doing is they're not learning from the experience,

Starting point is 00:38:12 answering questions and interacting with the people. They're not learning from their ordinary life. They learn from a pre-training phase and from a fine-tuning phase and then when the systems are are deployed and interacting with people, then they're no longer learning. So they really do not learn from experience. And so that feels a little bit of ashamed me,

Starting point is 00:38:35 and I think that will be fixed over the years. Yeah, so I mean, if you want to look for something that is obvious and needs to be done, you know, it's still learning from experience because today's most refined AI systems do not learn from their experience. Thank you. Yeah, that sounds,

Starting point is 00:38:51 I mean, that's super insightful. I think going back to what both of you said at the crux of it was, you know, to follow your interest, follow your passion, not necessarily just the flavor of the day. And I actually spent some amount of time listening to your interviews in preparation for this conversation. And so many of your students and colleagues speaking about the time that you've spent with them, helping them understand concepts, uncover the next question that they should pursue, just speaks volumes of kind of influence. and impact that you've had on so many young researchers. So I would just want to ask you sort of one last question for our final bite, which is who are Andy and Rich when they're not computer scientists? Tell us one thing that you've not shared before. Well, I haven't been a computer scientist for quite a few years. Carpenter. Carpenter. I'm still a mechanical engineer, I guess.

Starting point is 00:39:49 It's not officially an engineer, but mechanical things. And carpentry, I collect power tools and try not to injure myself with them. And otherwise, I read all the time. I read novels. I am trying to understand some things about physics that I've never understood. I find that nobody understands them, so it's not such a problem. I guess now that I'm retired, I'm not constrained and I'm not doing forced reading. I make my own choices as to what I read.

Starting point is 00:40:29 And I can download books. It's very easy. But besides a computer scientist, I had two older sisters. They didn't have PhDs, but they were both doing computing. So I think there's a genetic component in me and my sisters that somehow makes dealing with computing. interesting. So I think I'm a slave to my genetics, I guess. I like your answers, Andy. And I feel a little bit like that myself. I try to get broader studies. You know, maybe I like to think I'm a little bit of a philosopher, even though, because I like all those questions, although I don't think I

Starting point is 00:41:11 memorize enough the names to be a proper philosopher. But I like the ideas. And I toy with that. And sometimes I do speak about it because what we're doing is a great thing of understanding the mind and that has so many dimensions and impacts so many different parts of philosophy. We're all trying to understand the world and find simple ways to communicate it to others and think about it. And I try to get that from the books I read, whether they're physicists like David Deutsch or science fiction writers like Ian Banks. it's all trying to understand our place in the universe. Yeah, it's been an absolute pleasure talking to both of you. Thank you so much for taking the time to speak with us at ACM Bytecast.

Starting point is 00:41:56 Well, thank you very much. Thank you. It's been a pleasure. ACM Bytecast is a production of the Association for Computing Machineries Practitioners Board. To learn more about ACM and its activities, visit acm.org. For more information about this and other episodes, please visit our website at learning.acm.org slash B-Y-T-E-C-A-C-A-S-T. That's learning.acm.org slash bytecast. This podcast was edited by Resonate Recordings.

ACM ByteCast - Andrew Barto and Richard Sutton - Episode 80

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.