ACM ByteCast - Andrew Barto and Richard Sutton - Episode 80
Episode Date: January 14, 2026In this episode of ACM ByteCast, Rashmi Mohan hosts 2024 ACM A.M. Turing Award laureates Andrew Barto and Richard Sutton. They received the Turing Award for developing the conceptual and algorithmic f...oundations of reinforcement learning, a computational framework that underpins modern AI systems such as AlphaGo and ChatGPT. Barto is Professor Emeritus in the Department of Information and Computer Sciences at the University of Massachusetts, Amherst. His honors include the UMass Neurosciences Lifetime Achievement Award, the IJCAI Award for Research Excellence, and the IEEE Neural Network Society Pioneer Award. He is a Fellow of IEEE and AAAS. Sutton is a Professor in Computing Science at the University of Alberta, a Research Scientist at Keen Technologies (an artificial general intelligence company) and Chief Scientific Advisor of the Alberta Machine Intelligence Institute (Amii). In the past he was a Distinguished Research Scientist at Deep Mind and served as a Principal Technical Staff Member in the AI Department at the AT&T Shannon Laboratory. His honors include the IJCAI Research Excellence Award, a Lifetime Achievement Award from the Canadian Artificial Intelligence Association, and an Outstanding Achievement in Research Award from the University of Massachusetts at Amherst. Sutton is a Fellow of the Royal Society of London, AAAI, and the Royal Society of Canada. In the interview, Andrew and Richard reflect on their long collaboration together and the personal and intellectual paths that led both researchers into CS and reinforcement learning (RL), a field that was once largely neglected. They touch on interdisciplinary explorations across psychology (animal learning), control theory, operations research, cybernetics, and how these inspired their computational models. They also explain some of their key contributions to RL, such as temporal difference (TD) learning and how their ideas were validated biologically with observations of dopamine neurons. Barto and Sutton trace their early research to later systems such as TD-Gammon, Q-learning, and AlphaGo and consider the broader relationship between humans and reinforcement learning-based AI, and how theoretical explorations have evolved into impactful applications in games, robotics, and beyond.
Transcript
Discussion (0)
This is ACM Bytecast, a podcast series from the Association for Computing Machinery,
the world's largest education and scientific computing society.
We talk to researchers, practitioners, and innovators who are at the intersection of computing research and practice.
They share their experiences, the lessons they've learned, and their visions for the future of computing.
I am your host, Rashmi Mohan.
In a world where machines are beginning to learn, not a world where machines are beginning to learn, not
just from data, but from experience, two scientists quietly wrote the playbook that made it all
possible. They didn't just invent algorithms. They explored one of the oldest questions in both
science and philosophy. How does learning happen? Their work laid the foundation for reinforcement
learning, the idea that intelligence at its core emerges from trial, error and reward,
sitting with two of the most thoughtful minds in AI history whose ideas have shaped AlphaGo to chat
GPD, please help me welcome the 2024 Turing Award winners who need no other introduction.
Richard Sutton and Andrew Bartow, welcome to ACM Bytecast.
Thank you very much.
Thank you.
Wonderful.
So I'd like to lead with a simple question that I ask all my guests.
If you could please introduce yourself and talk about what you currently do, as well as give us some insight into what drew you into this field of work.
I'm Richard Sutton.
I have always done research and reinforcement learning.
I still am a professor at the University of Alberta,
and I'm a research scientist at a startup called Keene Technologies.
I'm still trying to fully understand how learning happens
and how learning can play a role in the larger architecture of a mind.
Maybe I'll stop there.
Right, and I'll take a turn.
I'm Andy Bartow. I am a professor emeritus at the University of Massachusetts in Amherst. I've been
retired for about 13 or 14 years. Rich was my first PhD student. We worked together for a number of
years, and I currently try to keep up a little bit with what's going on, but even for young people,
it's impossible to keep up with the pace of development in artificial intelligence. In the old days,
I was current in what was going on, but I have not been able to do that. So I spend my time,
well, doing whatever I want, basically, because I don't have to go to faculty meetings. I don't have
to write proposals. And I'm enjoying being retired. So I have given a few talks lately, and the
title has been, what is so interesting about reinforcement learning? And the reason I came up with
that title is that the basic idea of reinforcement learning is very, very old and common sense,
learning from the consequences of our behavior, of our actions, good or bad, and so on. So the
question is, why is it playing such a role in AI these days? And why did Rich and I get this
award because this is a very old subject, and I guess we sort of rediscovered it. I'm not quite sure
what the correct word would be, but we've created a way of thinking about it computationally.
I think that's kind of the essence of it. Thank you. That's a wonderful summary. And sounds like
when you say that you do whatever you want, it sounds like a great place to be. We're all inspiring
to get there. Andy, let me ask you one more question. And before I go back to Rich, I mean,
if you go back all the way, what drew you into computer science to start with? When you were a student,
what was it that you were aspiring to study and how did you stumble upon or was it a very conscious
decision to pick computer science? Well, I started out as an undergraduate. My father was a
mechanical engineer. And my mother was an artist and a registered nurse. And so I really wanted
to do something that combined art and engineering. So I went into it.
naval architecture and marine engineering. And I lasted a year in that and became less interested in it
and switched to mathematics. So I basically was raised to be an engineer, but I liked math,
was reasonably good at it. And so my undergraduate degree from University of Michigan was in math.
and along the way, I stumbled on things that really interested me, in particular the idea of
cybernetics, which was introduced back in the 50s, and the idea of system theory, studying societies
or organizations as systems, in particular mathematical systems. So I really was motivated
by looking at systems from a mathematical point of view.
And of course, that led to being interested in what we now call computational neuroscience
because the models that people were creating were mechanisms and systems.
So I got very interested in that.
My master's degree in PhD are in computer science,
but I was atypical as a computer scientist.
I just took two courses in programming.
I never liked to program.
I could never sleep after coding.
But I was more interested in more biologically directed applications of math and computing as well.
But I was not interested in a lot of the typical subjects that computer scientists were doing.
So basically, engineering, I'm kind of an old school engineer in the way I,
think about things. And I've been so fortunate to have so many wonderful students along the way that,
you know, this area of reinforcement learning really came into its own because of the efforts
of a lot of very talented students and colleagues. Fascinating. Thank you for sharing that. I do
want to come back to that. But Rich, I'd have the same question of you. What brought you into computer science?
Was that a very conscious choice?
Well, I think from as early as I can remember, I've been fascinated by the puzzle of the mind
and how it works and how my mind works, kind of being centered on oneself, introspecting when
you're growing up. What am I? How does it work? How is it that I can see and get a sense of
the world? This is, I think, an obvious intellectual puzzle for everyone as a
try to understand their world.
Then I went to school, of course,
and I heard about computers,
and computers were often described as,
like, you know, giant brains.
So I wanted to find out about them,
and then when I did find out about them,
they were not at all like brains or minds,
because they only would do exactly what you told them to do.
And nothing more, and that could never be a mind, I thought.
But then I thought, again, you know,
maybe if you programmed it the right way,
it could be a mind.
And after all, what could a mind be?
a machine that's processing its input and producing decisions, it couldn't possibly be a machine,
and yet what else could it be? So it was like it has to be true, but it couldn't be true. And I was
always struck by that puzzle. And then I found, you know, various fields that would look more deeply
in psychology and in artificial intelligence. And so when I went to college, I was trying to
figure out, you know, shall I be a brain scientist or shall I be an AI researcher? And I don't know,
maybe I got a disappointing grade in one of my neuroscience courses. And so I decided that, oh,
that wasn't any good. I'm just going to try to do it from the AI side.
That's a, yeah, good, good decision. Obviously, it led to a very fulfilling and successful career.
I'm curious. I know that, Rich, you were a student of Andes. You collaborated together, of course.
And it sounds like when, you know, at the time when you both decide, I don't know how that decision came about to say focus on reinforcement learning.
Was it that, you know, was it something that you pick when the rest of the world was kind of not giving it the attention that it deserved?
What made you sort of stay embedded in that field?
And the question is to both of you, whoever wants to go first.
It's a lovely question.
It's very, very appropriate.
I feel that we did, what did Andy say, rediscover the,
topic. We had some help. And it's really what we did together. Like when we both went to the,
I guess we did both go to the University of Massachusetts. Andy went a good year or so before me.
When we arrived. I was a postdoc when you arrived. That's right. So we engaged with this puzzle.
Something was missing and we eventually came to call it reinforcement learning. Why don't you explain
what that journey was like, Andy?
Okay.
Well, I began as a postdoc,
and the year was 1977 at the University of Massachusetts,
and I was hired for this project
whose objective was to evaluate
and assess this very unorthodox idea
that neurons, the basic computational unit of brains,
the idea was that neurons were themselves
goal-directed organisms or hedonists, for example, that they actually would learn on the basis of the
consequences of their activity. So as if they were little microscopic animals that behaved in
order to achieve reward or to avoid penalty. And our project was this idea worth studying. It was
an idea forward by someone else. And the challenge was, is this crazy idea worth studying
either from a scientific point of view or from a technological point of view? And we delved into it.
And it was a wonderful period. I didn't have to teach. I didn't have to go to faculty meetings.
And we really were able to look at the history of these ideas. And we could,
concluded, I guess it's not surprising, but we concluded that it is interesting and it really
has been neglected. A lot of reasons it was neglected in AI, the idea of a neural network back
then was just not, you know, it was considered to be a dead end and reviled, actually. And from a
neuroscience point of view, neuroscience, I mean, a key idea for synaptic or learning in the brain was
Hebs hypothesis about how synapses change their efficacies or weights.
And what we were proposing was that neurons are much more complicated than that.
They actually have local memory.
They can remember what they did until some consequences come back through various feedback loops.
And that could take several seconds, perhaps, and not instantaneous.
So it was a wonderful period of exploration.
and eventually led to mathematical and computational implementations of these ideas,
and then lots of very bright people got involved in it.
So it evolved over the years, and I'm amazed now that it's actually playing a role in current AI.
But I was mostly driven because nobody else was studying this.
So I felt kind of like a pioneer, even though the subject had been discussed intensely over many many years, but it really wasn't a current subject of exploration.
And so I was motivated by the fact that it was not a common direction of study.
And actually, I guess as I've retired, you know, I'm less motivated because now so many other people are doing it.
So it's not so exciting as it had been in the early days.
That is an incredible insight.
And maybe I'll ask Rich this follow-up question.
I have two follow-up questions, actually.
One is it sounds like the nature of the work was also very required interdisciplinary
collaboration with folks who are studying neuroscience or biology or the human mind.
So did you get those opportunities?
Did you seek out those opportunities at that time to kind of further your research?
And the second question is around research itself, the fact that you pursue a topic that is, hey, nobody else is looking at this, so I want to pursue it, is that the fundamental sort of the way that a researcher thinks about a problem?
Or is it usually with like, you know, some sort of line of sight into how this can be applied or used or there's a problem to be solved?
Let's take that.
I think it is maybe the way a researcher should think, but it's not the way we usually.
do it. We usually have something particular in mind and we're usually within our discipline.
But first, the fact that back then, when we were trying, we had this idea that came to us
from this fellow Harry Klopp that brought the idea to us and set us to work on it.
We were trying to make sense of it. We were trying to find out if it had already been explored.
And we eventually decided it hadn't been previously explored. To just,
side of thing like that, you can't just go into one field. You have to go into all the fields that
might have explored it. So it was sort of unnecessary just from the topic and the conclusion we
were reaching that it hadn't been studied that we had to go study many different fields. So many
different fields means like control theory and like operations research, AI, of course,
cybernetics, any mathematics of sequential decision making. Now,
We make the brief statement that hadn't been explored.
Well, of course, nothing is totally new under the sun,
as Andy taught me and still tells me all the time.
Nothing is totally new under the sun.
There are always precursors.
So we had to find the precursors and put them in the proper place
and then remark and set in prominence
what had been insufficiently studied.
So it was very interdisciplinary.
In particular, I brought the element of psychology in.
So psychology, you know, the animal learning theorists have thought probably more than anybody
about how learning works.
And it was already out of fashion when I was studying.
But it was very relevant and it is very relevant.
And it's clear if you look at animals that animals do learn by trial and error.
Animals and people learn by trial and error by trying things.
and seeing what is most pleasurable and most rewarding and avoiding pain.
So I think it's a very strong interplay between the psychology and the neuroscience
and the control theory of its various forms and the artificial intelligence.
God, thank you.
Andy, did you have something to add?
Let me add to it.
Yeah, so I studied psychology as an undergraduate, but it was not a,
animal learning theory. It was social psychology, I guess, and I really didn't pursue it because
didn't really grab me. Since then, I've talked to many psychologists, and Rich came to the project
with a background of psychology, and I'm not a neuroscientist either. I'm not a psychologist or a
neuroscientist, not sure how you would characterize me, but I've been fascinated by mechanisms, I guess,
You know, as a mechanical engineer's son, I always was dealing with motors and pumps and devices,
mechanical devices.
And then eventually, in graduate school, I studied abstract machines, automata theory.
Then, of course, neuroscience is about machines.
And so, you know, that all kind of fit together for me.
I think also I should add that I'm not very competitive.
So getting into a field that nobody else is in,
it relieves that problem of having to compete.
So when things get competitive, I tend to go in a different direction.
Before we leave the subject of the interdisciplinary things,
this might be a good chance to remark on the enormous,
impact or interchange. There's been between the neuroscience and reinforcement learning as a
computational endeavor. So it's a striking parallel between some of the reinforcement learning
algorithms and the apparent workings of the brain and the brain reward systems and the computer
reward systems. They work very, very similarly. I really actually let Andy, because he's developed
this more than I have. But these are striking parallels. Well, yeah, so that to me is one of the
most exciting things. It's actually your thesis topic developed an algorithm or a class of algorithms
called temporal difference algorithms that are used in computational systems, and then later,
a neuroscientist found data that really fit that algorithm very closely. And this is Wolfram Schultz's
recording from dopamine neurons, from awake-behaving monkeys. And it was an uncanny correspondence.
And when Rich developed that algorithm, those data didn't exist. This was a computational creation,
and it turned out that it matched quite closely to what Schultz was recording when recording
from dopamine cells. And so this has changed neuroscience of,
reward systems. And it's quite exciting that that confluence happened. So that was one of the most
motivating things that I came across in all of those years. I think this correspondence is not
completely to be unexpected because the problems that we were looking at, problems of learning,
animals had to deal with those same problems.
And so the solutions, it's not completely far-fetched
that the solutions were somewhat closely related.
So all of that was very exciting.
Yeah, so, yeah, go ahead.
Sorry, you know, I was going to say, I mean,
it's great serendipity that that data fit the model so well.
ACM Bytecast is available on Apple Podcasts,
the Google Podcast, Podbean, Spotify,
Stitcher and Tune In. If you're enjoying this episode, please subscribe and leave us a review on your
favorite platform. Before we go further, I was wondering if maybe for our audience, you could maybe
briefly explain what is temporal difference learning in simple terms. Why was it such a breakthrough?
That would be for rich, I think. Temporal difference learning, it's a little bit of a fancy name.
temporal difference, temporal means time, and a difference over time is just a change.
And so the idea of temporal difference learning is that if you look at your predictions,
you're predicting that something is going to happen, and how does your prediction change over time?
And then you use the change in your prediction as an error.
So if you're playing, I don't know, you're playing chess and you think you're going to win,
and then a little bit later, a few moves later, you no longer think you're going to.
going to win, you can learn from that temporal difference, even though the game isn't over yet.
Maybe you will end up winning, but still you can learn as you go along and that that's a key
to be able to learn without waiting for the final thing. And we show that in many cases that you
can actually learn better if you don't wait for the final result, but just learn from the change in
your predictions. And what they found in the brain is that there's a signal in the brain that
corresponds to the surprise of the prediction.
You know, if you are surprised because you changed your mind,
you thought it was something, one thing would happen,
and then a little bit later you thought it would come out differently.
You can learn from the surprise of that change.
That's the basic idea of temporal difference.
Let me add to something.
So I tend to think of that method.
It's an error correction method.
So in some sense, it's not per se reinforcement learning.
it's a component of a reinforcement learning system.
It's not learning from consequences, pluses or minuses.
It's learning from errors, and that's a supervised process.
But it works together with a reinforcement learning system,
and it's turned out to be quite effective in a number of applications.
Yeah, so one way to express that simple way to understand it,
is that life and a mind involves both prediction and control.
Maybe the ultimate thing is we're interested in control.
We want to do things that have good outcomes.
But in order to have good controls, you have to make good predictions that they help you to control well.
So this was also, I think maybe you can speak to this.
Rich was motivated also by animal learning, the idea of a secondary reinforcement system.
Is that correct?
Yeah, that's exactly right.
you can kind of almost see the algorithms in the animal learning ideas from back, way back.
And I think it's a very intuitive thing.
It's like, let's say you're driving your car and you do something and then you almost have a crash.
But you don't.
Now, if you can learn from your predictions, you can say, oh, I did something and then I thought maybe it was going to crash and then I got out of it.
And so you can learn two things.
You can learn not to do what got you into that fearful state,
and you can learn to reinforce and repeat what got you out of it.
But if you're waiting for the final outcome,
the final outcome is, well, you didn't have a crash.
So maybe it was just fine, you know?
This is a very common-sensical idea, really.
I mean, that's actually a great example.
Going back to the chess example, Rich, you know,
I understand what you mean by saying with every move that you make,
you're learning something, you possibly made an error, and you know, you're correcting your
path. So the part that you take could vary based on what you've learned so far. How do you measure
the efficacy of learning? Is it that, hey, this first chess game, I played, I lost, but I learned
an amount, and then the next chess game, I'm doing slightly better. Like, what is that measure to say
that that person is actually learning and learning more and getting better outcomes?
Well, we learn all the time. I think it's sort of obvious, although modern AI practice doesn't work that way. Modern AI practice often separates learning from behavior. But I think it's clear that animals and people, we learn all the time. And it doesn't mean that we're, you know, it's not monotonic improvement. Different things can happen. And you can't reduce it to a single number, like how much you know at a particular moment in time. You can't. You can't, you can't reduce it to a single number, like how much you know at a particular moment in time. You can't. You can't, you can't, you can't, you can
can just keep learning and hope that you are getting, you know, on average and long run
better and better.
Got it.
That's a fair point, especially with learning, because there's no sort of end to it.
You continue to learn every day that you live.
So there's definitely that aspect.
Going back to one other point, is there, say, an insight from neuroscience or biology or
by observing animals that we as computer scientists underestimate today?
I think so.
Yeah, I agree.
I think we've just scratched the surface of what animals are able to do.
Is that work that's actively happening, do you think, in the research community?
I think it's strange how little there is directly overlap between the computational work
and the psychological and neuroscience works.
Okay. Looking back at your sort of your research career, was there like a single moment, like an aha moment or a surprising discovery that you made that stands out for you?
Hmm. Thinking back. Well, this connection between TD learning and the dopamine system, but I think I had retired by that time. I don't remember. So it wasn't really motivating for me as a,
active researcher. But in those early days, I think what was interesting to me was finding that
there was a lot of confusion, thinking that error correction is the same as reinforcement learning,
learning from errors which supervised learning, which is the most prevalent sort of machine
learning that is being used, equating that with reinforcement learning. And even, you know,
renowned researchers, I'd seen that mistake. And so trying to explain why this is a mistake and what is
the correct way of thinking about these things was, you know, that was motivating for me. Maybe not the most.
I'm not sure if I can put my finger on. It was a long period of time. I think just getting a new
grant was a great moment. To get several new grants.
mostly from the National Science Foundation.
It's basic research.
Yeah, but what were the most outstanding?
I can't, it was all, it's all, it was so long ago.
I think our work was slow and we were gradually figuring out
and it was just, it was mostly a deepening of understanding.
Things were, would fall into place, we'd say, yeah, that's what's going on,
or it's only this and nothing else.
Perhaps that's right,
and then that should get confirmed over times
we didn't find other things.
Now, there were events that were done mainly by others
where, like, we all know about AlphaGo.
That was so striking.
That was a big step.
Well, I think before that was the backgammon system
that Jerry Tosaro did at IBM.
Yeah, that was very impressive.
I'm tending to think of things that other people have done.
You know, even Q learning was just very nice.
It was consistent with what we were doing, but it was a nice development.
I think it also aligns with the spirit of what you do.
I mean, learning happens over a period of time,
and that also explains why there is so much longevity
in the relevance of the research even today,
which leads me to my next question,
which is, is there a modern application of reinforcement learning
that you think
sort of best represents
the spirit of your early research?
It would be like Alpha Zero.
Alpha Zero was, you know,
a game learning system
that would play any of a wide variety
of two-player games,
and it didn't require much help
other than knowledge of the rules of the game.
Well, I tend to think of robot applications.
Mm, that's good.
Now we're beginning to see
very impressive.
movement abilities of often humanoid robots, but other kinds of robots too. And I think in many cases,
they're using reinforcement learning to adjust the movement algorithms that are implemented by robots.
I've always been interested in motor control and how the brain does that. And I think reinforcement
learning is involved in how we learn to move. And it is being used by a,
people developing robots.
That's a really good one.
Yeah, that's pretty fascinating.
Do you have any thoughts on how you imagine the relationship between, you know,
humans and reinforcement learning-based systems or products or robots, for example,
will evolve, say, over the next 20 years?
Any thoughts around that at all?
We each have thoughts, maybe a bit different.
I think it's, you know, it will be a fulfillment of the,
eternal striving of people to understand themselves and to make themselves work better.
And so this is what will happen.
This will continue, and in the next 20 years, within that time frame, there's a very good chance
that we will understand our minds to be able to recreate the basic functions.
And I just find it so exciting.
It also makes people concerned or worried, but I don't know, I'm sitting here.
I've just brought up that understanding is good
and if we understand our minds,
that will be a good thing.
Some people might do bad things with it,
but it's basically a good thing
if we understand our minds.
It will cause us to have to rethink a lot of things,
who we are, what we can do,
what we want to do.
And I think this could be very exciting.
So my views are a little bit different.
You know, there are concerns about dangers
of applying some of these algorithms.
And I guess for me, the dangers of reinforcement learning go way, way back.
They've been talked about with regard to algorithms that optimize.
So there is a concern that if you ask a system to maximize some measure,
you don't know a priori what it's going to come up with.
And so you have to do it with care.
But on the other hand, optimization has been extremely beneficial
for lots and lots of applications.
And it just needs to be,
if you release an agent,
a self-improving agent that uses reinforcement learning,
you need to have constraints
that prevent it from doing things you don't want it to do.
So that's a concern.
And I know people are working to deal with these issues,
but then in terms of what's going to happen,
I have no idea.
I mean, the field is moving so quickly
and I'm not connected to it in an intimate way,
but I do have concerns that good engineering practices
are not necessarily being followed.
That's a whole other subject.
I certainly agree with Rich understanding the mind, the brain.
I mean, that's a useful thing,
but not everyone is headed toward that objective.
Yeah, no, I think it's a fair point.
I think having those guardrails in place
to ensure that we're used.
or building products and technology that is for the general benefit of society.
But you're right.
I don't know if there is a body out there that is setting policy and then there is adherence to it.
So that's certainly something for all the researchers and practitioners out there to think about.
What would you tell a young researcher maybe who's starting out in AI,
like how should they be thinking about, you know, failure, persistence, curiosity and, you know, safety,
ethics, etc. Do you have specific thoughts? Like what guidance would you give and a young researcher in this field?
Well, let me tackle that. I think in science, and actually especially in AI research, fashion is really quite
prevalent. So currently now large language models are very fashionable. Lots and lots of people
pouring effort into that. But I've seen fashions arise and decay over the years in artificial
intelligence. And so I advise students not to follow the fashions, but to follow their passion,
their interests. I mean, we hear from Rich that he's been interested in understanding the mind
better. I think intrinsic interests should be driving one's choices of what to do rather than
fashion. I'm not a fashionable guy.
Good advice, though.
So I agree completely.
And I like to think that the most important things a young person will ever do
will be things that are really almost obvious to them now,
but they don't realize that other people don't see them.
That's how I felt about reinforcement learning reinforcement.
Obviously, you know, animals have to try things and do what feels good
and avoid what feels bad.
Such an obvious thing, and yet it was not a field.
and maybe our contribution was just to recognize that obvious thing and develop it.
And so I think that's a general strategy.
You can always look for the obvious thing and develop it.
The obvious thing that people are not paying attention to.
So like what did Charles Darwin did?
He just looked around and said, oh, look, basically people are just animals.
And I mean, which is obvious if you look at them, we all have, you know, hearts and lungs and skin and muscles.
so we are obviously animals, and we evolve from them, and things like that.
So you want to see the obvious that is not recognized in your field
and that other people aren't seeing stick to it.
This makes me think of, I mean, Rich has heard me talk about this,
but if you look at the history of machine learning,
even before computers existed in the 30s,
they were building these electromechanical systems
that learned using reinforcement learning.
And in the early days of implementing learning systems on computers, they were reinforcement learning systems.
And then at a certain point, it switched to doing what we now call supervised learning.
And there are lots of reasons for that.
But one was that the cognitive revolution in psychology kind of made this basic type of learning really not something that was worthy of study anymore.
So I think the history of psychology had a lot to do with extinguishing computational efforts
to implement reinforcement learning systems.
So it's fashions, again.
The fashions are a very big effect.
And so like in psychology, animal learning theory was fashionable for some years.
And then like in the 60s and the 70s it became out of fashion.
Reinforcement learning has varied in fashions.
neural networks have tremendously varied up and down at least three times.
You have to look past that.
And so I would, learning from experience, you know, that's the quickest way to understand
what reinforcement learning is about, a system that can just interact with the world and learn
from that without getting instructions from a teacher, but just get experiences from the world,
things that feel good and things that feel bad, and learning from them.
And so this is an obvious idea.
The first mention of it I like to mention was Alan Turing in 1947,
before there even was a field of artificial intelligence.
He gave a talk and spoke the words that what we really want is a machine that can learn from experience.
So that was at the very beginning, and then, you know, Annie and I kind of reawoke it about 1980.
And then it's, you know, things swing forward and back.
Right now, reinforcement learning is only sort of half in favor because the most important things are large language models, and they don't learn from experience. They don't really have experience.
A lot of people are using reinforcement learning to improve large language models.
Yeah, they're losing it inside it, and very often it's not like an evaluation. It's like they will do things and then a person will say that's good.
So it's not the world telling you that had a good outcome.
It's just they require a person to be there.
Well, that's one form.
There are others that they're, I don't, I'm not, no.
You're right.
I mean, they're doing lots and lots of different things.
But one thing they're not doing is they're not learning from the experience,
answering questions and interacting with the people.
They're not learning from their ordinary life.
They learn from a pre-training phase and from a fine-tuning phase
and then when the systems are
are deployed and interacting with people,
then they're no longer learning.
So they really do not learn from experience.
And so that feels a little bit of ashamed me,
and I think that will be fixed over the years.
Yeah, so I mean, if you want to look for something
that is obvious and needs to be done,
you know, it's still learning from experience
because today's most refined AI systems
do not learn from their experience.
Thank you.
Yeah, that sounds,
I mean, that's super insightful. I think going back to what both of you said at the crux of it was, you know, to follow your interest, follow your passion, not necessarily just the flavor of the day. And I actually spent some amount of time listening to your interviews in preparation for this conversation. And so many of your students and colleagues speaking about the time that you've spent with them, helping them understand concepts, uncover the next question that they should pursue, just speaks volumes of kind of influence.
and impact that you've had on so many young researchers.
So I would just want to ask you sort of one last question for our final bite,
which is who are Andy and Rich when they're not computer scientists?
Tell us one thing that you've not shared before.
Well, I haven't been a computer scientist for quite a few years.
Carpenter. Carpenter.
I'm still a mechanical engineer, I guess.
It's not officially an engineer, but mechanical things.
And carpentry, I collect power tools and try not to injure myself with them.
And otherwise, I read all the time.
I read novels.
I am trying to understand some things about physics that I've never understood.
I find that nobody understands them, so it's not such a problem.
I guess now that I'm retired, I'm not constrained and I'm not doing forced reading.
I make my own choices as to what I read.
And I can download books.
It's very easy.
But besides a computer scientist, I had two older sisters.
They didn't have PhDs, but they were both doing computing.
So I think there's a genetic component in me and my sisters that somehow makes dealing with computing.
interesting. So I think I'm a slave to my genetics, I guess. I like your answers, Andy. And I feel a little bit
like that myself. I try to get broader studies. You know, maybe I like to think I'm a little bit
of a philosopher, even though, because I like all those questions, although I don't think I
memorize enough the names to be a proper philosopher. But I like the ideas. And I toy with that. And sometimes
I do speak about it because what we're doing is a great thing of understanding the mind and that has so many
dimensions and impacts so many different parts of philosophy. We're all trying to understand the
world and find simple ways to communicate it to others and think about it. And I try to get that from
the books I read, whether they're physicists like David Deutsch or science fiction writers like Ian Banks.
it's all trying to understand our place in the universe.
Yeah, it's been an absolute pleasure talking to both of you.
Thank you so much for taking the time to speak with us at ACM Bytecast.
Well, thank you very much.
Thank you. It's been a pleasure.
ACM Bytecast is a production of the Association for Computing Machineries Practitioners Board.
To learn more about ACM and its activities, visit acm.org.
For more information about this and other episodes, please visit our website at learning.acm.org
slash B-Y-T-E-C-A-C-A-S-T.
That's learning.acm.org slash bytecast.
This podcast was edited by Resonate Recordings.
