Lex Fridman Podcast - #144 – Michael Littman: Reinforcement Learning and the Future of AI
Episode Date: December 13, 2020Michael Littman is a computer scientist at Brown University. Please support this podcast by checking out our sponsors: - SimpliSafe: https://simplisafe.com/lex and use code LEX to get a free security ...camera - ExpressVPN: https://expressvpn.com/lexpod and use code LexPod to get 3 months free - MasterClass: https://masterclass.com/lex to get 2 for price of 1 - BetterHelp: https://betterhelp.com/lex to get 10% off EPISODE LINKS: Michael's Twitter: https://twitter.com/mlittmancs Michael's Website: https://www.littmania.com/ Michael's YouTube: https://www.youtube.com/user/mlittman PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ YouTube Full Episodes: https://youtube.com/lexfridman YouTube Clips: https://youtube.com/lexclips SUPPORT & CONNECT: - Check out the sponsors above, it's the best way to support this podcast - Support on Patreon: https://www.patreon.com/lexfridman - Twitter: https://twitter.com/lexfridman - Instagram: https://www.instagram.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/LexFridmanPage - Medium: https://medium.com/@lexfridman OUTLINE: Here's the timestamps for the episode. On some podcast players you should be able to click the timestamp to jump to that time. (00:00) - Introduction (07:43) - Robot and Frank (10:02) - Music (13:13) - Starring in a TurboTax commercial (23:26) - Existential risks of AI (41:48) - Reinforcement learning (1:07:36) - AlphaGo and David Silver (1:17:15) - Will neural networks achieve AGI? (1:29:42) - Bitter Lesson (1:42:32) - Does driving require a theory of mind? (1:51:58) - Book Recommendations (1:57:20) - Meaning of life
Transcript
Discussion (0)
The following is a conversation with Michael Litman, a computer science professor at Brown University
doing research on and teaching, machine learning, reinforcement learning, and artificial intelligence.
He enjoys being silly and light-hearted in conversation, so this was definitely a fun one.
Quick mention of each sponsor, followed by some thoughts related to the episode.
Thank you to SimplySafe, a home security company I use to monitor and protect my apartment.
ExpressVPN, the VPN I've used for many years to protect my privacy on the internet, masterclass,
online courses that I enjoy from some of the most amazing humans in history, and better
help online therapy with a licensed professional.
Please check out these sponsors in the description to get a discount and to support this podcast.
As a side note, let me say that I may experiment with doing some solo episodes in the coming month or two.
The three ideas I have floating in my head currently is to use one particular moment in history to a particular movie or
three a book to drive a conversation about a set of related concepts. For example,
I could use 2001 Space Odyssey or X-Malkina to talk about AGI for one, two, three
hours. Or I could do an episode on the, uh, yes, Rise and Fall of Hitler and
Stalin each in a separate episode, using relevant books and historical moments for reference.
I find the format of a solo episode very uncomfortable and challenging, but that just tells me
that it's something I definitely need to do and learn from the experience. Of course I hope you come along for the ride.
Also since we have all this momentum built up announcements, I'm giving a few lectures
on machine learning at MIT this January.
In general if you have ideas for the episodes, for the lectures, or for just short videos
on YouTube, let me know in the comments that I still definitely read despite my better judgment and the wise
sage device of the great Joe Rogan.
If you enjoy this thing, subscribe on YouTube, review it with 5 stars on Apple Podcast,
follow us, modify, support our Patreon or connect with me on Twitter, Alexa Friedman. As usual, I'll do a few minutes of ads now and no ads in the
middle. I try to make these interesting, but I give you time stamps, so if you skip,
please still check out the sponsors by clicking the links in the description. It is
the best way to support this podcast. This show is sponsored by Simply Safe, a
home security company.
Everyone wants to keep their home and family safe.
That's what they told me to say.
So it must be true.
Whether it's from a break-in, a fire, flooding, or medical emergency, SimplySafe, Home Security,
got your back day and night, ready to send police, fire, or EMTs when you need them most straight to
your door. I'm pretty sure if you suffer an AGI robot takeover they will also
allegedly send spot robots from Boston Dynamics for full-on robot on robot
battle. However small caveat I haven't tried this aspect of the service yet
myself so I can't tell
you if it's a good idea or not.
We have sensors and cameras that protect every inch of your home.
All it takes is a simple 30-minute setup.
I have it set up in my apartment, but unfortunately anyone who tries to break in will be very
disappointed by the lack of interesting or valuable stuff to take.
Some dumbbells, pull a bar,
and some suits and shirts. That's about it. You get a free security camera and a 60-day
risk-free trial when you go to simplysafe.com slash Lex. Again, that's simplysafe.com slash
Lex. This episode is also sponsored by ExpressVPN. Earlier this year, more than 100 Twitter users got their accounts hacked into.
Passwords, email address, phone numbers, and more.
The list included Elon Musk and Kanye West.
Can't believe they gave me those two options.
ExpressVPN can help avoid that.
I use it to safeguard my personal data online. Did you know that for
20 years, the permissive action link PAL access control security device that controls access
to the United States nuclear weapons had a password of just eight zeros. That's it. Apparently
this was a protest by the military to say that PAL systems are generally a bad idea because they are hackable and so on.
Also, the most popular leak password of 2020 are 1-2-3-4-5-6, 1-2-3-4-5-6-7-8,
Picture 1, Password, and 1-2-3-4-5-6-7-8.
If you have one of these passwords, please perhaps make it a new year resolution to change them.
Anyway, ExpressVPN encrypts your data unless you surf the web safely and anonymously.
Get it at ExpressVPN.com slash Lex pod to get extra three months free at ExpressVPN.com slash
Lex pod.
This shows also sponsored by Masterclass,
$180 a year for an all axis pass
to watch courses from literally the best people
in the world on a bunch of different topics.
Let me list some of enjoyed watching in part or in whole.
Chris Hadfield on space exploration,
Neil Lugrass Tyson on scientific thinking
and communication will write, creator
of some city and sims and game design, Carlos Santana on guitar, Gary Kasparov on chess,
Daniel Nagrano on poker, Neil Gaiman on storytelling, Martin Scorsese on filmmaking, Jane Goodall
on conservation, and many more.
By the way, you can watch it on basically any device.
Sign up at masterclass.com slashlex to get 15% off the first year of an annual subscription.
That's masterclass.com slashlex.
This episode is also sponsored by BetterHelp.
Spelled HELP.
They figure out what you need and match it with a licensed professional therapist in
under 48 hours.
I chat with the person on there and enjoy it.
Of course, I also have been talking to David Goggins over the past few months, who is
definitely not a licensed professional therapist, but he does help me meet his and my demons
and become comfortable to exist in their presence. Everyone is different, but for me,
I think suffering is essential for creation, but you can
suffer beautifully in a way that doesn't destroy you.
Therapy can help in whatever form that therapy takes.
Better help is an option worth trying.
They're easy, private, affordable, available worldwide.
You can communicate by text anytime and schedule weekly, audio and video sessions.
You didn't ask me, but my two favorite psychiatists, the Zingman Freud and Carl Jung.
Their work was important in my intellectual development.
Anyway, check out betterhelp.com slash Lex.
That's betterhelp.com slash Lex.
And now, here's my conversation with Michael Blakeman.
I saw a video of you talking to Charles this bell about Westworld, the TV series.
You guys were doing a kind of thing where you're watching new things together.
But let's rewind back.
Is there a sci-fi movie or book or shows that you that was profound, that had an impact
on you philosophically or just like specifically something
you enjoyed nerding out about.
Yeah, interesting.
I think a lot of us have been inspired
by robots and movies.
But one that I really like is,
there's a movie called Robot in Frank,
which I think is really interesting
because it's very near-term future,
where robots are being deployed as helpers in people's homes. And was it was and we don't have to make robots like that at this point
But it seemed very plausible it seemed very
Realistic or imaginable. I thought that was really cool because they did their awkward
They do funny things that raised some interesting issues
But it seemed like something that would ultimately be helpful and good if we could do it right
Yeah, it was an older cranky gentleman. He was an older cranky jolt thief, yeah.
It's kind of funny little thing, which is, you know, he's a jolt thief, and so he pulls the robot
into his life, which is like, which is something you could imagine taking a home robotics thing,
and pulling into whatever quirky thing that's involved in your
existence. Yeah, the meaning is meaningful to you. Exactly. Yeah. And I think from that perspective,
I mean, not all of us are jewel thieves. And so when we bring our robots into our
life, yeah, it explains a lot about this apartment, actually. But no, the idea that people should
have the ability to make this technology their own, that
it becomes part of their lives.
And I think that's hard for us as technologists to make that kind of technology.
It's easier to mold people into what we need them to be.
And just that opposite vision, I think, is really inspiring.
And then there's an anthropomorphization where we project certain things on them, because
I think the robot was kind of dumb
But I have a bunch of rumbers that play with and they you immediately project stuff onto them
But much greater level of intelligence. We'll probably do that with each other too much much greater degree of compassion
One of the things we're learning from AI is where we are smart and where we are not smart. Yeah
You also enjoy as
more than we are not smart. Yeah.
You also enjoy, as people can see,
and I enjoyed myself watching you sing
and even dance a little bit, a little bit of dancing.
A little bit of dancing, it's not quite my thing.
As a method of education, or just in life,
you know, in general, so easy question,
what's the definitive objectively speaking top three
songs of all time? Maybe something that, you know, to walk that back a little bit, maybe
something that others might be surprised by, the three songs that you kind of enjoy.
That is a great question that I cannot answer, but instead let me tell you a story. So,
pick a question you do on the answer. That instead, let me tell you a story. So pick a question you do. That's right.
I've been watching the presidential debates
and vice president debates and turns out,
yeah, you just can just answer any question you want.
So to relate to questions.
Let me interrupt you.
Yeah, well said.
I really like pop music.
I've enjoyed pop music ever since I was very young.
So 60s music, 70s music, 80s music,
this is all awesome. And then I had kids and I think I stopped listening to music and I was
starting to realize that the like my musical taste had sort of frozen out. And so I decided
in 2011, I think to start listening to the top 10 Billboard songs each week. So I'd be on the
treadmill and I would listen to that week's top 10 songs. So I could find out what was popular now. And what I discovered is that I have no musical taste whatsoever.
I like what I'm familiar with. And so the first time I'd hear a song is the first week that was on
the charts. I'd be like, and then the second week I was into it a little bit and the third week I was
loving it. And by the fourth week is like just part of me. And so I'm afraid that I can't tell you
the most, my favorite song of all time,
because it's whatever I heard most recently.
Yeah, that's interesting.
People have told me that there's an art
to listening to music as well.
And you can start to, if you listen to a song,
just carefully, like explicitly,
just force yourself to really listen.
You start
to, I did this when I was part of Jazz Band and Fusion Band and College is there, you start
to hear the layers of the instruments. You start to hear the individual instruments. You
start to, you can listen to classical music or to work this way, you can listen to jazz
this way. I mean, it's funny to imagine you now to walk in that
forward to listening to pop hits now as like a scholar, listening to like Cardi B or
something like that or Justin Timberlake.
He, no, not Timberlake Bieber.
I get my chest.
They both, they've both been in the top 10 since I've been listening.
They're still, they're still up there.
Oh my God, I'm so cool.
If you haven't heard Justin Timberlake's Top 10 in the last few years,
there was one song that he did where the music video was set at essentially
Nourips. Oh, the one with the robotics. Yeah, yeah, yeah.
Yeah, yeah, it's like at an academic conference and he's doing it.
He was presenting. It was sort of a cross between the Apple, like,
Steve Jobs kind of talk and Nourips.
Yeah.
So it's always fun when AI shows up in pop culture.
I wonder if he consulted somebody for that.
That's really interesting.
So maybe on that topic, I've seen your celebrity
multiple dimensions, but one of them is you've done cameos
in different places.
I've seen you in a turbo tax commercial as like I guess the
the brilliant Einstein character and the point is that turbo tax doesn't need somebody like you.
Doesn't need a brilliant person. Very few things need someone like me, but yes, they were
specifically emphasizing the idea that you don't need to be a computer expert to be able to use
their software.
How'd you end up in that world?
I think it's an interesting story.
So I was teaching my class.
It was an intro computer science class for non-concentrators, non-majors.
And sometimes when people would visit campus, they would check in to say,
hey, we want to see what a class is like, can we sit on your class?
So a person came to my class who was the daughter
of the brother of the husband of the best friend of my wife.
Anyway, basically a family friend came to campus
to check out Brown and asked to come to my class and came
with her dad.
Her dad is, who I've known from various kinds of family events and so forth, but he also
does advertising.
And he said that he was recruiting scientists for this ad, this TurboTex set of ads.
And he said, we wrote the ad with the idea
that we get the most brilliant researchers,
but they all said no.
So can you help us find the, like,
B level scientists?
I'm like, sure, that's who I hang out with.
So that should be fine.
So I put together a list and I did what some people
call the Dick Taney.
So I included myself on the list of possible candidates.
You know, with a little blurb about each one and why I thought that would make sense for them to do it.
And they reached out to a handful of them, but then they ultimately,
they YouTube stocked me a little bit and they thought, oh, I think he could do this.
And they said, okay, we're going to offer you the commercial.
I'm like, what?
So it was, it was such an interesting experience
because they have another world,
the people who do nationwide kind of ad campaigns
and television shows and movies and so forth,
it's quite a remarkable system that they have going.
Because they-
Yeah, so I went to, it was just some buddies house
that they rented in New Jersey. But in went to, it was just some buddies house that they rented in New Jersey.
But in the commercial, it's just me and this other woman. In reality, there were 50 people
in that room and another, I don't know, half a dozen kind of spread out around the house
in various ways. There were people whose job it was to control the sun. They were in the backyard
on ladders putting filters up to try to make sure
that the sun didn't glare off the window
in a way that would wreck the shot.
So there was like six people out there doing that.
There was three people out there giving snacks,
the craft table.
There was another three people giving healthy snacks
because that was a separate craft table.
There was one person whose job it was
to keep me from getting lost.
And I think the reason for all this is because so many people are in one place at one time,
they have to be time efficient, they have to get it done.
The morning they were going to do my commercial in the afternoon,
they were going to do a commercial of a mathematics professor from Princeton.
They had to get it done.
No, you know, no wasted time or energy.
And so there's just a fleet of people all working as an organism and it was fascinating.
I was just the whole time,
I was just looking around like,
this is so neat.
Like one person whose job it was
to take the camera off of the cameraman
so that someone else whose job it was
to remove the film canister
because every couple's takes,
they had to replace the film because, you know,
film gets used up.
It was just, I don't know, I was geeking out
the whole time, it was so fun.
How many takes did it take?
It's looked the opposite like there was more
than two people there.
It was very relaxed.
Right, yeah, the super, I mean, the person
who I was in the scene with is a professional.
She said, you know, improv comedian,
from your city, and then what I got there,
they had given me a script, as such as it was, and then I got there and they said, we're gonna do this as improv. I'm like, I don't know how to improv. Like this is not, I don't know what this,
I don't know what you're telling me to do here.
Don't worry, she knows.
I'm like, okay.
I'm gonna see how this goes.
I guess I got pulled into the story
because like, where the heck did you come from?
I guess in the scene.
Like, how did you sharpen this random person's house?
I don't know.
Yeah, well, I mean the reality of it is I stood outside
and I was like, I'm gonna see how this from? I guess in the scene. Like, how did you shop in the,
ran the person's house?
I don't know.
Yeah, well, I mean, the reality of it is
I stood outside in the blazing sun.
There was someone whose job it was to keep an umbrella over me
because I started to sweat.
I started to sweat.
And so I would wreck the shot
because my face was all shiny with sweat.
So there was one person who would dab me off
at an umbrella.
But yeah, like the reality of it,
like, why is this strange
stalkery person hanging around outside somebody's house?
And we're not sure. We're not sure. We have to look in what the ways for the book.
But are you, uh, so you make, you make, like you said, YouTube, you make videos
yourself, you make awesome parody, sort of, uh, parody songs that kind of focus
in on particular aspect of computer science.
How much those seem really natural?
How much production value goes into that?
You also have a team of 50 people.
The videos, almost all the videos except for the ones that people would have actually
seen were just me.
I write the lyrics, I sing the song, I generally find a backing track online because I'm like,
you can't really play an instrument.
And then I do, in some cases,
I'll do visuals using just PowerPoint.
Lots and lots of PowerPoint
to make it sort of like an animation.
The most produced one is the one that people might have seen,
which is the overfitting video that I did with Charles,
as well.
And that was produced by the Georgia Tech and Udacity people
because we were doing a class together.
It was kind of, I usually do parody songs
kind of to cap off a class at the end of a class.
So that one you're wearing, so it was just a thriller.
Yeah.
You were in the Michael Jackson, the Red Leather jacket.
The interesting thing with podcasting that you're also into is that I really enjoy
is that there's not a team of people. It's kind of more because you know the there's something
that happens when there's more people involved than just one person that's just the way you start acting.
I don't know, there's a censorship,
you're not given, especially for like slow thinkers like me,
you're not, and I think most of us are,
if we're trying to actually think
we're a little bit slow and careful,
it kind of large teams get in the way of that.
And I don't know what to do with that.
Like that's the, to me, like, if there is, it's very popular.
It's a criticize quote unquote mainstream media.
But there is legitimacy to criticizing them.
The same love is seen to NPR, for example, but every, it's clear that there's a
team behind it.
There's a commerce that has cost the commercial breaks.
There's this kind of like rush of like
Okay, I have to interrupt you now because we have to go to commercial just this whole it creates
It destroys the possibility of nuance the conversation
Yeah, exactly
Evian which Charles is well what I talked to yesterday, told me that Evian is naive backwards,
which the fact that his mind thinks this way is just quite brilliant.
Anyway, there's a freedom to this podcast.
He's Dr. Awkward, which by the way is a palindrome.
That's a palindrome that happened to know from other parts of my life.
And I just thought about him.
Well, you know, stick against Charles.
Dr. Awkward.
So what, what was the most challenging parody song to make?
Was it the thriller one?
No, that one was really fun.
I wrote the lyrics really quickly.
And then I gave it over to the production team.
They recruited an Acapella group to sing.
That one, it written really smoothly.
It's great having a team.
Because then you can just focus on the part
that you really love,
which in my case is writing the lyrics.
For me, the most challenging one,
not challenging in a bad way,
but challenging in a really fun way was,
I did one of the parody songs I did
is about the halting problem in computer science,
the fact that you can't create a program
that can tell for any other arbitrary
program whether it actually going to get stuck in a loop or whether it's going to eventually
stop.
And so I did it to an 80 song because that's, I hadn't started my new thing of learning
current songs.
And it was Billy Jules the piano man.
Nice.
Which is a great song.
Great song. Nice. Which is a great song. Great song, that's it.
Yeah.
And...
Sing me a song.
You have The Piano Man?
Mm-hmm.
Yeah.
So the lyrics are great because, first of all, it rhymes.
Not all songs rhyme.
I've done Rolling Stone songs which turn out to have no rhyme scheme whatsoever.
They're just sort of yelling and having a good time.
Which makes it not fun from a parody perspective, because like you can say anything, but the
lines rhyme, and there was a lot of internal rhymes as well. And so figuring out how to
sing with internal rhymes, a proof of the halting problem was really challenging. And it was
I really enjoyed that process.
What about last question, this topic, what about the dancing in the thriller video, how many takes that take?
So I wasn't planning to dance, they had me in the studio and they gave me the jacket and
it's like, well, you can't, if you have the jacket and the glove, like there's not much
you can do.
So I, I think I just danced around and then they said, why don't you dance a little bit.
We, there was a scene with me and Charles dancing together.
They did not use it in the video, but we recorded it. It was pretty funny. And Charles, who has this beautiful, wonderful voice,
doesn't really sing. He's not really a singer. And so that was why I designed the song with him
doing a spoken section and me doing the singing. It's very like very white. Yeah, smooth baritone.
Yeah, yeah, it's great. That was awesome.
So one of the other things Charles said is that, you know,
everyone knows you as like a super nice guy, super passionate about teaching and so on.
What he said, don't know if it's true, that despite the fact that you are a cold.
that despite the fact that you are to cold blood.
Okay, I will admit this finally
for the first time that was
me. It's the Johnny Cash song
really, Manorino, just to watch
him die.
That you actually do have some strong opinions
and some topics.
So if this in fact is true, what
strong opinions would you say you have?
Is there ideas you think maybe in artificial intelligence,
machine learning, maybe in life that you believe is true
that others might, you know,
some number of people might disagree with you on.
So I try very hard to see things from multiple perspectives.
There's this great Calvin and Hobbes cartoon
where Calvin, okay, so Calvin's dad is always
kind of a bit of a foil and he talked Calvin
and Calvin had done something wrong.
The dad talks him into seeing it from another perspective
and Calvin, this breaks Calvin
because he's like, oh my gosh, now I can see
the opposite sides of things and so it becomes like a cubist cartoon where there is no front and back.
Everything's just exposed.
And it really freaks him out.
And finally, he settles back down.
It's like, oh, good, no, I can make that go away.
But like, I'm that, I'm that.
I live in that world where I'm trying to see everything from every perspective all the
time.
So there are some things that I've formed opinions about that I would be harder, I think,
to disavow me of. One is the superintelligence argument and the
existential threat of AI is one where I feel pretty confident in my
feeling about that one. Like, I'm willing to hear other arguments, but like, I am
not particularly moved by the idea that if we're not careful, we will
accidentally create a super intelligence
that will destroy human life.
Let's talk about that.
Let's get you in trouble and record your own video.
It's like Bill Gates.
I think he said like some quote about the internet
that that's just gonna be a small thing.
It's not gonna really go anywhere.
And then I think Steve Balmer said,
I don't know why I'm sticking on Microsoft.
That's something that like smartphones are useless.
There's no reason why Microsoft should get into smartphones.
So let's talk about AGI.
As AGI is destroying the world, we'll look back at this video and see.
No, I think it's really interesting to actually talk about because nobody really knows the
future.
See, if they use your best intuition. It's very difficult to predict it, but you have spoken about AGI and the existential risks
around it and sort of based on your intuition that we're quite far away from that being a serious
concern relative to the other concerns we have. Can you maybe unpack that a little bit? Yeah, sure, sure.
So as I understand it, for example, I read Boscherms book
and a bunch of other reading material
about this sort of general way of thinking about the world.
And I think the story goes something like this,
that we will, at some point, create computers
that are smart enough
that they can help design the next version of themselves,
which itself will be smarter than the previous version
of themselves and eventually bootstrapped up
to being smarter than us, at which point,
we are essentially at the mercy
of this sort of more powerful intellect,
which in principle, we don't have any control
over what its goals are. And so if its goals are at all out of sync with our goals, for example,
the continued existence of humanity, we won't be able to stop it. It'll be way more powerful than us
and we will be toast. So there's some, I don't know, very smart people
who have signed on to that story.
And it's a compelling story.
I want to, now I can really get myself in trouble.
I run to wrote an op-ed about this, specifically responding
to some quotes from Elon Musk,
who has been on this very podcast more than once.
And... A-I-I summoning the demon, I forget.
I think he said, but then he came to Providence, Rhode Island,
which is where I live, and said to the governors of all the states,
you're worried about entirely the wrong thing.
You need to be worried about AI.
You need to be very, very worried about AI.
So, and the journalist kind of reacted to that,
and they wanted to get people's,
people's take and I was like,
okay, my, my, my belief is that
one of the things that makes Elon Musk so successful
and so remarkable as an individual
is that he believes in the power of ideas.
He believes that you can have,
you can, if you know,
if you have a really good idea for getting into space,
you can get into space. If you have a really good idea for a company or for how to change the way that people drive,
you just have to do it and it can happen. It's really natural to apply that same idea to AI.
You see these systems that are doing some pretty remarkable computational tricks,
demonstrations, and then to take that idea and just push it all the way to the limit
and think, okay, where does this go?
Where is this going to take us next?
And if you're a deep believer in the power of ideas, then it's really natural to believe
that those ideas could be taken to the extreme and kill us.
So I think, you know, his strength is also his undoing because that doesn't mean it's true.
Like, it doesn't mean it's true.
It doesn't mean that that has to happen, but it's natural for him to think that.
So another way to phrase the way he thinks, and I find it very difficult to argue with
that line of thinking.
So Sam Harris is another person from a neuroscience perspective that thinks like that is saying, well, is there something fundamental in the physics of the universe that prevents this from eventually happening?
And that's the niggas from things in the same way, that kind of zooming out, yeah, okay, we humans now are existing in this, like, time scale of minutes and days. And so our intuition is in this time scale of minutes,
hours and days.
But if you look at the span of human history,
is there any reason we can't see this in 100 years?
And like, is there something fundamental
about the laws of physics that prevent this?
And if it doesn't, then it eventually will happen.
Or we will destroy ourselves in some other way.
And it's very difficult, I find,
to actually argue against that.
Yeah.
Me too.
And not sound like, not sound like you're just,
like rolling your ass, I have like a...
It's science fiction, we don't have to think about it.
But even worse than that, which is like,
I don't know if kids, but like I got to pick up my kids now.
Like this, okay.
I see there's more pressing short term.
Yeah, there's more pressing short term things
that like stop a with a sexual crisis
with much shorter things like now,
especially this year, there's COVID.
So like any kind of discussion like that is,
like there's, you know, this pressing things today.
It's, and then sort of the same hair as argument,
well, like any day, the exponential singularity can occur
is very difficult to argue against.
I mean, I don't know.
Part of his story is also, he's not gonna put a date on it.
It could be in a thousand years, it could be in a hundred years,
it could be in two years.
It's just that as long as we keep making this kind of progress, it's ultimately has to
become a concern.
I kind of am all over with that, but the thing that the piece that I feel like is missing
from that way of extrapolating from the moment that we're in is that I believe that in the
process of actually developing technology that can really get around in the world and really process and do things in the world in a sophisticated way.
We're going to learn a lot about what that means, which we don't know now, because we don't know how to do this right now.
If you believe that you can just turn on a deep learning network and it eventually give it enough compute and it eventually get there, well sure, that seems really scary because we won't be in the loop at all.
We won't be helping to design or target these kinds of systems.
But I don't see that feels like it is against the laws of physics because these systems need
help, right?
They need to surpass the difficulty, the wall of complexity that happens in arranging something
in the form that will happen.
Yeah. I believe in evolution. I believe that there's an argument, right? So there's another argument
just to look at it from a different perspective that people say, why don't we even evolution? How could
evolution? It's sort of like a random set of parts assemble themselves into a 747,
and that could just never happen.
So it's like, okay, that's maybe hard to argue against,
but clearly, 747s do get assembled,
they get assembled by us,
basically the idea being that there's a process
by which we will get to the point
of making technology that has that kind of awareness.
And in that process, we're gonna learn a lot
about that process, and we'll have more
ability to control it or to shape it or to build it in our own image.
It's not something that is going to spring into existence like that 747.
And we're just going to have to contend with it completely unprepared.
That's very possible that in the context of the long arc of human history, it will in fact spring into existence,
but that springing might take, like if you look at nuclear weapons, like even 20 years
is a springing in the context of human history, and it's very possible just like with nuclear
weapons that we could have, I don't know what percentage you want to put at it, but the
possibility of...
Could it knock ourselves out?
Yeah, the possibility of human beings destroying themselves in the 20th century.
When you clear up, I don't know, you can, if you really think through it, you could really
put it close to like, I don't know, 30, 40 percent given like the certain moments of crisis
that happen. So like I think one like fear in the shadows
that's not being acknowledged is it's not so much the AI will run away is that as it's
running away, we won't have enough time to think through how to stop it.
Right. Fast takeoff or food. Yeah. I mean, my much bigger concern, I wonder what you think about it, which is we won't know
it's happening.
So I kind of think that there is an AGI situation already happening with social media
that our minds, our collective intelligence, of human civilizations are already being controlled
by an algorithm.
And like we're already super,
like the level of a collective intelligence,
thanks to Wikipedia, people should donate to Wikipedia
to feed the AGI.
Man, if we had a super intelligence
that was in line with Wikipedia's values,
that's a lot better than a lot of other things I can imagine.
I trust Wikipedia more than I trust Facebook or YouTube
as far as trying to do the right thing
from a rational perspective.
Now that's not where you were going, I understand that.
But it does strike me that there's sort of
smarter and less smart ways of exposing ourselves
to each other on the internet.
Yeah, the interesting thing is that Wikipedia and social media
have very different forces.
You're right.
I mean, if AGI was Wikipedia,
it'd be just like this cranky, overly competent editor
of articles.
You know, there's something to that.
But the social media aspect is not,
so the vision of AGI is as a separate system that's super intelligent.
That's super intelligent.
That's one key little thing.
I mean, there's the paperclip argument that's super dumb, the super powerful systems.
But with social media, you have a relatively like algorithms we may talk about today.
Very simple algorithms that when something Jarls talks a lot about
which is interactive AI, when they start having at scale tiny little interactions with human beings,
they can start controlling these human beings. So a single algorithm can control the minds of
human beings slowly. To where we might not realize it can start wars, it can start, it can change the way we
think about things. It feels like in the long arc of history, if I were to sort of zoom out
from all the outrage and all the tension on social media, that it's progressing us towards
better and better things. It feels like chaos and toxic and all that kind of stuff.
But it feels like actually the chaos and toxic is similar to the kind of debates we had
from the founding of this country. There's a civil war that happened over that period.
And ultimately it was all about this tension of something that doesn't feel right about
our implementation of the core values
we hold as human beings and they're constantly struggling with this and that results in people
calling each other, just being shady to each other on Twitter. But I ultimately the algorithm
is managing all that and it feels like there's a possible future in which that algorithm
all that and it feels like there's a possible future in which that algorithm
controls us into the direction of self-destruction and what that looks like.
Yeah, so all right. I do believe in the power of social media to
screw us up royally. I do believe in the power of social media to benefit us too. I do think that we're in a
Yeah, it's sort of almost got dropped on top of us and now now we're trying to, as a culture, figure out how to cope with it.
There's a sense in which, I don't know, there's, there's some arguments that say that,
for example, I guess college age students now, late college age students now,
people who are in middle school when, when social media started to really take off,
maybe, maybe really damaged.
This may have really hurt their development
in a way that we don't have all the implications
of quite yet.
That's the generation who, and I hate to make it
somebody else's responsibility,
but they're the ones who can fix it.
They're the ones who can figure out,
how do we keep the good of this kind of technology without letting
it eat us alive?
And if they're successful, we move on to the next phase, the next level of the game.
If they're not successful, then we're going to wreck each other, we're going to destroy
society.
So you're going to, in your old age, set on a porch and watch the world burn because of the
TikTok generation that...
I believe, well, so my kids age, right?
And certainly my daughter's age,
and she's very tapped in to social stuff,
but she's also, she's trying to find that balance, right?
Of participating in it and getting the positives of it,
but without letting it eat her alive.
And I think sometimes she ventures,
I hope she's watched this. Sometimes I think sometimes she ventures, I hope she doesn't watch this. Sometimes I think
she ventures a little too far and is consumed by it. And other times she gets a little distance.
And if there's enough people like her out there, they're going to navigate this choppy waters.
That's an interesting skill actually to develop. I talked to my dad about it.
I've now somehow this podcast in particular, but other reasons has received a little bit of attention.
And with that, apparently, in this world, even though I don't shut up about love and I'm just all about kindness,
I have now a little mini army of trolls.
It's kind of hilarious actually,
but it also doesn't feel good,
but it's a skill to learn, to not look at that.
Like, to moderate actually how much you look at that.
The discussion I have with my dad,
it's similar to, it doesn't have to be about trolls,
it could be about checking email,
which is like if you're anticipating, you know
There's my dad will runs a large institute at Drexville University
And there could be stressful like emails. You're waiting like there's drama of some kinds
And so like there's a temptation to check the email if you send an email and you cut it and that pulls you in into it doesn't feel good
and and that pulls you in into, it doesn't feel good.
And it's a skill that he actually complains
that he hasn't learned, I mean, he grew up without it.
So he hasn't learned the skill of how to shut off the internet
and walk away.
And I think young people, while they're also being
quote unquote damaged by like,
being bullied online, all of those stories,
which are very like horrific,
you basically can't escape your bullies these days
when you're growing up,
but at the same time,
they're also learning that skill of how to be able to shut off
the, like disconnect with it,
be able to laugh at it,
not take it too seriously.
It's a fascinating,
like we're all trying to figure this out,
just like you said,
it's been dropped on us and we're trying to figure it out.
Yeah, I think that's really interesting.
I guess I've become a believer in the human design, which I feel like I don't completely
understand.
How do you make something as robust as us?
We're so flawed in so many ways, and yet, and yet, we dominate the planet. And we do seem to manage to get ourselves out of scrapes,
eventually, not necessarily the most elegant possible way,
but somehow we get to the next step.
And I don't know how I make a machine do that.
I, I, I, I, generally speaking, like if I train one
of my reinforcement learning agents to play a video game
and it works really hard on that first stage over and over and over again and it makes it through
it, succeeds on that first level. And then the new level comes and it's just like, okay,
I'm back to the drawing board. And somehow humanity, we keep leveling up and then somehow managing
to put together the skills necessary to achieve success, some semblance of success in that next level too.
success, some semblance of success in that next level too. And you know, I hope we can keep doing that.
You mentioned reinforcement learning, so you've had a couple of years in the field. No,
quite, you know, quite a few, quite a long career in artificial intelligence broadly, but reinforcement learning specifically, can you maybe give a hint
about your sense of the history of the field and in some ways, it's changed with the
advent of deep learning, but as long roots, like how is it we've done it out of your own life,
how have you seen the community change, or maybe the ideas that it's playing with change?
I've had the privilege, the pleasure of being, of having almost a front row seat
to a lot of this stuff and it's been really, really fun and interesting.
So when I was in college in the 80s, early 80s, the neural net thing was
starting to happen.
And I was taking a lot of psychology classes and a lot of computer science
classes as a college student, and I thought, you know, something that can play
tic-tac-toe and just like learn to get better at it, that ought to be a really easy thing.
So I spent almost all of my, what would have been vacations during college, like hacking
on my home computer, trying to teach it how to play tic-tac-toe.
Programming language.
Basic.
Oh yeah, that's, that's, I was,. Basic. Oh yeah. That's my first language.
That's my native language.
Is that when you first fell in love with computer science?
Just like programming basic on that.
What was the computer, do you remember?
I had a Tiras 80 model one before they were called model ones, because there was nothing
else.
I got my computer in 1979.
So I would have been Bar Mitzvahd,
but instead of having a big party
that my parents threw on my behalf,
they just got me a computer
because that's what I really, really, really wanted.
I saw them in the mall in Reirshack
and I thought, what, how are they doing that?
I would try to stump them.
I would give them math problems. Like one plus and then it prints these two plus doing that? I would try to stump them. I would give them math problems.
Like one plus and then it prints these two plus one.
And I would always get it right.
I'm like, how do you know so much?
I'm like, I've had to go to algebra class
for the last few years to learn this stuff.
And you just seem to know.
So I was, I was, hey, I was smitten and got a computer.
And I think age is 13 to 15.
I have no memory of those years.
I think I just was in my room with the computer.
The same to Billy Joel.
Communing, possibly listening to the radio,
listening to Billy Joel.
That was the one album I had on vinyl at that time.
And then I got it on cassette tape
and that was really helpful.
Because then I could play it.
I didn't have to go down to my parents' wifi,
or wifi, sorry. And at-fi, I'm sorry.
And at age 15, I remember kind of walking out and like,
okay, I'm ready to talk to people again.
Like I've learned what I need to learn here.
And so yeah, so that was my home computer.
And so I went to college and I was like,
oh, totally gonna study computer science.
And I opted, the college I chose specifically
had a computer science major.
The one that I really wanted, the college I really wanted to go to didn't so buy buy to them.
Which college did you go to?
So I went to Yale.
Princeton would have been way more convenient and it was just beautiful campus and it was
close enough to home and I was really excited about Princeton and I visited.
I said, so computer science major like, well, we have computer engineering.
I'm like, well, I don't like that we're an engineering. I like computer science majors like, well, we have computer engineering. I'm like, well, I don't like that we're engineering.
I like computer science.
I really, I want to do, like, you're saying hardware and software.
They're like, yeah, I just want to do software.
I couldn't care less about hardware.
You grew up in Philadelphia?
I grew up outside Philly, yeah.
Yeah.
So the, you know, local schools were like pen and Drexel and temple.
Like every one of my family went to Temple, at least at one point
in their lives except for me.
So yeah, Philly family.
You all had a computer science department and that's one you, it's kind of interesting.
You said 80s in your own networks.
That's when you're in your own networks, a hot new thing or a hot thing period.
So what is that in college when you first learned about your own networks?
Yeah, yeah.
And it was in a psychology class, not in a CS class.
Yeah.
Was it psychology or cognitive science?
Or like, do you remember like what context?
It was, yeah, yeah.
So, I was a, I've always been a bit of a cognitive psychology
groupie.
So like I study computer science,
but I like to hang around where the cognitive scientists are
because I don't know, brains, man, they're like, they're wacky, cool.
And they have a bigger picture of you of things.
They're a little less engineering, I would say.
They're more, they're more interested in
the nature of cognition and intelligence
and perception and how like the vision system works.
They're asking always bigger questions.
Now with the deep learning community there,
I think more, there's a lot of intersections,
but I do find that the neuroscience folks actually in the psychology, cognitive science
folks are starting to learn how to program, how to use new, artificial new networks.
And they are actually approaching problems in like totally new, interesting ways.
It's fun to watch at grad students from those departments approach a problem on machine
learning.
Right.
They come in with a different perspective.
Yeah.
They don't care about your image in that data set or whatever.
They want to understand the basic mechanisms at the neuronal level, at the functional level
of intelligence.
It's kind of cool to see them work.
You're always a group of cognitive psychology.
Yeah.
It was in a class by Richard Garrick.
He was kind of my favorite psych professor in college.
I took three different classes with him.
Yes, they were talking specifically,
the class I think was kind of a,
there was a big paper that was written by Stephen Pinker
and Prince, I don't, I'm blanking on Prince's first name,
but Prince and Pinker and Prince,
they wrote kind of a,
they were at that time kind of like,
I'm blanking on the names of the current people. The cognitive
scientists who are complaining a lot about deep networks. Oh, Gary.
Gary Marcus. Sorry, Marcus. And who else? I mean, there's a few, but Gary, Gary is the
most feisty. Sure. Gary's very feisty. And with his co-author, they, you know, they're kind
of doing these kind of takedowns where they say, okay, well, yeah, it does all these amazing things, but here's a short
coming, here's a short coming, here's a short coming.
And so the Pinker Prince paper is kind of like that generation's version of Marcus and
Davis, right, where they're trained as cognitive scientists, but they're looking skeptically
at the results in the artificial intelligence neural net kind of world
and saying, yeah, it can do this and this and this,
but like, it can't do that and it can't do that
and it can't do that.
Maybe in principle, or maybe just in practice at this point,
but the fact of the matter is you've narrowed your focus
too far to be impressed, you know,
you're impressed with the things within that circle,
but you need to broaden that circle a little bit.
You need to look at a wider set of problems.
And so we saw as in this seminar in college that was basically a close reading of the Pinker Prince paper,
which was really thick. There was a lot going on in there.
And it talked about the reinforcement learning idea a little bit.
I'm like, oh, that sounds really cool because behavior is what is really interesting to me about psychology anyway.
So making programs that, I mean, programs are things that behave.
People are things that behave.
Like I want to make learning that learns to behave.
In which way was reinforcement learning presented?
Is this talking about human and animal behavior?
Or are we talking about actual mathematical constructs?
That's a good question.
So this is, I think it wasn't actually
talked about as behavior in the paper that I was reading.
I think that it just talked about learning.
And to me, learning is about learning to behave,
but really, neural nets at that point
were about learning, like supervised learning,
so learning to produce outputs from inputs.
So I kind of tried to invent reinforcement learning. When I graduated, I joined a research group at Bellcore, which had spun
out of Bell Labs recently at that time, because of the divestiture of the long distance and local
phone service in the 1980s and 1984. And I was in a group with Dave Ackley, who was the first author of the Boltzmann machine paper,
so the very first neural net paper
that could handle XOR.
Right, so XOR sort of killed neural nets,
the very first, the zero width or the...
The first winter.
Yeah, the perceptron's paper,
and Hinton, along with his student Dave Ackley,
and I think there was other authors as well,
showed that, no, no, no, with Boltz machines,
we can actually learn non-linear concepts.
And so everything's back on the table again.
And that kind of started that second wave
of neural networks.
So Dave Ackley was, he became my mentor at Bellcore,
and we talked a lot about learning and life
and computation and how all these things get together.
Now Dave and I have a podcast together.
So I get to kind of enjoy that sort of his perspective once again, even all these years later.
And so I said, so I said I was really interested in learning, but in the concept of behavior.
And he's like, oh, well, that's reinforcement learning here.
And he gave me Rich Sutton's 1984 TD paper. So I read that paper. I honestly didn't get all of it, but I got the
idea. I got that they were using that he was using ideas that I was familiar with in the context
of neural nets and like sort of back prop. But with this idea of making predictions over time,
I'm like, this is so interesting, but I don't really get all the details
I said to Dave and Dave said, oh, well, why don't we have him come and give a talk?
And I was like,
wait, what you can do that? Like these are real people. I thought they were just words.
I thought it was just like ideas that somehow magically seeped into paper. He's like, no, I
Know Rich like, well, just haven't come down and he'll give a talk.
And so I was, my mind was blown.
And so Rich came and he gave a talk at Belcore.
And he talked about what he was super excited,
which was they had just figured out at the time of Qeer learning.
So Watkins had visited the Rich Sutton's lab at UMass
or Andy Bartos lab that Rich was
a part of.
And he was really excited about this because it resolved a whole bunch of problems that
he didn't know how to resolve in the earlier paper.
And so people don't know TD, temporal difference.
These are all just algorithms for reinforcement learning.
Right.
And TD, temporal difference in particul is about making predictions over time.
And you can try to use it for making decisions, right?
Because if you can predict how good a future action, an action outcomes will be in the future,
you can choose one that has better and over.
But the theory didn't really support changing your behavior.
Like the predictions had to be of a consistent process if you really wanted it to work.
And one of the things that was really cool about Q learning, another algorithm for reinforcement learning, is it was off-policy, which meant that you could actually be learning about
the environment and what the value of different actions would be while actually figuring out how
to behave optimally. So that was a revelation. And the proof of that is kind of interesting. I mean,
that's really surprising to me when I first read that and then in which it reached
Sutton's book on the matter.
It's kind of a beautiful, that a single equation can capture all of that.
One equation, one line of code, and like you can learn anything.
Yeah, like at enough time.
So equation and code, you're right.
Like, you can, the code that you can arguably, at least if you like, squint your eyes, can
say, this is all of intelligence.
It is, that you can implement that in a single one.
I think I started with Lisp, which is, shout out to Lisp, with like a single line of code,
key piece of code, maybe a couple that they could do that as kind of magical.
It feels to be true.
Well, I'm sort of it.
Yeah, kind of.
It seems to require an awful lot of extra stuff supporting it.
But nonetheless, the idea is really good.
And as far as we know, it is a very reasonable way of trying to create adaptive behavior,
behavior that gets better
at something over time.
Did you find the idea of optimal at all compelling?
You can prove that it's optimal.
So like one part of computer science that it makes people feel warm and fuzzy inside
is when you can prove something like that assorting algorithm, worst case runs and log
in and it makes everybody feel so good.
Even though in reality, it doesn't really matter what the worst case is, what matters
is like, does this thing actually work in practice on this particular actual set of data
that I enjoy?
Did you?
So here's a place where I have maybe a strong opinion, which is like, you're right, of
course, but no, no, like
so, so what makes worst case so great, right? If you have a worst case analysis, so great,
is that you get modularity. You can take that thing and plug it into another thing and
still have some understanding of what's going to happen when you click them together,
right? If it just works well in practice, in other words, with respect to some distribution
that you care about, when you go plug it into another thing, that distribution can shift,
it can change, and your thing may not work well anymore. And you want it to, and you wish
it does, and you hope that it will, but it might not. And then, ah.
So you're saying you don't like machine learning.
But we have some positive theoretical results for these things.
You know, you can come back at me with, yeah, but they're really weak and, yeah,
they're really weak. And you can even say that, you know, sorting algorithms, like if you do
the optimal sorting algorithm, it's not really the one that you want. And that might be true as well.
But, but it is, the modularity is a really powerful state.
I really like that.
As an engineer, you can then assemble different things.
You can call them to be, I mean, it's interesting.
It's a balance, like with everything else in life.
You don't want to get too obsessed.
I mean, this will compete with scientists, do, which they tend
to get obsessed and they over optimize things, or
they start by optimizing them, and they over optimize.
So it's easy to like get really granular about this thing, but like the step from an
N squared to an N log N sorting algorithm is a big leap for most real world systems.
No matter what the actual behavior of the system
is, that's a big leap.
And the same can probably be said
for other kind of first leaps
that you would take on a particular problem.
Like it's the picking the low hanging fruit
or whatever the equivalent of doing the,
not the dumbest thing, but the next to the dumbest thing.
It's all picking the most delicious, reachable fruit. Yeah, most delicious, reachable fruit.
I don't know why that's not a saying. And yeah. Okay, so, so you then this is the 80s
and this kind of idea, especially percolate of learning. Yeah, I mean, at that point, I got to
mid-re, I got to meet Rich Sutton. So everything was sort of downhill from there. And that was,
that was really the pinnacle of everything.
But then I felt like I was kind of on the inside.
So then as interesting results were happening,
I could like check in with Rich,
or with Jerry Tissaro, who had a huge impact
on kind of early thinking in temple difference learning
and reinforcement learning and show that you could do.
You could solve problems that we didn't know how to solve any other way
And so that was really cool. So it was good things were happening
I would hear about it from either the people who were doing it or the people who were talking to the people who are doing it
And so I was able to track things pretty well through through the 90s
so what
Wasn't most of the excitement on reinforcement learning in the 90s era with what is the TD gamma like what's the role of these kind of little like fun
game playing things and breakthroughs about
Yeah, you know exciting the community was that like what what we're you're because you've also built across or we're part of building across
puzzle What were your, because you've also built a cross, or part of building a cross, or puzzle,
solver, solving program called Prover. So you were interested in this as a problem
like in forming, using games to understand
how to build intelligent systems.
So like what did you think about TD Gammon?
Like, what did you think about that whole thing in the 90s?
Yeah, I mean, I found the TD Gammon's result really just remarkable.
So I had known about some of Jerry's stuff before he did TD Gammon.
He did a system just more of vanilla, not entirely vanilla, but a more classical backpropie
kind of network for playing backgammon, where he was training it on expert moves.
So it was kind of supervised.
But the way that it worked was not to mimic the actions,
but to learn internally and evaluation function.
So to learn, well, if the expert chose this over this,
that must mean that the expert values this more than this.
And so let me adjust my weights to make it so that the network evaluates this as being
better than this.
So it could learn from human preferences, it could learn its own preferences.
And then when he took the step from that to actually doing it as a full-on reinforcement
learning problem where you didn't need a trainer, you could just let it play. That was remarkable, right?
And so I think as humans often do, as we've done in the recent past as well, people extrapolate
and it's like, oh, well, if you can do that, which is obviously very hard, then obviously
you could do all these other problems that we want to solve that we know we're also really
hard.
And it turned out very few of them ended up being practical,
partly because I think neural nets,
certainly at the time, were struggling to be consistent
and reliable.
And so training them in a reinforcement learning setting
was a bit of a mess.
I had, I don't know, generation after generation
of like master students who wanted to do value
function approximation, basically reinforcement learning with neural nets.
And over and over again, we were failing.
We couldn't get the good results that Jerry Tissarra got.
I now believe that Jerry is a neural net whisperer.
He has a particular ability to get neural networks
to do things that other people would find impossible.
And it's not the technology,
it's the technology and Jerry to gather.
Yeah, at which I think speaks to the role
of the human expert in the process of machine learning.
Right, it's so easy.
We're so drawn to the idea that it's the technology
that is where the power is coming from,
that I think we lose sight of the fact
that sometimes you need a really good,
and just like, I mean, no one would think,
hey, here's this great piece of software.
Here's like, I don't know, GNU E-Max or whatever.
And it doesn't that prove
that computers are super powerful
and basically gonna take over the world.
It's like, no, Stommern is the hell of a hacker, right?
So he was able to make the code do these amazing things.
He couldn't have done it without the computer,
but the computer couldn't have done it without him.
And so I think people discount the role of people like Jerry
who have just a particular set of skills.
On that topic, by the way, as a small side note,
I tweeted,
Emax is greater than VIM yesterday and deleted, deleted the tweet 10 minutes later
when I realized you were on fire. It started a war. I was like, oh, I was just kidding.
I was just being and I'm walking walk, walk, walk back and I saw people still feel
passionately about that,
particularly the piece of.
Yeah, I don't get that,
because E-Max is clearly so much better.
I don't understand.
But you know, why do I say that?
Because I, because like I spent a block of time
in the 80s, making my fingers know the E-Max keys,
and now like that's part of the thought process for me.
Like I need to express,
and if you take that,
if you take my E-max key bindings away,
I become lit.
Yeah.
Yeah.
I can't express myself.
I'm the same way with, I don't know if you know what it is,
but a Kinesis keyboard, which is the bud-shaped keyboard.
Yes, I've seen them.
Yeah.
And they're very, I don't know, sexy, elegant.
They're just beautiful.
Yeah, they're gorgeous, way too expensive.
But the problem with them, similar with E-Mex,
is once you learn to use it,
it's harder to use other things.
It's hard to use other things.
There's this absurd thing where I have like small, elegant,
lightweight, beautiful little laptops.
And I'm sitting there in a coffee shop with a giant,
can you just keep going and the sexy little laptop? It's absurd.
But it, you know, like I used to feel bad about it, but at the same time,
you just kind of have to sometimes it's back to the Billy Joel thing.
You said to throw that Billy Joel record and throw Taylor Swift and Justin Bieber to the wind.
So see, but I like them now because I,
because again, I have no musical taste.
Like now that I've heard Justin Bieber enough,
I like, I really like his songs and Taylor Swift.
Not only do I like her songs,
but my daughter's convinced that she's a genius.
And so now I basically have, I'm signed onto that.
So.
So yeah, that speaks to the back to the robustness
of the human brain.
That speaks to the newer plasticity that you can just, you can just like a mouse teach
yourself to a probably love dog, teach yourself to enjoy Taylor Swift. I'll try it out. I don't know.
I try, you know what it has to do with just like acclamation, right? Just like you said,
a couple of weeks. Yeah. That's an interesting experiment. I'll actually try that.
Like I'll listen to it. That wasn't the intent of the experiment, just like social media.
It wasn't intended as an experiment to see what we can take as a society, but it turned
out that way.
I don't think I'll be the same person on the other side of the week listening to Taylor
Swift, but let's try.
You know, it's more compartmentalized.
Don't be so worried.
Like, it's like, I get that you can be worried, but don't be so worried because we compartmentalized
really well.
And so it won't bleed into other parts of your life.
You won't start, I don't know,
wearing red lipstick or whatever.
Like it's fine, it's fine.
It's a change fashion or everything.
It's fun, but you know what,
the thing you have to watch out for
is you'll walk into a coffee shop
once we can do that again.
And recognize the song.
And you'll be, no, you won't know that you're singing along
until everybody in the coffee shop is looking at you.
And then you're like, that wasn't me.
Yeah, that's the, you know, people are afraid of AGI.
I'm afraid of the tail.
The Taylor Swift takeover.
Yeah.
And I mean, people should know that TD Gammon was, I get,
would you call it, do you like the terminology of self play?
But any chance?
Sure.
So like systems that learn by playing themselves,
just I don't know if it's the best word, but.
So what's the problem with that term?
Okay, so it's like of the big bang.
Like it's like talking to serious businesses.
Do you like the term big bang?
And when it was early, I feel like it's the early days
of self-play.
I don't know, maybe it was just previously,
but I think it's been used by only a small group of people.
And so I think we're still deciding,
is this ridiculously silly name, a good name
for the potentially one of the most important concepts
in artificial intelligence?
Well, okay, depends how broadly you apply the term.
So I used the term in my 1996 PhD dissertation.
Oh, wow, the actual term stuff.
Yeah, because because to Sorrow's paper was something like training up an expert
back at my player through self play. So I think it was in the title of his paper.
Oh, okay. If not in the title, it was definitely a term that he used.
There's another term that we got from that work is rollout. So I don't know if you do
you ever hear the term rollout. That's a backam in term that has now applied
generally in computers, well, at least in AI
because of TD gammon.
That's fascinating.
So how is self-play being used now?
And like why is it, does it feel like
a more general powerful concept?
It's sort of the idea of, well,
the machine just gonna teach itself to be smart.
Yeah, so that's where maybe you can correct me, but that's where, you know,
the continuation of the spirit and actually like literally the exact algorithms
of TD Gammon are applied by deep mind and open AI to learn games that are a little bit
more complex. That when I was learning artificial intelligence, go was presented to me
with artificial intelligence, the modern approach. I don't know if they explicitly pointed to go in those books as like unsolvable
kind of thing, like implying that these approaches hit their limit in this particular kind of
game. So something, I don't remember if the book said it or not, but something in my
head or if it was the professors instilled
in me the idea like, this is the limits of artificial intelligence of the field.
Like, it instilled in me the idea that if we can create a system that can solve the game
of go, we've achieved AGI.
That was kind of, I didn't explicitly like say this, but that was the feeling.
And so from, I was one of the people that it seemed magical
when a learning system was able to beat a human world champion at the game of go. And even more so
from that that was Alpha Go, even more so with Alpha Go zero than kind of renamed and advanced into Alpha Zero, beating a world champion or world class player
without any supervised learning on expert games. We're doing only through by playing itself. So that
is, I don't know what to make of it, I think it would be interesting to hear what your opinions are and just how
exciting, surprising, profound, interesting, or boring the breakthrough performance of Alpha
Zero was.
Okay, so AlphaGo knocked my socks off.
That was so remarkable.
Which aspect of it? They got it to work.
That they actually were able to leverage a whole bunch of different ideas, integrate them
into one giant system.
Just the software engineering aspect of it is mind blowing.
I've never been a part of a program as complicated as the program that they built for that.
And just like Jerry Tissaro is a neural net whisper,
like David Silver is a kind of neural net whisperer too.
He was able to coax these networks
and these new way out there architectures
to solve these problems that, as you say,
when we were learning from AI,
no one had an idea how to make it work.
It was remarkable that these techniques
that were so good at playing chess
and that could beat the world champion in chess
couldn't beat your typical go playing teenager and go.
So the fact that in a very short number of years
we kind of ramped up to trouncing people and go, just blew me away.
So you're kind of focusing on the engineering aspect, which is also very surprising.
I mean, there's something different about large well-funded companies.
I mean, there's a compute aspect to it too.
Sure.
Like, that's, of course, I mean, that's similar to D. Blue, right?
With, with IBM.
Like, there's something important to be learned and remember it about a large company,
taking the ideas that are already out there and investing a few million dollars into it,
or more.
And so you're kind of saying the engineering is kind of fascinating, both on the,
with AlphaGo is probably just gathering all the data, right, of the, of the expert games, like
organizing everything, actually doing distributed, supervised learning. And to me, see, the engineering
I kind of took for granted, to me philosophically being able to persist in the face of like long odds because it feels
like for me, I would be one of the skeptical people in the room thinking that you can learn
your way to beat go.
Like, it sounded like, especially with David Silver, it sounded like David was not confident
at all. So it was like not, it's funny how confidence works.
It's like you're not like cocky about it,
like but.
Right, because if you're cocky about it,
you kind of stop and stall and don't get anywhere.
But there's like a hope that's unbreakable.
Maybe that's better than confidence. It's a kind of wishful
Hope in a little dream and you almost don't want to do anything else. You kind of keep doing it
That's that seems to be the story and
But with enough skepticism that you're looking for where the problems are and fighting through them
Yeah, because you know there's got to be a way out of this thing. Yeah, and for him most probably there's
Yeah, because you know, there's got to be a way out of this thing. Yeah. And for him, it was probably there's
There's a bunch of little factors that come into play. It's funny how these stories just all come together like everything He did in his life
Came into play which is like a love for video games and also a connection to so the the 90s had to happen with TD gamin and so on
Yeah, in some ways it's surprising
Maybe you can provide some intuition to it that not
much more than TD Gammon was done for quite a long time on the reinforcement learning
front.
Is that weird to you?
I mean, like I said, the students who I worked with, we tried to get, basically apply that
architecture to other problems.
And we consistently failed.
There were a couple, a couple of really nice demonstrations
that ended up being in the literature.
There was a paper about controlling elevators, right?
Where it's like, OK, can we modify the heuristic
that elevators use for deciding, like a bank of elevator
for deciding which floors we should be stopping on
to maximize throughput, essentially?
And you can set that up as a reinforcement learning problem
and you can have a neural net represent the value function
so that it's taking where all the elevators,
where the button pushes, high-dimensional wealth,
at the time, high-dimensional input,
a couple dozen dimensions,
and turn that into a prediction as to,
oh, is it gonna be better if I stop at this floor or not?
And ultimately, it appeared as though for the
standard simulation distribution for people trying to leave the building at the end of the day,
that the neural net learned a better strategy than the standard one that's implemented in elevator
controllers. So that was nice. There was some work that Satinder Singh at all did on handoffs with cell phones, deciding when should you hand off from this cell
tower to this cell tower?
Oh, okay.
Communication, I don't know exactly.
Yeah, and so a couple things seemed like they were really promising.
None of them made it into production that I'm aware of.
And neural nets as a whole started to kind of implode around then.
And so there just wasn't a lot of air in the room
for people to try to figure out, okay, how do we get this to work in the RL setting?
And then they they found their way back in 10 plus years. So you said Alpha Go was impressive,
like it's a big spectacle. Is there right to the Alpha zero? So I think I may have a slightly
different opinion on this than some people. So I talked to Citinder saying in particular about this. So Citinder was like Rich Sutton
of student of Antibarto. So they came out of the same lab, very influential machine learning,
reinforcement learning researcher. Now it's deep-ined as just as is Rich,
though different sites, the two of them. He's in Alberta.
Rich is in Alberta and Sittender would be in England,
but I think he's in England from Michigan at the moment.
But he was, yes, he was much more impressed with AlphaGo Zero
which didn't get a kind of a bootstrap
in the beginning with human trained games.
Just was purely self-play.
Though the first one, AlphaGo, was also a tremendous amount of self-play.
They started off, they kick-started the action network that was making decisions, but then
they trained it for a really long time using more traditional temple-difference methods.
So as a result, it didn't seem that different to me.
It seems like, yeah, why wouldn't that work?
Like, once it works, it works.
So, but he found that removal of that extra information
to be breathtaking.
Like, that's a game changer.
To me, the first thing was more of a game changer.
But the open question, I mean, I guess that the assumption
is the expert games might contain within them a
human, that's not of information. But we know that it went beyond that, right? We know that it
somehow got away from that information because it was learning strategies. I don't think AlphaGo is
just better at implementing human strategies. I think it actually developed its own strategies that were more effective.
And so from that perspective,
okay, well, so it made at least one quantum leap
in terms of strategic knowledge.
Okay, so now maybe it makes three, like, okay,
but that first one is the doozy, right?
Getting it to work reliably
and for the networks to hold on to the value well enough.
That was a big step.
Well, maybe you could speak to this on the reinforcement learning front.
So, starting from scratch and learning to do something, like the first, like random
behavior to crappy behavior to like crappy behavior to like somewhat okay behavior.
It's not obvious to me that that's not like impossible to take those steps.
Like if you just think about the intuition, like how the hactor's random behavior become
somewhat basic intelligent behavior.
Not human level, not super human level, but just basic.
But you're saying to you kind of the intuition is like,
if you can go from human to super human level intelligence
on this particular task of game playing,
then you're good at taking leaps.
So you can take many of them.
That the system, I believe that the system
can take that kind of leap.
Yeah, and also I think that beginner knowledge in Go, like you can start to get a feel really
quickly for the idea that certain parts of the being and certain parts of the board
seems to be more associated with winning, right? Because it's not stumbling upon the concept
of winning. It's told that it wins or that it loses. Well, it's self play. So it both wins and loses.
It's told which which side won. And the information is kind of there to start
percolating around to make a difference as to, well, these things have a better chance of
helping you win. And these things have a worse chance of helping you win. And so, you know,
it can get to basic play, I think pretty quickly. Then once it has basic play, well,
now it's kind of forced to do some search to actually experiment with, okay, well, what
gets me that next increment of improvement? How far do you think, okay, this is where
you kind of bring up the Elon Musk and the Sam Harris' right? How far is your intuition
about these kinds of self-playing
mechanisms being able to take us?
Because it feels one of the ominous,
but stated calmly things that when I talk to David Silver,
he said is that they have not yet discovered
a ceiling for Alpha Zero, for example,
on the game of Go or chess.
Oh, it's okay. It keeps, no matter how much they compute,
they throw at it, it keeps improving.
So it's very possible that if you throw,
some 10X compute that it will improve by 5X
or something like that.
And when stated calmly, it's so like,
oh yeah, I guess so.
But like, and then you think like,
well, can we potentially have like,
continuations of Moore's law in totally different way,
like broadly defined Moore's law?
Not the cultural improvement.
Exponential improvement, like,
are we going to have an Alpha Zero that swallows the world?
But notice is not getting better at other things. It's getting better at go. Yeah, and I know I think it's a that's a big leap to say, okay, well therefore it's
Better at other things. Well, I mean the the question is how much of the game of life can be turned into right?
So that's of that I think is a really good question and I think that we don't I don't think we as a I don't know
Community really know that the answer to this but So that's of that I think is a really good question. And I think that we don't, I don't think we as a, I don't know, community
really know the answer to this.
But so, okay, so I went, I went to a talk
by some experts on computer chess.
So in particular, computer chess is really interesting
because for, you know, for, of course,
for a thousand years, humans were the best chess playing
things on the planet.
And then computers like edge-to-head of the best person playing things on the planet, and then computers, like
edge-to-head of the best person. And they've been ahead ever since. It's not like people
have overtaken computers. But computers and people together have overtaken computers.
Right. So at least last time I checked, I don't know what the very latest is, but last time
I checked that there were teams of people who could work with computer programs to defeat the best
computer programs.
In the game of chess.
In the game of chess.
Right.
And so using the information about how these things called Elo scores, this sort of notion
of how strong a player are you, there's kind of a range of possible scores. And the you increment
in score, basically, if you can beat another player of that lower score, 62% of the time,
or something like that, like there's some threshold of, if you can somewhat consistently beat someone,
then you are of a higher score than that person. And there's a question as to how many times can
you do that in chess, right? And so we know that there's a range of human ability levels that cap out with the best playing
humans. And the computers went a step beyond that. And computers and people together have
not gone, I think, a full step beyond that. It feels the estimates that they have is
that it's starting to asymptote that we've reached kind of the maximum, the best possible chess playing.
And so that means that there's kind of a finite strategic depth, right?
At some point, you just can't get any better at this game.
Yeah, I mean, I don't, so I'd like to check that.
I think it's interesting because if you have somebody like Magnus Carlson who's using these chess
programs to train his mind, like to learn about chess.
To become a better chess player, yeah.
And so like, that's a very interesting thing because we're not static creatures.
We're learning together.
I mean, just like we're talking about social networks, those algorithms are teaching us
just like we're teaching those algorithms.
So that's the fast thing you think. But I think the best of just playing programs are now better than the pairs.
Like they have competition between pairs, but the, it's still, even if they weren't,
it's an interesting question.
Where is the ceiling?
So the, the David, the ominous David silver kind of statement is like, we have not found
the ceiling.
Right. But the question is,
okay, so I don't know his analysis on that.
My, from talking to Go experts,
the depth, the strategic depth of Go
seems to be substantially greater than that of chess,
that there's more kind of steps of improvement
that you can make getting better and better and better.
But there's no reason to think that it's infinite.
Infinite, yeah. And so it could be that it's what David is seeing is a kind of asymptoting that you can
keep getting better, but with diminishing returns.
And at some point, you hit optimal play.
Like in theory, all these finite games, they're finite.
They have an optimal strategy.
There's a strategy that is the minimax optimal strategy.
And so at that point, you can't get any better. You can't beat that strategy. Now, that strategy
may be from an information processing perspective, intractable, right? You need all the situations
are sufficiently different that you can't compress it at all. It's this giant mess of
hard-coded rules. And we can never achieve that. But
that still puts a cap on how many levels of improvement that we can actually make.
But the thing about self-play is if you put it, although I don't like doing that, in
the broader category of self-supervised learning, is that it doesn't require too much or
any human input. Human labeling, yeah. Yeah, human label or just human effort the human involvement
Pass a certain point and the same thing you could argue is true for the
Recent breakthroughs in natural English processing with language models. Oh, this is how you get to GPT-3
Yeah, see how that did the that is a good good transition
Yeah, yeah, you're a practice practice for days leading up to this good now.
But like that's one of the questions is, can we find ways to formulate problems in this
world that are important to us humans, like more important than the game of chess, that
to which self-supervised kinds of approaches could be applied, whether it's self-play, for example,
for like maybe you could think of like autonomous vehicles in simulation, that kind of stuff,
or just robotics applications in simulation, or in the self-supervised learning, where un-annotated data or data that's generated by humans naturally without
extra cost like Wikipedia or like all of the internet can be used to learn something
about to create intelligent systems that do something really powerful that pass the touring test or that do some kind of super
human level performance.
So what's your intuition?
They're trying to stitch all of it together about our discussion of AGI, the limits of
self-play, and your thoughts about maybe the limits of neural networks in the context
of language models. Is there
some intuition in there that might be useful to think about?
Yeah, yeah. So first of all, the whole transformer network family of things is really cool.
It's really, really cool. I mean, if you've ever, back in the day, you played with, I don't
know, Markov models for generating text and you've seen, back in the day, you played with, I don't know, Markov models
for generating text and you've seen the kind of text that they spit out and you compare
it to what's happening now, it's amazing.
It's so amazing.
Now, it doesn't take very long interacting with one of these systems before you find the
holes, right?
It's not smart in any kind of general way. It's really good at a bunch of things. And it does
seem to understand a lot of the statistics of language extremely well. And that turns
out to be very powerful. You can answer many questions with that. But it doesn't make
it a good conversationalist, right? And it doesn't make it a good storyteller. It just makes
it good at imitating of things that are seen in the past.
The exact same thing could be said by people who voting for Donald Trump about Joe Biden
supporters and people voting for Joe Biden about Donald Trump supporters is, you know,
that they're not intelligent. They're just following the, yeah, they're following things they've
seen in the past. And so it's very, it doesn't take long to find the flaws in their, in their like natural language generation
abilities. Yes. Yeah. So we're being very interesting critical of ASS.
Right. So, so I've had a similar thought, which was that the stories that GPT-3 spits out are
amazing and very human like. And it doesn't mean that computers are smarter
than we realize necessarily. It partly means that people are dumber than we realize, or
that much of what we do day to day is not that deep. Like we're just, we're just kind of
going with the flow, we're saying whatever feels like the natural thing to say next, not a lot of it is creative or meaningful or intentional.
But enough is that we actually get by, right?
And we do come up with new ideas sometimes
and we do manage to talk each other into things sometimes
and we do sometimes vote for reasonable people sometimes.
But it's really hard to see in the statistics
because so much of what we're saying is kind
of wrote.
And so our metrics that we use to measure how these systems are doing don't reveal that
because it's in the interest this is that that is very hard to detect.
But is your, do you have an intuition that with these language models, if they grow in
size, it's already
surprising that when you go from GPT-2 to GPT-3, that there is a noticeable improvement.
So the question now goes back to the ominous David Silver and the ceiling.
Right.
So maybe there's just no ceiling.
We just need more compute.
Now, I mean, okay, so now I'm speculating.
Yes.
As opposed to before when I was completely on firm ground.
All right.
I don't believe that you can get something that really can do language and use language
as a thing that doesn't interact with people.
Like I think that it's not enough to just take everything that we've said written down
and just say, that's enough.
You can just learn from that and you can be intelligent.
I think you really need to be pushed back at.
I think that conversations, even people who are pretty smart,
maybe the smartest thing that we know,
maybe not the smartest thing we can imagine,
but we get so much benefit out of talking to each other
and interacting.
That's presumably why you have conversations live with guests
is that there's something in that interaction that would not be exposed by, oh, I'll just write you a story and then
you can read it later.
And I think because these systems are just learning from our stories, they're not learning
from being pushed back at by us, that they're fundamentally limited into what they could actually
become on this route.
They have to get shut down. We have down. Like we, we have to have
an argument that they have to have an argument with us and lose a couple times before they
start to realize, oh, okay, wait, there's some nuance here that actually matters.
Yeah, that's actually subtle sounding, but quite profound that the interaction with humans
is essential. And the limitation within that is profound as well,
because the time scale, like the bandwidth at which you can really interact with humans is very low.
So it's costly.
So you can't, one of the underlying things about self-plays,
it has to do, you know, a very large number of interactions.
And so you can't really deploy reinforced and learning systems has to do a very large number of interactions.
And so you can't really deploy a reinforcement learning systems
into the real world to interact.
Like you couldn't deploy a language model
into the real world to interact with humans
because it was just not get enough data
relative to the cost it takes to interact.
Like the time of humans is expensive.
Which is really interesting
That's the good that takes us back to reinforcement learning and trying to figure out if there's ways to
Make algorithms that are more efficient at learning keep the spirit and reinforcement learning and become more efficient
In some sense, this seems to be the goal
I'd love to hear what your thoughts are. I don't know if you got a chance to see it
The blog post
called Bit or Lesson.
Oh, yes.
But we're just something that makes an argument.
And hopefully I can summarize it.
Perhaps you can.
Yeah, but good.
Oh, good.
OK.
So I mean, I could try and you can correct me, which
is a, he makes an argument that it seems
if we look at the long arc of the history
of the artificial
intelligence field, it calls, you know, 70 years, that the algorithms from which we've
seen the biggest improvements in practice are the very simple, like, dumb algorithms that
are able to leverage computation.
And you just wait for the computation to improve.
Like all of academics and so on have fun
by finding all the tricks and congratulating themselves
on those tricks and sometimes those tricks can be like
big, they feel in the moment like big spikes
and breakthroughs, but in reality over the decades,
it's still the same dumb algorithm
that just waits for the computer to get faster and faster.
Do you find that to be an interesting argument,
I guess the entirety of the field of machine learning
as an academic discipline?
We're really just a subfield of computer architecture.
Yeah, we're just kind of waiting around
for them to do that.
We really don't want to do hardware work.
So like,
That's right, I really don't want to get on with
or procrastinating.
Yes, that's right.
Just waiting for them to do their job so that we can pretend to have done our.
Yeah, I mean, the argument reminds me a lot of, I think it was a Fred Jellinette quote,
early computational linguist who said, we're building these computational linguistic systems.
And every time we fire a linguist, performance goes up by 10 percent.
Something like that.
The idea of us building the knowledge in, in that case,
he was finding it to be much less successful than get rid of the people who know about language
from a kind of scholastic, academic kind of perspective, and replace them with more compute.
I think this is a modern version of that story, which is, okay, we want to
do better on machine vision.
You could build in all these motivated, part-based models that, you know, that just feel like
obviously the right thing that you have to have, or we can throw a lot of data at it and
guess what we're doing better with a lot of data at it and guess what we're doing better with it with a lot of you. So I hadn't thought about it until this moment in this way, but what I believe, well, I've
thought about what I believe, what I believe is that, you know, compositionality and what's the
right way to say it, the complexity grows rapidly as you consider more and more possibilities,
like explosively.
And so far, Moore's Law has also been growing
explosively, exponentially.
And so it really does seem like,
well, we don't have to think really hard
about the algorithm design or the way that we build the systems
because the best benefit we could get is exponential
and the best benefit that we can we could get is exponential and the best
benefit that we can get from waiting is exponential. So we can just wait.
It's got that's got to end right and there's hints now that that Moore's Law is
is starting to feel some friction. It's starting to the world is pushing back a little bit.
One thing I don't know do lots of people know that I didn't know this. I was I was trying to
One thing I don't know, lots of people know that I didn't know this. I was trying to write an essay.
And yeah, Moore's Law has been amazing and it's enabled all sorts of things, but there's
also a kind of counter Moore's Law, which is that the development cost for each successive
generation of chips also is doubling.
So it's costing twice as much money.
So the amount of development money per cycle or whatever is actually sort of constant. And at some point, we run out of money. So, or we have to come
up with an entirely different way of doing the development process. So, like, I guess
I always, always a bit skeptical of the look. It's an exponential curve. Therefore, it has
no end. Soon, the number of people going to NURPS will be greater than the population
of the Earth. That means we're going to discover life on other planets. No, it doesn't. It means that we're in a
sigmoid curve on the front half, which looks a lot like an exponential. The second half is going
to look a lot like diminishing returns. Yeah, I mean, but the interesting thing about Moore's law,
if you actually look at the technologies involved, it's hundreds,
not thousands of S curves stacked on top of each other.
It's not actually an exponential curve.
It's constant breakthroughs.
And then what becomes useful to think about, which is exactly what you're saying, the
cost of development, like the size of teams, the amount of resources that are invested in
continuing to find new Sves, new breakthroughs.
Yeah, it's an interesting idea. If we live in the moment, if we sit here today,
it seems to be the reasonable thing to say that exponentials end.
And yet, in the software realm, they just keep appearing to be happening. And it's so I mean it's so
hard to disagree with Elon Musk on this because it like I've you know I used to be one of those folks
I'm still one of those folks I studied at Thomas. This is what I worked on. And it's like,
you look at Elon Musk saying about Thomas vehicles, well obviously in a couple of years or
in a year or next month, we'll have fully autonomous vehicles. There's no reason why we can't.
Driving is pretty simple. It's just a learning problem and you just need to convert
all the driving that we're doing into data and just having you all know
what to train some of that data.
And like we use only our eyes,
so you can use cameras and you can train on it.
And it's like, yeah, that's what,
that should work.
And then you put that hat on like the philosophical hat
and put then you put the pragmatic hat
and it's like, this is what the flaws of computer vision are like this would it means to change scale
and then you you put the human factors the psychology hat on which is like
it's actually driving us a lot the cognitive science or cognitive whatever the
heck you call it is it's really hard it's much harder to drive than then we
realize there's much larger number of edge cases so building up an intuition around this is
It's around exponentials really difficult and on top of that the pandemic is making us think about exponentials
Make making us realize that like we don't understand anything about it. We're not able to intuit exponentials
We're either ultra terrified some part of the population and some part is like
the opposite of whatever the different carefree and
We're not managing it. Well, I say blasé. Well, wow, that's a French
So it's an accent. So it's a it's fascinating to think what what the limits of this exponential
growth of technology and I just more as law it's technology how that
rubs up against the bitter lesson and GPT-3 and self-playing mechanisms.
That's not obvious.
I used to be much more skeptical about neural networks.
Now, at least give us slither of possibility that will be all,
there will be very much surprised and also caught in a way that like we are not prepared for.
Like in applications of social networks, for example,
because it feels like really good transformer models
that are able to do some kind of,
like very good natural language generation
are the same kind of models that could be used
to learn human behavior
and then manipulate that human behavior to gain advertisers dollars and all those kinds of things
if you the capitalist system and they arguably already are manipulating human behavior.
Yeah. Yeah. So, but not for self-preservation, which I think is a big, that would be a big step.
Like, if they were trying to manipulate us to convince us not to shut them off, I would
be very freaked out.
But I don't see a path to that from where we are now.
They don't have any of those abilities.
That's not what they're trying to do.
They're trying to keep people on the site.
But see the thing is, this is the thing about life on earth,
is they might be borrowing our consciousness
and sentience.
Like, so in the sense they do,
because the creators of the algorithms have,
like, they're not, you know, if you look at our body,
we're not a single organism,
we're a huge number of organisms with tiny little motivations,
we're built on top of each other.
In the same sense, the AI algorithms that are not...
They're not like...
It's a system that includes human companies and corporations.
Right?
Because corporations are funny organisms in and of themselves that really do seem to have
self-preservation built in.
And I think that's at the design level.
I think they're designed to have self-preservation be a focus.
So you're right.
In that broader system that we're also a part of and can have some influence on, it is
much more complicated, much more powerful.
Yeah, I agree with that.
So people really love it when I ask, what three books, technical, philosophical,
fiction, how to big impact in your life,
maybe you can recommend.
We went with movies, we went with Billy Joe
and forgot what music you recommended, but...
I didn't, I just said I have no taste in music.
I just like pop music.
That was actually really skillful
the way you avoided that question.
I'm gonna try to do the same with the books. So do you have a skillful the way you have voted that question. I was, I'm going to try to do the same with the books.
So do you have a skillful way to avoid answering the question about three books?
You would recommend.
I'd like to tell you a story.
So my first job out of college was at Belkhar, I mentioned that before, where I
worked with Dave Ackley.
The head of the group was a guy named Tom Landauer.
And I don't know how well known he's known now,
but arguably he's the inventor and the first proselytizer of word embeddings. So they developed a system shortly before I got to the group.
Yeah, that's called latent semantic analysis that would take words of English and embed them in
multi-hundred dimensional space, and then use that as a way of
assessing similarity and basically doing reinforcement learning, I'm not sorry, not reinforcement,
information retrieval, sort of pre-Google information retrieval. And he was trained as an anthropologist,
but then became a cognitive scientist. I was in the cognitive science research group. Like I said,
I'm a cognitive science groupie. At the time, I thought I'd become a cognitive scientist, but then it became a cognitive scientist. I was in the cognitive science research group. Like I said, I'm a cognitive science groupie.
At the time, I thought I'd become a cognitive scientist,
but then I realized in that group,
no, I'm a computer scientist,
but I'm a computer scientist who really loves
to hang out with cognitive scientists.
And he said, he studied language acquisition in particular.
He said humans have about this number of words of vocabulary,
and most of that is learned from reading. And I said, that know, humans have about this number of words of vocabulary, and most of that
is learned from reading.
And I said, that can't be true, because I have a really big vocabulary, and I don't read.
He's like, you must.
I'm like, I don't think I do.
I mean, like, stop signs.
I definitely read stop signs.
But like, reading books is not a thing that I do.
You really though?
It might be just, maybe the red color.
Do I read stop signs? No, it's just pattern recognition at this point. I don't though it might be just maybe the red color. Do I read stop signs?
No, it's just pattern recognition at this point.
I don't sound it out.
I do.
I wonder what that.
Oh, yeah, stop the guns.
So that's fascinating.
So you don't.
So I don't read very, I mean, obviously I read and I've read,
I've read plenty of books.
But like some people like Charles, my friend Charles,
and others, like a lot of people in my field,
a lot of academics, like reading was really a central topic
to them in development.
And I'm not that guy.
In fact, I used to joke that when I got into college,
that it was on kind of a, help out the illiterate kind of program
because I got to, like in my house
I wasn't particularly bad or good reader,
but when I got to college,
I was surrounded by these people
that were just voracious in their reading appetite.
And they were like, have you read this, have you read this,
have you read this, and I'd be like,
no, I'm clearly not qualified to be at this school.
Like there's no way I should be here.
Now I've discovered books on tape, like audiobooks, and so I'm much better. I'm more caught up. I read a lot of books.
The small tangent on that, it is a fascinating open question to me on the topic of driving.
Whether, you know, supervise learning people, machine learning people think you have to like drive to learn how to drive.
To me, it's very possible that just by us humans, by first of all, walking, but also by watching other people drive, not even being inside cars as a passenger, but let's say being inside the cars of passenger.
Yeah. cars a passenger, but even just like being up a pedestrian and crossing the road, you learn so much
about driving from that. It's very possible that you
can without ever being inside of a car be okay at
driving once you get in it, or like watching a movie, for
example, I don't know, something like that.
Have you have you taught anyone to drive? No. So I have and I learned a lot about car driving because my wife doesn't want to be
the one in the car while they're learning. So that's my job. So I sit in the passenger seat
and it's really scary. I have wishes to live and they're figuring things out. Now, they start off very much better than I imagine
like a neural network would, right?
They get that they're seeing the world,
they get that there's a road that they're trying to be on,
they get that there's a relationship
between the angle of the steering,
but it takes a while to not be very jerky.
And so that happens pretty quickly.
Like the ability to stay in lane at speed,
that happens relatively fast.
It's not zero shot learning, but it's pretty fast.
The thing that's remarkably hard,
and this is, I think, partly why self-driving cars
are really hard, is the degree to which driving
is a social interaction activity.
And that blew me away.
I was completely unaware of it
until I watched my son learning to drive.
And I was realizing that he was sending signals to all the cars around him. And those in his case,
he's always had social communication challenges. He was sending very mixed, confusing signals to the
other cars. And that was causing the other cars to drive weirdly and erratically. And there was no question in my mind that he would have an accident because they didn't know how
to read him. There's things you do with the speed that you drive, the positioning of your car
that you're constantly like in the head of the other drivers and seeing him not knowing how to do
that and having to be taught explicitly. okay, you have to be thinking about
what the other driver is thinking was a revelation to me.
I was stunned.
So creating kind of theories of mind of the other.
The reason mind of the other cars.
Yeah, yeah, which I just hadn't heard discussed
in the self-driving car talks that I've been to.
Since then, there's some people who do consider
those kinds of issues, but it's way more subtle
than I think.
There's a little bit of work involved with that
when you realize, like when you especially focus
not on other cars, but on pedestrians, for example,
it's literally staring you in the face.
Yeah, yeah, yeah.
So then when you're just like, how do I interact
with pedestrians?
You have pedestrians pedestrians you're practically
talking to an octopus at that point. They've got all these weird degrees of freedom. You don't
know what they're going to do. They can turn around any second.
What the point is we humans know what they're going to do. Like we have a good theory of
mind, we have a good mental model of what they're doing and we have a good model of the model
that have a view and the model of the model of the model, like, they're able to kind of reason about this kind of the social, like,
game of it.
All the hope is that it's quite simple, actually, that it could be learned.
That's what I just talked to the Waymo, I don't know if you know,
that company, it's Google sells Africa.
They, I talked to their CTO, but podcast. They eroded their car and it's
quite aggressive and it's quite fast and it's good. It also just like Tesla, Wayne will
make me change my mind about maybe driving is easier than I essentially. Maybe it's a speciest argument.
Yes, I don't know.
But it's fascinating to think about like the same
as with reading, which I think you just said,
you avoided the question,
I still hope you answered somewhat.
You avoided it brilliantly.
It is, there's blind spots as artificial intelligence,
that artificial intelligence
researches have about what it actually takes to learn to solve a problem.
That's a fact.
You had Anka Dragan on you.
Yeah.
Okay.
She's one of my favorites.
So much energy.
She's right.
Oh, she's amazing.
Fantastic.
And in particular, she thinks a lot about this kind of I know that you know that I know
kind of planning. And the last
time I spoke with her, she was very articulate about the ways in which self-driving cars
are not solved, like what's still really, really hard.
But even her intuition is limited. Like, we're all like new to this. So in some sense, the
almost approach of being ultra confident and just like plowing it out there, putting it out there. Like some people say it's reckless and dangerous and so on,
but like partly it seems to be one of the only ways
to make progress in artificial intelligence.
So it's, you know, these are difficult things,
you know, democracy is messy, implementation
of artificial intelligence systems in the real world is messy.
So many years ago, before self-driving cars
were an actual thing you could have a discussion about,
somebody asked me, like, what if we could use
that robotic technology and use it to drive cars around?
Like, aren't people gonna be killed
and then it's, you know, blah, blah, blah.
And like, that's not what's gonna happen.
I said, with confidence, incorrectly, obviously.
What I think is gonna happen is we're gonna have a more, like, a very gradual kind of rollout where people have these cars in, like, closed communities, right, where
it's somewhat realistic, but it's still in a box, right, so that we can really get a
sense of what are the weird things that can happen?
How do we, how do we have to change the way we behave
around these vehicles?
Like it's obviously requires a kind of co-evolution
that you can't just plop them in and see what happens.
But of course, we're basically popping them
in and see what happens.
So I was wrong, but I do think
that would have been a better plan.
So that's, but your intuition, that's funny.
Just zooming out and looking at the forces of capitalism.
And it seems that capitalism rewards risk takers and rewards and punishes risk takers.
And like, try it out.
The academic approach to let's try a small thing and try to understand slowly the fundamentals of the problem.
And let's start with one and do two and then see that and then do the three.
You know, the capitalists like startup entrepreneurial dream is let's build a thousand and let's
write at 500 of them fail, but whatever the other 500 we learned from them.
But if you're good enough, I mean, one thing,
it's like your intuition would say,
like, that's gonna be hugely destructive to everything.
But actually, it's kind of the forces of capitalism,
like people are quite, it's easy to be critical,
but if you actually look at the data,
at the way our world is progressed
in terms of the quality of life,
it seems like the competent good people rise to the top. This is coming from me of the quality of life, it seems that the competent good people
rise to the top. This is coming from me from the Soviet Union and so on. It's interesting
that somebody like Elon Musk is the way you push progress and artificial intelligence.
It's forcing way more to step this, there's stuff up and Waymo is forcing Elon Musk
to step up, it's fascinating.
Hey, because I have this tension in my heart
and just being upset by the lack of progress
in autonomous vehicles within academia.
So there's huge progress in the early days of the DARPA challenges. And
then it just kind of stopped like at MIT, but it's true everywhere else with an exception
of a few sponsors here and there is like, it's not seen as a sexy problem. Thomas, like
the moment artificial intelligence starts approaching the problems of the real world,
like academics kind of like, eh, all right, let the cops get really hard in a different way.
In a different way.
And that's right.
I think, yeah, right.
Some of us are not excited about that other way.
But I still think there's fundamental problems to be solved in those difficult things.
It's not, it's still publishable, I think.
We just need to, it's the same criticism
you could have of all these conferences in Europe,
the CVPR, where application papers
are often as powerful and as important as theory paper.
Even like theory just seems much more respectable
and so on.
I mean, machine learning can is changing that a little bit.
I mean, at least in statements, but it's still not seen as the sexiest of pursuits,
which is like, how do I actually make this thing work in practice as opposed to on this toy dataset?
All that to say, are you still avoiding the three books question?
Is there something on audio book that you can recommend?
Oh, I've, yeah, I mean, I've read a lot of really fun stuff.
In terms of books that I find myself thinking back on that I read a while ago, like that
have stood the test of time to some degree, I find myself thinking of program or B programed
a lot by Douglas Rush, which was, it basically put out the premise
that we all need to become programmers
in one form or another.
And it was in analogy to, once upon a time,
we all had to become readers.
We had to become literate.
And there was a time before that
when not everybody was literate,
but once literacy was possible,
the people who were literate had more of a say in society than the people who weren't.
And so we made a big effort to get everybody up to speed.
And now it's, it's not 100% universal, but it's quite widespread.
The assumption is generally that people can read.
The analogy that he makes is that programming is a similar kind of thing that that
We need to have a say in right so being a reader being literate being a reader means that you're you can receive all this information But you don't get to put it out there and programming is the way that we get to put it out there
And that was the argument that he made I think he specifically has now
Backed away from this idea. He doesn't think it's happening quite
this way. And that might be true that it didn't society didn't sort of play forward quite
that way. I still believe in the premise. I still believe that at some point we have
the relationship that we have to these machines and these networks has to be one of each individual
can has the wherewithal to make the machines help them. Do the things that
that person once done. And as you know, I software people, we know how to do that. And we have a
problem. We're like, okay, I'll just, I'll hack up a pro script or something and make it so.
If we lived in a world where everybody could do that, that would be a better world. And computers
would be have, I think, less sway over us. And other people's software would
have less sway over us as a group. Yeah, in some sense, software engineering programming is power.
It's programming is power, right? Yeah, it's like magic. It's like magic spells. And it's not out of
reach of everyone. But at the moment, it's just a sliver of the population who can, who can commune with machines in this way. So I don't know, so that book had a big, big impact
on me. Currently, I'm, I'm reading the alignment problem actually by Brian Christian. So I don't
know if you've seen this out there yet.
Is this similar to your Russell's work with the control problem?
It's in that same general neighborhood. I mean, they take, they have different emphases
that they're, they're concentrating on. I think, I think Stuart's book did a remarkably good job, like a, just
a celebratory good job at describing AI technology and sort of how it works. I thought that was
great. It was really cool to see that in a book.
Yeah. I think he has some experience writing some books.
That's, you know, that's probably a possible thing. He's maybe thought a thing or two
about how to explain AI to people. Yeah, that's a really good point. This book so far has been
remarkably good at telling the story of the sort of the history, the recent history of some of
the things that have happened. I'm in the first third. He said, the book is in three thirds. The
first third is essentially
AI fairness and implications of AI on society that we're seeing right now.
And that's been great. I mean, he's telling those stories really well. He went out and talked to the frontline people whose names were associated with some of these ideas and it's been
terrific. He says, the second half of the book is on reinforcement learning. So maybe that'll be fun.
He says the second half of the book is on reinforcement learning. So maybe that'll be fun.
And then the third half, third third, is on the super intelligent alignment problem.
And I suspect that that part will be less fun for me to read.
Yeah, it's an interesting problem to talk about.
I find it to be the most interesting, just like thinking about whether we live in a simulation as a thought experiment to think about our
own existence.
So, in the same way, talking about alignment problem with AGI is a good way to think,
similarly, like the trolley problem with autonomous vehicles.
It's a useless thing for engineering, but it's a nice little thought experiment for actually
thinking about what are our own human ethical systems, our moral systems to, to, to, uh, by thinking how
we engineer these things, you start to understand yourself.
So sci-fi can be good at that too.
So one sci-fi book to recommend is, um, exhalations by Ted Chang, bunch of short stories.
This Ted Chang is the guy who wrote the short story
that became the movie Arrival.
And all of his stories, just from a,
he was a computer scientist,
actually he studied at Brown.
They all have this sort of really insightful bit
of science or computer science that drives them.
And so it's just a romp, right,
to just like he creates these artificial worlds with these by extrapolating on these ideas that
that we know about, but hadn't really thought through to this kind of conclusion. And so his stuff
is it's really fun to read mind-warping. So I'm not sure if you're familiar. I seem to mention this every other word is I'm from the Soviet Union and I'm Russian. You mentioned that I think somewhere in the
conversation you mentioned that you don't really pretty much like dying. I forget in which context
it might have been a reinforcement learning perspective. I don't know. I know, you know what it
was? It was in teaching my kids to drive. That's how you face your mortality. Yes. From a human
beings perspective or from a reinforcement learning researcher's
perspective, let me ask you the most absurd question. What's what do you think is the meaning
of this whole thing? The meaning of life on this spinning rock.
I mean, I think reinforcement learning researchers maybe think about this from a science perspective
more often than a lot of other people, right? As a supervised learning person, you're probably not thinking about the sweep of a
lifetime, but reinforcement learning agents are having little lifetimes, little weird little
lifetimes, and it's hard not to project yourself into their world sometimes. But, you know, as far as
the meaning of life, so when I turned 42, you may know from, that's
a book I read, the...
The Chisharco Taxi.
The Chisharco Taxi.
The guy that's the galaxy.
That is the meaning of life.
So when I turned 42, I had a meaning of life party where I invited people over and everyone
shared their meaning of life.
We had slides made up and so we had so we all sat down and did a slide presentation
to each other about the meaning of life.
And mine was balance.
I think that life is balance.
And so the activity at the party, for 42-year-old,
maybe this is a little bit non-standard.
But I found all the little toys and devices
that I had where you had to balance on them. You had to like stand on it and balance or pogo stick I brought,
a rip stick, which is like a weird two-wheeled skateboard. I got a unicycle, but I didn't know how
to do it. I now can do it. You love watching you try. Yeah, I'll show you a video. I'm not great, but I managed.
And so balance, yeah.
So my wife has a really good one that she sticks to
and is probably pretty accurate.
And it has to do with healthy relationships
with people that you love and working hard for good causes.
But to me, yeah, balance, balance in a word. That's,
that works for me. Not too much of anything, because too much of anything is iffy.
That feels like a Rolling Stone song. I feel like they must be.
You can't always get what you want, but if you try sometimes, you can strike a balance.
Yeah, I think that's how it goes. My close.
I'll write you a parody. It's a huge honor to talk to you. This is really fun.
I've been a big fan of yours. So can't wait to see what you do next in the world of education,
in the world of parody, in the world of reinforcement learning. Thanks for talking today.
My pleasure. Thank you for listening to this conversation with Michael Littman,
and thank you to our sponsors.
Simply safe, a home security company I use to monitor
and protect my apartment, express VPN.
The VPN I've used for many years
to protect my privacy on the internet, masterclass,
online courses that I enjoy
from some of the most amazing humans in history,
and better help.
Online therapy with a licensed professional.
Please check out these sponsors in the description to get a discount and to support this podcast.
If you enjoy this thing, subscribe on YouTube, review it with fast-darging up a podcast,
follow on Spotify, support it on Patreon, or connect with me on Twitter and Lex Friedman.
And now let me leave you some words from Groucho Marx.
If you're not having fun, you're doing something wrong.
Thank you.