Lex Fridman Podcast - #81 – Anca Dragan: Human-Robot Interaction and Reward Engineering
Episode Date: March 19, 2020Anca Dragan is a professor at Berkeley, working on human-robot interaction -- algorithms that look beyond the robot's function in isolation, and generate robot behavior that accounts for interaction a...nd coordination with human beings. Support this podcast by supporting the sponsors and using the special code: - Download Cash App on the App Store or Google Play & use code "LexPodcast" EPISODE LINKS: Anca's Twitter: https://twitter.com/ancadianadragan Anca's Website: https://people.eecs.berkeley.edu/~anca/ This conversation is part of the Artificial Intelligence podcast. If you would like to get more information about this podcast go to https://lexfridman.com/ai or connect with @lexfridman on Twitter, LinkedIn, Facebook, Medium, or YouTube where you can watch the video versions of these conversations. If you enjoy the podcast, please rate it 5 stars on Apple Podcasts, follow on Spotify, or support it on Patreon. Here's the outline of the episode. On some podcast players you should be able to click the timestamp to jump to that time. OUTLINE: 00:00 - Introduction 02:26 - Interest in robotics 05:32 - Computer science 07:32 - Favorite robot 13:25 - How difficult is human-robot interaction? 32:01 - HRI application domains 34:24 - Optimizing the beliefs of humans 45:59 - Difficulty of driving when humans are involved 1:05:02 - Semi-autonomous driving 1:10:39 - How do we specify good rewards? 1:17:30 - Leaked information from human behavior 1:21:59 - Three laws of robotics 1:26:31 - Book recommendation 1:29:02 - If a doctor gave you 5 years to live... 1:32:48 - Small act of kindness 1:34:31 - Meaning of life
Transcript
Discussion (0)
The following is a conversation with Enka Jogan, a professor at Berkeley working on human
robot interaction.
Algorithms that look beyond the robot's function in isolation and generate robot behavior
that accounts for interaction and coordination with human beings.
She also consults at Waymo, the autonomous vehicle company, but in this conversation, she
is 100% wearing her Berkeley hat.
She's one of the most brilliant and fun
Robot assists in the world to talk with. I had a tough and crazy day leading up to this conversation, so I was
a bit tired
even more so than usual
but almost immediately as she walked in her her energy, passion, and excitement for
human-robotic interaction was contagious, so I had a lot of fun and really enjoyed this conversation.
This is the Artificial Intelligence Podcast. If you enjoy it, subscribe on YouTube,
review it with 5 stars and Apple podcasts, support it on Patreon, or simply connect with me on Twitter, Alex Friedman spelled F-R-I-D-M-A-N.
As usual, I'll do one or two minutes of ads now and never any ads in the middle that
can break the flow of the conversation.
I hope that works for you and doesn't hurt the listening experience.
This show is presented by CashApp, the number one finance app in the App Store.
When you get it, use code Lex Podcast.
CashApp lets you send money to friends by Bitcoin and invest in the stock market with
as little as one dollar.
Since CashApp does fractional share trading, let me mention that the order execution algorithm
that works behind the scenes to create the abstraction of fractional orders is an algorithmic marvel.
So big props to the cash app engineers for solving a hard problem that in the end provides
an easy interface that takes a step up to the next layer of abstraction over the stock market,
making trading more accessible for new investors and diversification much easier.
So again, if you get cash out from the App Store or Google Play and use the code LX Podcast,
you get $10 and cash app will also donate $10 to the first, an organization that is helping
to advance robotics and STEM education for young people around the world.
And now here's my conversation with Enka Drogon. When did you first fall in love with robotics?
I think it was a very gradual process and it was somewhat accidental actually because
I first started getting into programming when I was a kid and then into math and then
into computer science was the thing I was going to do.
And then in college I got into AI.
And then I applied to the robotics institute at Carnegie Mellon.
And I was coming from this little school in Germany that nobody had heard of.
But I had spent an exchange semester at Carnegie Mellon.
So I had letters from Carnegie Mellon.
So that was the only, you know, MIT said no, Berkeley said no, Stanford said no.
That was the only place I got into.
So I went there to the robotics institute.
And I thought that robotics is a really cool way
to actually apply the stuff that I knew and love,
like optimization.
So that's how I got into robotics.
I have a better story how I got into cars, which is,
I used to do mostly manipulation in
my PhD, but now I do kind of a bit of everything application-wise, including cars.
And I got into cars because I was here in Berkeley while I was a PhD student still for RSS
2014, Peter Biel organized it.
And he arranged for, it was Google at the time to give us rides and self-driving cars.
And I was in a robot.
And it was just making decision after decision the right call.
And it was so amazing.
So it was all different experience, right?
It just, I mean, manipulation is so hard.
You can't do anything. And there it was. Was that the, right? Just I mean manipulation is so hard you can't do anything and there it was.
Was it the most magical robot you've ever met?
So like for me to meet Google software and car for the first time was like a transformative moment.
Like I had two moments like that that and spot mini. I don't know if you met spot mini for Boston Dynamics.
I felt like I felt like I fell in love for something like it.
Because I thought I know how a spot many works, right? It's just, I mean, there's nothing truly special. It's great
engineering work, but the anthropomorphism that went on into my brain that came to life, like
it had a little arm and it like, and looked at me. He, she looked at me. You know, I don't know,
there's a magical connection
there and it made me realize, wow robots can be so much more than things that manipulate
objects, they can be things that have a human connection.
Do you have, was the self-driving car at the moment, like, was there a robot that truly
sort of inspired you?
That was, I remember that experience very viscerally, riding in that car and being just
wowed. They gave us a sticker that said I rode in a self-driving car and it had this cute little
firefly on and or logo or something. Oh, that was like the smaller one, like the buyfly. Yeah,
the really cute one. Yeah. And I put it on my laptop and I had that
for years until I finally changed my laptop out and you know. What about if we walk back, you mentioned
optimization, like what beautiful ideas inspired you in math, computer science early on? Like why
get into this field? Seems like a cold and boring field of math. Like, what was exciting to you about it?
The thing is I liked math from very early on, from fifth grade.
Is when I got into the Math Olympiad and all of that.
Oh, you competed too.
Yeah.
This, it Romania is like our national sport.
You got to understand.
So I got into that fairly early and, and it was a little,
maybe two just theory would know kind of, I didn't kind of So I got into that fairly early and it was a little, maybe,
too just theory would know kind of, I didn't kind of how to, didn't really have a goal.
And I didn't understand, which was cool.
I always liked learning and understanding,
but there was no, what am I applying this understanding to?
And so I think that's how I got into,
more heavily into computer science,
because it was kind of math meets
something you can do tangibly in the world.
Do you remember the first program you've written?
Okay, the first program I've written with, I kind of do, it was in Q-Basic and fourth grade.
Wow.
And it was drawing like a circle.
A graphic.
Yeah. And he was drawing like a circle. Yeah, I don't know how to do that anymore, but in first grade.
That's the first thing that they taught me.
I was like, you could take a special, I wouldn't say it was an extracurricular.
It's in the sense an extracurricular.
So you could sign up for dance or music or programming.
And I did the programming thing and my mom was like, what?
Why?
Did you compete in program?
Like, these days, Romania probably,
that's like a big thing.
There's a program in competitions.
Was that, did that touch you at all?
I did a little bit of the computer science Olympian,
but not as serious as I did the math Olympian.
So it was programming.
Yeah, it's basically, here's a hard math problem,
solve it with a computer, it's kind of the deal. Okay. It's more like algorithm. Exactly.
It's out where's algorithmic. So, like you kind of mentioned the Google
self-driving car, but outside of that, oh, what's like who or what is your favorite robot?
Real or fictional that like captivated your imagination throughout. I mean I
guess you kind of alluded to the Google sells right the Firefly was a magical
moment but there's something else. It was on the Firefly there. I think there
was the Lexus by the way. It was back back then. But yeah so good question. I Okay, my favorite fictional robot is Wally.
And I love how amazingly expressive it is.
I'm personally thinking a little bit
about expressive motion kinds of things you're saying with.
You can do this and it's ahead and it's a manipulator
and what does it all mean?
I like to think about that stuff.
I love Pixar, I love animation.
I love how it's-
Wally has two big eyes, I think, or no.
Yeah, it has these cameras, and they move.
So, yeah, it goes woo, and then it's super cute.
Yeah, it's, you know, the way it moves,
it's just so expressive, the timing of that motion,
what it's doing with its arms,
and what it's doing with these lenses is amazing.
And so I've really liked that from the start.
And then on top of that, sometimes I share this,
it's a person's story I share with people
or when I teach about AI or whatnot.
My husband proposed to me by building a walley
and he actuated it.
So it 70 degrees of freedom and cruding the lens thing
and it kind of came in and it had the,
he made it have like a, you know,
the belly box opening thing.
So it just did that and then it spewed out this box
made out of Legos that opens slowly and then bam.
No.
Yeah.
Yeah.
It was quite, it's set a bar.
I could be like the most impressive thing I've ever heard.
Okay.
That was special connection to Wally, long story short.
I like Wally because I like animation and I like robots and I like, you know, the fact
that we still have this robot to this day.
What, how hard is that problem? Do you think of the expressivity of robots? Like, the,
with the Boston Dynamics, I never talked to those folks about this particular element.
I've talked to them a lot, but it seems to be like almost an accidental side effect for them.
That they weren't, I don't know if they're faking it.
They weren't trying to.
Okay.
They do say that the gripper on it was not intended to be a face.
I don't know if that's an honest statement, but I think they're legitimate.
Probably.
And, but so do we automatically just anthropomorphize any something,
anything we can see about a robot?
So like the question is how hard is it
to create a walletype robot that connects so deeply
with us humans?
What do you think?
It's really hard, right?
So it depends on what setting.
So if you wanna do it in this very particular narrow setting
where it does only one thing and it's expressive,
then you can get an animator. You can have fixer on call, come in, design some trajectories.
There was an Anki had a robot called Cosmo where they put in some of these animations.
That part is easy, right? The hard part is doing it not via these kind of handcrafted
behaviors, but doing it generally, autonomously.
I kind of want robots.
I don't work on, just to clarify,
I don't, I used to work a lot on this.
I don't work on that quite as much these days,
but the notion of having robots that,
when they pick something up and put it in a place,
they can do that with various forms of style or you
can say, well, this robot is succeeding at this task and is confident versus its hesitant
versus maybe it's happy or it's disappointed about something, some failure that it had.
I think that when robots move, they can communicate so much about internal states or perceived internal states that they have.
And I think that's really useful in an element that we'll want in the future because I was reading
this article about how kids are being rude to Alexa because they can be rude to it and it doesn't really get angry,
right?
It doesn't reply anyway.
It just says the same thing.
So, I think there's at least for that, for the correct development of children, for
that these things, you kind of react differently.
I also think that you walk in your home and you have a personal robot.
And if you're really pissed, presumably robot should behave slightly differently than when
you're super happy and excited.
But it's really hard because it's, I don't know, the way I would think about it and the
way I thought about it when it came to expressing goals or intentions for robots, it's, well,
what's really happening is that instead of doing robotics,
where you have your state, then you have your action space, and you have your space, the
reward function that you're trying to optimize.
Now you kind of have to expand the notion of state to include this human internal state.
What is the person actually perceiving?
What do they think about the robots,
something's more rather,
and then you have to optimize in that system.
And so that means they have to understand how your motion,
your actions end up sort of influencing the observers,
kind of perception of you.
And it's very hard to write math about that.
Right.
So when you start to think about incorporating the human into the state model,
apologize for the philosophical question, but how complicated are human beings do you think?
Like, can they be reduced to a kind of almost like an object that moves and maybe has some basic intents, or is there something, do we have to model
things like mood and general aggressiveness and I mean all these kinds of human qualities
are like game theoretic qualities like what's your sense?
How complicated is?
How hard is the problem of human robot interaction?
Yeah.
Should we talk about what the problem of human robot interaction is?
Yeah, this is what it is. And then, and then. And then talk about what the problem of human robot interaction is? Yeah, this is what and then and then and then talk about how that yeah, so
and by the way, I'm going to talk about this very particular view of human robot
interaction, right, which is not so much on the social side or on the side of
how do you have a good conversation with the robot? What should the robot's
appearance be? It turns out that she make robots taller
versus shorter, this has an effect on how people act with them.
So I'm not talking about that.
But I'm talking about this very kind of narrow thing,
which is you take, if you want to take a task,
that a robot can do in isolation,
in a lab out there in the world, but in isolation.
And now you're asking what does it mean for the robot
to be able to do this task for presumably what it's actually
and goal is, which is to help some person.
That ends up changing the problem in two ways.
The first way it changes the problem is that the robot is no longer
the single agent acting.
Do you have humans who also take actions in that same space? You know, cars navigate around people, robots around the office, navigate around the people in that office. If I send the robot to over
there in the cafeteria to get me a coffee, then there's probably other people reaching for stuff
in the same space. And so now you have your robot and you're in charge
of the actions that the robot is taking.
And you have these people who are also
making decisions and taking actions in that same space.
And even if the robot knows what it should do and all of that,
just co-existing with these people,
kind of getting the actions to gel well, to mesh well together,
that's sort of the kind of problem number one.
And then there's problem number two, which is,
goes back to this notion of,
if I'm a programmer, I can specify some objective
for the robot to go off and optimize,
and specify the task.
But if I put the robot in your home,
presumably you might have your own opinions
about, well, okay, I won my house clean, but how do I want it cleaned and how should robot
move how close to me it should come and all of that. And so I think those are the two differences
that you have. You're acting around people and you what you should be optimizing for should
satisfy the preferences of that end user,
not of your programmer who programmed you.
Yeah, and the preferences thing is tricky.
So figuring out those preferences, be able to interactively adjust, to understand what
the human is.
So really, it was not to be understanding the humans in order to interact with them in
order to please them.
Right.
So why is this hard? Yeah, why is understanding human's hard so
I think
There's two tasks about understanding humans that in my mind are very very similar
But not everyone agrees so there is the task of being able to just anticipate what people will do
We all know that cards need to do this right?
We all know that well need to do this, right? We all know that, well, if I navigate around some people,
the robot has to get some notion of, okay,
where is this person gonna be?
So that's kind of the prediction side.
And then there's what you are saying,
satisfying the preferences, right?
So adapting to the person's preference
is knowing what to optimize for,
which is more this inference side.
This, what does this person want?
What is their intent?
What are their preferences?
And to me, those kind of go together
because I think that at the very least,
if you can understand, if you can look at human behavior
and understand what it is that they want,
then that's sort of the key enabler to being able to
anticipate what they'll do in the future, because I think that, you know, we're not arbitrary,
we make these decisions that we make, we act in the way we do, because we're trying to achieve
certain things. And so I think that's the relationship between them. Now, how complicated do these models need to be in order to be able to understand what people want.
So we've gotten a long way in robotics with something called
Inverse Reinforcement Learning, which is the notion of someone acts, demonstrates what
how they want the thing done. What is in inverse reinforcement learning? You've briefly said it.
Right. So it's the problem of take human behavior and infer reward function from this.
So figure out what it is that that behavior is optimal with respect to.
And it's a great way to think about learning human preferences in the sense of, you know,
you have a car and the person can drive it and then you can say, well, okay, I can actually learn what the person is
optimizing for.
I can learn their driving style or you can have people
demonstrate how they want the house clean and then you can say,
okay, this is, I'm getting the trade-offs that they're making,
I'm getting the preferences that they want out of this.
And so we've been successful in robotics somewhat with this.
And it's based on a very simple model of human behavior.
It was remarkably simple, which is that human behavior is optimal
respect to whatever it is that people want, right?
So you make that assumption and now you can kind of inverse through.
That's why it's called inverse, well, really optimal control,
but also inverse reinforcement learning
so this is based on
Utility maximization in economics versus back in the 40s
By no even Morganstein were like okay people are making choices by maximizing utility go and then in the late 50s, we had loose and shepherd come in
and say, people are a little bit noisy and approximate
in that process.
So they might choose something kind of stochastically
with probability proportional to how much utility something
has.
There's a bit of noise in there.
This has translated into robotics and something that we call
Boltzmann rationality.
So it's a kind of an evolution of
inverse reinforcement learning that accounts for human noise.
We've had some success with that too for these tasks where it turns out people act
noisily enough that you can just do vanilla,
the vanilla version. Ah, you can account for noise and still infer what they seem to want based
on this. Then now we're hitting tasks where that's not not enough. And...
What are examples of the basic tasks? So imagine you're trying to control some robot that's
that's fairly complicated. You're trying to control some robot that's fairly complex.
Are you trying to control the robot arm?
Because maybe you're a patient with a motor impairment,
and you have this wheelchair mounted arm,
and you're trying to control it around.
Or one test that we've looked at with Sergey
is, and our student said, is a lunar lander.
So I don't know if you know this Atari game.
It's called lunar lander.
It's really hard. People really suck at landing the same way. a lunar lander. So I don't know if you know this Atari game, it's called lunar lander.
It's really hard. People really suck at landing the same way. Mostly they just crash it left
and right.
Okay, so this is the kind of task we imagine. You're trying to provide some assistance to
a person operating such a robot where you want the kind of the autonomy to kick in. Figure
out what it is that you're trying to do and help you do it. It's really hard to do that for,
say, lunar lander because people are all over the place. And so they seem much more noisy than
really rational. That's an example of a task where these models are kind of failing us.
And it's not surprising because, so we talked about the 40s utility, like 50s, sort of noisy.
Then the 70s came and behavioral economics started being a thing where people are like,
no, no, no, no.
People are not rational.
People are messy and emotional and irrational and have old sort of heuristics that might be
domain specific and they're just a mess.
So what does my robot do to understand what you want?
And that's why it's complicated.
For the most part, we get away with pretty simple models until we don't.
And then the question is, what do you do then? And I have days when I wanted to pack
my bags and go home and change jobs because it feels really daunting to make sense of human
behavior enough that you can reliably understand what people want, especially as robot capabilities
will continue to get developed. You'll get these systems that are more and more capable
of all sorts of things. And then you really want to make sure that you'll get these systems that are more and more capable of all sorts of things.
And then you really want to make sure that you're telling them the right thing to do.
What is that thing? Well, read it in human behavior.
So if I just sat here quietly and tried to understand something while you're by listening to you talk,
it would be harder than if I got to say something and ask you and interact and control.
Can the robot help its understanding of the human by influencing the behavior by actually
acting?
Yeah, absolutely.
So one of the things that's been exciting to me lately is this notion that when you try to think of the robotics problem as, okay, I have
a robot and it needs to optimize for whatever it is that a person wants it to optimize, as
opposed to maybe what a programmer said, that problem, we think of as a human robot collaboration
problem in which both agents get to act, in
which the robot knows less than the human because the human actually has access to, you
know, at least implicitly, to what it is that they want.
They can't write it down, but they can talk about it, they can give all sorts of signals,
they can demonstrate.
And but the robot doesn't need to sit there and passively observe human behavior
and try to make sense of it.
The robot can act too.
And so there's these information gathering actions
that the robot can take to sort of solicit responses
that are actually informative.
So for instance, this is not for the purpose
of assisting people, but with kind of back to coordinating
with people in cars and all of that.
One thing that Dorsa did was, so we were looking at cars being able to navigate around people.
And you might not know exactly the driving style of a particular individual that's next to you,
but you want to change lanes in front of them.
Navigating around other humans inside cars?
Yeah, good clarification question.
So you have an autonomous car and is trying to navigate
the road around human driven vehicles.
Similar things, ideas applied to pedestrians as well,
but let's just take human driven vehicles.
So now you're trying to change a lane.
Well, you could be trying to infer the
driving style of this person next to you. You'd like to know if they're in particular, if they're
sort of aggressive or defensive, if they're going to let you kind of go in or if they're going to
not. And it's very difficult to just, you know, when if you think that, if you want to head your
bets and say,
oh, maybe they're actually pretty aggressive, I shouldn't try this.
You kind of end up driving next to them and driving next to them, right?
And then you don't know because you're not actually getting the observations
that you get away, someone drives when they're next to you,
and they just need to go straight.
It's kind of the same, regardless of they're aggressive or defensive.
And so you need to enable the robot
the reason about how it might actually be able
to gather information by changing the actions
that it's taking.
And then the robot comes up with these cool things
where it kind of nudges towards you
and then sees if you're gonna slow down or not.
And if you slow down, it sort of updates its model of you and says,
okay, you're more on the defensive side, so now I can actually like...
That's a fascinating dance. That's so cool that you can use your own actions to gather information.
That's uh, that feels like an totally open, exciting new world of robotics.
I mean, how many people are even thinking about that kind of thing?
new world of robotics. I mean, how many people are even thinking about that kind of thing?
A handful of us. It's rare because it's actually leveraging human. I mean, most robotics, I've talked to a lot of, you know, colleagues and so on, are kind of being honest,
kind of afraid of humans. Because they're messy and complicated, right? I understand.
Going back to what we were talking about earlier,
right now, we're kind of in this dilemma of, OK,
there are tasks that we can just assume people are
approximately rational for, and we can figure out what they want.
We can figure out their goals.
We can figure out their driving styles, whatever.
Cool.
There are these tasks that we can't.
So what do we do?
Do we pack our bags and go home?
And this one, I've had a little bit of hope recently.
And I'm kind of doubting myself
because what do I know that 50 years of behavioral economics
hasn't figured out.
But maybe it's not really in contradiction
with the way that field is headed,
but basically one thing that we've been thinking about
is instead of kind of giving up and saying people are too crazy and irrational the way that field is headed, but basically one thing that we've been thinking about is
instead of kind of giving up and saying people are too crazy and irrational for us to make
sense of them, maybe we can give them a bit the benefit of the down and maybe we can think
of them as actually being relatively rational, but just under different assumptions about
rational, but just under different assumptions about the world, about how the world works, about, you know, they don't have, we, when we think about rationality, implicit assumption
is, oh, they're rational, under all the same assumptions and constraints as the robot,
right? What, if this is the state of the world, that's what they know, this is the transition
function, that's what they know, this is the horizon, that's what they know. But maybe the kind of this difference,
the way the reason they can see
a little messy and hectic, especially to robots,
is that perhaps they just make different assumptions
or have different beliefs.
So.
That's another fascinating idea that
this kind of anecdotal desire to say
that humans are irrational, perhaps grounded in behavioral economics, is that we just
don't understand the constraints and their awards end to which they operate.
And so our goal shouldn't be to throw our hands up and say they're irrational, is to say,
let's try to understand what are the constraints.
What it is that there must be a be assuming that makes this behavior make sense.
A good life lesson, right?
Good life lesson.
Yes, true.
It's just outside of robotics.
It's just good to communicate with humans.
That's just a good assume that you just don't sort of empathy, right?
It's a...
This may be there's something you're missing.
And it's, you know, it especially happens to robots
because they're kind of dumb and they don't know things and oftentimes people are sort of
super irrational and that they actually know a lot of things that robots don't. Sometimes
like with the lunar lander, the robot, you know, knows much more. So it turns out that if you try
to say, look, maybe people are operating this thing, but assuming a much more simplified
physics model, because they don't get the complexity of this kind of craft, or the robot arm,
with 70 years of freedom, with these inertia and whatever.
So, maybe they have this intuitive physics model, which is not, you know, this notion of
intuitive physics is something that could you studied actually in cognitive science was like Josh Denenbaum, Tom Griffiths, work on this stuff.
And what we found is that you can actually
try to figure out what physics model kind of best
explains human actions.
And then you can use that to sort of correct
what it is that they're commanding the craft to do.
So they might be sending the craft somewhere, but instead of executing that action, you can
take a step back and say, according to their intuitive, if the world lot worked, according
their intuitive physics model, where do they think that the craft is going?
Where are they trying to send it to?
And then you can use the real physics, right?
The inverse of that to actually figure out
what you should do so that you do that instead of where
they were actually sending you in the real world.
And I kid you not, it worked.
People land the damn thing in between the two flags
and all that.
So it's not conclusive in any way, but I'd say it's evidence
that maybe we're kind of underestimating
humans in some ways when we're giving up and saying, yeah, there's just crazy noisy.
Then you try to explicitly try to model the kind of world view that they have.
That they have, that's right. That's right. There's things in behavior
economics too that for instance have touched upon the planning horizon. So there's this idea that there's bounded rationality
Essentially and the idea that well, maybe we work on their computational constraints. And I think kind of our view recently has been
Take the Bellman update in AI and just break it in all sorts of way by saying state. No, no, no
The person doesn't get to see the real state. Maybe they're estimating somehow transition function. No, no, no, no, no, the person doesn't get to see the real state. Maybe they're estimating somehow transition function.
No, no, no, no, no.
Even the actual reward evaluation,
maybe they're still learning about what it is that they want.
Like, when you watch Netflix and you have all the things
and then you have to pick something,
imagine that the AI system interpreted that choice
as this is the thing you prefer to see.
How are you going to know? You're still trying to figure out what you like,
what you don't like, etc. So I think it's important to also account for that.
So it's not irrationality, because I was doing the right thing under the things that they know.
Yeah, that's brilliant. You mentioned recommender systems.
What kind of, and we were talking about human-robot interaction, what kind of problem space is that you're thinking about?
So, is it robots, like, wheels, robots,
with the times vehicles? Is it object manipulation?
When you think about human-robot interaction in your mind,
and maybe, I'm sure you can speak for the entire community
of human-robot interaction, but what are the problems of interest here?
And does it...
I kind of think of open domain dialogue as human-robot interaction.
And that happens not in the physical space, but it could just happen in the virtual space.
So where's the boundaries of this field for you
when you're thinking about the things we've been talking about?
Yeah, so I try to find kind of underlying,
I don't know what to even call them.
I get try to work on, you know, I might call what I do,
the kind of working on the foundations
of Valgris-Make Human Robot Interaction and try to make on, you know, I might call what I do, the kind of working on the foundations of algorithmic human robot interaction and try to make contributions there.
And it's important to me that whatever we do is actually somewhat domain agnostic when
it comes to, is it about, you know, autonomous cars or is it about quadroteurs or is it
a basis, sort or the same underlying principles
apply?
Of course, when you're trying to get a particular domain to work, usually have to do some extra
work to adapt that to that particular domain.
But these things that we were talking about around, well, you know, how do you model
humans?
It turns out that a lot of systems need to co-bene benefit from a better understanding of how human behavior relates
to what people want and need to predict human behavior, physical robots of all sorts, and beyond that.
And so I used to do manipulation, I used to be picking up stuff, and then I was picking up stuff
with people around. And now it's sort of very broad when it comes to the application level, but in a sense,
very focused on, okay, how does the problem need to change?
How do the algorithms need to change?
When we're not doing a robot by itself, you know, I'm seeing the dishwasher, but we're
stepping outside of that.
I thought that popped into my head just now on the Game Theory. I think you said this really interesting idea of using actions to gain more information.
But if we think of sort of Game Theory, the humans that are interacting with you, with
you, the robot, while I'm thinking that I don't need the robots.
Yeah, I did that all the robot. I'm taking the identity of the robot. Yeah, I did that all the time. Yeah, is they also have a world model of you.
And you can manipulate that.
I mean, if we look at autonomous vehicles,
people have a certain viewpoint.
You said with the kids,
people see Alexa in a certain way,
is there some value in trying to also optimize how people see you as a robot?
Is that a little too far away from the specifics of what we can solve right now?
So both, right? So it's really interesting. And we've seen a little bit of progress on this problem,
on pieces of this problem.
So you can, again, it kind of comes out
to how complicated as the human model need to be.
But in one piece of work that we were looking at,
we just said, okay, there's these parameters
that are internal to the robot
and what the robot is about to do
or maybe what objective, what driving style
the robot has or something like that.
And what we're gonna do is we're gonna set up a system
where part of the state is the person's belief
over those parameters.
And now when the robot acts,
that the person gets new evidence about this robot internal state,
and so they're updating their mental model of the robot, right?
So if they see a card that sort of cuts someone off, oh, god, that's an aggressive card, they
know more, right?
And if they see sort of a robot head towards a particular door, they're like, oh, yeah,
the robot's trying to get to that door.
So this thing that we have to do with humans to try and understand their goals and intentions,
humans are inevitably going to do that to robots. And then that raises this interesting
question that you asked, which is, can we do something about that? This is going to happen
inevitably, but we can sort of be more confusing or less confusing to people. And it turns out
you can optimize for being more informative and less confusing.
If you have an understanding of how your actions are being interpreted by the human, how
they're using these actions to update their belief. And honestly, all we did is just
base rule. Basically, okay, the person has a belief, they see an action, they make some
assumptions about how the robot generates its actions presumably as being rational because robots are rational, there is no multi-stimilar about
them.
And then they incorporate that new piece of evidence, an invasion sense in their belief
and they obtain a posterior.
And now the robot is trying to figure out what actions to take such that it steers the
person's belief to put as much probability mass as possible on the correct parameters.
So that's kind of a mathematical formalization of that.
But my worry, and I don't know if you want to go there with me, but I talk about this
quite a bit.
The kids talking to Alexa disrespectfully worries me. I worry in general about human age. I
guess I'd go up in Soviet Union World War II, on the Jew too, so with the Holocaust and everything. I just
worry about how we humans sometimes treat the other, the group that we call the other, whatever
it is, the human history, the group that the other has been changed faces. But it seems like the robot will be the other, the other, the next, the other.
And one thing is, it feels to me that robots don't get no respect.
They get shoved around.
Shaved around. And is there one at the shallow level for a better experience? It seems
that robots need to talk back a little bit. Like, my intuition says, I mean, most companies
from sort of Roomba, autonomous vehicle companies might not be so happy with the idea that a robot
has a little bit of an attitude. But I feel, it feels to me that that's necessary to create a
compelling experience. Like, we humans don't seem to respect that that's necessary to create a compelling experience.
Like we humans don't seem to respect anything that doesn't give us some attitude.
Or like a mix of mystery and attitude and anger and that threatens us subtly,
maybe passive aggressively. I don't know. It seems like we humans need that.
Dude, what
are your, is there something you have thoughts on this?
I'll give you two thoughts on it. Okay, sure. One is, one is it, it's, we respond to, you
know, someone being assertive, but we also respond to someone being vulnerable. So I think
robots, my first thought
is that robots get shoved around and bullied a lot
because they're sort of tempting
and they're sort of showing off
or they appear to be showing off.
And so I think, Karen, going back to these things
we were talking about in the beginning
of making robots a little more expressive,
a little bit more like, that wasn't cool to do
and now I'm bummed.
Right?
I think that that can actually help,
because people can't help but anthropomorphize and respond to that.
Even that, though, the emotion being communicated
not in any way a real thing, and people know that it's not a real thing,
because they know it's just a machine.
We're still interpreting, you know, we can, we,
we watch there's this famous psychology experiment with little triangles and
Kind of dots on a screen and a triangle is chasing the square and you get really angry at the darn triangle
Because why is it not leaving the square alone? So that's yeah, we can't help. That was the first thought
The vulnerability. It's really interesting. I
that I think of like being
pushing back being assertive as the only mechanism of getting of forming a connection of getting respect, but perhaps vulnerability.
Perhaps there's other mechanism that are less threatening.
Yeah.
So, well, I see a little bit, yes, but then this other thing that we can think about is it
goes back to what you're
saying that interaction is really game theoretic. So the moment you're taking actions in a space,
the humans are taking actions in that same space, but you have your own objective, which is you're
a car, you need to get your passenger to the destination. And then the human nearby has their
own objective, which somewhat overlaps with you, but not entirely. You're not interested in getting into an accident
with each other, but you have different destinations
and you want to get home faster
and they want to get home faster.
And that's a general sum game at that point.
And so I think that's what...
Treating it as such is kind of a way we can step outside
of this kind of mode that where you try to anticipate
what people do and you don't realize you have any influence over it while still protecting
yourself because you're understanding that people also understand that they can influence
you. And it's just kind of back and forth, this negotiation, which is really, really talking about different equilibria of a game.
The very basic way to solve coordination is to just make predictions about what people
will do and then stay out of their way.
And that's hard for the reasons we talked about, which is how I have to understand people's
intentions, implicitly, explicitly, who knows, but somehow we have to get enough of an understanding
of that, to be able't anticipate what happens next.
And so that's challenging, but then it's further challenged by the fact that people change
what they do based on what you do, because they don't plan an isolation either, right?
So when you see cars trying to merge on a highway and not succeeding. One of the reasons this can be is because you,
they look at traffic that keeps coming,
they predict what these people are planning on doing,
which is to just keep going,
and then they stay out of the way,
because there's no feasible plan, right?
Any plan would actually intersect
with one of these other people.
So that's bad, so you could stuck there.
So now, if you start thinking about it as, no, no, no, actually, these people change what they do
depending on what the car does. If the car actually tries to kind of inch itself forward,
they might actually slow down and let the car in.
And now taking advantage of that,
well, that's kind of the next level.
We call this this undiractuated system idea
where it's an undiractable system robotics,
but it's kind of, you don't,
you're influenced these other degrees of freedom,
but you don't get to decide what they do.
I've, somewhere as seen, you, uh, this, the human element
in this picture as underactuated.
So, you know, you, you understand underactuated robotics is, uh, you know, that you can't fully
control the system.
So you can't go in arbitrary directions in the configuration space under your control.
Yeah, it's a very simple way of under-actuation
where basically there's literally these degrees of freedom
that you can control, and these degrees of freedom
that you can't but you influence.
And I think that's the important part,
is that they don't do whatever,
regardless of what you do, that what you do influence
is what they end up doing.
I just also like the poetry of calling human
and robot interaction and underactuated robotics problem.
And you also mentioned sort of nudging.
It seems that there, I don't know, I think about this a lot in the case of pedestrians
have collected hundreds of hours of videos.
I like to just watch pedestrians.
And it seems that it's a funny hobby.
Yeah, it's weird.
Because I learn a lot. I'll learn a lot about myself, about our human behavior from watching pedestrians,
watching people in their environment.
Basically, crossing the street is like you're putting your life on the line.
You know, I don't know, tens of millions of time in America every day.
As people are just like playing this weird game of chicken
when they cross the street, especially when there is some ambiguity about the right of
way that has to do either with the rules of the road or with the general personality of
the intersection based on the time of day and so on.
And this nudging idea, I don't, you know, it seems that people don't even nudge.
They just aggressively take, make a decision. Somebody,
there's a runner that gave me this advice, I sometimes run in
in the street and, you know, not in the street, on the sidewalk,
and you said that if you don't make eye contact with people when
you're running, they will all move out of your way.
It's called civil in attention.
Civil in attention. That's called civil and attention.
Civil and attention.
That's the thing.
Oh, wow, I need to look this up.
But it works.
What is that?
My sense was, if you communicate, like, confidence in your actions, that you're unlikely
to deviate from the action that you're following, that's a really powerful signal to others
that they need to plan around your actions as opposed to nudging where you're sort of hesitantly
Then what the hesitation might communicate that you're now you're still in a dance in the game that they couldn't influence with their own actions
I've recently had
Conversation with Jim Keller who's a sort of this
legendary chip or chip architect,
but he also led the autopilot team for a while.
And his intuition that driving is fundamentally
still like a ballistics problem.
Like you can ignore the human element
that it's just not hitting things
and you can kind of learn the right dynamics required
to do the merger
and all those kinds of things.
And then my sense is, and I don't know if I can provide sort of definitive proof of this,
but my sense is I can order a magnitude or more, more difficult when humans are involved.
Like it's not simply object collision avoidance problem. What's, where does your intuition, of course, nobody
knows the right answer here, but where does your intuition follow on the difficulty, fundamental
difficulty of the driving problem when humans are involved?
Yeah. Good question. I have many opinions on this. Imagine downtown San Francisco. Yeah.
Yeah.
It's crazy, busy, everything.
Okay, now take all the humans out.
No pedestrians, no human driven vehicles, no cyclists, no people on little electric scooters
dipping around nothing.
I think we're done.
I think driving at that point is done.
We're done.
There's nothing really that's neat, still needs to be solved about that.
Well, let's pause there. I think I agree with you, and I think a lot of people that will hear
or will agree with that. But we need to sort of internalize that idea. So what's the problem there?
Because we went not quite yet, be done with that. because a lot of people kind of focused on the perception problem. A lot of people kind of map autonomous driving into how close are we to
solving being able to detect all the you know the the drivable area the objects in the scene.
Do you see that as a how hard is that problem? So your intuition there behind your statement was, we might have not solved it yet, but we're
close to solving basically the perception problem.
I think the perception problem, I mean, and by the way, a bunch of years ago, this would
not have been true.
And a lot of issues in the space were coming from the fact that, oh, we don't really,
you know, we don't know what's, what's where.
But, um, I think it's fairly safe to say that at this point, although you could always improve on things and all of that,
you can drive through downtown San Francisco, if there are no people around,
there's no really perception issues standing in your way.
And there,
thinking perception is hard, but yeah, it's, we've made a lot of progress on the perceptions and had
to undermine the difficulty of the problem.
I think everything about robotics is really difficult.
Of course, the planning problem, the control problem, all very difficult, but I think what
makes it really...
It's really...
Yeah.
It might be...
I picked Anton San Francisco, it's adapting to,
well, now it's knowing, now it's no longer snowing, now it's slippery in this way, now it's
the dynamics part could, I could imagine being still somewhat challenging.
But no, the thing that I think worries us and our tuition is not good there is the perception problem at the edge cases
sort of
Don't tell us in Francisco the nice thing
It's not actually it may not be a good example because um
Because you know what to what you're getting for others like crazy construction zones and all that
Yeah, but the thing is you're traveling at slow speed. So it doesn't feel dangerous.
To me, what feels dangerous is highway speeds when everything is to us humans super clear.
Yeah.
I'm assuming Lidar here, by the way.
I think it's kind of irresponsible to not use Lidar.
That's just my personal opinion.
That's a pretty.
I mean, depending on you, in this case, but I think like, you know, if you have the
opportunity to use Lidar, then a lot of cases you might not.
Good, your internship makes more sense now.
So you don't take vision.
I really just don't know enough to say, well, vision alone, what, you know, what's like,
there's a lot of, how many cameras do you have?
Is it how are you using them?
I don't know.
There's all sorts of details. I imagine there's stuff that's really hard to actually see, you know, how do you have? Is it how are you using them? I don't know. There's all sorts of details I imagine.
There's stuff that's really hard to actually see,
I don't know how to deal with,
with Galei exactly what you were saying,
stuff that people would see that you don't.
I think I have more of my intuition comes from systems
that can actually use LIDAR as well.
Yeah, until we know for sure,
it makes sense to be using LiDAR.
That's kind of the safety focus.
But then the sort of the, I also sympathize with the Elon Musk statement of LiDAR is a crutch.
It's, it's, it's a, it's a fun notion to think that the things that work today is a crutch
for the invention of the things that will work tomorrow, right?
They get...it's kind of true in the sense that if we want to stick to the conference, you
see this in academic and research settings all the time, the things that work force you
to not explore outside, think outside the box.
I mean, that happens all the time.
The problem is in the safety critical systems, you kind of want to stick with the things that work.
So it's an interesting and difficult trade-off in the case of real world, sort of safety
critical robotic systems. But so your intuition is just to clarify how hard is this human element?
How hard is driving when this human element is involved?
Are we years, decades away from solving it?
Perhaps actually the years and the thing I'm asking
it doesn't matter what the timeline is,
but do you think we're, how many breakthroughs are we away from, in solving the human interaction problem to get
this right?
I think in a sense that really depends, I think that we were talking about how well,
look, it's really hard because anticipation, people do is is hard. And on top of that, playing the game is hard.
But I think we sort of have the fundamental,
some of the fundamental understanding for that.
And then you already see that these systems are being
deployed in the real world, you know, even driverless, look at this, I think now a few companies
that don't have a driver in the car.
And so small areas.
I got a chance to, I went to Phoenix
and I shot a video with Waymo.
And you need to get that video out.
People can give me slack,
but there's incredible engineering
work being done there.
And it's one of those other seminal moments for me
in my life to be able to, it sounds silly,
but to be able to drive without a ride,
so I, without a driver in the seat.
I mean, that was an incredible robotics.
I was driven by a robot, you know,
and without being able to take over, without being able to take the steering
well, that's a magical, that's a magical moment. So in that regard, those domains, at least
for like way more, they're solving that human, there's, I mean, they're going fat. I mean,
it felt fast because you're like freaking out at first, this is my first experience, but it's going like the speed limit,
right, 30, 40, whatever it is.
And there's humans and it deals with them quite well.
It detects them and then negotiates the intersections,
left turns and all that.
So at least in those domains, it's solving them.
The open question for me is like,
how quickly can we expand?
That's the, you know, outside of the weather conditions, all those kinds of things, how quickly can we expand? That's the, outside of the weather conditions,
all those kinds of things,
how quickly can we expand to cities like San Francisco?
Yeah, and I wouldn't say that it's just,
now it's just pure engineering and it's probably the,
I mean, by the way, I'm speaking kind of very generally
here as hypothesizing, but I think that there are successes and no one is
everywhere out there, so that seems to suggest that things can be expanded and can be scaled,
and we know how to do a lot of things, but there's still probably new algorithms or modified algorithms that you still need to put in there as you learn
more and more about new challenges that you get faced with.
How much of this problem do you think can be learned through N2N?
This is the success of machine learning and reinforcement learning.
How much of it can be learned from sort of data from scratch?
And how much, which most of the success of autonomous vehicle systems
have a lot of heuristics and rule-based stuff on top,
like human expertise in injected,
forced into the system to make it work?
What's your sense?
How much will be the role of learning in a near term?
I think on the one hand that learning is inevitable here.
I think on the other hand that when people characterize the problem as it's a bunch of rules
that some people wrote down versus it's an end-to-end,L system or imitation learning, then maybe there's kind of something missing
from maybe that's more. So for instance, I think a very, very useful tool in this sort of problem,
both in how to generate the car, behavior, and robots in general,
and how to model human beings is actually planning, search optimization.
So robotics is a sequential decision-making problem, and when a robot can figure out on its own,
how to achieve its goal without hitting stuff and all that stuff,
you're all the good stuff for motion planning 101. I think of that as very much AI,
not this is some rule or some, there's nothing rule based around that, right? It's just you're
searching through a space and figuring out where you're optimizing through a space and figuring out
what seems to be the right thing to do. And I think it's hard to just do that because you need to learn models of the world.
And I think it's hard to just do the learning part where you don't bother with any of that
because then you're saying, well, I could do imitation, but then when I go off distribution,
I'm really screwed.
Or you can say, I can do reinforcement learning, which
adds a lot of robustness, but then you have to do either reinforcement learning in the
real world, which sounds a little challenging, or that trial and error, you know. Or you
have to do reinforcement learning in simulation, and then that means, well, guess what? You
need to model things, at least to model people, model the world enough that you, you know,
whatever policy you get of that is like actually fine to roll out in the world and do some additional
learning there. So do you think simulation, by the way, just a quick tangent has a role in the
human and robot interaction space. Like, is it useful? It seems like humans, everything we've been talking about
are difficult to model.
Is it similar?
Do you think simulation has a role in the space?
I do.
I think so because you can take models
and train with them ahead of time, for instance.
You can.
But the models are trying to interrupt. The models are sort of human constructed or learned.
I think they have to be a combination
because if you get some human data,
and then you say, this is going to be my model
of the person, what are four simulation and training work
for just deployment time?
And that's what I'm planning with as my model of how people work.
Regardless, if you take some data and you don't assume anything else and you just say,
okay, this is some data that I've collected.
Let me fit a policy to how people work based on that.
What does the happen is you collected some data and some distribution and then now your robot
sort of computes a best response to that, right?
It's sort of like, what should I do if this is how people work?
And easily goes off of distribution where that model that you've built of the human completely
sucks because out of distribution, you have no idea, right?
There's, if you think of all the possible policies,
and then you take only the ones that are consistent
with the human data that you've observed,
that still leads a lot of,
a lot of things could happen outside of that distribution
where you're confident and you know what's going on.
By the way, that's,
I mean, I've gotten used to this terminology
of not a distribution,
but it's such a machine learning terminology because
it kind of assumes so distribution is referring to the data that you've seen.
Setup states that you encounter.
They've encountered so far at training time.
Yeah.
But it kind of also implies that there's a nice statistical model that represents that
data. So out of distribution feels like, I don't know,
it raises to me philosophical questions
of how we humans reason out of distribution.
Reason about things that are completely,
we haven't seen before.
And so, and what we're talking about here
is how do we reason about what other people do in,
you know, situations where we haven't seen them.
And somehow we just magically navigate that.
I, you know, I can anticipate what will happen in situations that are even novel in many
ways.
And I have a pretty good intuition for, I always get it right, but, you know, and I might
be a little uncertain and so on. And I think it's this that if you just rely on data, you know, you do just,
there's just too many possibilities, there's too many policies out there that fit the data.
And by the way, it's not just state, it's really kind of history of state
because to really be able to have this be what the person will do, it kind of depends
on what they've been doing so far because that's the information you need to kind of, at least implicitly, sort of say,
oh, this is the kind of person that this is,
this is probably what they're trying to do.
So anyway, it's like you're trying to map
history of states to actions, there's many mappings.
And history, meaning like the last few seconds
or the last few minutes or the last few months.
Who knows?
Who knows how much you need, right?
In terms of if your state is really like the positions
of everything or whatnot and velocities. Who knows how much you need, right? In terms of if your state is really like the positions of everything or whatnot and velocities.
Who knows how much you need?
And then there's so many mappings.
And so now you're talking about how do you regularize
that space of what priors do you impose
or what's the inductive bias?
So there's all very related things to think about it.
Basically what are assumptions that we should be making?
Such that these models actually generalize outside of the data
that we've seen.
And now you're talking about, well, I don't know.
What can you assume?
Maybe you can assume that people actually have intentions,
and that's what drives their actions.
Maybe that's the right thing to do
when you haven't seen data
very nearby that tells you otherwise. I don't know. It's a very open question.
Do you think, sort of, that one of the dreams of artificial intelligence was to solve
common sense reasoning? Whatever the heck that means. Do you think something like common
sense reasoning has to be solved in part, to be able to solve this dance of human interaction, the driving space or human or by interaction in
general?
You have to be able to reason about these kinds of common sense concepts of physics, of
all the things we've been talking about humans, I don't even know how to express
them with words, but the basics of human behavior, a fear of death. So like, to me, it's really
important to encode in some kind of sense, maybe not, maybe it's implicit, but it feels
that it's important to explicitly encode the fear of death that people don't want to die.
It seems silly, but the game of chicken that involves with pedestrian crossing the street
is playing with the idea of mortality. We really don't want to die. It's not just a negative reward.
Like we really don't want to die. It's not just like a negative reward.
I don't know. It just feels like all these human concepts have to be encoded.
Do you share that sense or is this a lot simpler than I'm making out to be? I think it might be simpler. And I'm no person who likes to communicate. I think it might be simpler than that.
Because it turns out, for instance, if you say model people in the very,
I don't call it traditional,
I don't know if it's fair to look at it as a traditional way,
but calling people as, okay, they're rational somehow,
the utilitarian perspective.
Well, in that, once you say that,
you automatically capture that they have an incentive to keep
on being.
Stuart likes to say, you can't fetch the coffee if you're dead.
Stuart Russell, by the way.
That's a good line.
So when you're sort of treating agents as having these objectives,
these incentives, humans or artificial,
you're kind of implicitly modeling that they'd like to stick around
so that they can accomplish those goals.
So I think in a sense, maybe that's what draws me so much
to the rationality framework, even though it's so broken,
we've been able to, it's been such a useful perspective.
And like we were talking about earlier,
what's the alternative?
I give up and go home or, you know,
I just use complete black boxes,
but then I don't know what to assume out of distribution.
I come back to this.
It's just, it's been a very fruitful way to think about the problem
and a very more positive way, right?
These people aren't just crazy.
Maybe they make more sense than we think.
But I think we also have to somehow be ready for it
to be wrong, be able to detect when these assumptions
are in holding, be all of that stuff.
Let me ask sort of another small side of this
that we've been talking about about the pure autonomous driving problem.
But there's also relatively successful systems already deployed out there in what you may call like level two autonomy or semi autonomous vehicles.
Whether that's Tesla or Apollo, we're quite a bit with catalax super guru system, which has a driver-facing camera that detects
your state. There's a bunch of basically lane-centering systems. What's your sense about this
kind of way of dealing with the human robot interaction problem by having a really dumb
robot and relying on the human to help the robot out to keep
them both alive.
Is that from the research perspective, how difficult is that problem and from a practical deployment
perspective, is that a fruitful way to approach this human or violent interaction problem.
I think what we have to be careful about there is to not... it seems like some of these systems,
not all, are making this underlying assumption that if... so I'm a driver and I'm now really
not driving but supervising and my job is to intervene, right?
And so we have to be careful with this assumption that when I'm, if I'm supervising,
I will be just as safe as when I'm driving.
Like that I will, you know, if I, if I wouldn't get into some kind of accident, if I'm driving, I will be able to avoid that accident
when I'm supervising too.
And I think I'm concerned about this assumption
from a few perspectives.
So from a technical perspective,
it's that when you let something kind of take control
and do its thing, and it depends on what that thing is,
obviously, and how much it's taking control,
and how what things are you trusting it to do.
But if you let it do its thing and take control, it will go to what we might call off policy
from the person's perspective state.
So states to the person wouldn't actually find themselves in if they were the ones driving.
And the assumption that the person functions just as well there as they function in the
states that they would normally encounter is a little questionable.
Now another part is the kind of the human factor side of this, which is that I don't know about you, but I think I definitely feel like I'm experiencing things very differently when I'm actively engaged in the task, versus when I'm a passive observer. Like, even if I try to stay engaged, right, it's very different than when I'm actually
actively making decisions.
And you see this in life in general, like you see students who are actively trying to
come up with the answer, learn to sing better than when they're passively told the answer.
I think that's somewhat related.
And I think people have studied this in human factors for airplanes.
And I think it's actually fairly established that these two are not the same.
So on that point, because I've gotten a huge amount of heat on this, and I stand by it.
Okay.
Because I know the human factors community well, and the work here is really strong.
And there's many decades of work showing exactly what you're saying.
Nevertheless, I've been continuously surprised that much of the predictions of that work has
been wrong in what I've seen.
So what we have to do, I still agree with everything you said, we have to be a little bit more
open-minded.
So I'll tell you, there's a few surprising things that, like everything you said to the
word is actually exactly correct.
But it doesn't say what you didn't say is that these systems are, you said you can't
assume a bunch of things.
But we don't know if these systems are fundamentally unsafe.
That's still unknown.
There's a lot of interesting things,
like I'm surprised by the fact,
not the fact that what seems to be anecdotally
from a large data collection that we've done,
but also from just talking to a lot of people,
when in the supervisory role of semi-autonomous systems
that are sufficiently dumb at
least, which is the key element is the systems that have to be dumb. The people are actually
more energized as observers. So they're actually better at observing the situation. So there
might be cases in systems if you get the interaction right, will you as a supervisor
will do a better job with the system together?
I agree.
I think that is actually really possible.
I guess mainly I'm pointing out that if you do it naively, you're an implicitly assuming
something, that assumption might actually really be wrong.
But I do think that if you explicitly think about what the
agent should do is such that the person still stays engaged. So that you essentially
empower the person to mourn, that's really the goal. You still have a driver, so you
want to empower them to be so much better than they would be by themselves. And that's different, it's a very different mindset
than I went up to basically not drive,
I didn't, and, but be ready to sort of take over.
So one of the interesting things we've been talking about
is the rewards that they seem to be fundamental to
the way RoboS behaves. So, broadly speaking,
we've been talking about utility functions, so how do we approach the design of reward functions?
Like, how do we come up with good reward functions? Well, really good question, because the answer is, we don't.
This was, you know, used to think.
I used to think about how, well, it's actually really hard to specify rewards for interaction
because, you know, it's really supposed to be what the people want and then you really,
you know, we talked about how you have to customize what you want to do to the end user.
But I kind of realized that even if you take the interactive component away, it's still
really hard to design reward functions.
So what do I mean by that?
I mean, if we assume this sort of AI paradigm in which there's an agent and as job is to
optimize some objectives, some reward, utility, loss, whatever cost.
If you write it out, maybe it's a sad depending on situation or whatever it is.
If you write it out and then you deploy the agent, you'd wanna make sure that whatever you specified
incentivizes the behavior you want from the agent
in any situation that the agent will be faced with.
So I do motion planning on my robot arm.
I specify some cost function like,
this is how far away you should try to stay so much
a matter of stay away from people and this is how much it matters to be able to be efficient
and blah, blah, blah, right?
I need to make sure that whatever I specify those constraints or trade-offs or whatever
they are, that when the robot goes and solves that problem in every new situation, that
behavior is the behavior that I want to see.
And what I've been finding is that we have no idea how to do that.
Basically, what I can do is I can sample, I can think of some situations that I think
are representative of what the robot will face.
And I can tune and add and tune some reward function
until the optimal behavior is what I want
on those situations, which first of all,
is super frustrating because through the miracle of AI,
we don't have to specify rules for behavior anymore,
who are saying before the robot comes up
with the right thing to do. You plug in the situation.
It optimizes,
magnet situation it optimizes,
but you have to spend still a lot of time on actually defining what it is
that that criterion should be.
Make sure you didn't forget about 50 Brazilian things that are important
and how they all should be combining together to tell the robot what's good
and what's bad and how good and how bad.
And so I think this is a lesson that I don't know, kind of, I guess I close my eyes to it for a
lot because I've been tuning cost functions for 10 years now. But it's really strikes me that,
But it's really strikes me that, yeah, we've moved the tuning and the designing of features or whatever from the behavior side into the reward side.
And yes, I agree that there's way less of it, but it still seems really hard to anticipate
any possible situation and make sure you specify a reward function
that when optimized will work well in every possible situation.
So you're kind of referring to unintended consequences or just in general any kind of suboptimal
behavior that emerges outside of the things you said out of distribution.
Suboptimal behavior that is actually optimal.
I mean, I guess the idea of unintended consequence.
You know, it's optimal respect to what you specified,
but it's not what you want.
And there's a difference between those.
But that's not fundamentally a robotics problem, right?
That's a human problem.
So like, that's the thing.
Yeah.
Right. So there's this thing called good-hards law,
which is you set a metric for an organization
and the moment it becomes
Untarget that people actually optimize for it's no longer a good metric. Oh, what's the called the good heart's law
Good heart's law. So the moment you specify a metric it stops doing his job. Yeah, it stops doing his job
So there's yeah, there's such a thing as offer optimizing for things and failing to think ahead of time
of all the possible things that might be important. So that's interesting because,
history I work a lot on reward learning from the perspective of customizing to the end user,
but it really seems like it's not just the interaction with the end user
that's a problem of the human and the robot collaborating so that the robot can do what the human wants,
right, this kind of back and forth, the robot pro being the person being informative, all of that stuff,
might be actually just as applicable to this kind of maybe new form of human robot interaction, which is the interaction
between the robot and the expert programmer, robotist's designer in charge of actually specifying
what the heck to do and specifying the task for the robot.
That's so cool, like collaborating on the reward design.
Right, collaborating on the reward design.
And so what does it mean right?
What is it when we think about the problem not as someone specifies all of your
job is to optimize and we start thinking about urine disinteraction and
discoloration and the first thing that comes up is when the person specifies a
reward it's not you know gospel it was not like the letter of the law.
It's not the definition of the reward function you should be optimizing because they're doing
their best, but they're not some magic, perfect oracle.
And the sooner we start understanding that, I think the sooner we'll get to more robust
robots, that function better in different situations.
And then then you have kind of say, okay, well, it's, it's almost
like robots are over learning over, they're putting too much weight on the rewards specified by definition.
And maybe leaving a lot of other information on the table, like what are other things we could do
to actually communicate to the robot about what we want them to do besides attempting
to specify a reward from.
Yeah, you have this awesome, again, I love the poetry of it, of leaked information.
You mentioned humans leak information about what they want, you know, leak reward signal
for the robot.
So how do we detect these leaks?
Like that. Yeah, what are these leaks?
But I just, I don't know, those were,
there's recently saw it read it,
I don't know where from you.
And it's gonna stick with me for a while for some reason
because it's not explicitly expressed
the kind of leaks indirectly from our behavior.
We do, yeah, absolutely.
So I think maybe some surprising bits, right?
So we were talking before about our robot arm.
It needs to move around people, carry stuff, put stuff away, all of that.
And now, imagine that the robot has some initial objective that the programmer gave it,
so they can do all these things functional.
It's capable of doing that.
Now, I noticed that it's doing something and maybe it's coming too close to me.
Maybe I'm the designer, maybe I'm the end user, and this robot is now in my home.
I push it away.
I push away because it's a reaction to what the robot is currently doing.
And this is what we call physical human robot interaction.
And now there's a lot of interesting work on how do you respond to physical human robot
interaction?
What should the robot do if such an event occurs?
And there's different schools of thought.
Well, you can sort of treat it the control of the eradratic way and say, this is a disturbance that you must reject.
You can sort of treat it more kind of heuristically.
And so I'm going to go into some like gravity compensation
modes that I'm easily maneuverable around.
I'm going to go into the direction that the person push me.
And and to us.
Part of realization has been that that is signal that communicate about the reward
because if my robot was moving in an optimal way and I intervened, that means that I disagree
with his notion of optimality and whatever it thinks is optimal, is not actually optimal.
And sort of optimization problems aside, that means that the cost function, the reward function, is incorrect, at least is not what I wanted to be.
How difficult is that signal to interpret and make actionable?
So like, because this connects to our autonomous vehicle discussion, whether in the semi-autonomous
vehicle or autonomous vehicle, when the safety driver disengages the car, a car or a car or a car or a car or a car.
But they could have disengaged it for a million reasons.
Yeah.
So that's true.
Again, it comes back to, can you structure a little bit your assumptions about how human
behavior relates to what they want.
And you know, you can't wonder what we've done is literally just treated this external
torque that they applied. As you know, when you take that and you add it with what the
torque the robot was already applying, that overall action is probably relatively optimal
respect to whatever it is that the person wants. And then that gives you information about
what it is that they want. So you can learn that people want you to stay further away from
them. Now, you're right that there might be many things
that explain just that one signal
that you might need much more data than that
for the person to be able to shape your reward function
over time.
You can also do this infogaddering stuff
that we were talking about.
And I don't know, we've done that in that context,
just to clarify, but it's definitely something
we thought about where you can have
the robot start acting in a way. I give there a bunch of different explanations, right? It moves
in a way where it sees if you're corrected in some other way or not and then kind of actually
plans its motion so that it can disambiguate and collect information about what you want.
Anyway, so that's one way that's sort of leaked information, maybe even more subtle leaked information,
is if I just press the e-stop, right?
I'm doing it out of panic,
because the robot is about to do something bad.
There's again information there, right?
Okay, the robot should definitely stop,
but it should also figure out
that whatever it was about to do was not good.
And in fact, it was so not good
that stopping and remaining
stop for a while was a better trajectory for it than whatever it is that it was about to do.
And that again is information about what are my preferences? What do I want?
Speaking of e-stops, what are your expert opinions on the three laws of robotics,
some Isaac Asimov, that don't harm humans, they'll be orders, protect yourself? opinions on the three laws of robotics from Isaac Ozzamoff,
don't harm humans or Bay Orders protect yourself. I mean, it's a such a silly notion, but I speak to so many people these days,
just regular folks, just I don't know, my parents and so on about robotics.
And they kind of operate in that space of, you know, imagining our future with robots and
thinking, what are the ethical, how do we
get that dance, right?
I know the three laws might be a silly notion, but do you think about like what universal
reward functions that might be that we should enforce on the robots of the future, or is
that a little too far out? Or is the mechanism that you just
describe, you shouldn't be three laws that should be constantly adjusting kind of thing?
I think it should constantly be adjusting kind of thing. The issue with the laws is, I
don't even, you know, they're words and I have to write math and have to translate them
into math. What does it mean to? What does harm mean?
What is a...
Oh, babe.
It's not math.
Yeah, babe.
Okay, what?
Right?
Because we just talked about how you try to say what you want, but you don't always get
it right, and you want these machines to do what you want, not necessarily exactly what
you're literally.
You don't want them to take you literally.
You want to take what you say and interpret it in context.
And that's what we do with the specified rewards.
We don't take them literally anymore from the designer.
We, not we as a community.
We as, you know, some members of my group.
Um, um, we, and some of our collaborators, like Peter,
Biel and Stuart Russell, we should have said, okay, the designer specified this thing,
but I'm going to interpret it not as this is the universal reward function that I
shall always optimize, always and forever, but as this is good evidence about why
the person wants, and I should interpret that evidence in the context of these situations that it was specified for.
Because ultimately, that's what the designer
thought about.
That's what they had in mind.
And really, the specified reward function
that works for me in all these situations
is really kind of telling me that whatever behavior
that incentivizes must be good behavior
respect to the thing that I should actually
be optimizing for.
And so now the robot has uncertainty about what it is that it should be,
what its reward function is.
And then there's all these additional signals we've been finding
that it can kind of continually learn from and adapt its understanding of what people want.
Every time the person corrected, maybe they demonstrate, maybe they stop hopefully not, right?
One really, really crazy one is the environment itself.
Like our world, you observe our world and the state of it, and it's not that you're seeing
behavior and you're saying people are making decisions that are rational blah blah blah. But our world is something that we've been acting when, according to our preferences.
So I have this example where like the robot walks into my home and my shoes are laid down
on the floor kind of in a line, right?
It took effort to do that.
So even though the robot doesn't see me doing this, you know, actually aligning the shoes,
it should still be able to figure out
that I want the shoes aligned,
because there's no way for them to have magically,
you know, been stancheted themselves in that way.
Someone must have actually
shaken the time to do that.
So must be important.
So the environment actually tells the environment is-
Leaks information. Leaks information. I mean So the environment actually tells the environment is- Leaks information.
Leaks information.
I mean, the environment is the way it is
because humans somehow manipulated it.
So you have to kind of reverse engineer
the narrative that happened to create the environments
it is and that leaks the preference information.
Yeah, and you have to be careful, right?
Because people don't have the bandwidth to do everything.
So just because my house is messy doesn't mean that banwis to do everything. So just because you know, my house is messy,
doesn't mean that I want it to be messy, right?
But that just, because I, you know,
I didn't put the effort into that.
I put the effort into something else.
So the robot should figure out, well, that's
something else was more important.
But it doesn't mean that, you know,
the house being messy is not.
So it's a little subtle, but yeah,
we really think of it.
The state itself is kind of like a choice
that people implicitly made about how they want their world.
What book or books technical or fiction or philosophical
had, when you like look back, your life had a big impact,
maybe it was a turning point, was inspiring in some way.
Maybe we're talking about some silly book
that nobody in their right mind want to read.
Or maybe it's a book that you would recommend to others to read.
Or maybe those could be two different recommendations
that are books that could be useful for people
on their journey.
When I was in, it was kind of a personal story.
When I was in 12th grade, I got my hands on a PDF copy in
Romania of Russell Norvig, AI Modern Approach. I didn't know anything about AI at that point. I was
schoo-no. I had watched the movie The Matrix. That was my exposure. And so I started going through this thing.
And you know, you were asking in the beginning,
what are, what are, you know, it's math
and it's algorithms, what's interesting.
It was so captivating.
This notion that you could just have a goal
and figure out your way through a kind of a messy
complicated situation, sort of what
sequence of decisions you should make to autonomously to achieve that goal.
That was so cool.
I'm biased, but that's a cool book.
Yeah, the goal of the process of intelligence and mechanize it.
I had the same experience.
I was really interested in psychiatry
and trying to understand human behavior.
And then AI and modern approach is like,
wait, you can just reduce it all to-
We can write math about human behavior, right?
Yeah, so that's, and I think that stuck with me.
Because a lot of what I do,
a lot of what we do in my lab is write math about human behavior combined with data
and learning, put it all together, give it to robots to plan with, and hope that instead
of writing rules for the robots, writing heuristics, designing behavior, they can actually autonomously
come up with the right thing to do around people.
That's our signature move.
We wrote some math, and then instead of kind
of hand-grafting this and that and that and the robot figured it stuff out and isn't
that cool. And I think that is the same enthusiasm that I got from the robot figured out how
to reach that goal in that graph isn't that cool.
So Apologize for the romanticized questions and the silly ones. If a doctor gave you five years to live
Sort of
emphasizing the finiteness of our existence. What would you try to accomplish?
It's like my biggest nightmare, by the way. I really like living.
I'm actually I really don't like the idea of being told that I'm going to die.
Well, sorry, Dylan, got in there for a second.
Do you, I mean, do you meditate or ponder on your mortality or on our human, the fact that
this thing ends?
It seems to be a fundamental feature.
Do you think of it as a feature or a bug too?
Is it, you said you don't like the idea of dying, but if I were to give you
a choice of living forever, like you're not allowed to die.
Yeah, now I'll say that I'm wondering forever, but I watch this show. It's very still. It's
called the good place. And they reflect a lot on this. And you know, the, the moral of
the story is that you have to make the afterlife be finite too, because otherwise people just
kind of, it's like, W kind of he's like Wally it's
like yeah whatever. So I think the finance helps but yeah it's just I
don't I'm not a religious person I don't think that there's something after
and so I think it just ends and you stop existing. And I really like existing. It's such a great privilege to exist
that, yeah, I think that's the scary part. I still think that we like existing so much because it ends.
And that's so sad. It's so sad to me every time. I find almost everything about this life beautiful.
Like, the silliest, most mundane things are just beautiful.
I think I'm cognizant of the fact that I find it beautiful because it ends.
It's so, I don't know, I don't know how to feel about that.
I also feel like there's a lesson in there for robotics and AI that is not like the finiteness of things seems to be a fundamental nature of human existence.
I think some people sort of accuse me of just being Russian and melancholic and romantic or something,
but that seems to be a fundamental nature of our existence that should be incorporated in our reward functions. But anyway, if you were
speaking of reward functions, if you only had five years, what would you try to accomplish?
This is the thing. I'm thinking about this question and have a pretty joyous moment, because I don't know that I would change my life.
I'm trying to make some contributions
to how we understand human AI interaction.
I don't think I would change that.
Maybe I'll take more trips to the Caribbean or something,
but I tried to solve that already from time to time.
So yeah, I mean, I try to do the things that bring me joy.
And thinking about these things, bring me joy is the Mary condo
thing.
Don't do stuff that doesn't spark joy.
For the most part, I do things that spark joy.
Maybe I'll do less service in the department or something.
I mean, like that's hard, I'm not dealing with admissions
anymore. But, but no, I mean, I think I have amazing colleagues and amazing students
and amazing family and friends and kind of spending time and some balance with all of
them is what I do. And that's what I'm doing already. So I don't know that I would really
change anything.
So on the spirit of positiveness, what small act of kindness if one pops the mind where you once shown that you will never forget? When I was in high school, my friends, my classmates,
Um, my friends, my classmates did some tutoring. We were gearing up for our baccalaureate exam and they did some tutoring on, um, well,
someone mad, someone, whatever.
I was comfortable enough with, with some of those subjects, but, uh, physics was
something that I hadn't focused on in a while.
And so, um, they were all working with this one teacher.
And I started working with that teacher,
name is Nicole Becano.
And she was the one who kind of opened up
this whole world for me.
Because she sort of told me that I should take the SATs
and apply to go to college abroad and do better on
my English and all of that.
And when it came to, well, financially, I couldn't, my parents couldn't really afford to do
all these things.
She started tutoring me on physics for free.
And on top of that, sitting down with me to kind of train me for SATs and all that jazz that she had
experience with.
Wow.
So, and obviously that has taken you to be to here today, or sort of one of the world experts
in robotics, it's funny, those little, yeah, people do it, do it, do it small or long.
For no reason, really.
That's a kindness.
Just out of karma.
Wanting to support someone. Yeah. So we talked a ton
about reward functions. Let me talk about the most ridiculous big question. What is the meaning of life?
What's the reward function under which we humans operate? Like what maybe to your life maybe broader to human life in general what do you think?
What gives life fulfillment purpose happiness meaning
You can't even ask that question with a straight face that's how ridiculous. I can't I can't okay, so
You know you're gonna try to answer it anyway or true
So I was in a planetarium once.
And they show you the thing and they zoom out and zoom out and it's all like your respect
of dust kind of thing.
I think I was conceptualizing that we're kind of, you know, what are humans?
We're just on this little planet, whatever.
We don't matter much in the grand scheme of things.
And then my mind got really blown because this talk, they talked about this multiverse theory,
where they kind of zoomed out and were like, this is our universe. And then like, there's a
bazillion other ones. And it stays pop in and out of existence. So like our whole thing that's
that we can't even fathom how big it is was like a blimp that went in and out.
And I thought I was like, okay, like I'm done. This is not there is no meaning.
And clearly what we should be doing is try to impact whatever local thing we can impact. Our
communities leave a little bit behind there. Our friends, our family, our local communities,
and just try to be there for other humans, because
I just, everything beyond that seems ridiculous.
I mean, I, you, like, how do you make sense of these multiverse?
Is like, are you inspired by the immensity of it?
Do you, I mean, you, is there, like, is it amazing to you, or is it almost paralyzing in the mystery of it?
It's frustrating.
I'm frustrated by my inability to comprehend.
It feels very frustrating.
Look, there's some stuff that we should time, blah, blah, blah, that we should really be understanding. I definitely don't understand it, but the amazing physicists
of the world have a much better understanding than me, but it still feels excellent.
The grants came of things. It's very frustrating. It just feels like our brain don't have
some fundamental capacity yet. Well, yet or ever, I don't know.
Well, that's one of the dreams of artificial intelligence is to create systems that will
aid expand our cognitive capacity in order to understand the build the theory of everything,
with the physics, and understand what the hektis multiverse are.
So I think there's no better way to end it than talking about the meaning
of life and the fundamental nature of the universe and the multiverse.
And the multiverse. So, Akka is a huge honor. One of the my favorite conversations I've
had. I really appreciate your time. Thank you for talking to me.
Thank you for coming. Come back again.
Thanks for listening to this conversation with Enka Dregan. And thank you to our presenting coming. Come back again. are simply connected with me on Twitter at Lex Friedman. And now, let me leave you with some words from Isaac Asimov.
Your assumptions are your windows in the world.
Scub them off every once in a while, or the light won't come in.
Thank you for listening and hope to see you next time.
you