Microsoft Research Podcast - Abstracts: NeurIPS 2024 with Dylan Foster
Episode Date: December 6, 2024Can existing algorithms designed for simple reinforcement learning problems be used to solve more complex RL problems? Researcher Dylan Foster discusses the modular approach he and his coauthors explo...red in their 2024 NeurIPS paper on RL under latent dynamics.Read the paper
Transcript
Discussion (0)
Welcome to Abstracts,
a Microsoft Research podcast that puts
the spotlight on world-class research in brief.
I'm Amber Tingle.
In this series, members of the research community at Microsoft give us
a quick snapshot or a podcast abstract of their new and noteworthy papers. Our guest today is Dylan Foster. He is a principal
researcher at Microsoft Research and co-author of a paper called Reinforcement Learning Under
Latent Dynamics Toward Statistical and Algorithmic Modularity. The work is among the oral presentations at this year's
Conference on Neural Information Processing Systems, or NEURIPS, in Vancouver. Dylan,
welcome and thank you for joining us on the podcast. Thanks for having me. Let's start with
a brief overview of this paper. Tell us about the problem this work addresses and why the research community should
know about it. So this is a kind of a theoretical work on reinforcement learning or RL. When I say
reinforcement learning, broadly speaking, this is talking about the question of how can we design
AI agents that are capable of like interacting with unknown environments and learning how to solve
problems through trial and error. So this is part of some broader agenda we've been doing on
theoretical foundations of RL. And the key questions we're looking at here are what are
called exploration and sample efficiency. So this just means we're trying to understand what are the
algorithm design principles that can allow you to explore an unknown environment and learn as quickly as possible.
What we're doing in this paper is we're kind of looking at how can you most efficiently solve
reinforcement learning problems where you're faced with very high dimensional observations,
but the underlying dynamics of the system you're interacting with
are simple. So this is a setting that occurs in a lot of natural reinforcement learning and
control problems, especially in the context of like, say embodied decision making. So if you
think about say games like Pong, you know, the state of the the game like the state of like pong is extremely
simple it's just you know what is the position and velocity of the ball and like where are the
paddles but what we'd like to be able to do is learn to uh you know like control or like solve
games like this from raw pixels or like images kind of in the same way that a human would like
just solve them from vision so if you look at these types of problems you know we call these like rl with rich observations
or rl with latent dynamics you know these are interesting because they kind of require you to
explore the system but they also require you know representation learning like you want to be able
to use neural nets to learn a mapping from,
say, the images you see to the latent state of the system. This is a pretty interesting and
non-trivial algorithmic problem. And what we do in this work is we take a first step towards
something like a unified understanding for how to solve these sorts of rich observation or latent dynamics
RL problems. So how did you go about developing this theoretical framework?
Yeah, so if you look at these sort of RL problems with latent dynamics, this is something that's
actually received a lot of investigation in theory. And a lot of this goes back to kind of early work from our lab
from like 2016, 2017 or so. There's some really interesting results here, but progress was largely
on a like case by case basis, meaning, you know, there are many different ways that you can try to
model the latent dynamics of your problem. And, you know, each of these somehow leads to a different
algorithm, right? So like, you know, you think very hard about this modeling assumption, you and each of these somehow leads to a different algorithm.
So you think very hard about this modeling assumption,
you think about what would an optimal algorithm look like,
and you end up writing an entire paper about it.
And there's nothing wrong with that per se, but if you want to be able to iterate quickly
and kind of try different modeling assumptions
and see what works in practice, this is not really tenable. It's just too slow.
And so the starting point for this work was to kind of try to take a different and more modular
approach. So the idea is, you know, there are many, many different types of sort of systems
or modeling assumptions for the dynamics that have been already studied
extensively and have entire papers about them for the simpler setting in which you can directly see
the state of the system. And so what we wanted to ask here is, is it possible to use these existing
results in more of like a modular fashion? Like if someone has already written a paper on how to optimally solve a particular type of MDP or Markov decision process, can we just take their algorithm as is and perhaps plug it into some kind of meta algorithm that can directly kind of combine this with representation learning and use it to solve the corresponding rich observation or
latent dynamics RL problem. What were your major findings? What did you learn during this process?
We started by asking the question sort of exactly in the way that I just posed it, right? Like,
can we take existing algorithms and use them to solve rich observation RL problems in a modular fashion. And this turned
out to be really tricky. Like there's a lot of natural algorithms you might try that seem
promising at first, but don't exactly work out. And what this kind of led us to, and sort of the
first main result in this paper is actually a negative result so what
we actually showed is most sort of well-studied uh types of systems or like mdps that have been
studied in like the prior literature on rl uh even if they're tractable when you're able to directly see the state of the system, they can become statistically intractable once you add sort of high dimensional attempts to explore the system that you need in order to learn a good decision
making policy becomes like very very large like much much larger than the corresponding uh sort
of complexity if you were able to directly see the states of the system you know you could look at
this and say i guess i guess we're we're out of you know, maybe there's just no hope of solving these sorts of
problems. But that's perhaps a little too pessimistic, you know, really, the way you
should interpret this result is just that you need more assumptions. And that's precisely what
the sort of second result we have in this paper is. So our second result shows that you can sort of bypass this impossibility
results and, you know, achieve truly modular algorithms under a couple different types of
additional assumptions. Dylan, I'd like to know, and I'm sure our audience would too,
what this work means when it comes to real-world application? What impact will this have on the research community?
Yeah, so maybe I'll answer that with two different points.
The first one is a broader point, which is why is it important to understand this problem of exploration and sample efficiency in reinforcement learning. If you look at the sort of setting we study in this paper, you know, this like RL or decision making with high dimensional observations, on the empirical side, people have
made a huge amount of progress on this problem through deep reinforcement learning. This was
what kind of led to these amazing breakthroughs in solving games like Atari in the last decade.
But if you look at these results the gains are somehow
more coming from the like inductive bias or the like generalization abilities of deep learning
and not necessarily from the specific algorithms so like current algorithms do not actually explore
very deliberately and so their sample efficiency is very high like Like, it's hard to draw a one to one
comparison, but you can argue they need like far more experience than a human would to solve these
sorts of problems. So it's not clear that we're really anywhere near the ceiling of what can be
achieved in terms of like, how efficiently can you have, you know, an agent learn to solve new
problems from trial and error. And I think better algorithms here could potentially be like transformative in a lot of different domains. To get into this
specific work, I think there's a couple of important takeaways for researchers. One is that
by giving this impossibility result that shows that RL with latent dynamics is impossible without further assumptions,
we're kind of narrowing down the search space where other researchers can look for efficient
algorithms. The second takeaway is, you know, we are showing that this problem becomes tractable
when you make additional assumptions, but I view these more as like a proof of concept.
Like we're kind of showing for the first time that it is possible to do something non-trivial, but I think a lot more
work and research will be required in order to like, you know, build on this and take this to
something that can lead to like practical algorithms. Well, Dylan Foster, thank you for
joining us today to discuss your paper on reinforcement learning
under latent dynamics. We certainly appreciate it. Thanks a lot. Thanks for having me.
And to our listeners, thank you all for tuning in. If you'd like to read Dylan's paper,
you may find a link at aka.ms slash abstracts. You can also find the paper on Archive and on the NeurIPS conference website.
I'm Amber Tingle from Microsoft Research, and we hope you'll join us next time on Abstracts.