Microsoft Research Podcast - 039 - Malmo, Minecraft and Machine Learning with Dr. Katja Hofmann
Episode Date: August 29, 2018The wildly popular video game, Minecraft, might appear to be an unlikely candidate for machine learning research, but to Dr. Katja Hofmann, the research lead of Project Malmo in the Machine Intelligen...ce and Perception Group at Microsoft Research in Cambridge, England, it’s the perfect environment for teaching AI agents, via reinforcement learning, to act intelligently – and cooperatively – in the open world. Today, Dr. Hofmann talks about her vision of a future where machines learn to collaborate with people and empower them to help solve complex, real-world problems. She also shares the story of how her early years in East Germany, behind the Iron Curtain, shaped her both personally and professionally, and ultimately facilitated a creative, exploratory mindset about computing that informs her work to this day.
Transcript
Discussion (0)
What we really designed Malmo for was for this broad exchange between industry and the academic community.
AI and reinforcement learning are fascinating techniques and this whole area is developing very, very quickly,
but it's not quite clear where the next new insight is going to come from.
So we were really envisioning this as a meta-platform that others could be using to start
to compare, start to integrate the different approaches, and really generate new insights
and understanding how to push this area forward. You're listening to the Microsoft Research
Podcast, a show that brings you closer to the cutting edge of technology research
and the scientists behind it. I'm your host, Gretchen Huizenga.
The wildly popular video game Minecraft might appear to be an unlikely candidate
for machine learning research,
but to Dr. Katja Hoffman,
the research lead of Project Malmo
in the Machine Intelligence and Perception Group
at Microsoft Research in Cambridge, England,
it's the perfect environment for teaching AI agents, via reinforcement learning,
to act intelligently and cooperatively in the open world.
Today, Dr. Hoffman talks about her vision of a future where machines learn to collaborate
with people and empower them to help solve complex real-world problems.
She also shares the story of how her early years in East Germany,
behind the Iron Curtain,
shaped her both personally and professionally,
and ultimately facilitated a creative, exploratory mindset about computing
that informs her work to this day.
That and much more on this episode of the Microsoft Research Podcast.
Katja Hoffman, welcome to the podcast.
Thanks for having me.
You're a researcher in the Machine Intelligence Group in Cambridge.
Give us a brief description of the work you do and the things you're working on. In broad strokes, what gets you up in the Machine Intelligence Group in Cambridge. Give us a brief description of the work you do
and the things you're working on.
In broad strokes, what gets you up in the morning?
I broadly work in the area
of multi-agent reinforcement learning.
So I look at how artificial agents
can learn to interact with complex environments.
And I'm particularly excited about possibilities
of those environments being ones
where they interact with humans.
So one area is, for example, in video games, where AI agents that learn to interact intelligently
could really enrich video games and create new types of experiences. For example, learn directly
from their interactions with players, remember what kinds of interactions they've had, and be
really more relatable and more responsive to what is actually going on in the game and how they're interacting with the player.
So let's drill in there just a little bit. What you're talking about is what I think you call collaborative AI. Tell us a little bit more about what this is and why it's an important line of inquiry in the broader field of AI research? I think of collaborative AI as
really one of the key questions in artificial intelligence. So when we think about machines
that learn or machines that perform certain tasks, the end goal of this is always, in my mind,
a machine that is better at helping us achieve what we want to achieve. So I really think about AI research as coming up with new ways to enable this collaboration between machines and humans
in a huge variety of different applications. I think that general techniques like machine
learning and reinforcement learning are particularly promising for pushing that
forward and enabling new kinds of collaborations. But what I envision in the future are machines that understand what we're trying to do
and that can really reason about what it is that is most helpful to us to support us in achieving more.
Yes, this kind of resonates with a lot of research that's going on here at Microsoft Research,
specifically this idea of augment versus replace. Talk to me a little bit more about the
augmenting, where this application is most fruitful for us, do you think?
The application area that I think about the most is video games. And I see video games as an
important stepping stone towards enabling more general applications of collaborative AI.
I think in the long term, there are a huge number of different
applications, starting from health to being creative. But I think for many of those applications,
some of the key research questions we're still trying to address are very, very hard to address
in these open-ended real-world environments. And video games form this really interesting
intermediate stage
where we have very complex worlds that are extremely rich,
that are really engaging for people.
And we have a lot of scenarios where people interact
in small or large communities within those fantastic worlds.
So there is this space for introducing new technology,
for understanding how agents within video games could learn to
collaborate with people. And then we can, once well understood, take this technology and apply
it to new application areas. I want to talk about this exciting application of collaborative AI that
you've really poured a lot of time and effort into called Project Malmo. Tell us about it. Absolutely. So Project Malmo is an AI experimentation platform that my team and I
built on top of the popular video game Minecraft. And when we started developing this project,
we were really thinking about what would the future platform for AI research look like? So
what features, what capabilities would we need to support not just addressing the
next research questions that we're immediately focusing on, but that would enable a huge number
of researchers, enthusiasts to explore the space and push AI research forward for the next 10,
15 years to come. So we built a very generic platform on top of this Minecraft game to allow
researchers to create different
new tasks, to put different kinds of agents into the game, and to really push the state
of the art forward. So why did you choose Minecraft to launch this platform?
Minecraft seemed just the perfect fit for a project like this. If you've played Minecraft
before, you'll know that it's kind of the sandbox game. It's almost like a meta game where different communities, different players go in
and create amazing artifacts and new games within this game. There's the concept of parkour races,
where people set up kind of race courses to race against their friends. And there's build battles
where you have to construct creative new structures. So people are using this sandbox game to come up with all these different ways of playing
and interacting with each other.
And if you think about a general purpose platform for AI evaluation, then this is exactly what
we need.
We need to be able to have a platform that is general enough so that we can create some
initial tasks that push the current state
of the art in reinforcement learning or AI more generally, and then be able to expand
that, build on that to create more and more complex, more and more challenging tasks to
throw at our agents to really push them moving forward.
There are tasks in there around navigation that are at the level that can be addressed
by current state ofof-the-art
approaches, all the way to a task that would require complex communication interaction in
natural language. And we can support all those different scenarios within the platform.
Let's get into the weeds a little bit about the technology behind Melmo and what kinds of
methodologies, approaches, techniques are you using to do this research?
So my team here is particularly interested in this area called reinforcement learning, where an agent starts with a clean slate or very little initial knowledge about the world, but it has to learn from interaction with its environment.
So, for example, try a certain action and then learn
about the consequences of that action in that world. But within the Malmo platform, not only
work on reinforcement learning is supported, but within Malmo, we aim to support all types of
artificial intelligence research. So we provide opportunities for more symbolic reasoning
approaches all the way to the reinforcement
learning type of approaches that I mentioned earlier. So we've talked about researchers
using deep learning for exploratory AI. Why is Melmo a good platform for this?
So reinforcement learning is the general technique, and then deep reinforcement learning
is a specific part of that where you learn from very high dimensional observations.
So, for example, if you wanted to directly learn how to interpret visual signals that come from the environment, then you would of the key challenges in reinforcement learning to understand how an agent that is thrown into some arbitrary complex world can collect new experiences or can collect data about this world in such a way that it learns to understand what kind of tasks, what kinds of goals it could achieve within that world.
Talk a little bit about the importance of simulation when we're
working with AI agents before they hit the open world. So AI agents, and specifically when we're
talking about reinforcement learning agents that learn from direct interaction with an environment,
they essentially learn from trial and error. So they need to try some action and some of those
might fail and they need to observe the negative consequences of those actions in order to form a good understanding about how the world works.
Now, if we were to think about safety critical applications like flight or self-driving cars or maybe the health space, then we want to make sure that if agents explore, they only try those actions that are actually
sensible and have a good chance of success within those environments. Just like people do, we
wouldn't try arbitrary random things. We would try to address a problem by taking a path forward
that has a good chance of giving us a good outcome. So we would like those agents to be
pre-trained as much as possible in a simulated environment,
then the key question is how to transfer that to the real environment. And there's actually a lot
of work, many other colleagues are focusing on that, to look at how well learned behavior can
be translated into real-world situations. And in many cases, it's surprisingly effective to
actually pre-train in simulation and then perform the task in a real-world environment.
So let's talk a bit about this reinforcement learning.
We've talked to other researchers about the rewards and I don't want to call them punishments, but how does an algorithm deal with rewards?
It's different from a human, right?
We would feel embarrassed or ashamed or whatever that we made a mistake. An algorithm doesn't have those feelings. What is the mechanism that you build
into the machine learning techniques for reinforcement learning? That's a fantastic
question. And there are two ways in which I think about how those agents handle rewards.
One side of this is kind of the reward structure that is imposed on the agent, in this
case often by the experimenter or by the person designing a system. So if you play a game, let's
say a parkour race in Minecraft, then the experimenter could look at this and say, well,
if you win the race, then you get a positive reward. So we want to encourage that behavior. We give a plus one. If you lose, then you get a minus one. So this is very much kind of a hand-tuned reward
structure that would be application dependent. In some situations, there might be a reward
structure that is very natural. So when you play Atari games, the score was a pretty good proxy
reward structure. Or if you learn to play chess, then
winning or losing the game is a good one. But there are many application areas where there's
not an obvious reward structure. If you wanted to train an agent to help a human user perform
whatever the user is trying to do, then you would need to think more generally about what is a good
reward structure for such an agent that learns to cooperate with people.
And this is actually one of the key directions that we're focusing on within my group to understand how to create reward structures
that would be useful for learning this kind of cooperative behavior or the supportive behavior in agents. Your team has intentionally made the Malmo platform open source and independent of
or agnostic to the variety of methodologies and programming languages that researchers or developers might bring to the table.
Why have you worked so hard to make this project so open?
That's a fantastic question.
And if you ask my team, it's quite painful to develop for three different operating systems and at least five different programming languages.
But what we really designed MIMO4 was for this broad exchange between industry
and the academic community. AI and reinforcement learning are fascinating techniques and this whole
area is developing very, very quickly, but it's not quite clear where the next new insight is
going to come from. There are thousands of people, maybe hundreds of thousands of people working on tackling some of those really hard challenges.
And it is often hard to compare very different approaches with each other because different communities might have different tools, might be using meta platform that others could be using to start to compare, start to integrate the different approaches and really generate new insights and understanding how to push this area forward.
So you alluded just now to collaboration with academia industry and academia collaborating.
What does each party bring to the party, so to speak, in this back and forth between applied and pure research?
I think that's a great question.
And it's one where I kind of think of myself as a little bit in the middle, as kind of having a foot in both.
Because at Microsoft Research, we are really in the research area.
So some of our work very much looks like what would happen at an
academic institution. But at the same time, we have access to this huge company where there's
a lot of interesting problems in product groups and looking at people who are solving some of
the really hard challenges that are experienced in industry. And that gives us a great source of
both collaboration and inspiration of what's
coming, maybe what key challenges need to be addressed and how we can frame our research or
maybe think about our research in such a way that it can achieve maximal input. So really thinking
about, well, if I focus on this area and answering those questions, how is this going to change the
world? How is this going to change how we look at things and what kind of real world impact could
that have? And I think this real-world perspective is something really
valuable that the industry side brings to the table. On the academic side, you have the opportunity
to take a longer-term view. I mean, some of the breakthrough technologies that we're using today,
deep learning, has roots that go several decades back and have taken a huge amount of
dedication and energy of really a sustained research program on the academic side. And I
think this long-term view, as well as the huge variety of different opinions and ideas that we
see in the academic community, are extremely valuable and absolutely necessary for pushing
the field forward. By bringing those two sides together, I think really interesting things can happen. When you launched Project Malmo a year
ago, you had a really overwhelming response. And since then, there have been some exciting
new developments with the project. Tell us what happened at the outset, what's happened since,
and what you're seeing on the horizon in the
future. So we were really excited more than a year ago and launching the platform and really seeing
how the community would respond and what they would do with it. And it was really crazy and
exciting to see how many people would pick up this platform and use it for a huge variety of
different purposes. Some of those we had never thought about or never anticipated.
There was quite some uptake in class projects to learn about concepts in AI and different
approaches to AI.
There are different enthusiasts that just love interacting with the platform and seeing
what they can come up with.
And there's a huge variety in terms of the research directions that people are establishing
on top of it.
At the same time, people asked about using specific benchmarks within the platform. And this is what motivated us to really start looking at what benchmarks we would like to create in order to facilitate research in some of the key research areas that we think of as the most challenging at this point in time.
So out of that discussion initially came the first Malmo Collaborative AI Challenge, which
we ran last year.
Maybe some of the listeners remember the pig chase task that we ran there, which was a
fantastic experience.
And we had very motivated participants from all over the world, but also gave us a lot of new insights, new learnings about what went well, what could be improved,
and how to move into a next round of creating a challenge that could have a more targeted
impact on the research community. So since then, we started reaching out. We've built up a network
of academic collaborators, which we're very proud to work with.
And those are teams at Queen Mary University in London, as well as EPFL, the university
in Lausanne.
And we put our heads together and looked at specifically the question of generality.
So how can we create a benchmark that would push the research towards learning approaches that would learn not
just to perform well on a single task with maybe a single opponent in a multi-agent scenario,
but that would really be pushing those approaches towards multi-task, multi-agent
learning in this video game setting that we're providing here.
That's actually quite exciting and challenging, right?
It's very challenging, yes.
Do you have another challenge coming up or have you already launched another challenge?
Absolutely. We're just about to launch the Marlowe competition, which is the multi-agent reinforcement learning competition in Malmo.
I believe by the time this podcast comes out, it will already have been launched.
But we're just preparing the platform for actually releasing the competition.
How did you come up with the name Malmo?
So this was geographically inspired.
As you may know, home of Minecraft is Stockholm, where the game was originally developed and initiated.
And my team is here in Cambridge in the UK.
And Malmo is almost in the middle between
those. It's not a precise fit, but it was the closest kind of large city that we found. And
I hear that Malmo is a very energetic young city with a lot of exciting things happening. So it
seems like a very good fit. That's great. And just sort of a side question, how would you define who your audience is for the Malmo challenges?
So for the competitions, we are targeting students in particular.
So we think students who have maybe had some experience with reinforcement learning or machine learning and are trying to test out their skills.
This would be a fantastic competition to try out what they've
learned and maybe push things forward. We think that the benchmark is a serious one for the
academic community. So we'd love to see exciting new research in multi-agent, multi-task learning
to be inspired by this. So certain techniques in AI research like current exploration and probabilistic modeling
are tackling the big problems of ambiguity, complexity, and uncertainty, and a lot of
interesting work in this area is coming out of the Cambridge lab. How are you seeing these
probabilistic models affect the work that you're doing? I think this is a fantastic area where two
research areas that used to be quite distinct from each other are starting to move more closely together.
If you think about reinforcement learning as learning from trial and error, then you need to think very deeply about uncertainty.
So if you already know what effects your actions will have in some environment, then there's no need to explore. You already know everything you need to know, and you can just compute or develop an optimal
strategy for acting in this environment.
But a lot of times, the experience of the agent will be very limited.
And then it's crucial to have a good estimate or be able to quantify that uncertainty and
whether uncertainty is due to not having seen enough data in a particular
part of the environment or whether there's true stochasticity. So, for example, the difference
between playing a slot machine and exploring a dark cave. In one example, you see there's a lot
of uncertainty about the outcome, but it's because the slot machine is really random.
In the other one, there is a lot of uncertainty just because you may be at the entrance of the cave and you just haven't explored
the space yet. And a lot of recent work in probabilistic modeling, especially around
stochastic neural networks, is taking very rapid steps forward towards capturing those different
kinds of uncertainty. And there is a very clear application area in reinforcement learning,
some key questions on how to effectively use those models to inform exploration and acting in uncertain environments.
So talk a bit more. I know you have a working paper on stochastic neural nets and generative models.
Talk a little bit more about what you're exploring here, you personally.
One working paper we have on the archive right now is looking at variational models
for model-based reinforcement learning. So the question we ask there is whether,
starting with some initial set of data, it would be more effective and efficient to learn a model
of that environment and then compute or derive a structure of how to act, or whether what's called model-free approaches
that directly learn a mapping to a policy,
so directly map to an agent's behavior,
would be more effective.
And there we found, first of all,
that these types of models can very effectively learn
the structure of the environment,
but also that we're particularly able to use those models
in situations where the environment might be changing.
So one scenario is if you get to initially explore the environment,
but you may not have any information about the task or the reward structure,
then we're able to show that we can leverage these models to learn effectively even before we see the reward structure
and that the agent can then effectively combine this with new information
and perform the task very well, even with a limited amount of data.
We're told over and over that AI has great potential to benefit the world,
but there's a lot of speculation about what it will actually look like,
especially if we enable agents on a large scale to learn and make decisions,
maybe even independent of humans.
And you're doing work that enables AI agents to learn and make decisions.
So is there anything about that that keeps you up at night?
And if so, what are you doing about it?
With all new technology that is initially poorly understood, there are a lot of open
questions and a lot of uncertainty.
As you note, especially with reinforcement learning technology, we still don't have a
lot of good examples of actually seeing this kind of technology deployed in real-world
applications. And there's key questions around how we can make sure that all those decisions that the
agent learns to make are warranted and that we can explain and understand why they made
those decisions.
If those are agents that learn to make decisions in a way that allows them to interact quite
directly with humans or much more flexibly than we're currently used to?
How can we make sure that those decisions and those interactions are actually intelligible to the users
and that we have some way of understanding when an agent is learning something new
and how our interaction with it affects what it's learning?
Those are really hard questions.
And I think this is one reason why video games are such a fantastic platform for studying those kinds of technologies.
They provide the sandbox where we can safely explore what it means to interact with something that is learning to interact with you.
And how to frame that whole learning process and that interaction and make sure that we can understand what's going on there.
You have an incredibly interesting personal life story that includes firsthand experience of what I would say is one of the biggest events of the 20th century.
Tell us a bit about your life from your beginnings and what got you interested in computer science
and how did you come to be doing research at Microsoft Research in England?
So I was born in East Germany. So I spent the first years of my life actually behind the Iron Curtain.
And I am old enough to remember some of this.
And I think it quite shaped who I am as a person.
Initially, there was this conviction that there's all these countries that I would never
travel to.
And my mom is a geography teacher.
So she used to tell us about all these countries in the world. And it was just completely infeasible for us to even think about ever visiting them.
And then once the wall fell, it opened up all those opportunities.
And I've traveled to all continents except for Antarctica.
I've studied in three different countries,
worked in four.
So, you know, it's just created all this life story that,
you know, when I was a little girl,
I could have never imagined
anything like this.
So it's quite interesting
to reflect on this.
I think it's also impacted
how I came to encounter computers.
In the GDR,
there weren't a lot of computers.
So personal computers only became
available after the wall fell. And that means that when I was growing up, when I was a teenager and
my parents bought a computer because they heard that that could be something useful,
there were no preconceptions about what you can and cannot do using a computer.
There weren't any role models that would have influenced me.
And it was this box that stood in my room and that I could just use completely freely and figure out what it can do.
I taught myself how to program and started exploring what's possible there.
And I think this kind of freedom and sense of creativity and being able to explore really shaped my perspective of computing
and really influenced this decision to pursue computer science later on.
So from that room and your three countries of study,
how did you end up at Microsoft Research?
It almost feels like that was by accident.
I, again, wouldn't have anticipated this. But during my PhD,
I was working in an area called information retrieval. So I was looking at how to make
search engines more intelligent. And one of the threats that was already present was to understand
how search engines could learn from the users to be better able to surface exactly what the user
is looking for. So it had this theme of interactive learning combined with search applications.
One of the people that I really admired was working here at Microsoft Research in Cambridge.
And I introduced myself at a conference and we chatted about this work.
That led to an internship at Microsoft Research and Bing in Redmond at the time and gave me
this opportunity to explore
what possibilities there were and what it meant to be working at Microsoft Research.
That was a fantastic experience. So when I was asked whether I wanted to apply for a
postdoc position here, I was thrilled and that was the job choice that I had.
Well, as we close, Katja, I like to ask researchers what advice they would give to
other aspiring researchers,
especially those who might be interested in the kind of research you're doing in machine intelligence and perception.
And I'm especially interested in what you'd say to young women who might be considering following in your footsteps.
That's a very, very big question.
Give us a big answer. What I'd like to see more in researchers is thinking more
about why we do what we do. Academic and intellectual curiosity is one thing, but
the research that we do in AI and machine learning has such a huge potential to change the world.
Anyone, be that an expert in the field or someone
on the street, will tell you that AI will have a fundamental impact on people's lives.
So I'd like to encourage people to think more about what matters about their research,
why they're doing this research, and how we should be using it to influence and make sure
that the world we create is the one we actually want to live in in the future.
Just drilling in a little bit, you don't see a lot of women proportionally in machine learning research.
What do you think was instrumental in getting you interested in going in that field?
I think what played a big part was that I had this opportunity to just completely freely discover this area and figure out what it meant for me before being confronted with
maybe preconceptions and ideas that people had about what computing was and what it was
for and who it was for.
And I see a lot of the imbalance that we're seeing today very much reflecting these preconceptions that people have.
I'd love to play a part in changing those preconceptions.
I think they are incredibly hurtful and prevent very bright, very talented people who would work really hard to make an impact on this field and sometimes prevent them from entering the field.
It's hard to derive specific advice from that. to make an impact on this field and sometimes prevent them from entering the field.
It's hard to derive specific advice from that.
You can say, try to isolate yourself from what everyone thinks, and that's not an easy thing to do.
No, but you know what? Right now, I think you are an inspiration on what can be done and what you can accomplish. So Katja Hoffman, thank you so much for taking time
at the end of your day and the beginning of mine
to come on the podcast today.
Thank you so much for having me.
It was a great pleasure.
To learn more about Dr. Katja Hoffman
and how Project Malmo is pushing the state of the art
in reinforcement learning,
visit Microsoft.com slash research.