Microsoft Research Podcast - 039 - Malmo, Minecraft and Machine Learning with Dr. Katja Hofmann

Starting point is 00:00:00 What we really designed Malmo for was for this broad exchange between industry and the academic community. AI and reinforcement learning are fascinating techniques and this whole area is developing very, very quickly, but it's not quite clear where the next new insight is going to come from. So we were really envisioning this as a meta-platform that others could be using to start to compare, start to integrate the different approaches, and really generate new insights and understanding how to push this area forward. You're listening to the Microsoft Research Podcast, a show that brings you closer to the cutting edge of technology research and the scientists behind it. I'm your host, Gretchen Huizenga.

Starting point is 00:00:51 The wildly popular video game Minecraft might appear to be an unlikely candidate for machine learning research, but to Dr. Katja Hoffman, the research lead of Project Malmo in the Machine Intelligence and Perception Group at Microsoft Research in Cambridge, England, it's the perfect environment for teaching AI agents, via reinforcement learning, to act intelligently and cooperatively in the open world.

Starting point is 00:01:14 Today, Dr. Hoffman talks about her vision of a future where machines learn to collaborate with people and empower them to help solve complex real-world problems. She also shares the story of how her early years in East Germany, behind the Iron Curtain, shaped her both personally and professionally, and ultimately facilitated a creative, exploratory mindset about computing that informs her work to this day. That and much more on this episode of the Microsoft Research Podcast.

Starting point is 00:01:54 Katja Hoffman, welcome to the podcast. Thanks for having me. You're a researcher in the Machine Intelligence Group in Cambridge. Give us a brief description of the work you do and the things you're working on. In broad strokes, what gets you up in the Machine Intelligence Group in Cambridge. Give us a brief description of the work you do and the things you're working on. In broad strokes, what gets you up in the morning? I broadly work in the area of multi-agent reinforcement learning.

Starting point is 00:02:12 So I look at how artificial agents can learn to interact with complex environments. And I'm particularly excited about possibilities of those environments being ones where they interact with humans. So one area is, for example, in video games, where AI agents that learn to interact intelligently could really enrich video games and create new types of experiences. For example, learn directly from their interactions with players, remember what kinds of interactions they've had, and be

Starting point is 00:02:43 really more relatable and more responsive to what is actually going on in the game and how they're interacting with the player. So let's drill in there just a little bit. What you're talking about is what I think you call collaborative AI. Tell us a little bit more about what this is and why it's an important line of inquiry in the broader field of AI research? I think of collaborative AI as really one of the key questions in artificial intelligence. So when we think about machines that learn or machines that perform certain tasks, the end goal of this is always, in my mind, a machine that is better at helping us achieve what we want to achieve. So I really think about AI research as coming up with new ways to enable this collaboration between machines and humans in a huge variety of different applications. I think that general techniques like machine learning and reinforcement learning are particularly promising for pushing that forward and enabling new kinds of collaborations. But what I envision in the future are machines that understand what we're trying to do

Starting point is 00:03:47 and that can really reason about what it is that is most helpful to us to support us in achieving more. Yes, this kind of resonates with a lot of research that's going on here at Microsoft Research, specifically this idea of augment versus replace. Talk to me a little bit more about the augmenting, where this application is most fruitful for us, do you think? The application area that I think about the most is video games. And I see video games as an important stepping stone towards enabling more general applications of collaborative AI. I think in the long term, there are a huge number of different applications, starting from health to being creative. But I think for many of those applications,

Starting point is 00:04:33 some of the key research questions we're still trying to address are very, very hard to address in these open-ended real-world environments. And video games form this really interesting intermediate stage where we have very complex worlds that are extremely rich, that are really engaging for people. And we have a lot of scenarios where people interact in small or large communities within those fantastic worlds. So there is this space for introducing new technology,

Starting point is 00:05:02 for understanding how agents within video games could learn to collaborate with people. And then we can, once well understood, take this technology and apply it to new application areas. I want to talk about this exciting application of collaborative AI that you've really poured a lot of time and effort into called Project Malmo. Tell us about it. Absolutely. So Project Malmo is an AI experimentation platform that my team and I built on top of the popular video game Minecraft. And when we started developing this project, we were really thinking about what would the future platform for AI research look like? So what features, what capabilities would we need to support not just addressing the next research questions that we're immediately focusing on, but that would enable a huge number

Starting point is 00:05:50 of researchers, enthusiasts to explore the space and push AI research forward for the next 10, 15 years to come. So we built a very generic platform on top of this Minecraft game to allow researchers to create different new tasks, to put different kinds of agents into the game, and to really push the state of the art forward. So why did you choose Minecraft to launch this platform? Minecraft seemed just the perfect fit for a project like this. If you've played Minecraft before, you'll know that it's kind of the sandbox game. It's almost like a meta game where different communities, different players go in and create amazing artifacts and new games within this game. There's the concept of parkour races,

Starting point is 00:06:36 where people set up kind of race courses to race against their friends. And there's build battles where you have to construct creative new structures. So people are using this sandbox game to come up with all these different ways of playing and interacting with each other. And if you think about a general purpose platform for AI evaluation, then this is exactly what we need. We need to be able to have a platform that is general enough so that we can create some initial tasks that push the current state of the art in reinforcement learning or AI more generally, and then be able to expand

Starting point is 00:07:10 that, build on that to create more and more complex, more and more challenging tasks to throw at our agents to really push them moving forward. There are tasks in there around navigation that are at the level that can be addressed by current state ofof-the-art approaches, all the way to a task that would require complex communication interaction in natural language. And we can support all those different scenarios within the platform. Let's get into the weeds a little bit about the technology behind Melmo and what kinds of methodologies, approaches, techniques are you using to do this research?

Starting point is 00:07:46 So my team here is particularly interested in this area called reinforcement learning, where an agent starts with a clean slate or very little initial knowledge about the world, but it has to learn from interaction with its environment. So, for example, try a certain action and then learn about the consequences of that action in that world. But within the Malmo platform, not only work on reinforcement learning is supported, but within Malmo, we aim to support all types of artificial intelligence research. So we provide opportunities for more symbolic reasoning approaches all the way to the reinforcement learning type of approaches that I mentioned earlier. So we've talked about researchers using deep learning for exploratory AI. Why is Melmo a good platform for this?

Starting point is 00:08:37 So reinforcement learning is the general technique, and then deep reinforcement learning is a specific part of that where you learn from very high dimensional observations. So, for example, if you wanted to directly learn how to interpret visual signals that come from the environment, then you would of the key challenges in reinforcement learning to understand how an agent that is thrown into some arbitrary complex world can collect new experiences or can collect data about this world in such a way that it learns to understand what kind of tasks, what kinds of goals it could achieve within that world. Talk a little bit about the importance of simulation when we're working with AI agents before they hit the open world. So AI agents, and specifically when we're talking about reinforcement learning agents that learn from direct interaction with an environment, they essentially learn from trial and error. So they need to try some action and some of those might fail and they need to observe the negative consequences of those actions in order to form a good understanding about how the world works.

Starting point is 00:09:51 Now, if we were to think about safety critical applications like flight or self-driving cars or maybe the health space, then we want to make sure that if agents explore, they only try those actions that are actually sensible and have a good chance of success within those environments. Just like people do, we wouldn't try arbitrary random things. We would try to address a problem by taking a path forward that has a good chance of giving us a good outcome. So we would like those agents to be pre-trained as much as possible in a simulated environment, then the key question is how to transfer that to the real environment. And there's actually a lot of work, many other colleagues are focusing on that, to look at how well learned behavior can be translated into real-world situations. And in many cases, it's surprisingly effective to

Starting point is 00:11:01 actually pre-train in simulation and then perform the task in a real-world environment. So let's talk a bit about this reinforcement learning. We've talked to other researchers about the rewards and I don't want to call them punishments, but how does an algorithm deal with rewards? It's different from a human, right? We would feel embarrassed or ashamed or whatever that we made a mistake. An algorithm doesn't have those feelings. What is the mechanism that you build into the machine learning techniques for reinforcement learning? That's a fantastic question. And there are two ways in which I think about how those agents handle rewards. One side of this is kind of the reward structure that is imposed on the agent, in this

Starting point is 00:11:46 case often by the experimenter or by the person designing a system. So if you play a game, let's say a parkour race in Minecraft, then the experimenter could look at this and say, well, if you win the race, then you get a positive reward. So we want to encourage that behavior. We give a plus one. If you lose, then you get a minus one. So this is very much kind of a hand-tuned reward structure that would be application dependent. In some situations, there might be a reward structure that is very natural. So when you play Atari games, the score was a pretty good proxy reward structure. Or if you learn to play chess, then winning or losing the game is a good one. But there are many application areas where there's not an obvious reward structure. If you wanted to train an agent to help a human user perform

Starting point is 00:12:37 whatever the user is trying to do, then you would need to think more generally about what is a good reward structure for such an agent that learns to cooperate with people. And this is actually one of the key directions that we're focusing on within my group to understand how to create reward structures that would be useful for learning this kind of cooperative behavior or the supportive behavior in agents. Your team has intentionally made the Malmo platform open source and independent of or agnostic to the variety of methodologies and programming languages that researchers or developers might bring to the table. Why have you worked so hard to make this project so open? That's a fantastic question. And if you ask my team, it's quite painful to develop for three different operating systems and at least five different programming languages.

Starting point is 00:13:41 But what we really designed MIMO4 was for this broad exchange between industry and the academic community. AI and reinforcement learning are fascinating techniques and this whole area is developing very, very quickly, but it's not quite clear where the next new insight is going to come from. There are thousands of people, maybe hundreds of thousands of people working on tackling some of those really hard challenges. And it is often hard to compare very different approaches with each other because different communities might have different tools, might be using meta platform that others could be using to start to compare, start to integrate the different approaches and really generate new insights and understanding how to push this area forward. So you alluded just now to collaboration with academia industry and academia collaborating. What does each party bring to the party, so to speak, in this back and forth between applied and pure research? I think that's a great question.

Starting point is 00:14:49 And it's one where I kind of think of myself as a little bit in the middle, as kind of having a foot in both. Because at Microsoft Research, we are really in the research area. So some of our work very much looks like what would happen at an academic institution. But at the same time, we have access to this huge company where there's a lot of interesting problems in product groups and looking at people who are solving some of the really hard challenges that are experienced in industry. And that gives us a great source of both collaboration and inspiration of what's coming, maybe what key challenges need to be addressed and how we can frame our research or

Starting point is 00:15:30 maybe think about our research in such a way that it can achieve maximal input. So really thinking about, well, if I focus on this area and answering those questions, how is this going to change the world? How is this going to change how we look at things and what kind of real world impact could that have? And I think this real-world perspective is something really valuable that the industry side brings to the table. On the academic side, you have the opportunity to take a longer-term view. I mean, some of the breakthrough technologies that we're using today, deep learning, has roots that go several decades back and have taken a huge amount of dedication and energy of really a sustained research program on the academic side. And I

Starting point is 00:16:12 think this long-term view, as well as the huge variety of different opinions and ideas that we see in the academic community, are extremely valuable and absolutely necessary for pushing the field forward. By bringing those two sides together, I think really interesting things can happen. When you launched Project Malmo a year ago, you had a really overwhelming response. And since then, there have been some exciting new developments with the project. Tell us what happened at the outset, what's happened since, and what you're seeing on the horizon in the future. So we were really excited more than a year ago and launching the platform and really seeing how the community would respond and what they would do with it. And it was really crazy and

Starting point is 00:16:56 exciting to see how many people would pick up this platform and use it for a huge variety of different purposes. Some of those we had never thought about or never anticipated. There was quite some uptake in class projects to learn about concepts in AI and different approaches to AI. There are different enthusiasts that just love interacting with the platform and seeing what they can come up with. And there's a huge variety in terms of the research directions that people are establishing on top of it.

Starting point is 00:17:30 At the same time, people asked about using specific benchmarks within the platform. And this is what motivated us to really start looking at what benchmarks we would like to create in order to facilitate research in some of the key research areas that we think of as the most challenging at this point in time. So out of that discussion initially came the first Malmo Collaborative AI Challenge, which we ran last year. Maybe some of the listeners remember the pig chase task that we ran there, which was a fantastic experience. And we had very motivated participants from all over the world, but also gave us a lot of new insights, new learnings about what went well, what could be improved, and how to move into a next round of creating a challenge that could have a more targeted impact on the research community. So since then, we started reaching out. We've built up a network

Starting point is 00:18:23 of academic collaborators, which we're very proud to work with. And those are teams at Queen Mary University in London, as well as EPFL, the university in Lausanne. And we put our heads together and looked at specifically the question of generality. So how can we create a benchmark that would push the research towards learning approaches that would learn not just to perform well on a single task with maybe a single opponent in a multi-agent scenario, but that would really be pushing those approaches towards multi-task, multi-agent learning in this video game setting that we're providing here.

Starting point is 00:19:00 That's actually quite exciting and challenging, right? It's very challenging, yes. Do you have another challenge coming up or have you already launched another challenge? Absolutely. We're just about to launch the Marlowe competition, which is the multi-agent reinforcement learning competition in Malmo. I believe by the time this podcast comes out, it will already have been launched. But we're just preparing the platform for actually releasing the competition. How did you come up with the name Malmo? So this was geographically inspired.

Starting point is 00:19:34 As you may know, home of Minecraft is Stockholm, where the game was originally developed and initiated. And my team is here in Cambridge in the UK. And Malmo is almost in the middle between those. It's not a precise fit, but it was the closest kind of large city that we found. And I hear that Malmo is a very energetic young city with a lot of exciting things happening. So it seems like a very good fit. That's great. And just sort of a side question, how would you define who your audience is for the Malmo challenges? So for the competitions, we are targeting students in particular. So we think students who have maybe had some experience with reinforcement learning or machine learning and are trying to test out their skills.

Starting point is 00:20:22 This would be a fantastic competition to try out what they've learned and maybe push things forward. We think that the benchmark is a serious one for the academic community. So we'd love to see exciting new research in multi-agent, multi-task learning to be inspired by this. So certain techniques in AI research like current exploration and probabilistic modeling are tackling the big problems of ambiguity, complexity, and uncertainty, and a lot of interesting work in this area is coming out of the Cambridge lab. How are you seeing these probabilistic models affect the work that you're doing? I think this is a fantastic area where two research areas that used to be quite distinct from each other are starting to move more closely together.

Starting point is 00:21:08 If you think about reinforcement learning as learning from trial and error, then you need to think very deeply about uncertainty. So if you already know what effects your actions will have in some environment, then there's no need to explore. You already know everything you need to know, and you can just compute or develop an optimal strategy for acting in this environment. But a lot of times, the experience of the agent will be very limited. And then it's crucial to have a good estimate or be able to quantify that uncertainty and whether uncertainty is due to not having seen enough data in a particular part of the environment or whether there's true stochasticity. So, for example, the difference between playing a slot machine and exploring a dark cave. In one example, you see there's a lot

Starting point is 00:21:59 of uncertainty about the outcome, but it's because the slot machine is really random. In the other one, there is a lot of uncertainty just because you may be at the entrance of the cave and you just haven't explored the space yet. And a lot of recent work in probabilistic modeling, especially around stochastic neural networks, is taking very rapid steps forward towards capturing those different kinds of uncertainty. And there is a very clear application area in reinforcement learning, some key questions on how to effectively use those models to inform exploration and acting in uncertain environments. So talk a bit more. I know you have a working paper on stochastic neural nets and generative models. Talk a little bit more about what you're exploring here, you personally.

Starting point is 00:22:40 One working paper we have on the archive right now is looking at variational models for model-based reinforcement learning. So the question we ask there is whether, starting with some initial set of data, it would be more effective and efficient to learn a model of that environment and then compute or derive a structure of how to act, or whether what's called model-free approaches that directly learn a mapping to a policy, so directly map to an agent's behavior, would be more effective. And there we found, first of all,

Starting point is 00:23:15 that these types of models can very effectively learn the structure of the environment, but also that we're particularly able to use those models in situations where the environment might be changing. So one scenario is if you get to initially explore the environment, but you may not have any information about the task or the reward structure, then we're able to show that we can leverage these models to learn effectively even before we see the reward structure and that the agent can then effectively combine this with new information

Starting point is 00:23:45 and perform the task very well, even with a limited amount of data. We're told over and over that AI has great potential to benefit the world, but there's a lot of speculation about what it will actually look like, especially if we enable agents on a large scale to learn and make decisions, maybe even independent of humans. And you're doing work that enables AI agents to learn and make decisions. So is there anything about that that keeps you up at night? And if so, what are you doing about it?

Starting point is 00:24:26 With all new technology that is initially poorly understood, there are a lot of open questions and a lot of uncertainty. As you note, especially with reinforcement learning technology, we still don't have a lot of good examples of actually seeing this kind of technology deployed in real-world applications. And there's key questions around how we can make sure that all those decisions that the agent learns to make are warranted and that we can explain and understand why they made those decisions. If those are agents that learn to make decisions in a way that allows them to interact quite

Starting point is 00:25:02 directly with humans or much more flexibly than we're currently used to? How can we make sure that those decisions and those interactions are actually intelligible to the users and that we have some way of understanding when an agent is learning something new and how our interaction with it affects what it's learning? Those are really hard questions. And I think this is one reason why video games are such a fantastic platform for studying those kinds of technologies. They provide the sandbox where we can safely explore what it means to interact with something that is learning to interact with you. And how to frame that whole learning process and that interaction and make sure that we can understand what's going on there.

Starting point is 00:25:41 You have an incredibly interesting personal life story that includes firsthand experience of what I would say is one of the biggest events of the 20th century. Tell us a bit about your life from your beginnings and what got you interested in computer science and how did you come to be doing research at Microsoft Research in England? So I was born in East Germany. So I spent the first years of my life actually behind the Iron Curtain. And I am old enough to remember some of this. And I think it quite shaped who I am as a person. Initially, there was this conviction that there's all these countries that I would never travel to.

Starting point is 00:26:19 And my mom is a geography teacher. So she used to tell us about all these countries in the world. And it was just completely infeasible for us to even think about ever visiting them. And then once the wall fell, it opened up all those opportunities. And I've traveled to all continents except for Antarctica. I've studied in three different countries, worked in four. So, you know, it's just created all this life story that, you know, when I was a little girl,

Starting point is 00:26:51 I could have never imagined anything like this. So it's quite interesting to reflect on this. I think it's also impacted how I came to encounter computers. In the GDR, there weren't a lot of computers.

Starting point is 00:27:04 So personal computers only became available after the wall fell. And that means that when I was growing up, when I was a teenager and my parents bought a computer because they heard that that could be something useful, there were no preconceptions about what you can and cannot do using a computer. There weren't any role models that would have influenced me. And it was this box that stood in my room and that I could just use completely freely and figure out what it can do. I taught myself how to program and started exploring what's possible there. And I think this kind of freedom and sense of creativity and being able to explore really shaped my perspective of computing

Starting point is 00:27:46 and really influenced this decision to pursue computer science later on. So from that room and your three countries of study, how did you end up at Microsoft Research? It almost feels like that was by accident. I, again, wouldn't have anticipated this. But during my PhD, I was working in an area called information retrieval. So I was looking at how to make search engines more intelligent. And one of the threats that was already present was to understand how search engines could learn from the users to be better able to surface exactly what the user

Starting point is 00:28:22 is looking for. So it had this theme of interactive learning combined with search applications. One of the people that I really admired was working here at Microsoft Research in Cambridge. And I introduced myself at a conference and we chatted about this work. That led to an internship at Microsoft Research and Bing in Redmond at the time and gave me this opportunity to explore what possibilities there were and what it meant to be working at Microsoft Research. That was a fantastic experience. So when I was asked whether I wanted to apply for a postdoc position here, I was thrilled and that was the job choice that I had.

Starting point is 00:28:59 Well, as we close, Katja, I like to ask researchers what advice they would give to other aspiring researchers, especially those who might be interested in the kind of research you're doing in machine intelligence and perception. And I'm especially interested in what you'd say to young women who might be considering following in your footsteps. That's a very, very big question. Give us a big answer. What I'd like to see more in researchers is thinking more about why we do what we do. Academic and intellectual curiosity is one thing, but the research that we do in AI and machine learning has such a huge potential to change the world.

Starting point is 00:29:43 Anyone, be that an expert in the field or someone on the street, will tell you that AI will have a fundamental impact on people's lives. So I'd like to encourage people to think more about what matters about their research, why they're doing this research, and how we should be using it to influence and make sure that the world we create is the one we actually want to live in in the future. Just drilling in a little bit, you don't see a lot of women proportionally in machine learning research. What do you think was instrumental in getting you interested in going in that field? I think what played a big part was that I had this opportunity to just completely freely discover this area and figure out what it meant for me before being confronted with

Starting point is 00:30:31 maybe preconceptions and ideas that people had about what computing was and what it was for and who it was for. And I see a lot of the imbalance that we're seeing today very much reflecting these preconceptions that people have. I'd love to play a part in changing those preconceptions. I think they are incredibly hurtful and prevent very bright, very talented people who would work really hard to make an impact on this field and sometimes prevent them from entering the field. It's hard to derive specific advice from that. to make an impact on this field and sometimes prevent them from entering the field. It's hard to derive specific advice from that. You can say, try to isolate yourself from what everyone thinks, and that's not an easy thing to do.

Starting point is 00:31:21 No, but you know what? Right now, I think you are an inspiration on what can be done and what you can accomplish. So Katja Hoffman, thank you so much for taking time at the end of your day and the beginning of mine to come on the podcast today. Thank you so much for having me. It was a great pleasure. To learn more about Dr. Katja Hoffman and how Project Malmo is pushing the state of the art in reinforcement learning,

Starting point is 00:31:43 visit Microsoft.com slash research.

Microsoft Research Podcast - 039 - Malmo, Minecraft and Machine Learning with Dr. Katja Hofmann

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.