Microsoft Research Podcast - 055 (rerun) - Building Literate Machines with Dr. Adam Trischler

Starting point is 00:00:00 As an ex-English teacher, I was really interested back in March to talk to Adam Trischler about how he's trying to close the human-computer communication gap by teaching machines to learn from and understand the world through language. Whether you're all up to speed on machine learning curriculum, or you're in Adam's class for the first time, I know you'll enjoy episode 16 of the Microsoft Research Podcast, Building Literate Machines. The problem right now is algorithms require so much data to learn even simple things like recognizing cats and dogs. And that brings us back to the meta-learning aspect is we really want to build systems that learn on the fly and continually rather than just once and doing their task forever and ever. And we want those systems to be able to pick things up rapidly, really data efficiently. So from just a few examples, I can learn a new task. You're listening to the Microsoft Research Podcast, a show that brings you closer to the cutting edge of technology research and the scientists behind it. I'm your host, Gretchen Huizenga.

Starting point is 00:01:10 Learning to read, think, and communicate effectively is part of the curriculum for every young student. But Dr. Adam Trischler, research manager and leader of the machine comprehension team at Microsoft Research Montreal, would like to make it part of the curriculum for your computer as well. And he's working on that, using methods from machine learning, deep neural networks, and other branches of AI to close the communication gap between humans and computers. Today, Dr. Trischler talks about his dream of making literate machines, his efforts to design meta-learning algorithms

Starting point is 00:01:45 that can actually learn to learn, the importance of what he calls few-shot learning in that meta-learning process, and how, through the method of one-to-many mapping and machine learning, our computers may not only be answering our questions, but asking them as well. That and much more on this episode of the Microsoft Research Podcast.

Starting point is 00:02:15 Welcome, Adam Trischler, to the podcast this morning. Great to talk to you. Thank you, Gretchen. I'm glad to be here. All the way from Montreal, Quebec, from the Microsoft Research Montreal Lab. Tell me what it's like working at the research lab in Montreal. It's become a global hotbed for AI research. Yeah, it's been incredible. It's like when Maluba first moved here in December 2015, it was small. I mean, Montreal had this big lab through Yashua and Mila. But in terms of the sort of corporate interest and presence, it was nothing. And our lab was one of the first. And we started with about four or five people. And since then, it's been like watching a skyscraper go up

Starting point is 00:02:59 around you. It's really, really cool. So you lead the machine comprehension team at Microsoft Research Montreal in broad strokes, because I'm going to go specific. Tell me what gets you up in the morning. What do you do? What do you study? Why is it important? Okay, so broadly speaking, the goal of my team is to build literate machines. So what that means a bit more specifically is machines that learn from and understand the world through language like people do. So I'm really inspired by the prospect of using machines to unlock all of the human knowledge that's recorded in text. We have textbooks, we have Wikipedia, so many instructions of how to do things, skills that we can gain, recipes we can make, whether like literal recipes for cooking or,

Starting point is 00:03:45 you know, a recipe for how to do something or build something. Like an algorithm? Exactly, like an algorithm. So yeah, we could get the machines to be sort of self-aware and adjust their own algorithms in some sense. But anyway, what really drew me to AI in general, and this field in particular is, first of all, the prospect of using AI as a window onto human intelligence. So I've always been fascinated by thought. As a kid, I definitely tended to spend a lot of time in my own mind, just thinking about thinking, talking to myself even, and I've always loved language as well. So I'm just generally fascinated by how language shapes and facilitates thoughts even, like grandly, philosophically speaking, yes, of course, but even at the smaller, more personal scale as

Starting point is 00:04:37 well. For example, how writing an idea down or explaining it to another person can clarify and crystallize that idea for you yourself and help you to understand it better. So like I said, our goal is to build literate machines that learn from and understand the world through language, which is so useful for us. I don't want to make machines that sort of make people read less, but I want machines that can augment human reading, for example, by taking care of some of the more mundane parts, slogging through your insurance policy or an HR manual, or maybe filtering the massive stream of text, this massive stream that we have coming at us through Twitter

Starting point is 00:05:17 and everywhere else. Or maybe even helping me to understand all the legalese that I hit agree to when I'm getting an app. All those terms of service, exactly. Right, I agree. The things that everyone is supposed to read, but no one does. You know, I think one of the things you hinted at there is what I'm most excited about is like seeing a literate machine as a kind of librarian or a tutor who could guide like a human student or just somebody who has an interest in something through new books, new materials, new ideas, like stoking their natural curiosity and

Starting point is 00:05:52 feeding them new information as the student would ask questions? You know, my mind is racing already. I have a list of questions that I want to ask and some of them are jumping up to the front of the line, raising their hands, going, ask me, ask me. All right, go ahead. I know, right? Let's go off on that tangent a little bit about how your research is teaching these machines to be literate, read, think, and communicate like humans. Because I'm trying to wrap my brain around what it looks like if the machine's doing some of the heavy lifting, we could call it. For me, how does that transfer into my brain so that I could say, understand something more quickly? My daughter in college, how could she

Starting point is 00:06:36 use a machine to help her be more successful in school? I guess where we really started in this sort of quest for illiterate machines was a lot more concrete, pretty straightforward in the field of question answering. So the idea here is we simply want to build machines that if given a document, let's say an essay or a news article, you could ask the machine a question about that document and it could provide you with a reasonable, hopefully ideally correct answer to your question. So why we were interested in this is because I think it's a nice proxy for testing comprehension. So understanding, comprehending language, obviously this is a sort of ephemeral concept. We don't have a good way of measuring

Starting point is 00:07:23 something like comprehension or understanding directly, but we can use proxies like question answering. So in building a literate machine, for example, one of the tests we can imagine is a comprehension test like a human student would receive at school. You're given these test questions. What happened here? Why? What were the motivations? And what followed? Let's talk about what it actually looks like. What you've just described is so fascinating. I've talked to several researchers here at Microsoft Research who all use this idea of the delicate balance, the dance between human and machine, augment versus replace. What does that look like in my life? How could that play out? Right now,

Starting point is 00:08:05 I have a tablet where if I hold my finger on a word I don't know, I can look it up, right? Is there some more advanced version of that? I mean, what do you envision here? Yeah, I mean, question answering definitely has a passive nature to it, right? The machine is just kind of sitting there waiting for you as the user to highlight the word you don't understand. Another thing we've worked on fairly recently, and which is perhaps even more exciting, is the idea of question asking. So, you know, just the other side of the coin, a machine that rather than just waiting for you to pose a question and answering it for you, can do this sort of curiosity-driven question asking to sort of guide you along through knowledge or act as a tutor for you. So we're just getting started on this. Obviously,

Starting point is 00:08:59 it's a complex task. In some ways, asking questions is more complex than answering them because you can imagine if I give you a document and a question, if it's well posed, it probably leads to a single answer. Whereas if I gave you a document, even if I provided you with a set of terms or snippets from that document that I said, ask questions about these. Even if you're just looking at the information in the document, you can probably formulate several questions that lead to the same answer. So it's this one-to-many mapping rather than a one-to-one mapping that we see more typically in the question answering case. So it's really difficult. As I said, we're just getting started, but already we've seen some adoption of this. I think it could be

Starting point is 00:09:45 super useful in things like MOOCs, massively online open courses. But ultimately, you can see this is really kind of driving people hopefully to learn more and to improve their understanding. When you're talking about all the interaction there, I start thinking about user interface. We'll talk in a bit about the technology behind everything, but ultimately it's going to be AI has an interface. And I imagine you're already thinking about what user interface these kinds of machines could have that anticipate and generate questions for me or answer them or communicate with me. We've certainly thought about it. I think we're sufficiently far away from making this technology work nicely outside of, you know, kind of trivial,

Starting point is 00:10:33 literally trivial settings like trivia on search. But we have thought about it. You know, you could imagine if you're in this sort of interaction on your phone, the camera could be watching your face for those sort of visual cues. We can pick up on vocal cues as well. The bigger picture is so big that it's not something that one team is going to tackle. It's these different teams coming together, breaking the problem down into its parts, and then hopefully bringing them all together into, you know, a really compelling product or assistant or use case in the end. But there's just so much for us in the language itself that we're not even close to that yet. And so thankfully, we do have, you know, in MSR, these other amazing teams who are working on these other really challenging aspects of these problems.

Starting point is 00:11:35 Let's focus on the language for a bit, since that's your work. When I think of talking to machines, being able to communicate with me, I think pretty well ethnocentrically. Of course, we all tend to, right? Yeah. And I know there's a lot of work going on in machine translation as well. Are we heading for an AI future where language makes no difference to any of us, people or machines? I think we're definitely working in that direction. One of the really interesting things about this new wave of AI through deep learning is that we get a lot of stuff like bilingualism or trilingualism,

Starting point is 00:12:08 et cetera, almost for free. It's not totally free, but on the algorithmic side, we can do things that are very general. We can build systems that are agnostic to the particular language they're operating on. And so, you know, there's generalities to language. Obviously, there are different specifics, and you can't classify all languages the same. There are structural differences, morphological differences. But from the algorithmic side, there's a lot of generality. And so we can build something on our end that can really operate in a whole variety of languages. And all that matters for us is that we have the training data to tailor it to each of those individual languages.

Starting point is 00:12:51 So I can build the same recurrent neural network that processes English or French or both, whether it will do those things and do them well, is really just a factor of the data that I use to train my system. Well, and we've talked about data on this podcast before and how important it is to have not just lots, but quality of the right kind of data to learn and train machines. So let's talk about machine learning for a second. There's several lines of research in that deep learning, supervised, unsupervised, reinforcement learning. And until fairly recently, the models have been pretty task specific, but you're doing work. Absolutely. You're doing work on what we call meta-learning algorithms. Can you tell us about that and particularly your work on like rapid adaptation and

Starting point is 00:13:39 conditionally shifted neurons? Yeah, meta-learning is something that we're really excited about here in the group. And in general, the field is really picking up this new sort of paradigm, I guess. So meta-learning really refers to learning to learn. The goal of a meta-learning algorithm is the ability to learn new tasks efficiently, given little training data for each individual task. So, you know, these systems that we're training right now, these task-specific systems require so much data to perform really, really well. And they do, and that's great, but we don't always have a ton of data. So the problem we're trying to address with meta-learning is that, like right now, neural networks, they need to see, generally speaking, hundreds or thousands of examples of a class to be able to recognize it.

Starting point is 00:14:27 On the other hand, you have people. If you showed me one or two pictures of some hypothetical brand new car model, I'd probably be able to recognize it on the road, in different colors, in different lighting and weather conditions right away from one or two pictures. So this is something we call few-shot learning. It just takes a few shots of this thing I want to learn to be able to recognize it. And yeah, humans are really, really good at it. So the standard way we've trained machine learning models, in particular deep neural networks up until now, it really doesn't encourage this ability for few-shot

Starting point is 00:15:06 learning. ML systems, you know, they're typically trained through one optimization phase, after which that's it, learning is over. So we build these systems in this train and then test manner, and they don't really scale to complex environments, and they don't have the capability to pick up topics on the fly. So one of the things we do in meta-learning, first of all, is just change the training setup. Rather than showing a model how to do one big task with lots of data, we'll show it a set of smaller related tasks that sort of have a few things in common, but they're not exactly the same. So to give you an example, let's say,

Starting point is 00:15:52 instead of learning to recognize like 50 breeds of dogs all at once, we'll give a model the smaller task of recognizing, let's say, just chihuahuas versus huskies, and then a different task, which is just poodles versus bulldogs, and so on. So when you have this kind of setup, there are these general features of all dogs that will remain constant across all these smaller tasks. And the model can learn these gradually, picking them up across tasks as it sees them in sequence and over many examples. But there are also these specific features of the specific breeds that the model has to pick up rapidly from just a few examples while it's doing each individual task. And so there are these sort of two levels of learning, the slower, like,

Starting point is 00:16:31 what do dogs look like in general? And then there's the faster, what do these specific dogs look like and what are the features that discriminate them from each other? So is that the rapid adaptation that you're talking about? Exactly. So that second level that I mentioned is the rapid adaptation. Right. So as a side note, if you're training a machine on recognizing dogs and they all have four legs and then suddenly you've got a dog that lost a leg and it sees three-legged dog, can it still say that it's a dog? Yeah, so that would definitely be a problem. That's another place where human learning just vastly diverges from machine learning right now. You know, an algorithm that learns to recognize dogs, I mean, it's tough to say because we can't peek inside them and we

Starting point is 00:17:18 can't ask them questions, but I feel pretty confident. Yeah, not yet. We're getting there. But I feel pretty confident in saying that not yet. We're getting there. But I feel pretty confident in saying that the algorithm that's learning to recognize dogs and seeing all these dogs with four legs, it doesn't really understand what legs are. It doesn't know what legs are for. It doesn't know that legs can be lost without fundamentally altering the dogness of this thing, right? It sees an object composed of all these parts. And so it's going to infer that all these parts are necessary to the definition of that thing. As humans, we know because of the way we decompose things that that's not always the case. Yeah. So that'll be a big challenge, I think, for the next round of research to say three-legged dogs are still dogs.

Starting point is 00:18:07 Yeah, so that relates to an idea that Yoshua Bengio has been interested in for a long time. Of course, Yoshua is the luminary here for us in Montreal, but he's interested in this idea of the factors of variation and the controllable factors of variation that really define classes of things, for example, and what we can do in the world. So let's go on that topic a little more, a little deeper, Adam, because understanding is a tough concept, even when we're talking about do humans understand other humans? Can machines ever really understand? I'm going to go out on a limb and say no, but that's in the context of traditional, historical, even metaphysical,

Starting point is 00:18:45 our understanding of understanding. Since this is what you're working on, how would you argue that they eventually can? Or would you? I absolutely think that machines are capable of understanding in the way that we use the word to refer to humans. I do believe that. So for me, I think that one of the fundamental aspects of understanding is just the way we relate and map different things to each other. So one example that sort of indicates the importance of relations and mappings for understanding is there's this thing you can do where you repeat a word to yourself over and over again, and it just loses all meaning. And you're like, what the hell is this word? Like, why does this refer to the thing that it refers to?

Starting point is 00:19:30 I'm sure, you know, you've done that before. I've done it. I still do it. Yeah, it's fun and it's weird. Well, sometimes we start saying our names over and over. Exactly. Yeah. That's the first thing I can remember doing that with is my own name. And I think that we repeat this thing over and over again to ourselves. And in that process, it becomes disconnected. It loses its meaning because you've cut it off from the things it maps to. So one of the things that we talk about as very important in language research for AI is the concept of grounding. So language is grounded in the real world. It reflects the real world. The simplest sort of example of this is nouns. Like nouns are

Starting point is 00:20:13 essentially labels for things we see in the world. The point is that language is grounded and it reflects the world and it's fundamentally connected to the world. And there's only so much understanding you can glean from language without that connection. And that's one of the things that we're missing right now in AI and machine learning is that typically we teach machines language just within the world of language. So they learn how to use words by looking at words, by reading documents, but they don't learn how to use words by interacting with people in a conversation or asking for things to be given to them or seeing that a chair is this thing that I can sit on and push around. They're missing that mapping and that grounding to the real world and that's where I

Starting point is 00:21:05 think understanding stems from and because I do think it is possible for machines to get that other aspect of things to ground language I do think they can understand so that leads to a question obvious in my mind is how how are you gonna do, I mean, like you guys have said, we want to close the communication gap between humans and machines. So, yeah, how? Because what you described to me is the reverse engineering. It's the human disengaging from that grounding and becoming the machine saying, Adam, Adam, Adam. That doesn't make sense right right so let's then engineer the backwards you know so from adam meaning nothing to the machine

Starting point is 00:21:54 to adam meaning this cool guy at montreal you know microsoft research Please go on. Actually, this podcast is very challenging. This is like a philosophical test. Well, yeah. And probably if you tell me, you'll have to shoot me because it's like proprietary information on how we're doing that. You're really asking the tough questions here. No, this is good. So one of the things we have to do is move away from this paradigm we have right now of training, let's say, agents that use language, these literate machines I'm trying to build. We have to move away from training them just on language. As I said, they have to learn how to map those words to things that are not words. So that could be images, that could be actions. So learning that

Starting point is 00:22:48 the words sitting down refer to an action that you can observe, let's say in the real world or from videos, that is making the connection between the words and something else. Maybe this goes back to what we talked about earlier on the fact that multiple teams that are working on multiple hard problems, and you've got people doing computer vision and emotive computing, and you're not working on that, but they are. And then suddenly your algorithms get together and have this love child algorithm that, you know, I mean, I hear what you're saying. But what it begs also is that the computer is going to have to be watching a lot of stuff, reading a lot of stuff,

Starting point is 00:23:32 experiencing, to use another human term, a lot of stuff. For sure. Yeah, I don't think you can get around that. I mean, the way we know right now to create an intelligent system is to have a baby and then raise it through the 10, 20 years. It's a long process to get a fully formed, functional adult. Obviously, even a child is way, way smarter than the systems we have right now. But the point is, intelligence isn't easy, right? Even in humans, you know, we don't just come out being able to speak. We pick it up really quickly and we're tailored to do things like few-shot learning really effectively. But still, it doesn't just

Starting point is 00:24:17 happen. That's why research. I mean, exactly, exactly. That's why your lab is doing what you're doing. Exactly. And the systems to pick up these mappings that I'm talking about, they need to experience the world or at least experience proxies or recordings of the world through books, a lot of text, images, videos, all that sort of stuff, which we call multimodal learning. And the problem right now is we can't possibly have an algorithm do that because algorithms require so much data to learn even simple things like recognizing cats and dogs. And that brings us back to the meta-learning aspect, is we really want to build systems that learn on the fly and continually rather than just once and doing their task forever and ever. And we want those systems to be able to pick things up rapidly, really data efficiently. So from just a few examples, I can learn a new task. Yeah, I think you called that sample efficiency.

Starting point is 00:25:19 Exactly, yeah. Sample efficiency, data efficiency. And then being able to transfer what it learns to other scenarios that aren't exactly the same. Exactly. So right now in meta-learning, the way we set things up is that the different tasks that the model is undertaking, they are fairly similar to each other. But ultimately, we'd like to start breaking that and saying, okay, now you've learned cats and dogs, but let's take that to something very different, like elephants and horses. I interviewed one of your colleagues over there, Harm Van Sion, about how they used reinforcement learning to beat Ms. Pac-Man at its own game. And he used the phrase islands of tractability, which is where you focus your efforts because

Starting point is 00:26:12 you know you can have some semblance of success there. So what are the biggest challenges right now that might be offshore from the islands of tractability that you see are most exciting or promising areas of research, especially for people that might be interested in getting into this? One of the big ones for me, because of my focus on language, and I think a lot of people here at the Montreal Lab will echo this, is evaluation of language. You know, in machine learning and many, many other fields, you only get what you measure. And it is so hard to measure the quality of language. Language is slippery, and it's really hard to measure. So one of the things we focused on in this group here is natural language generation. So obviously, this factors into the earlier work I was talking about on question

Starting point is 00:27:05 generation. We have to build a question in natural language that flows, that makes sense to people, and even more importantly, that asks about real information and is well-posed and leads you to the answer that you're looking for. And it's so incredibly hard to measure and evaluate language. This comes up in machine translation as well. And part of the reason for this is that there's so many different ways to say the same thing. And so even training a language generation system on these massive corpora of language data that we have now, they're still missing out on very many plausible and reasonable ways to say things. They'll never see those hidden away examples. I could help you, but I don't write algorithms.

Starting point is 00:27:55 Well, see, that's one of the really interesting things is that in the lab here in MSR Montreal right now, we are all, with a few exceptions, we're computer scientists. And we're the ones tackling this language problem and trying to measure the quality of language outputs, but we're not necessarily the best suited to that job. I really think that this is a problem that could really, really benefit from an interdisciplinary effort. There's so much that goes into language, which is beyond algorithm and computation, I think, that we really need to take into account. And I could help you.

Starting point is 00:28:34 I would love that. We need help. Honestly, evaluating language is so, so hard. Let me tell you about this metric we have called BL blue, which is used in machine translation. So the way you measure the quality of outputs in machine translation is you have one or more sort of gold reference translations. It's one sentence that says maybe machine learning is hard. So your system is going to say all these different things like machine learning is a tough problem. Artificial intelligence is not very easy. You know, you can imagine all these different things like machine learning is a tough problem. Artificial intelligence is not very easy. You can imagine all these different ways. But the way that Blu measures these outputs is it just looks at how many words overlap or how many pairs of words overlap between the two candidate translations. And so you can obviously imagine there's a translation which has zero or very, very low overlap with my reference, but it could still be completely valid. And so in this case, my algorithm is being told, no, this thing that you tried to say is complete garbage because it has zero overlap. And it's being punished very, very unfairly for saying something totally reasonable. But just because we have this very limited ability to measure what really is

Starting point is 00:29:45 reasonable, the whole thing's breaking down. I think AI is changing our world in ways that, well, of course, ways that we don't understand. But one of them is this, you know, having my daughter in college right now, when anyone says, what are you studying? If she says anything besides a STEM subject, people look at her and say, oh, what are you going to ask with that degree? Do you want fries with that? However, I keep hearing from computer scientists like yourself, especially researchers, that other things are necessary. You know, computation is necessary, but not sufficient for AI. Yeah, I mean, the thing is,

Starting point is 00:30:31 AI, intelligence, this word refers to human behavior. And so if you want to build a system that exhibits intelligence and intelligence is this human thing, it should be intelligent in ways similar to the ways we are. And so we need an understanding of human behavior, and that's something that we hoodie-clad STEM guys don't necessarily have. I hear you. Hey, so let's talk about, you know, I asked you at the beginning of the program, what gets you up in the morning? And I sort of want to find out at the end of the program, what keeps you up at night? And I read an article that you wrote in Fast Company called Who Will Protect AI from Humanity? Why do machines need to be protected from us? Right now, I don't think that they do. This really goes back to what I said before when you

Starting point is 00:31:26 asked me about, do I think that machines can one day understand in a human sense or have true comprehension? I said the answer is yes. And if I'm right, if the answer is yes, then one day, not necessarily tomorrow or even five years from now, but one day, we're going to have a system that understands, that has memory of experiences that it has had. And if such a system exists, obviously we'll have played a significant hand in its creation, but I don't think we can consider ourselves to be the owners of such a system. To some degree, it will be an individual because it has its own memories and its own understanding. And so that's where I think that we have to start realizing that just because you built it

Starting point is 00:32:18 and you trained it and you even wrote the code for it, it's not necessarily yours or property of some corporation that fed it all of its data. It's not a huge concern. We're not even close to that, but it's good to think about these things probably far in advance. Exactly. Because sort of the follow-up question on that is, yeah, I believe that if we don't ask the questions now, the unintended consequences will hit. And so having read the article, I should say, you bring up the issues of rights and ethics, and we've all seen the Boston Dynamics robots both get pushed over and you kind of feel sorry for the robot. For sure. And now you see it open a door and fighting off a stick and you're going, I don't feel sorry for you anymore. I'm scared of you.

Starting point is 00:33:10 But these are issues that we, as humans, we understand rights and ethics and compassion and things like that. And so I guess my better question would be, what questions do we need to be asking? And what issues do we need to be addressing while we are still upstream from pretty fundamental changes in our relationship with technology? One of the things we mentioned before is this idea of measurement. I think I talk about this in the article to some degree is, you know, we have to be able to, you know, how can you measure individuality? It doesn't really make sense. How can you measure the memory capacity of something? Obviously, we can measure in megabytes and gigabytes, but I mean more in an experiential sense. We don't know how to do that. But if we're going to consider

Starting point is 00:33:56 artificial intelligence as something on the level of people, then we have to start thinking about, yes, first, how do we measure consciousness or sentience or memory or understanding? Because only then can you start to say, this thing is pretty close to being human. I don't think we should kick it as it's trying to walk along or wipe out its memory that it built up over thousands of simulated hours or even real hours of experience. So measurement is a big thing, for sure. It's so philosophical, and I love thinking about those things, but they're definitely outside of my purview, really. Expertise and even task. You're not actually paid at the Montreal lab to come up with these deep philosophical answers.

Starting point is 00:34:45 It's like, get the machine learning done, baby. Yeah. Oh, Adam Trischler, it's been fantastic talking to you this morning. I suspect we'll be seeing the fruits of your labors in ways that we might not even expect. But I'm looking forward to watching where you're going and what's going on in Montreal. Looks fantastic. Thank you.

Starting point is 00:35:08 And I look forward to seeing the machine that's going to understand me and talk to me in my old age. This has been a lot of fun. Thanks. To learn more about Dr. Adam Trischler and the quest for literate machines, visit Microsoft.com slash research.

Your Ad Here

Microsoft Research Podcast - 055 (rerun) - Building Literate Machines with Dr. Adam Trischler

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.