Microsoft Research Podcast - 055 (rerun) - Building Literate Machines with Dr. Adam Trischler
Episode Date: December 19, 2018This episode first aired in March, 2018.Learning to read, think and communicate effectively is part of the curriculum for every young student. But Dr. Adam Trischler, Research Manager and leader of th...e Machine Comprehension team at Microsoft Research Montreal, would like to make it part of the curriculum for your computer as well. And he’s working on that, using methods from machine learning, deep neural networks, and other branches of AI to close the communication gap between humans and computers.Today, Dr. Trischler talks about his dream of making literate machines, his efforts to design meta-learning algorithms that can actually learn to learn, the importance of what he calls “few-shot learning” in that meta-learning process, and how, through a process of one-to-many mapping in machine learning, our computers not may not only be answering our questions, but asking them as well.
Transcript
Discussion (0)
As an ex-English teacher, I was really interested back in March to talk to Adam Trischler about how he's trying to close the human-computer communication gap by teaching machines to learn from and understand the world through language.
Whether you're all up to speed on machine learning curriculum, or you're in Adam's class for the first time, I know you'll enjoy episode 16 of the Microsoft Research Podcast, Building Literate Machines.
The problem right now is algorithms require so much data to learn even simple things like recognizing cats and dogs. And that brings us back to the meta-learning aspect is we really want
to build systems that learn on the fly and continually rather than just once and doing their task forever and ever. And we want
those systems to be able to pick things up rapidly, really data efficiently. So from just a few
examples, I can learn a new task. You're listening to the Microsoft Research Podcast, a show that
brings you closer to the cutting edge of technology research and the scientists behind it. I'm your
host, Gretchen Huizenga.
Learning to read, think, and communicate effectively is part of the curriculum for every young student.
But Dr. Adam Trischler, research manager and leader of the machine comprehension team at
Microsoft Research Montreal, would like to make it part of the curriculum for
your computer as well.
And he's working on that, using methods from machine learning, deep neural networks, and
other branches of AI to close the communication gap between humans and computers.
Today, Dr. Trischler talks about his dream of making literate machines, his efforts to
design meta-learning algorithms
that can actually learn to learn,
the importance of what he calls
few-shot learning in that meta-learning process,
and how, through the method of one-to-many mapping
and machine learning,
our computers may not only be answering our questions,
but asking them as well.
That and much more on this episode of the Microsoft Research Podcast.
Welcome, Adam Trischler, to the podcast this morning.
Great to talk to you.
Thank you, Gretchen. I'm glad to be here.
All the way from Montreal, Quebec, from the Microsoft Research Montreal Lab.
Tell me what it's like working at the research lab in Montreal. It's become a global hotbed for AI research.
Yeah, it's been incredible. It's like when Maluba first moved here in December 2015, it was small. I mean, Montreal had this big lab through Yashua and Mila. But in terms of the
sort of corporate interest and presence, it was nothing. And our lab was one of the first. And we
started with about four or five people. And since then, it's been like watching a skyscraper go up
around you. It's really, really cool. So you lead the machine comprehension team
at Microsoft Research Montreal
in broad strokes, because I'm going to go specific. Tell me what gets you up in the morning.
What do you do? What do you study? Why is it important? Okay, so broadly speaking, the goal
of my team is to build literate machines. So what that means a bit more specifically is machines
that learn from and understand the world through language like people do.
So I'm really inspired by the prospect of using machines to unlock all of the human knowledge that's recorded in text.
We have textbooks, we have Wikipedia, so many instructions of how to do things, skills that we can gain, recipes we can make, whether like literal recipes for cooking or,
you know, a recipe for how to do something or build something.
Like an algorithm?
Exactly, like an algorithm. So yeah, we could get the machines to be
sort of self-aware and adjust their own algorithms in some sense. But anyway,
what really drew me to AI in general, and this field in particular is, first of all, the prospect of using AI as a window onto human intelligence.
So I've always been fascinated by thought. As a kid, I definitely tended to spend a lot of time in my own mind, just thinking about thinking, talking to myself even, and I've always loved language as well.
So I'm just generally fascinated by how language shapes and facilitates thoughts even, like
grandly, philosophically speaking, yes, of course, but even at the smaller, more personal scale as
well. For example, how writing an idea down or explaining it to another person can clarify and
crystallize that idea for you yourself
and help you to understand it better. So like I said, our goal is to build literate machines that
learn from and understand the world through language, which is so useful for us. I don't
want to make machines that sort of make people read less, but I want machines that can augment
human reading, for example, by taking care of
some of the more mundane parts, slogging through your insurance policy or an HR manual, or maybe
filtering the massive stream of text, this massive stream that we have coming at us through Twitter
and everywhere else. Or maybe even helping me to understand all the legalese that I hit agree to
when I'm getting an app.
All those terms of service, exactly.
Right, I agree.
The things that everyone is supposed to read, but no one does. You know, I think one of the
things you hinted at there is what I'm most excited about is like seeing a literate machine
as a kind of librarian or a tutor who could guide like a human student or just somebody who has an interest in something
through new books, new materials, new ideas, like stoking their natural curiosity and
feeding them new information as the student would ask questions?
You know, my mind is racing already. I have a list of questions that I want to ask and
some of them are jumping up to the front of the line, raising their hands, going, ask me, ask me. All right, go ahead.
I know, right? Let's go off on that tangent a little bit about how your research is teaching
these machines to be literate, read, think, and communicate like humans. Because I'm trying to
wrap my brain around what it looks like if the machine's doing
some of the heavy lifting, we could call it. For me, how does that transfer into my brain
so that I could say, understand something more quickly? My daughter in college, how could she
use a machine to help her be more successful in school?
I guess where we really started in this sort of quest for illiterate machines
was a lot more concrete, pretty straightforward in the field of question answering. So the idea
here is we simply want to build machines that if given a document, let's say an essay or a news
article, you could ask the machine a question about that document and it could provide you with
a reasonable, hopefully ideally correct answer to your question. So why we were interested in this is because
I think it's a nice proxy for testing comprehension. So understanding, comprehending
language, obviously this is a sort of ephemeral concept. We don't have a good way of measuring
something like comprehension or
understanding directly, but we can use proxies like question answering. So in building a literate
machine, for example, one of the tests we can imagine is a comprehension test like a human
student would receive at school. You're given these test questions. What happened here? Why?
What were the motivations? And what followed?
Let's talk about what it actually looks like. What you've just described is so fascinating.
I've talked to several researchers here at Microsoft Research who all use this idea of the delicate balance, the dance between human and machine, augment versus replace.
What does that look like in my life? How could that play out? Right now,
I have a tablet where if I hold my finger on a word I don't know, I can look it up, right?
Is there some more advanced version of that? I mean, what do you envision here?
Yeah, I mean, question answering definitely has a passive nature to it, right? The machine is just
kind of sitting there waiting for you as the user to highlight the word you don't understand. Another thing we've worked on
fairly recently, and which is perhaps even more exciting, is the idea of question asking. So,
you know, just the other side of the coin, a machine that rather than just waiting for you to pose a question and answering
it for you, can do this sort of curiosity-driven question asking to sort of guide you along
through knowledge or act as a tutor for you. So we're just getting started on this. Obviously,
it's a complex task. In some ways, asking questions is more complex than answering them because you can
imagine if I give you a document and a question, if it's well posed, it probably leads to a single
answer. Whereas if I gave you a document, even if I provided you with a set of terms or snippets
from that document that I said, ask questions about these. Even if you're just looking at the information in
the document, you can probably formulate several questions that lead to the same answer. So it's
this one-to-many mapping rather than a one-to-one mapping that we see more typically in the question
answering case. So it's really difficult. As I said, we're just getting started, but already
we've seen some adoption of this. I think it could be
super useful in things like MOOCs, massively online open courses. But ultimately, you can
see this is really kind of driving people hopefully to learn more and to improve their
understanding. When you're talking about all the interaction there, I start thinking about user
interface. We'll talk in a bit about the technology behind everything,
but ultimately it's going to be AI has an interface. And I imagine you're already thinking
about what user interface these kinds of machines could have that anticipate and generate questions
for me or answer them or communicate with me. We've certainly thought about it. I think we're
sufficiently far away from making this technology work nicely outside of, you know, kind of trivial,
literally trivial settings like trivia on search. But we have thought about it. You know, you could
imagine if you're in this sort of interaction on your phone, the camera could be watching your face for those sort of visual cues.
We can pick up on vocal cues as well.
The bigger picture is so big that it's not something that one team is going to tackle.
It's these different teams coming together, breaking the problem down into its parts, and then hopefully bringing them all together into, you know, a really compelling
product or assistant or use case in the end. But there's just so much for us in the language
itself that we're not even close to that yet. And so thankfully, we do have, you know, in MSR,
these other amazing teams who are working on these other really challenging aspects of these problems.
Let's focus on the language for a bit, since that's your work.
When I think of talking to machines, being able to communicate with me,
I think pretty well ethnocentrically.
Of course, we all tend to, right?
Yeah. And I know there's a lot of work going on in machine translation as well.
Are we heading for an AI future where language makes no difference to any of us, people or machines?
I think we're definitely working in that direction.
One of the really interesting things about this new wave of AI through deep learning is that we get a lot of stuff like bilingualism or trilingualism,
et cetera, almost for free. It's not totally free, but on the algorithmic side, we can do things
that are very general. We can build systems that are agnostic to the particular language they're
operating on. And so, you know, there's generalities
to language. Obviously, there are different specifics, and you can't classify all languages
the same. There are structural differences, morphological differences. But from the algorithmic
side, there's a lot of generality. And so we can build something on our end that can really operate
in a whole variety of languages. And all that matters
for us is that we have the training data to tailor it to each of those individual languages.
So I can build the same recurrent neural network that processes English or French or both,
whether it will do those things and do them well, is really just a factor of the data that I use to train my
system. Well, and we've talked about data on this podcast before and how important it is to have
not just lots, but quality of the right kind of data to learn and train machines. So let's talk
about machine learning for a second. There's several lines of research in that deep learning,
supervised, unsupervised, reinforcement learning. And until fairly recently, the models have been pretty task specific,
but you're doing work. Absolutely. You're doing work on what we call meta-learning algorithms.
Can you tell us about that and particularly your work on like rapid adaptation and
conditionally shifted neurons? Yeah, meta-learning is something that we're really excited about here
in the group. And in general, the field is really picking up this new sort of paradigm, I guess.
So meta-learning really refers to learning to learn. The goal of a meta-learning algorithm is
the ability to learn new tasks efficiently, given little training data for each individual task. So, you know,
these systems that we're training right now, these task-specific systems require so much data to
perform really, really well. And they do, and that's great, but we don't always have a ton of
data. So the problem we're trying to address with meta-learning is that, like right now,
neural networks, they need to see, generally speaking, hundreds or thousands of examples of a class to be able to recognize it.
On the other hand, you have people.
If you showed me one or two pictures of some hypothetical brand new car model, I'd probably be able to recognize it on the road, in different colors, in different lighting and weather conditions right away from one or two pictures.
So this is something we call few-shot learning.
It just takes a few shots of this thing I want to learn to be able to recognize it.
And yeah, humans are really, really good at it.
So the standard way we've trained machine learning models,
in particular deep neural networks up until now,
it really doesn't encourage this ability for few-shot
learning. ML systems, you know, they're typically trained through one optimization phase, after which
that's it, learning is over. So we build these systems in this train and then test manner,
and they don't really scale to complex environments, and they don't have the
capability to pick up topics on the fly.
So one of the things we do in meta-learning, first of all, is just change the training setup.
Rather than showing a model how to do one big task with lots of data,
we'll show it a set of smaller related tasks that sort of have a few things in common, but they're not exactly the same.
So to give you an example, let's say,
instead of learning to recognize like 50 breeds of dogs all at once, we'll give a model the smaller task of recognizing, let's say, just chihuahuas versus huskies, and then a different task,
which is just poodles versus bulldogs, and so on. So when you have this kind of setup,
there are these general features of all dogs
that will remain constant across all these smaller tasks. And the model can learn these gradually,
picking them up across tasks as it sees them in sequence and over many examples. But there are
also these specific features of the specific breeds that the model has to pick up rapidly
from just a few examples while it's
doing each individual task. And so there are these sort of two levels of learning, the slower, like,
what do dogs look like in general? And then there's the faster, what do these specific
dogs look like and what are the features that discriminate them from each other?
So is that the rapid adaptation that you're talking about?
Exactly. So that second level that I mentioned is the rapid adaptation.
Right. So as a side note, if you're training a machine on recognizing dogs and they all have four legs and then suddenly you've got a dog that lost a leg and it sees three-legged dog, can it still say that it's a dog?
Yeah, so that would definitely be a problem. That's another place where human
learning just vastly diverges from machine learning right now. You know, an algorithm
that learns to recognize dogs, I mean, it's tough to say because we can't peek inside them and we
can't ask them questions, but I feel pretty confident. Yeah, not yet. We're getting there.
But I feel pretty confident in saying that not yet. We're getting there. But I feel pretty confident in
saying that the algorithm that's learning to recognize dogs and seeing all these dogs
with four legs, it doesn't really understand what legs are. It doesn't know what legs are for.
It doesn't know that legs can be lost without fundamentally altering the dogness of this thing, right? It sees an object composed of all
these parts. And so it's going to infer that all these parts are necessary to the definition of
that thing. As humans, we know because of the way we decompose things that that's not always the
case. Yeah. So that'll be a big challenge, I think, for the next round of research to say three-legged dogs are still dogs.
Yeah, so that relates to an idea that Yoshua Bengio has been interested in for a long time.
Of course, Yoshua is the luminary here for us in Montreal, but he's interested in this idea of the
factors of variation and the controllable factors of variation that really define classes of things,
for example, and what
we can do in the world. So let's go on that topic a little more, a little deeper, Adam,
because understanding is a tough concept, even when we're talking about do humans understand
other humans? Can machines ever really understand? I'm going to go out on a limb and say no,
but that's in the context of traditional, historical, even metaphysical,
our understanding of understanding. Since this is what you're working on,
how would you argue that they eventually can? Or would you?
I absolutely think that machines are capable of understanding in the way that we use the word
to refer to humans. I do believe that. So for me, I think that one of the fundamental aspects
of understanding is just the way we relate and map different things to each other. So one example
that sort of indicates the importance of relations and mappings for understanding is there's this
thing you can do where you repeat a word to yourself over and over again, and it just loses
all meaning. And you're like, what the hell is this word? Like, why does this refer to the thing that it refers to?
I'm sure, you know, you've done that before. I've done it. I still do it.
Yeah, it's fun and it's weird.
Well, sometimes we start saying our names over and over.
Exactly. Yeah. That's the first thing I can remember doing that with is my own name. And I think that we repeat this thing over and over again to ourselves. And in that process,
it becomes disconnected. It loses its meaning because you've cut it off from the things it
maps to. So one of the things that we talk about as very important in language research for AI is
the concept of grounding. So language is grounded in the real
world. It reflects the real world. The simplest sort of example of this is nouns. Like nouns are
essentially labels for things we see in the world. The point is that language is grounded and it
reflects the world and it's fundamentally connected to the world. And there's only so much understanding you
can glean from language without that connection. And that's one of the things that we're missing
right now in AI and machine learning is that typically we teach machines language just within
the world of language. So they learn how to use words by looking at words, by reading documents,
but they don't learn how to use words by interacting with people in a conversation
or asking for things to be given to them or seeing that a chair is this thing that I can sit on
and push around. They're missing that mapping and that grounding to the real world and that's where I
think understanding stems from and because I do think it is possible for machines to get that
other aspect of things to ground language I do think they can understand so that leads to a
question obvious in my mind is how how are you gonna do, I mean, like you guys have said, we want to close the communication gap between humans and machines.
So, yeah, how?
Because what you described to me is the reverse engineering.
It's the human disengaging from that grounding and becoming the machine saying, Adam, Adam, Adam.
That doesn't make sense right right
so let's then engineer the backwards you know so from adam meaning nothing to the machine
to adam meaning this cool guy at montreal you know microsoft research Please go on.
Actually, this podcast is very challenging. This is like a philosophical test.
Well, yeah. And probably if you tell me, you'll have to shoot me because it's like proprietary information on how we're doing that.
You're really asking the tough questions here. No, this is good.
So one of the things we have to do is move away from this paradigm we have right now
of training, let's say, agents that use language, these literate machines I'm trying to build. We
have to move away from training them just on language. As I said, they have to learn how to
map those words to things that are not words. So that could be images, that could be actions. So learning that
the words sitting down refer to an action that you can observe, let's say in the real world or
from videos, that is making the connection between the words and something else.
Maybe this goes back to what we talked about earlier on the fact that
multiple teams that are working on multiple hard problems, and you've got people doing computer
vision and emotive computing, and you're not working on that, but they are. And then suddenly
your algorithms get together and have this love child algorithm that, you know, I mean,
I hear what you're saying. But what it begs also
is that the computer is going to have to be watching a lot of stuff, reading a lot of stuff,
experiencing, to use another human term, a lot of stuff.
For sure. Yeah, I don't think you can get around that. I mean, the way we know right now to create an intelligent system is to have a baby and then raise it through the 10, 20 years.
It's a long process to get a fully formed, functional adult.
Obviously, even a child is way, way smarter than the systems we have right now.
But the point is, intelligence isn't easy, right?
Even in humans,
you know, we don't just come out being able to speak. We pick it up really quickly and we're
tailored to do things like few-shot learning really effectively. But still, it doesn't just
happen. That's why research. I mean, exactly, exactly. That's why your lab is doing what
you're doing. Exactly. And the systems to pick up these mappings that I'm talking about, they need to experience the world or at least experience proxies or recordings of the world through books, a lot of text, images, videos, all that sort of stuff, which we call multimodal learning. And the problem right now is we can't
possibly have an algorithm do that because algorithms require so much data to learn even
simple things like recognizing cats and dogs. And that brings us back to the meta-learning aspect,
is we really want to build systems that learn on the fly and continually rather than just once and doing their task forever and ever.
And we want those systems to be able to pick things up rapidly, really data efficiently.
So from just a few examples, I can learn a new task.
Yeah, I think you called that sample efficiency.
Exactly, yeah. Sample efficiency, data efficiency.
And then being able to transfer what it learns to other scenarios that aren't exactly the same.
Exactly. So right now in meta-learning, the way we set things up is that the different tasks that the model is undertaking, they are fairly similar to each other.
But ultimately, we'd like to start breaking that and saying, okay, now you've learned cats and dogs, but let's take that to
something very different, like elephants and horses.
I interviewed one of your colleagues over there, Harm Van Sion, about how
they used reinforcement learning to beat Ms. Pac-Man at its own game.
And he used the phrase islands of tractability, which is where you focus your efforts because
you know you can have some semblance of success there. So what are the biggest challenges right
now that might be offshore from the islands of tractability that you see are most exciting or
promising areas of research, especially for people that might be interested in getting into this?
One of the big ones for me, because of my focus on language, and I think a lot of people here
at the Montreal Lab will echo this, is evaluation of language. You know, in machine learning and many, many other fields, you only get what you
measure. And it is so hard to measure the quality of language. Language is slippery, and it's really
hard to measure. So one of the things we focused on in this group here is natural language
generation. So obviously, this factors into the earlier work I was talking about on question
generation. We have to build a question in natural language that flows, that makes sense to people,
and even more importantly, that asks about real information and is well-posed and leads you to
the answer that you're looking for. And it's so incredibly hard to measure and evaluate language. This comes up
in machine translation as well. And part of the reason for this is that there's so many different
ways to say the same thing. And so even training a language generation system on these massive
corpora of language data that we have now, they're still missing out on very many plausible and
reasonable ways to say things. They'll never see those hidden away examples.
I could help you, but I don't write algorithms.
Well, see, that's one of the really interesting things is that in the lab here in MSR Montreal
right now, we are all, with a few exceptions, we're computer scientists.
And we're the ones tackling this language problem and trying to measure the quality
of language outputs, but we're not necessarily the best suited to that job. I really think that
this is a problem that could really, really benefit from an interdisciplinary effort.
There's so much that goes into language, which is beyond
algorithm and computation, I think, that we really need to take into account.
And I could help you.
I would love that. We need help. Honestly, evaluating language is so, so hard. Let me
tell you about this metric we have called BL blue, which is used in machine translation.
So the way you measure the quality of outputs in machine translation is you have one or more sort of gold reference translations.
It's one sentence that says maybe machine learning is hard.
So your system is going to say all these different things like machine learning is a tough problem.
Artificial intelligence is not very easy. You know, you can imagine all these different things like machine learning is a tough problem. Artificial intelligence is not very easy. You can imagine all these different ways.
But the way that Blu measures these outputs is it just looks at how many words overlap or how many pairs of words overlap between the two candidate translations. And so you can obviously
imagine there's a translation which has zero or very, very low overlap with my reference, but it could still be completely valid. And so in this case, my algorithm is being told, no, this thing that you tried to say is complete garbage because it has zero overlap. And it's being punished very, very unfairly for saying something totally reasonable. But just because we have this very limited ability to measure what really is
reasonable, the whole thing's breaking down. I think AI is changing our world in ways that,
well, of course, ways that we don't understand. But one of them is this, you know, having my
daughter in college right now, when anyone says, what are you studying? If she says
anything besides a STEM subject, people look at her and say, oh, what are you going to ask
with that degree? Do you want fries with that? However, I keep hearing from computer scientists
like yourself, especially researchers, that other things are necessary. You know, computation is necessary,
but not sufficient for AI.
Yeah, I mean, the thing is,
AI, intelligence,
this word refers to human behavior.
And so if you want to build a system
that exhibits intelligence
and intelligence is this human thing,
it should be intelligent in ways similar to the ways we are. And so we need an understanding of human behavior,
and that's something that we hoodie-clad STEM guys don't necessarily have.
I hear you. Hey, so let's talk about, you know, I asked you at the beginning of the program, what gets you up in the morning? And I sort of want to find out at the end of the program, what keeps you up at night? And I read an article that you wrote in Fast Company called Who Will Protect AI from Humanity? Why do machines need to be protected from us? Right now, I don't think that they do. This really goes back to what I said before when you
asked me about, do I think that machines can one day understand in a human sense or have true
comprehension? I said the answer is yes. And if I'm right, if the answer is yes, then one day,
not necessarily tomorrow or even five years from now, but one day,
we're going to have a system that understands, that has memory of experiences that it has had.
And if such a system exists, obviously we'll have played a significant hand in its creation, but
I don't think we can consider ourselves to be the owners of such a system.
To some degree, it will be an individual because it has its own memories and its own understanding.
And so that's where I think that we have to start realizing that just because you built it
and you trained it and you even wrote the code for it, it's not necessarily yours or property of some corporation
that fed it all of its data. It's not a huge concern. We're not even close to that, but
it's good to think about these things probably far in advance.
Exactly. Because sort of the follow-up question on that is, yeah, I believe that if we don't ask the questions now, the unintended consequences will
hit. And so having read the article, I should say, you bring up the issues of rights and ethics,
and we've all seen the Boston Dynamics robots both get pushed over and you kind of feel sorry
for the robot. For sure. And now you see it open a door and fighting off a stick and you're going,
I don't feel sorry for you anymore. I'm scared of you.
But these are issues that we, as humans, we understand rights and ethics and compassion
and things like that. And so I guess my better question would be, what questions do we need to
be asking? And what issues do we need to be addressing while we are still upstream from pretty fundamental changes in our relationship with
technology? One of the things we mentioned before is this idea of measurement. I think I talk about
this in the article to some degree is, you know, we have to be able to, you know, how can you measure
individuality? It doesn't really make sense. How can you measure the memory
capacity of something? Obviously, we can measure in megabytes and gigabytes, but I mean more
in an experiential sense. We don't know how to do that. But if we're going to consider
artificial intelligence as something on the level of people, then we have to start thinking about,
yes, first, how do we measure consciousness or sentience or memory or understanding? Because only then can
you start to say, this thing is pretty close to being human. I don't think we should kick it as
it's trying to walk along or wipe out its memory that it built up over thousands of simulated hours or even real hours of experience.
So measurement is a big thing, for sure. It's so philosophical, and I love thinking about those
things, but they're definitely outside of my purview, really.
Expertise and even task. You're not actually paid at the Montreal lab to come up with these
deep philosophical answers.
It's like, get the machine learning done, baby.
Yeah.
Oh, Adam Trischler, it's been fantastic talking to you this morning.
I suspect we'll be seeing the fruits of your labors in ways that we might not even expect.
But I'm looking forward to watching where you're going and
what's going on in Montreal.
Looks fantastic.
Thank you.
And I look forward to seeing the machine that's going to understand me and talk to me in my
old age.
This has been a lot of fun.
Thanks.
To learn more about Dr. Adam Trischler and the quest for literate machines, visit Microsoft.com
slash research.