Microsoft Research Podcast - 016 - Building Literate Machines with Dr. Adam Trischler
Episode Date: March 21, 2018Learning to read, think and communicate effectively is part of the curriculum for every young student. But Dr. Adam Trischler, Research Manager and leader of the Machine Comprehension team at Microsof...t Research Montreal, would like to make it part of the curriculum for your computer as well. And he’s working on that, using methods from machine learning, deep neural networks, and other branches of AI to close the communication gap between humans and computers. Today, Dr. Trischler talks about his dream of making literate machines, his efforts to design meta-learning algorithms that can actually learn to learn, the importance of what he calls “few-shot learning” in that meta-learning process, and how, through a process of one-to-many mapping in machine learning, our computers not may not only be answering our questions, but asking them as well.
Transcript
Discussion (0)
The problem right now is algorithms require so much data to learn even simple things like
recognizing cats and dogs. And that brings us back to the meta-learning aspect is we really want to
build systems that learn on the fly and continually rather than just once and doing
their task forever and ever. And we want those systems to be able to pick things up rapidly, really data efficiently.
So from just a few examples, I can learn a new task.
You're listening to the Microsoft Research Podcast, a show that brings you closer to the cutting edge of technology research and the scientists behind it.
I'm your host, Gretchen Huizinga.
Learning to read, think, and communicate effectively is part of the curriculum for every young student.
But Dr. Adam Trischler,
research manager and leader
of the machine comprehension team
at Microsoft Research Montreal,
would like to make it part of the curriculum
for your computer as well.
And he's working on that,
using methods from machine learning,
deep neural networks, and other branches of AI to close the communication gap between humans and computers.
Today, Dr. Trischler talks about his dream of making literate machines, his efforts to design
meta-learning algorithms that can actually learn to learn, the importance of what he calls
few-shot learning in that meta-learning process,
and how, through the method of one-to-many mapping
and machine learning,
our computers may not only be answering our questions,
but asking them as well.
That and much more
on this episode of the Microsoft Research Podcast. Welcome, Adam Trischler, to the podcast this morning.
Great to talk to you. Thank you, Gretchen. I'm glad to be here.
All the way from Montreal, Quebec, from the Microsoft Research Montreal Lab. Tell me what
it's like working at the research lab in Montreal.
It's become a global hotbed for AI research. Yeah, it's been incredible. It's like when
Maluba first moved here in December 2015, it was small. I mean, Montreal had this big lab
through Yoshua and Mila, but in terms of the sort of corporate interest and presence, it was nothing. And our lab was one of the first.
And we started with about four or five people.
And since then, it's been like watching a skyscraper go up around you.
It's really, really cool.
So you lead the machine comprehension team at Microsoft Research Montreal in broad strokes,
because I'm going to go specific.
Tell me what gets you up in the morning.
What do you do? What do you study? Why is it important? Okay, so broadly speaking, the goal of my team is to build literate machines. So what that means a bit more specifically is machines
that learn from and understand the world through language like people do. So I'm really inspired by
the prospect of using machines to
unlock all of the human knowledge that's recorded in text. We have textbooks, we have Wikipedia,
so many instructions of how to do things, skills that we can gain, recipes we can make,
whether like literal recipes for cooking or a recipe for how to do something or build something.
Like an algorithm? Exactly, like an algorithm.
So yeah, we could get the machines to be sort of self-aware
and adjust their own algorithms in some sense.
But anyway, what really drew me to AI in general,
and this field in particular,
is first of all, the prospect of using AI as a window onto human intelligence. So I've always
been fascinated by thought. As a kid, I definitely tended to spend a lot of time in my own mind,
just thinking about thinking, talking to myself even, and I've always loved language as well. So
I'm just generally fascinated by how language shapes and facilitates thoughts even like grandly philosophically speaking, yes, of course, but even at the smaller, like more personal scale as well.
For example, how writing an idea down or explaining it to another person can clarify and crystallize that idea for you yourself and help you to understand it better. So like I said, our goal is to build literate
machines that learn from and understand the world through language, which is so useful for us.
I don't want to make machines that sort of make people read less, but I want machines that can
augment human reading, for example, by taking care of some of the more mundane parts, you know,
slogging through your insurance policy or an HR manual,
or maybe filtering the massive stream of text, this massive stream that we have coming at us
through Twitter and everywhere else. Or maybe even helping me to understand
all the legalese that I hit agree to when I'm getting an app.
All those terms of service, exactly. Right, I agree.
The things that everyone is supposed to read, but no one does. interest in something through new books, new materials, new ideas, like stoking their natural
curiosity and feeding them new information as the student would ask questions.
You know, my mind is racing already. I have a list of questions that I want to ask, and
some of them are jumping up to the front of the line, raising their hands, going,
ask me, ask me. All right, go ahead. I know, right? Let's go off on that tangent a little bit
about how your research is teaching these machines to be literate, read, think, and communicate like
humans. Because I'm trying to wrap my brain around what it looks like if the machine's doing some of
the heavy lifting, we could call it. for me, how does that transfer into my brain
so that I could say, understand something more quickly? My daughter in college, how could she
use a machine to help her be more successful in school?
I guess where we really started in this sort of quest for illiterate machines was a lot more
concrete, pretty straightforward in the field of question answering. So the idea
here is we simply want to build machines that if given a document, let's say an essay or a news
article, you could ask the machine a question about that document and it could provide you with
a reasonable, hopefully ideally correct answer to your question. So why we were interested in this is because I think it's a nice
proxy for testing comprehension. So understanding, comprehending language, obviously this is a sort
of ephemeral concept. We don't have a good way of measuring something like comprehension or
understanding directly, but we can use proxies like question answering. So in building a literate machine,
for example, one of the tests we can imagine is a comprehension test like a human student would
receive at school. You're given these test questions. What happened here? Why? What were
the motivations? And what followed? Let's talk about what it actually looks like. What you've
just described is so fascinating. I've
talked to several researchers here at Microsoft Research who all use this idea of the delicate
balance, the dance between human and machine, augment versus replace. What does that look like
in my life? How could that play out? Right now I have a tablet where if I hold my finger on a word I don't know, I can look it up, right?
Is there some more advanced version of that? I mean, what do you envision here?
Yeah, I mean, question answering definitely has a passive nature to it, right? The machine is just
kind of sitting there waiting for you as the user to highlight the word you don't understand.
Another thing we've worked on fairly recently,
which is perhaps even more exciting,
is the idea of question asking.
So, you know, just the other side of the coin,
a machine that rather than just waiting for you
to pose a question and answering it for you
can do this sort of curiosity-driven question asking to sort of guide you along
through knowledge or act as a tutor for you. So we're just getting started on this. Obviously,
it's a complex task. In some ways, asking questions is more complex than answering them,
because you can imagine if I give you a document and a question, if it's well posed, it probably leads to a single answer. Whereas if I gave you a document, even if I provided you with a set of terms or snippets from that document that I said, ask questions about these, even if you're just looking at the information in the document, you can probably formulate several questions that lead to the same answer.
So it's this one-to-many mapping rather than a one-to-one mapping that we see more typically
in the question answering case. So it's really difficult. As I said, we're just getting started,
but already we've seen some adoption of this. I think it could be super useful in things like
MOOCs, massively online open courses. But ultimately, you can
see this is really kind of driving people hopefully to learn more and to improve their
understanding. When you're talking about all the interaction there, I start thinking about user
interface. We'll talk in a bit about the technology behind everything, but ultimately, it's going to
be AI has an interface. And I imagine you're already thinking about what user interface these kinds of machines could have that anticipate and generate questions for me or answer them or communicate with me.
We've certainly thought about it. I think we're sufficiently far away from making this technology work nicely outside of, you know, kind of trivial, literally trivial settings like trivia on search.
But we have thought about it.
You know, you could imagine if you're in this sort of interaction on your phone, the camera could be watching your face for those sort of visual cues.
We can pick up on vocal cues as well.
The bigger picture is so big that, you know, it's not something that one team is going to tackle. It's these different teams coming together, breaking the problem down into its parts, and then hopefully bringing them all together into, you know, a really compelling product or assistant or use case in the end.
But there's just so much for us in the language itself
that we're not even close to that yet. And so thankfully, we do have, you know, in MSR,
these other amazing teams who are working on these other really challenging aspects of these problems. Let's focus on the language for a bit since that's your work.
When I think of talking to machines, being able to communicate with me,
I think pretty well ethnocentrically.
Of course, we all tend to, right?
Yeah, and I know there's a lot of work going on in machine translation as well.
Are we heading for an AI future where language makes no difference to any of us, people or machines? I think we're definitely working in
that direction. One of the really interesting things about this new wave of AI through deep
learning is that we get a lot of stuff like bilingualism or trilingualism, etc., almost for free. It's not totally free,
but on the algorithmic side, we can do things that are very general. We can build systems
that are agnostic to the particular language they're operating on. And so, you know,
there's generalities to language. Obviously, there are different specifics, and you can't
classify all languages the same.
There are structural differences, morphological differences.
But from the algorithmic side, there's a lot of generality.
And so we can build something on our end that can really operate in a whole variety of languages.
And all that matters for us is that we have the training data to tailor it to each of those individual languages. So I can build the same recurrent neural network that processes English or French or both, whether it will do those things and do them well, is really just a factor of the data that I use to train my system. Well, and we've talked about data on this podcast before and how important it is to
have not just lots, but quality of the right kind of data to learn and train machines. So let's talk
about machine learning for a second. There's several lines of research in that deep learning,
supervised, unsupervised, reinforcement learning. And until fairly recently, the models have been
pretty task specific, but you're doing work. Absolutely. You're doing work on what we call meta-learning algorithms. Can you tell us
about that and particularly your work on rapid adaptation and conditionally shifted neurons?
Yeah, meta-learning is something that we're really excited about here in the group. And in general,
the field is really picking up this new sort of paradigm, I guess.
So meta-learning really refers to learning to learn. The goal of a meta-learning algorithm is
the ability to learn new tasks efficiently, given little training data for each individual task. So
these systems that we're training right now, these task-specific systems require so much data to perform really, really well.
And they do, and that's great, but we don't always have a ton of data.
So the problem we're trying to address with meta-learning is that, like right now, neural networks, they need to see, generally speaking, hundreds or thousands of examples of a class to be able to recognize it.
On the other hand, you have people. If you showed me one or
two pictures of some hypothetical brand new car model, I'd probably be able to recognize it on
the road in different colors, in different lighting and weather conditions right away from one or two
pictures. So this is something we call few shot learning. It just takes a few shots of this thing
I want to learn to be able to recognize it.
And yeah, humans are really, really good at it.
So the standard way we've trained machine learning models, in particular deep neural
networks up until now, it really doesn't encourage this ability for few-shot learning.
ML systems, you know, they're typically trained through one optimization
phase, after which that's it, learning is over. So we build these systems in this train and then
test manner, and they don't really scale to complex environments, and they don't have the
capability to pick up topics on the fly. So one of the things we do in meta-learning, first of all,
is just change the training setup.
Rather than showing a model how to do one big task with lots of data, we'll show it a set of smaller related tasks that sort of have a few things in common, but they're not exactly the
same. So to give you an example, let's say instead of learning to recognize like 50 breeds of dogs all at once, will give a model the smaller task of recognizing, let's say, just chihuahuas versus huskies,
and then a different task, which is just poodles versus bulldogs, and so on.
So when you have this kind of setup, there are these general features of all dogs
that will remain constant across all these smaller tasks.
And the model can learn these
gradually, picking them up across tasks as it sees them in sequence and over many examples.
But there are also these specific features of the specific breeds that the model has to pick up
rapidly from just a few examples while it's doing each individual task. And so there are these sort
of two levels of learning, the slower, like,
what do dogs look like in general? And then there's the faster, what do these specific dogs
look like? And what are the features that discriminate them from each other?
So is that the rapid adaptation that you're talking about?
Exactly. So that second level that I mentioned is the rapid adaptation.
Right. So as a side note, if you're training a machine on
recognizing dogs and they all have four legs and then suddenly you've got a dog that lost a leg
and it sees three-legged dog, can it still say that it's a dog? Yeah, so that would definitely
be a problem. That's another place where human learning just vastly diverges from machine learning right now. You know, an algorithm that
learns to recognize dogs, I mean, it's tough to say because we can't peek inside them and we can't
ask them questions, but I feel pretty confident. Yeah, not yet. We're getting there. But I feel
pretty confident in saying that the algorithm that's learning to recognize dogs and seeing all these dogs with four legs,
it doesn't really understand what legs are. It doesn't know what legs are for. It doesn't know
that legs can be lost without fundamentally altering the dogness of this thing, right?
It sees an object composed of all these parts. And so it's going to infer that all these parts are necessary
to the definition of that thing. As humans, we know because of the way we decompose things that
that's not always the case. Yeah, so that'll be a big challenge, I think, for the next round of
research to say three-legged dogs are still dogs. Yeah, so that relates to an idea that
Yoshua Bengio has been interested in for a long time. Of course, Yoshua is the luminary here for us in Montreal, but he's interested in this idea of the factors of variation and the controllable factors of variation that really define classes of things, for example, and what we can do in the world. So let's go on that topic a little more, a little deeper, Adam, because
understanding is a tough concept, even when we're talking about do humans understand other humans?
Can machines ever really understand? I'm going to go out on a limb and say no,
but that's in the context of traditional, historical, even metaphysical,
our understanding of understanding. Since this is what you're working on,
how would you argue that they eventually can? Or would you? I absolutely think that machines are capable of understanding
in the way that we use the word to refer to humans. I do believe that. So for me, I think that
one of the fundamental aspects of understanding is just the way we relate and map different things to each other.
So one example that sort of indicates the importance of relations and mappings for
understanding is there's this thing you can do where you repeat a word to yourself over and
over again, and it just loses all meaning. And you're like, what the hell is this word? Like,
why does this refer to the thing that it refers to? I'm sure, you know,
you've done that before. I've done it. I still do it. Yeah, it's fun and it's weird.
Well, sometimes we start saying our names over and over.
Exactly. Yeah. That's the first thing I can remember doing that with is my own name.
And I think that we repeat this thing over and over again to ourselves. And in that process, it becomes
disconnected. It loses its meaning because you've cut it off from the things it maps to.
So one of the things that we talk about as very important in language research for AI is the
concept of grounding. So language is grounded in the real world. It reflects the real world. The simplest sort of example of this is nouns.
Like nouns are essentially labels for things we see in the world.
The point is that language is grounded and it reflects the world and it's fundamentally
connected to the world.
And there's only so much understanding you can glean from language without that connection.
And that's one of the things
that we're missing right now in AI and machine learning is that typically we teach machines
language just within the world of language. So they learn how to use words by looking at words,
by reading documents, but they don't learn how to use words by interacting with people in a conversation
or asking for things to be given to them or seeing that, you know,
a chair is this thing that I can sit on and push around.
They're missing that mapping and that grounding to the real world.
And that's where I think understanding stems from.
And because I do think it is possible for machines to get that
other aspect of things to ground language, I do think they can understand. So that leads to a
question obvious in my mind is how? How are you going to do that? Yeah, I mean, like you guys have
said, we want to close the communication gap between humans and machines. So, yeah, how? Because what you described to me
is the reverse engineering.
It's the human disengaging
from that grounding
and becoming the machine,
saying, Adam, Adam, Adam,
that doesn't make any sense.
Right, right.
So let's then engineer the backwards you know so from adam meaning nothing
to the machine to adam meaning this cool guy at montreal you know microsoft research please go on
no this actually this is a this podcast is very challenging. This is like a philosophical test.
Well, yeah.
And probably if you tell me, you'll have to shoot me because it's like proprietary information on how we're doing that.
Oh, you're really asking the tough questions here.
No, this is good.
So one of the things we have to do is move away from this paradigm we have right now of training, let's say, agents that use language, these
literate machines I'm trying to build, we have to move away from training them just
on language.
As I said, they have to learn how to map those words to things that are not words.
So that could be images, that could be actions.
So learning that the words sitting down refer to an action that you can observe, let's say, in the real world or from videos, that is making the connection between the words and something else.
Maybe this goes back to what we talked about earlier on the fact that multiple teams that are working on multiple hard problems.
And you've got people
doing computer vision and emotive computing, and you're not working on that, but they are.
And then suddenly your algorithms get together and have this love child algorithm that, you know,
I mean, I hear what you're saying, but what it begs also is that the computer is going to have
to be watching a lot of stuff, reading a lot of stuff, experiencing, to use another human term, a lot of stuff.
Yeah, I don't think you can get around that.
I mean, the way we know right now to create an intelligent system is to have a baby and then raise it through the 10, 20 years. It's a long process to get, you know,
a fully formed, functional adult. Obviously, even a child is way, way smarter than the systems we
have right now. But the point is, intelligence isn't easy, right? Even in humans, you know,
we don't just come out being able to speak. We pick it up really quickly and we're tailored to do things like few shot learning really effectively.
But still, it doesn't just happen.
That's why research.
I mean, exactly, exactly.
That's why your lab is doing what you're doing.
Exactly.
And the systems to pick up these mappings that I'm talking about. They need to experience the world,
or at least experience proxies or recordings of the world through books, a lot of text,
images, videos, all that sort of stuff, which we call multimodal learning.
And the problem right now is we can't possibly have an algorithm do that because algorithms
require so much data to learn even simple things like recognizing cats and dogs.
And that brings us back to the meta learning aspect is we really want to build systems that learn on the fly and continually rather than just once and doing their task forever and ever.
And we want those systems to be able to pick things up rapidly,
really data efficiently.
So from just a few examples, I can learn a new task.
Yeah, I think you called that sample efficiency.
Exactly, yeah.
Sample efficiency, data efficiency.
And then being able to transfer what it learns
to other scenarios that aren't exactly the same.
Exactly.
So right now in meta-learning,
the way we set things up is that the different tasks
that the model is undertaking,
they are fairly similar to each other.
But ultimately, we'd like to start breaking that
and saying, okay, now you've learned cats and dogs,
but let's take that to something very different,
like elephants and horses.
I interviewed one of your colleagues over there, Harm Van Sion,
about how they used reinforcement learning to beat Ms. Pac-Man at its own game. And he used the phrase islands of tractability,
which is where you focus your efforts because you know you can have some semblance of success there.
So what are the biggest challenges right now that might be offshore from the islands of
tractability that you see are most exciting or promising areas of research, especially for
people that might be interested in getting into this? One of the big ones for me, because of my focus on language, and I think a lot of people here
at the Montreal Lab will echo this, is evaluation of language. In machine learning and many,
many other fields, you only get what you measure. And it is so hard to measure the quality of language. Language is
slippery, and it's really hard to measure. So one of the things we focused on in this group here is
natural language generation. So obviously, this factors into the earlier work I was talking about
on question generation. We have to build a question in natural language that flows, that makes sense to people, and even more importantly, that asks about real information and is well-posed and leads you to the answer that you're looking for.
And it's so incredibly hard to measure and evaluate language.
This comes up in machine translation as well.
And part of the reason for this is that there's so many different ways to say the same thing. And so even training a language generation system on these massive corpora of language data that we have now, they're still missing out on very many plausible and reasonable ways to say things. They'll never see those hidden away examples.
I could help you, but I don't write algorithms.
Well, see, that's one of the really interesting things is that, so in the lab here in MSR
Montreal right now, we are all, with a few exceptions, we're computer scientists.
And we're the ones tackling this language problem and trying to measure the quality
of language outputs but we're not necessarily the best suited to that job i really think that
this is a problem that could really really benefit from an interdisciplinary effort there's so much
that goes into language which is beyond algorithm and computation, I think, that we really need to take into account.
And I could help you.
I would love that. We need help. Honestly, evaluating language is so, so hard. Let me
tell you about this metric we have called blue, which is used in machine translation.
So the way you measure the quality of outputs in machine translation is you have one or more sort of gold reference translations.
It's one sentence that says maybe machine learning is hard.
So your system is going to say all these different things like machine learning is a tough problem.
Artificial intelligence is not very easy.
You know, you can imagine all these different ways. But the way that Blu measures these outputs is it just looks at how many words overlap or how many pairs of words overlap between
the two candidate translations. And so you can obviously imagine there's a translation which has
zero or very, very low overlap with my reference, but it could still be completely valid. And so
in this case, my algorithm is being told,
no, this thing that you tried to say is complete garbage because it has zero overlap. And it's
being punished very, very unfairly for saying something totally reasonable. But just because
we have this very limited ability to measure what really is reasonable, the whole thing's breaking down. I think AI is changing our world in ways that,
well, of course, ways that we don't understand. But one of them is this, you know, having my
daughter in college right now, when anyone says, what are you studying? If she says anything besides
a STEM subject, people look at her and say, oh, what are you going to ask with that degree?
Do you want fries with that? However, I keep hearing from computer scientists like yourself,
especially researchers, that other things are necessary. You know for AI? Yeah, I mean, the thing is, AI, intelligence,
this word refers to human behavior. And so if you want to build a system that exhibits intelligence
and intelligence is this human thing, it should be intelligent in ways similar to the ways we are. And so we need an understanding of human behavior.
And that's something that we hoodie-clad STEM guys don't necessarily have.
I hear you.
Hey, so let's talk about, you know, I asked you at the beginning of the program, what
gets you up in the morning?
And I sort of want to find out at the end of the
program, what keeps you up at night. And I read an article that you wrote in Fast Company called
Who Will Protect AI from Humanity? Why do machines need to be protected from us?
Right now, I don't think that they do. This really goes back to what I said before,
when you asked me about, do I think that machines can one day understand in a human
sense or have true comprehension? You know, I said the answer is yes. And if I'm right,
if the answer is yes, then one day, not necessarily tomorrow or even five years from now, but one day
we're going to have a system that understands, that has memory of experiences that it has had. And if such a system exists, obviously, we'll have played a significant hand in its creation. But I don't think we can consider ourselves to be the owners of such a system. To some degree, it will be an individual because it has its own memories and its own
understanding. And so that's where I think that we have to start realizing that just because
you built it and you trained it and you even wrote the code for it, it's not necessarily
yours or property of some corporation that fed it all of its data.
It's not a huge concern.
We're not even close to that,
but it's good to think about these things probably far in advance.
Exactly, because sort of the follow-up question on that is,
yeah, I believe that if we don't ask the questions now,
the unintended consequences will hit.
And so having read the article, I should say,
you bring up the issues of rights and ethics, and we've all seen the Boston Dynamics robots both get pushed over and you kind of feel sorry for the robot.
For sure.
And now you see it open a door and fighting off a stick and you're going,
I don't feel sorry for you anymore. I'm scared of you. But these are issues that we, as humans, we understand rights and ethics and compassion
and things like that. And so I guess my better question would be, what questions do we need to
be asking and what issues do we need to be addressing while we are still upstream from
pretty fundamental changes in our relationship with technology.
One of the things we mentioned before is this idea of measurement. I think I talk about this in the article to some degree is, you know, we have to be able to, you know, how can you measure
individuality? It doesn't really make sense. How can you measure the memory capacity of something?
Obviously, we can measure in megabytes and gigabytes,
but I mean more in an experiential sense. We don't know how to do that. But if we're going to consider artificial intelligence as something on the level of people, then we have to start
thinking about, yes, first, how do we measure consciousness or sentience or memory or
understanding? Because only then can you start
to say you know this thing is pretty close to being human i don't think we should you know
kick it as it's trying to walk along or wipe out its memory that it built up over
thousands of simulated hours or even real hours of experience so measurement is a big thing for sure
it's so philosophical, and I love thinking
about those things, but they're definitely outside of my purview, really.
Expertise and even task. You're not actually paid at the Montreal Lab to come up with these
deep philosophical answers. It's like, get the machine learning done, baby.
Yes. like, get the machine learning done, baby. Oh, Adam Trischler, it's been fantastic talking to
you this morning. I suspect we'll be seeing the fruits of your labors in ways that we might not
even expect, but I'm looking forward to watching where you're going and what's going on in Montreal.
Looks fantastic. And I look forward to seeing the machine
that's going to understand me and talk to me in my old age.
This has been a lot of fun. Thanks.
To learn more about Dr. Adam Trischler
and the quest for literate machines,
visit microsoft.com slash research.