Microsoft Research Podcast - 024 - Not Lost in Translation with Arul Menezes
Episode Date: May 16, 2018Humans are wired to communicate, but we don’t always understand each other. Especially when we don’t speak the same language. But Arul Menezes, the Partner Research Manager who heads MSR’s Machi...ne Translation team, is working to remove language barriers to help people communicate better. And with the help of some innovative machine learning techniques, and the combined brainpower of machine translation, natural language and machine learning teams in Redmond and Beijing, it’s happening sooner than anyone expected. Today, Menezes talks about how the advent of deep learning has enabled exciting advances in machine translation, including applications for people with disabilities, and gives us an inside look at the recent “human parity” milestone at Microsoft Research, where machines translated a news dataset from Chinese to English with the same accuracy and quality as a person.
Transcript
Discussion (0)
The thing about research is you never know when those breakthroughs are going to come through.
So when we started this project last year, we thought it would take a couple of years.
But we made faster progress than we expected.
And then sometime last month, we were like, looks like we're there.
We should just publish.
And that's what we did.
You're listening to the Microsoft Research Podcast,
a show that brings you closer to the cutting edge of technology research and the scientists behind it. You're listening to the Microsoft Research Podcast, a show that brings you closer to the
cutting edge of technology research and the scientists behind it. I'm your host, Gretchen
Huizinga. Humans are wired to communicate, but we don't always understand each other,
especially when we don't speak the same language. But Arul Menezes, the partner research manager who heads MSR's machine translation team,
is working to remove language barriers to help people communicate better.
And with the help of some innovative machine learning techniques
and the combined brainpower of machine translation, natural language, and machine learning teams
in Redmond in Beijing, it's happening sooner than anyone expected.
Today, Arul Menezes talks about how the advent of deep learning
has enabled exciting advances in machine translation,
including applications for people with disabilities,
and gives us an inside look at the recent human parity milestone at Microsoft Research,
where machines translated a news dataset
from Chinese to English
with the same accuracy and quality as a person.
That and much more on this episode
of the Microsoft Research Podcast.
Arul Menezes, welcome to the podcast today.
Thank you. I'm delighted to be here.
So you're a partner research manager at Microsoft Research, and you head the machine translation team,
which, if I'm not wrong, falls under the umbrella of human language technologies.
Yes.
What gets you up in the morning? What's the big goal of your team?
Well, translation is just a fascinating problem, right? I've been working on it for almost two decades now, and it never gets old because there's always something
interesting or unusual or unique about getting the translations right. The nice thing is that
we've been getting steadily better over the
last few years. So it's not a solved problem, but we're making great progress. So it's sort of like
a perfect problem to work on. So it's enough to get you out of bed and, you know, keep you going.
Yeah. It's not so hard that you give up and it's not solved yet.
You don't want to go back to bed.
Yeah. So your team has just made a major breakthrough in machine translation,
and we'll get into the technical weeds about how you did it in a bit.
But for now, tell us what you achieved and why it's noteworthy.
So the result we showed was that our latest research system
is essentially at parity with professional human translators.
And the way we showed that is that we got a public test set that's generally used in the research community of Chinese English news.
We had it translated by professional translators,
and we also had it translated by our latest research systems.
And then we gave it to some evaluators who are bilingual speakers. And of course,
it's a blind test, so they couldn't tell which was which. And at the end of the evaluation,
our system and the humans scored essentially the same. So, you know, the result is that for the
first time, really, we have a credible result that says that humans and machines are at parity
for machine translation. Now, of course,
keep in mind, this is a very specific domain. This is news, and it's one language pair. So,
you know, we don't want to sort of oversell it, but it is exciting.
What about the timing of it? You had had plans to do this, but did it come
when you expected?
The thing about research is you never know when those breakthroughs are going to come through, you know. So when we started this project
last year, we thought it would take a couple of years. But, you know, we made faster progress than
we expected. And then sometime last month, we were like, looks like we're there. We should just
publish. And that's what we did. Is this sort of like a Turing test for machine translation?
Which one did it, a computer or a human?
In a limited sense.
We didn't ask people to actually detect which was the human and which was the machine,
because there may be little tells like, you know,
maybe there's a certain capitalization pattern or whatever.
What we did was we had people just score the translation on a scale, like just a slider
really, and tell us how good the translation was.
So it was a very simple set of instructions that the evaluators got.
And the reason we do that is so that we can get very consistent results and people can
understand the instructions.
And so you score the translations sentence by
sentence, and then you take the averages across the different humans and the machine,
and it turned out they basically were the same score.
Why did you choose Chinese-English translation first?
So we wanted to pick a publicly used test set because, you know, we're trying to engage with
the research community here, and we wanted to have a test set that other people have worked on
that we could release all of our findings and results and evaluations.
There's an annual workshop in machine translation
that's been going on for the last 10 or more years called the WMT.
And so we use the same methodology that they use for evaluation.
And we also wanted to use the same test set.
And they recently added Chinese.
They used to be focused more on European languages, but they added Chinese. And so we thought that
would be a good one to tackle, especially because it's an important language pair. And, you know,
it's hard, but not too hard, obviously, at least as it turned out. You've had another very impressive
announcement recently, just this month even, that impacts what I can do with machine translation on my phone. And I'm all ears. What is it and why is it different from other
machine translation apps? Yeah, so we're super excited about that because, you know, we've had
a translator app for Android and Apple phones for a while. And one of the common use cases is,
of course, when people are traveling. And the number one request we get from users is, can I do translation on my phone, even though I'm not connected? Because when I'm
traveling, I don't always have a data plan. I'm not always connected with Wi-Fi at the point when
I'm trying to communicate with someone like a taxi driver or a waiter or a reception at a hotel.
And so we've had for a while what we call an offline pack. You can download this pack before
you travel. And then once you have that, you can do translations
on your phone without being connected to the cloud.
But the thing about these packs is that they haven't been using the latest neural net technology
because neural nets are very expensive, take a lot of computation, and no one's been able
to really run a neural machine translation on a phone before.
And so last year, we started working with a major phone manufacturer and they had a phone that had
a special neural chip. And we thought it would be super exciting to run neural translation offline
on the phone using this chip. And so this month, we have been working to improve the efficiency, do a lot of careful
engineering, and we managed to get it working on any phone without relying on the special hardware.
So what we released this month was that anyone who has an Android or iPhone can download these packs
and then they'll have neural translation on their phone. So that means even if they're not connected
to the cloud, they're going to get really fluent translations. So it's the latest cutting edge translation technology.
Yeah. Running right on your phone. Yeah. Super exciting.
I wish I had that last summer.
Me too, actually. Yeah. It's, you know, it's a very useful app when you travel. Yeah.
Is it unique to Microsoft Research and Microsoft in general?
Yeah. As far as I know, nobody else has neural translation running on the phone.
Now, this is only text translation. We don't yet have the speech recognition.
Are you working on that?
We are. We don't really have a date for that yet, but something that we're interested in.
I'll postpone my next trip so you've got it done.
Let's get specific about the technology behind MSR's research in machine translation.
You told me that neural network architectures are the foundation for the AI training systems. But your team used some additional training methods to help boost
your efforts to achieve human parity in Chinese English news translation. So let me ask you about
each one in turn, and let's start with a round-trip translation technique called dual learning. What
is it? How did it help? Right. So one of the techniques we used to improve the quality of our research system that reached
the human parity was what we call dual learning.
The way you train a regular machine translation system is typically with parallel data.
So these are previously translated documents in, say, Chinese and English that are aligned
at the sentence level.
And then the neural net model essentially learns to translate the sentence from Chinese into English,
and that's the signal that we use to train the models.
Now you can do the same thing in the opposite direction in English-Chinese.
So what we do with dual learning is now we couple those two systems and we train them jointly.
So you use the signal from the English to Chinese translation to improve the Chinese
to English and vice versa.
So it literally is very much like what a human would do where you might do a round-trip translation
where you translate from English to Chinese, but you're not sure if it's good.
So you translate back into English and see how it went.
And if you get it consistent, you have some faith that translation may be good.
And so this is
what we did. So it's basically a joint loss function for the two systems. And then there's
another thing you can do when you have this dual learning working, which is that in addition to the
parallel data, you can also use monolingual data. Let's say you have Chinese text, you can send it
through the Chinese to English system and then the English to Chinese system and then compare the results. And that's a signal you can use to train both systems.
So another technique you used is called deliberation networks. What is that and how
does that add to the translation accuracy? Right. So the other thing that our team did,
and I should say that both the dual learning and the deliberation network work was actually done by our partners in Microsoft Research Asia. The effort was a joint
effort of my team here in Redmond, which is the machine translation team, and the two teams in
Microsoft Research Beijing, the natural language group and the machine learning team there.
Both the dual learning and the deliberation network came out of the machine learning team
in MSR Beijing. The way deliberation networks work is essentially it's a two-pass translation
process. So you can think about it as creating a rough translation and then refining it. And,
you know, a human might do the same thing is where you essentially create a first draft and then you edit it. So the architecture of the deliberation network is that you have a
first pass neural network encoder decoder that produces the first translation. Then you have a
second pass, which takes both the original input in Chinese, as well as the first pass output.
And he takes both of those as inputs in parallel
and then produces a translation
by looking over both the original input
as well as the first pass input.
It's essentially learning, let's say,
which parts of the first pass translation to copy over, say,
and which parts maybe need to be changed
and the parts that it changed,
it would decide to look at the original.
So the output of the second pass is our final translation.
I mean, in theory, you could keep doing this,
but we just do two passes and that seems to be enough.
Yeah. I was actually going to ask that.
It's like how many passes is enough before you kind of land on it? I would imagine that after like two passes, you're likely to converge.
So the third tool that we talked about is called joint training or left to right, right to train a different system that produces the translation, again, one word at a time, but from right to left,
you actually get different translations. And the idea was, if you could make these true translations consistent, you might get better translation. And the reason is,
if you think about in many languages, when you produce a sentence, there's
later parts of the sentence that need to be consistent, say grammatically, or in terms of gender or number or pronoun
with something earlier in the sentence.
But sometimes you need something early in the sentence to be consistent with something
later in the sentence, but you haven't produced that yet.
So you don't know what to produce.
Whereas if you did it right to left, you would be able to get that right. So by forcing the left-to-right system and the right-to-left system to be consistent with each
other, we could improve the quality of the translation. And again, this is a very similar
iterative process to what we were talking about with dual learning, except that instead of the
consistency being Chinese to English and English to Chinese, it's left-to-right and right-to-left.
So what was the fourth technique that you added to the mix to get this human parity in the Consistency being Chinese to English and English to Chinese, it's left to right and right to left.
So what was the fourth technique that you added to the mix to get this human parity in the Chinese to English translation? Yeah, so we also did what's called system combination.
So we trained a number of different systems with different techniques, with variations on different techniques, with different initializations.
And then we took our best six systems and
did a combination in our case it was what's called a sentence level combination so it really is
just picking off the six which one is the best so essentially each of the six systems produce an
n best list like say they're 10 best candidates for translation so now you've got 60 translations
and you rescore them and pick the best. People have done system combination at the word level before where you take part of a translation from one and part of a translation from the other.
But that doesn't work very well with neural translation because you can really destroy the fluency of the sentence by just sort of cutting and pasting the 10 pieces from here and there.
Yeah, we've seen that done without machines.
Yeah.
It gets butchered in translation.
Most of us have fairly general machine translation needs,
but your research has addressed some of the needs
in a very domain-specific arena in the
form of Presentation Translator. Can you tell us more about that? Right. So, Presentation Translator
is this unique add-in that we have developed for PowerPoint, where when you are giving a
presentation, you just click the button and you can get transcripts of your lecture displayed on screen so that people
in the audience can follow along. In addition, the transcripts are made available to audience
members on their own phone. So they use our app and just enter a code and then they can connect
to the same transcription feed and they can get it either in the language of the speaker or in their own language.
And so essentially, with this one add-in, we're addressing two real needs.
One is for people who are deaf or hard of hearing, where the transcript can help them
follow along with what's going on in the classroom or in a lecture, and also language
learners, foreign students who can follow along in their
own language if they are not that familiar with the language of the speaker. And so we've had a
lot of excitement about this in the educational field with both school districts as well as
colleges. And in particular, the Rochester Institute of Technology, which has one of the
colleges in the universities called the National Technical Institute for the Deaf. And so they have a very large
student body of deaf students. And so they have been providing sign language interpretation.
This gave them an opportunity to expand the coverage by providing this transcription
in the classroom. So is it from text to text on the PowerPoint presentation to...
So it's the user speak.
It is.
Yeah, so the professor is lecturing and everything that they say is transcribed both on screen and on people's phones.
Oh my gosh.
And then because it's on their phone, they can also save the transcript and then that becomes class notes.
And the other thing that's really cool about Presentation Translator is that it uses the content of your PowerPoint.
This is why it's connected to PowerPoint.
It uses the content of your slides to customize the speech recognition system so that you can actually use the specialized terminology of the class and it'll be recognized.
So, you know, if someone's teaching a biology class, it'll recognize things like mitochondria or ribosome, which in
other contexts would not be recognized. So you told me about how you can use this with domain-specific
or business-specific needs as well. So tell us about that.
One of the things we're super excited about is that we have the ability to customize our machine translation system for the domain and the
terminology of specific companies.
We have a lot of customers who use translation to translate their documentation, their internal
communications, product listings.
And the way to get really high quality translation for all of these scenarios is to customize the translation to the terminology that's being used by that business.
Part of the challenge of machine translation is that human language can't be reduced to ones and zeros.
It's got nuance, it's got richness and fluidity.
And so there are detractors that start to criticize how unsophisticated machine
translation is. But you said that they're missing the point sort of of what the goal is. Talk about
that a little bit and how should we manage our expectations around machine translation?
Yeah. So, I mean, the kind of scenarios that we're focused on with machine translation today have to
do with sort of everyday needs that people have, whether you're a traveler or you maybe
want to read a website or a news article or a newspaper, or you're a company where you're
communicating with customers who speak a different language or you're communicating between different
branches of the enterprise that speak different languages.
Most of the language that is being
translated today is pretty prosaic. I mean, it's not that hard. Well, it is hard, but we've gotten
to the point where we can do a pretty good job of translating that kind of text. Of course,
you know, if you start getting into fiction and poetry, it is very hard and we're nowhere,
obviously, with that kind. But that's not our
goal at this point. So how would you define your goal? I think the goal for translation today
is to make the language barrier disappear for people in everyday contexts, you know,
at work, when they're traveling, so that they can communicate without a language barrier.
Right.
So that kind of leads into the idea that every language has its own formal grammar and semantics,
and it also has local patois, as it were.
And it often leads to humorous mistranslations. So how are machine learning researchers tackling that lost-in-translation problem
so machines don't end up making that classic video game,
mistranslation, all your base are belong to us?
There's two things.
With better techniques, we have gotten a lot better at producing fluent translation.
So we would not produce something like that today.
But it is still the case that we're very dependent on the data we have available.
So in the languages where we have sufficient data, we can do a really good job.
When you get to languages where there's not that much data, or you get to dialects or
variations of language where there's not that much data, it becomes a lot tougher.
And I think this is something machine translation shares with all AI and machine learning fields
is that, you know, we're very dependent on the data. There are ways to get iteratively better
by continually learning based on how people use your product, right? How much are you dealing interdisciplinarily with other fields?
You are computer scientists, right?
And your data is language, which is human and expressive all over the world.
Who do you bring in to help you?
So we have linguists on our team that make sure that we're translating things correctly.
So for example, one of the linguists on our team that, you know, make sure that we're translating things correctly. So for example,
one of the linguists on our team looks for things that our automatic metrics don't catch. So,
you know, every time we produce a new version of our translation system, we have various scoring functions. The one that we use, which is a very popular metric is called blur. And so that gives
you a single number that says, how well is your system doing? So, you know,
in principle, if the version of your system this month is, you know, slightly better blue score
than the version last month, you're like, great, it's better, you know, ship it. But then what
Lee, who's the linguist on my team does is she looks at it and tries to spot things that may
not be caught by that score. So for example, how are we doing with names?
How are we doing with capitalization?
How are we doing with dates and times and numbers?
There's a lot of phenomenon that are very noticeable to humans
that are not necessarily picked up by the automatic metric. trick. So let's talk about you for a second. How did you end up doing machine translation research
and Microsoft research? Yes, I was in a PhD program in sort of the systems area in the
computer science department at Stanford. And I
spent a couple of summers up here. And at the end of my second summer, I decided I wanted to stay.
And so I did. I just never went back. And I worked on a number of products at Microsoft.
But at some point, I wanted to get back into research. And so I moved to Microsoft Research and I started the translation project actually
in about the year 2000. So basically the year my daughter was born and now she's going off to
college. And you've watched her language grow over the years. It's actually when you're studying
language, listening to how kids learn language is fascinating. It's just wonderful. There's a spectrum here at Microsoft of,
you know, pure research to applied research, stuff that ends up in products. And you seem to straddle
what your work is about being in products, but also in the research phase.
Yeah, one of the things that's super exciting about our team is that, and it makes us somewhat
unique, I think, is we have everything from the basic research and translation to the web service that serves up the APIs, the cloud service that people
call to the apps that we have on the phone. So we have everything from the things that users are
directly using down to the basic research. It's all in one team. So when somebody comes up with
something cool, we can get it out to users very quickly. And that's very exciting.
I always ask my podcast guests my version of the what could possibly go wrong question,
which is, is there anything about your work in machine translation that keeps you up at night?
Well, so we always have this challenge that we are learning from the data and the data is
sometimes misleading. And so we have things that we do to try and clean up the data. We do have a
mechanism, for example, to be able to respond to those kinds of issues quickly. And it has happened.
We've had situations where somebody discovered a translation that we produced that was offensive
and posted it on Twitter.
And, you know, it kind of went viral and some people were upset about it.
And so we had to respond quickly and fix it.
And so we have people who are on call 24 hours a day to fix any issue that arises like that.
So it's a thing that literally does
keep somebody up at night. Definitely. At least doing the night shift version of it.
As we wrap up, Arul, what advice would you give to aspiring researchers that might be interested
in working in human language technologies? And why would someone want to come to Microsoft
Research to work on those problems?
So I think we live in an absolutely fascinating time, right? Like people have been working on AI for, or machine translation for that matter, for 50, 60 years. And for decades, there was a real
struggle. And I would say just in the last 10 years with the advent of deep learning, we're
making amazing progress towards these really, really
hard tasks that people at some point had almost given up hope, you know, that we would ever be
successful at recognizing speech or translating anywhere close to the level that a human can.
But here we are. It's a super exciting time. What's even more exciting is not only have we
made tremendous progress on the research side, but now all of those techniques are being put into products and they're impacting people
on a daily basis. And I think Microsoft's an amazing place to be doing this because
we have such a breadth. We have a range of products that go all the way from individual
users in their homes to multinational companies. And so we have just so many places that our technology can be used in.
The range of opportunity you have at Microsoft, I think, is incredible.
Arul Manazis, thank you for taking time to come out and talk to us today.
It's been really interesting.
Thank you.
Thank you.
To learn more about Arul Menezes and the exciting advances in machine translation,
visit microsoft.com slash research.