Microsoft Research Podcast - 024 - Not Lost in Translation with Arul Menezes

Episode Date: May 16, 2018

Humans are wired to communicate, but we don’t always understand each other. Especially when we don’t speak the same language. But Arul Menezes, the Partner Research Manager who heads MSR’s Machi...ne Translation team, is working to remove language barriers to help people communicate better. And with the help of some innovative machine learning techniques, and the combined brainpower of machine translation, natural language and machine learning teams in Redmond and Beijing, it’s happening sooner than anyone expected. Today, Menezes talks about how the advent of deep learning has enabled exciting advances in machine translation, including applications for people with disabilities, and gives us an inside look at the recent “human parity” milestone at Microsoft Research, where machines translated a news dataset from Chinese to English with the same accuracy and quality as a person.

Transcript
Discussion (0)
Starting point is 00:00:00 The thing about research is you never know when those breakthroughs are going to come through. So when we started this project last year, we thought it would take a couple of years. But we made faster progress than we expected. And then sometime last month, we were like, looks like we're there. We should just publish. And that's what we did. You're listening to the Microsoft Research Podcast, a show that brings you closer to the cutting edge of technology research and the scientists behind it. You're listening to the Microsoft Research Podcast, a show that brings you closer to the
Starting point is 00:00:25 cutting edge of technology research and the scientists behind it. I'm your host, Gretchen Huizinga. Humans are wired to communicate, but we don't always understand each other, especially when we don't speak the same language. But Arul Menezes, the partner research manager who heads MSR's machine translation team, is working to remove language barriers to help people communicate better. And with the help of some innovative machine learning techniques and the combined brainpower of machine translation, natural language, and machine learning teams in Redmond in Beijing, it's happening sooner than anyone expected. Today, Arul Menezes talks about how the advent of deep learning
Starting point is 00:01:11 has enabled exciting advances in machine translation, including applications for people with disabilities, and gives us an inside look at the recent human parity milestone at Microsoft Research, where machines translated a news dataset from Chinese to English with the same accuracy and quality as a person. That and much more on this episode of the Microsoft Research Podcast.
Starting point is 00:01:48 Arul Menezes, welcome to the podcast today. Thank you. I'm delighted to be here. So you're a partner research manager at Microsoft Research, and you head the machine translation team, which, if I'm not wrong, falls under the umbrella of human language technologies. Yes. What gets you up in the morning? What's the big goal of your team? Well, translation is just a fascinating problem, right? I've been working on it for almost two decades now, and it never gets old because there's always something interesting or unusual or unique about getting the translations right. The nice thing is that
Starting point is 00:02:23 we've been getting steadily better over the last few years. So it's not a solved problem, but we're making great progress. So it's sort of like a perfect problem to work on. So it's enough to get you out of bed and, you know, keep you going. Yeah. It's not so hard that you give up and it's not solved yet. You don't want to go back to bed. Yeah. So your team has just made a major breakthrough in machine translation, and we'll get into the technical weeds about how you did it in a bit. But for now, tell us what you achieved and why it's noteworthy.
Starting point is 00:02:55 So the result we showed was that our latest research system is essentially at parity with professional human translators. And the way we showed that is that we got a public test set that's generally used in the research community of Chinese English news. We had it translated by professional translators, and we also had it translated by our latest research systems. And then we gave it to some evaluators who are bilingual speakers. And of course, it's a blind test, so they couldn't tell which was which. And at the end of the evaluation, our system and the humans scored essentially the same. So, you know, the result is that for the
Starting point is 00:03:37 first time, really, we have a credible result that says that humans and machines are at parity for machine translation. Now, of course, keep in mind, this is a very specific domain. This is news, and it's one language pair. So, you know, we don't want to sort of oversell it, but it is exciting. What about the timing of it? You had had plans to do this, but did it come when you expected? The thing about research is you never know when those breakthroughs are going to come through, you know. So when we started this project last year, we thought it would take a couple of years. But, you know, we made faster progress than
Starting point is 00:04:17 we expected. And then sometime last month, we were like, looks like we're there. We should just publish. And that's what we did. Is this sort of like a Turing test for machine translation? Which one did it, a computer or a human? In a limited sense. We didn't ask people to actually detect which was the human and which was the machine, because there may be little tells like, you know, maybe there's a certain capitalization pattern or whatever. What we did was we had people just score the translation on a scale, like just a slider
Starting point is 00:04:50 really, and tell us how good the translation was. So it was a very simple set of instructions that the evaluators got. And the reason we do that is so that we can get very consistent results and people can understand the instructions. And so you score the translations sentence by sentence, and then you take the averages across the different humans and the machine, and it turned out they basically were the same score. Why did you choose Chinese-English translation first?
Starting point is 00:05:16 So we wanted to pick a publicly used test set because, you know, we're trying to engage with the research community here, and we wanted to have a test set that other people have worked on that we could release all of our findings and results and evaluations. There's an annual workshop in machine translation that's been going on for the last 10 or more years called the WMT. And so we use the same methodology that they use for evaluation. And we also wanted to use the same test set. And they recently added Chinese.
Starting point is 00:05:44 They used to be focused more on European languages, but they added Chinese. And so we thought that would be a good one to tackle, especially because it's an important language pair. And, you know, it's hard, but not too hard, obviously, at least as it turned out. You've had another very impressive announcement recently, just this month even, that impacts what I can do with machine translation on my phone. And I'm all ears. What is it and why is it different from other machine translation apps? Yeah, so we're super excited about that because, you know, we've had a translator app for Android and Apple phones for a while. And one of the common use cases is, of course, when people are traveling. And the number one request we get from users is, can I do translation on my phone, even though I'm not connected? Because when I'm traveling, I don't always have a data plan. I'm not always connected with Wi-Fi at the point when
Starting point is 00:06:33 I'm trying to communicate with someone like a taxi driver or a waiter or a reception at a hotel. And so we've had for a while what we call an offline pack. You can download this pack before you travel. And then once you have that, you can do translations on your phone without being connected to the cloud. But the thing about these packs is that they haven't been using the latest neural net technology because neural nets are very expensive, take a lot of computation, and no one's been able to really run a neural machine translation on a phone before. And so last year, we started working with a major phone manufacturer and they had a phone that had
Starting point is 00:07:13 a special neural chip. And we thought it would be super exciting to run neural translation offline on the phone using this chip. And so this month, we have been working to improve the efficiency, do a lot of careful engineering, and we managed to get it working on any phone without relying on the special hardware. So what we released this month was that anyone who has an Android or iPhone can download these packs and then they'll have neural translation on their phone. So that means even if they're not connected to the cloud, they're going to get really fluent translations. So it's the latest cutting edge translation technology. Yeah. Running right on your phone. Yeah. Super exciting. I wish I had that last summer.
Starting point is 00:07:56 Me too, actually. Yeah. It's, you know, it's a very useful app when you travel. Yeah. Is it unique to Microsoft Research and Microsoft in general? Yeah. As far as I know, nobody else has neural translation running on the phone. Now, this is only text translation. We don't yet have the speech recognition. Are you working on that? We are. We don't really have a date for that yet, but something that we're interested in. I'll postpone my next trip so you've got it done. Let's get specific about the technology behind MSR's research in machine translation.
Starting point is 00:08:41 You told me that neural network architectures are the foundation for the AI training systems. But your team used some additional training methods to help boost your efforts to achieve human parity in Chinese English news translation. So let me ask you about each one in turn, and let's start with a round-trip translation technique called dual learning. What is it? How did it help? Right. So one of the techniques we used to improve the quality of our research system that reached the human parity was what we call dual learning. The way you train a regular machine translation system is typically with parallel data. So these are previously translated documents in, say, Chinese and English that are aligned at the sentence level.
Starting point is 00:09:20 And then the neural net model essentially learns to translate the sentence from Chinese into English, and that's the signal that we use to train the models. Now you can do the same thing in the opposite direction in English-Chinese. So what we do with dual learning is now we couple those two systems and we train them jointly. So you use the signal from the English to Chinese translation to improve the Chinese to English and vice versa. So it literally is very much like what a human would do where you might do a round-trip translation where you translate from English to Chinese, but you're not sure if it's good.
Starting point is 00:09:58 So you translate back into English and see how it went. And if you get it consistent, you have some faith that translation may be good. And so this is what we did. So it's basically a joint loss function for the two systems. And then there's another thing you can do when you have this dual learning working, which is that in addition to the parallel data, you can also use monolingual data. Let's say you have Chinese text, you can send it through the Chinese to English system and then the English to Chinese system and then compare the results. And that's a signal you can use to train both systems. So another technique you used is called deliberation networks. What is that and how
Starting point is 00:10:35 does that add to the translation accuracy? Right. So the other thing that our team did, and I should say that both the dual learning and the deliberation network work was actually done by our partners in Microsoft Research Asia. The effort was a joint effort of my team here in Redmond, which is the machine translation team, and the two teams in Microsoft Research Beijing, the natural language group and the machine learning team there. Both the dual learning and the deliberation network came out of the machine learning team in MSR Beijing. The way deliberation networks work is essentially it's a two-pass translation process. So you can think about it as creating a rough translation and then refining it. And, you know, a human might do the same thing is where you essentially create a first draft and then you edit it. So the architecture of the deliberation network is that you have a
Starting point is 00:11:30 first pass neural network encoder decoder that produces the first translation. Then you have a second pass, which takes both the original input in Chinese, as well as the first pass output. And he takes both of those as inputs in parallel and then produces a translation by looking over both the original input as well as the first pass input. It's essentially learning, let's say, which parts of the first pass translation to copy over, say,
Starting point is 00:12:00 and which parts maybe need to be changed and the parts that it changed, it would decide to look at the original. So the output of the second pass is our final translation. I mean, in theory, you could keep doing this, but we just do two passes and that seems to be enough. Yeah. I was actually going to ask that. It's like how many passes is enough before you kind of land on it? I would imagine that after like two passes, you're likely to converge.
Starting point is 00:12:48 So the third tool that we talked about is called joint training or left to right, right to train a different system that produces the translation, again, one word at a time, but from right to left, you actually get different translations. And the idea was, if you could make these true translations consistent, you might get better translation. And the reason is, if you think about in many languages, when you produce a sentence, there's later parts of the sentence that need to be consistent, say grammatically, or in terms of gender or number or pronoun with something earlier in the sentence. But sometimes you need something early in the sentence to be consistent with something later in the sentence, but you haven't produced that yet. So you don't know what to produce.
Starting point is 00:13:22 Whereas if you did it right to left, you would be able to get that right. So by forcing the left-to-right system and the right-to-left system to be consistent with each other, we could improve the quality of the translation. And again, this is a very similar iterative process to what we were talking about with dual learning, except that instead of the consistency being Chinese to English and English to Chinese, it's left-to-right and right-to-left. So what was the fourth technique that you added to the mix to get this human parity in the Consistency being Chinese to English and English to Chinese, it's left to right and right to left. So what was the fourth technique that you added to the mix to get this human parity in the Chinese to English translation? Yeah, so we also did what's called system combination. So we trained a number of different systems with different techniques, with variations on different techniques, with different initializations. And then we took our best six systems and
Starting point is 00:14:06 did a combination in our case it was what's called a sentence level combination so it really is just picking off the six which one is the best so essentially each of the six systems produce an n best list like say they're 10 best candidates for translation so now you've got 60 translations and you rescore them and pick the best. People have done system combination at the word level before where you take part of a translation from one and part of a translation from the other. But that doesn't work very well with neural translation because you can really destroy the fluency of the sentence by just sort of cutting and pasting the 10 pieces from here and there. Yeah, we've seen that done without machines. Yeah. It gets butchered in translation.
Starting point is 00:14:57 Most of us have fairly general machine translation needs, but your research has addressed some of the needs in a very domain-specific arena in the form of Presentation Translator. Can you tell us more about that? Right. So, Presentation Translator is this unique add-in that we have developed for PowerPoint, where when you are giving a presentation, you just click the button and you can get transcripts of your lecture displayed on screen so that people in the audience can follow along. In addition, the transcripts are made available to audience members on their own phone. So they use our app and just enter a code and then they can connect
Starting point is 00:15:39 to the same transcription feed and they can get it either in the language of the speaker or in their own language. And so essentially, with this one add-in, we're addressing two real needs. One is for people who are deaf or hard of hearing, where the transcript can help them follow along with what's going on in the classroom or in a lecture, and also language learners, foreign students who can follow along in their own language if they are not that familiar with the language of the speaker. And so we've had a lot of excitement about this in the educational field with both school districts as well as colleges. And in particular, the Rochester Institute of Technology, which has one of the
Starting point is 00:16:23 colleges in the universities called the National Technical Institute for the Deaf. And so they have a very large student body of deaf students. And so they have been providing sign language interpretation. This gave them an opportunity to expand the coverage by providing this transcription in the classroom. So is it from text to text on the PowerPoint presentation to... So it's the user speak. It is. Yeah, so the professor is lecturing and everything that they say is transcribed both on screen and on people's phones. Oh my gosh.
Starting point is 00:16:56 And then because it's on their phone, they can also save the transcript and then that becomes class notes. And the other thing that's really cool about Presentation Translator is that it uses the content of your PowerPoint. This is why it's connected to PowerPoint. It uses the content of your slides to customize the speech recognition system so that you can actually use the specialized terminology of the class and it'll be recognized. So, you know, if someone's teaching a biology class, it'll recognize things like mitochondria or ribosome, which in other contexts would not be recognized. So you told me about how you can use this with domain-specific or business-specific needs as well. So tell us about that. One of the things we're super excited about is that we have the ability to customize our machine translation system for the domain and the
Starting point is 00:17:47 terminology of specific companies. We have a lot of customers who use translation to translate their documentation, their internal communications, product listings. And the way to get really high quality translation for all of these scenarios is to customize the translation to the terminology that's being used by that business. Part of the challenge of machine translation is that human language can't be reduced to ones and zeros. It's got nuance, it's got richness and fluidity. And so there are detractors that start to criticize how unsophisticated machine translation is. But you said that they're missing the point sort of of what the goal is. Talk about
Starting point is 00:18:33 that a little bit and how should we manage our expectations around machine translation? Yeah. So, I mean, the kind of scenarios that we're focused on with machine translation today have to do with sort of everyday needs that people have, whether you're a traveler or you maybe want to read a website or a news article or a newspaper, or you're a company where you're communicating with customers who speak a different language or you're communicating between different branches of the enterprise that speak different languages. Most of the language that is being translated today is pretty prosaic. I mean, it's not that hard. Well, it is hard, but we've gotten
Starting point is 00:19:11 to the point where we can do a pretty good job of translating that kind of text. Of course, you know, if you start getting into fiction and poetry, it is very hard and we're nowhere, obviously, with that kind. But that's not our goal at this point. So how would you define your goal? I think the goal for translation today is to make the language barrier disappear for people in everyday contexts, you know, at work, when they're traveling, so that they can communicate without a language barrier. Right. So that kind of leads into the idea that every language has its own formal grammar and semantics,
Starting point is 00:19:55 and it also has local patois, as it were. And it often leads to humorous mistranslations. So how are machine learning researchers tackling that lost-in-translation problem so machines don't end up making that classic video game, mistranslation, all your base are belong to us? There's two things. With better techniques, we have gotten a lot better at producing fluent translation. So we would not produce something like that today. But it is still the case that we're very dependent on the data we have available.
Starting point is 00:20:26 So in the languages where we have sufficient data, we can do a really good job. When you get to languages where there's not that much data, or you get to dialects or variations of language where there's not that much data, it becomes a lot tougher. And I think this is something machine translation shares with all AI and machine learning fields is that, you know, we're very dependent on the data. There are ways to get iteratively better by continually learning based on how people use your product, right? How much are you dealing interdisciplinarily with other fields? You are computer scientists, right? And your data is language, which is human and expressive all over the world.
Starting point is 00:21:16 Who do you bring in to help you? So we have linguists on our team that make sure that we're translating things correctly. So for example, one of the linguists on our team that, you know, make sure that we're translating things correctly. So for example, one of the linguists on our team looks for things that our automatic metrics don't catch. So, you know, every time we produce a new version of our translation system, we have various scoring functions. The one that we use, which is a very popular metric is called blur. And so that gives you a single number that says, how well is your system doing? So, you know, in principle, if the version of your system this month is, you know, slightly better blue score than the version last month, you're like, great, it's better, you know, ship it. But then what
Starting point is 00:21:54 Lee, who's the linguist on my team does is she looks at it and tries to spot things that may not be caught by that score. So for example, how are we doing with names? How are we doing with capitalization? How are we doing with dates and times and numbers? There's a lot of phenomenon that are very noticeable to humans that are not necessarily picked up by the automatic metric. trick. So let's talk about you for a second. How did you end up doing machine translation research and Microsoft research? Yes, I was in a PhD program in sort of the systems area in the computer science department at Stanford. And I
Starting point is 00:22:46 spent a couple of summers up here. And at the end of my second summer, I decided I wanted to stay. And so I did. I just never went back. And I worked on a number of products at Microsoft. But at some point, I wanted to get back into research. And so I moved to Microsoft Research and I started the translation project actually in about the year 2000. So basically the year my daughter was born and now she's going off to college. And you've watched her language grow over the years. It's actually when you're studying language, listening to how kids learn language is fascinating. It's just wonderful. There's a spectrum here at Microsoft of, you know, pure research to applied research, stuff that ends up in products. And you seem to straddle what your work is about being in products, but also in the research phase.
Starting point is 00:23:37 Yeah, one of the things that's super exciting about our team is that, and it makes us somewhat unique, I think, is we have everything from the basic research and translation to the web service that serves up the APIs, the cloud service that people call to the apps that we have on the phone. So we have everything from the things that users are directly using down to the basic research. It's all in one team. So when somebody comes up with something cool, we can get it out to users very quickly. And that's very exciting. I always ask my podcast guests my version of the what could possibly go wrong question, which is, is there anything about your work in machine translation that keeps you up at night? Well, so we always have this challenge that we are learning from the data and the data is
Starting point is 00:24:26 sometimes misleading. And so we have things that we do to try and clean up the data. We do have a mechanism, for example, to be able to respond to those kinds of issues quickly. And it has happened. We've had situations where somebody discovered a translation that we produced that was offensive and posted it on Twitter. And, you know, it kind of went viral and some people were upset about it. And so we had to respond quickly and fix it. And so we have people who are on call 24 hours a day to fix any issue that arises like that. So it's a thing that literally does
Starting point is 00:25:06 keep somebody up at night. Definitely. At least doing the night shift version of it. As we wrap up, Arul, what advice would you give to aspiring researchers that might be interested in working in human language technologies? And why would someone want to come to Microsoft Research to work on those problems? So I think we live in an absolutely fascinating time, right? Like people have been working on AI for, or machine translation for that matter, for 50, 60 years. And for decades, there was a real struggle. And I would say just in the last 10 years with the advent of deep learning, we're making amazing progress towards these really, really hard tasks that people at some point had almost given up hope, you know, that we would ever be
Starting point is 00:25:50 successful at recognizing speech or translating anywhere close to the level that a human can. But here we are. It's a super exciting time. What's even more exciting is not only have we made tremendous progress on the research side, but now all of those techniques are being put into products and they're impacting people on a daily basis. And I think Microsoft's an amazing place to be doing this because we have such a breadth. We have a range of products that go all the way from individual users in their homes to multinational companies. And so we have just so many places that our technology can be used in. The range of opportunity you have at Microsoft, I think, is incredible. Arul Manazis, thank you for taking time to come out and talk to us today.
Starting point is 00:26:38 It's been really interesting. Thank you. Thank you. To learn more about Arul Menezes and the exciting advances in machine translation, visit microsoft.com slash research.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.