Microsoft Research Podcast - 104 - Going deep on deep learning with Dr. Jianfeng Gao

Episode Date: January 29, 2020

Dr. Jianfeng Gao is a veteran computer scientist, an IEEE Fellow and the current head of the Deep Learning Group at Microsoft Research. He and his team are exploring novel approaches to advancing the ...state-of-the-art on deep learning in areas like NLP, computer vision, multi-modal intelligence and conversational AI. Today, Dr. Gao gives us an overview of the deep learning landscape and talks about his latest work on Multi-task Deep Neural Networks, Unified Language Modeling and vision-language pre-training. He also unpacks the science behind task-oriented dialog systems as well as social chatbots like Microsoft Xiaoice, and gives us some great book recommendations along the way! https://www.microsoft.com/research  

Transcript
Discussion (0)
Starting point is 00:00:00 Historically, there are two approaches to achieve the goal. One is to use large data. The idea is that if I can collect all the data in the world, then I believe the representation learned from this data is universal, because I see all of them. The other approach is that since the goal of this representation is to serve different applications. How about I train the model using application-specific objective functions across many, many different applications? You're listening to the Microsoft Research Podcast, a show that brings you closer to the cutting edge of technology research and the scientists behind it. I'm your host, Gretchen Huizenga. Dr. John Fangao is a veteran computer scientist, an IEEE fellow,
Starting point is 00:00:54 and the current head of the Deep Learning Group at Microsoft Research. He and his team are exploring novel approaches to advancing the state-of-the-art on deep learning in areas like NLP, computer vision, multimodal intelligence, and conversational AI. Today, Dr. Gao gives us an overview of the deep learning landscape and talks about his latest work on multitask deep neural networks,
Starting point is 00:01:16 unified language modeling, and vision language pre-training. He also unpacks the science behind task-oriented dialogue systems, as well as social chatbots like Microsoft Show Ice, and gives us some great book recommendations along the way. That and much more on this episode of the Microsoft Research Podcast. John Feng Gao, welcome to the podcast. Thank you.
Starting point is 00:01:46 So you're a partner research manager of the Deep Learning Group at MSR. What's your big goal as a researcher yourself? And what's the big goal of your group? What gets you up in the morning? It's like all the world-class research teams. Our ultimate goal is to advance the state of the art. And we want to push the AI frontiers by using deep learning technology or developing new deep learning technologies. That's the goal I think every group has. But for us, because we
Starting point is 00:02:19 are a group at Microsoft, we also have a mission to transfer the latest deep learning and AI technologies into Microsoft products so that we can benefit millions of Microsoft users. Well, interestingly, as you talk about the deep learning group, as I understand it, that's a relatively new group here at Microsoft Research, but deep learning is not a new thing here. So tell us how and why this group actually came about. Yeah, deep learning has a long history, but I think the first deep learning model at that time was called the Neural Network Model. It was developed half a century ago. Right. But at this time, because the training data available for larger scale model learning is not available. So the
Starting point is 00:03:05 performance of these neural net models are not as good as the state-of-the-art model at that time. So deep learning only I think it took off in the last decade when the large amounts of training data is available and the larger scale training infrastructure, computing training data is available and the larger scale training infrastructure, computing training infrastructure is available. And Microsoft deep learning also has a long history. I remember back to 2012, the speech group at Microsoft Research already demonstrated the power of deep learning by applying them to acoustic modeling. They were able to reduce the error rate of the speech recognition system
Starting point is 00:03:47 by about 10% to 15%. That was considered a very significant milestone at that time. After almost 10 years hard work without any significant improvement, they used deep learning to hit the bar. Then in two years, the vision team, computer vision team at Microsoft developed an extremely deep model called ResNet. And they reached the human parity and won a lot of competitions. And I think the first deep learning group at Microsoft Research was founded back in 2014. At that time, our focus is to develop new deep learning technologies for natural language
Starting point is 00:04:28 processing and web search and a lot of business applications. In the beginning, we think that deep learning can be not only used to push the frontier of AI, but also to benefit Microsoft products. So there are two parts in the deep learning group. One is the research part, the other is the incubation part. I was managing the incubation part and Dr. Lee Teng was managing the research part. Then after two or three years, the incubation starts to show very promising business results internally. So they moved the team to an independent business incubation division.
Starting point is 00:05:08 Then in some sense, the big deep learning team is split into two parts. Then later on, they moved my team to Dynamics, asking me to build the real products for customers. And at that time, I have to make a choice.
Starting point is 00:05:22 So I got to either stay there to be a general manager of the new product team or move back to MSR. And so I decided to move back last year. So last year, we built a new deep learning group. This is probably the biggest research team at MSR AI. Talk a little bit more granularly about deep learning itself
Starting point is 00:05:44 and how your particular career has ebbed and flowed in the deep learning world. I joined Microsoft almost 20 years ago. Speech group was my first team. I worked on speech, then I worked on natural language processing, web search, machine translation, statistical machine learning, and even intelligent sales and marketing. I touched deep learning back to 2012 when Lee Dunn introduced me the speech deep learning model. At that time, I remember he was super excited and ran into my office saying, oh, we should build a deep learning model for natural language processing. I don't believe that. But anyway, we tried it. The first deep learning model for natural language processing. I don't believe that.
Starting point is 00:06:25 But anyway, we tried it. The first deep learning model we developed is called DSSM. It stands for Deep Structured Simulated Model. The idea is very simple. We take the web search scenario as a test case. The idea is that you have a query. You want to identify relevant documents, but unfortunately, documents are written by the author. Query is used by the users using very,
Starting point is 00:06:52 very different vocabulary and language. There's a mismatch. So the deep learning idea is to map both query and document into a common vector space we call semantic space. In that space, all these concepts are represented using vectors. And the distance between vectors measures the semantic similarity. The idea is very straightforward. Fortunately, we got a lot of Bing click data.
Starting point is 00:07:21 Use a SQL query and a click document. These are weak supervision training data. It's your query and the clicker document. These are weak supervision training data. We have tons of this. And then we train a deep learning model called DSSM. It's fantastic. Encouraged by this result, we decided to form a deep learning team. The key concept of deep learning is representation learning. Let's take a natural language example. Let's say natural language sentence consists of words and phrases. These are symbolic tokens. The good thing about these symbolic tokens is that people can understand them easily, but they are discrete, meaning that if you're given
Starting point is 00:08:02 two words, you won't ask question how similar they are. Deep learning is try to map all these words into semantical representations so that you can measure the semantic similarity. And this mapping is done through a nonlinear function. And the deep learning model in some sense is the implementation of this nonlinear function. And the deep learning model, in some sense, is the implementation of this nonlinear function. And it is a very effective implementation in the sense that you can add more and more layers, make them very deep, and you have different model architecture to capture different aspects of the input and even identify the features at a different abstract level. Then this model needs a large amount of data to train.
Starting point is 00:08:53 I think half century ago, we don't have the computer power to do this. Now we have, and we also have a large amount of training for this. That's why I think deep learning take off. Okay. Well, let's talk a little bit about these representations and some of the latest research that's going on today. In terms of the kinds of representations you're dealing with, we've been talking about symbolic representations, both in language and mathematics. And you're moving into a space where you're dealing more with neural representations. And those two things, that architecture is going to kind of set the stage
Starting point is 00:09:32 for the work that we're going to talk about in a minute. But I would like you to talk a little bit about both the definitions of symbolic representations and neural representations and why these neural representations represent an interesting and possibly fruitful line of research. Let's talk about two different spaces. One is called the symbolic space. The other is the neural space. They have different characteristics. The symbolic space, take natural language as an example, is what we are familiar with, where the concepts are
Starting point is 00:10:06 represented using words, phrases, and sentences. These are discrete. The problem of this space is that natural language is highly ambiguous. So the same concept can be represented using very different words and phrases, and the same words or sentence can mean totally different things given the context. But in the symbolic space, it's hard to tell. In the neural space, it's different. All the concepts are going to be represented
Starting point is 00:10:34 using vectors. And the distance between vectors measures the relationship at the semantic level. So we already talked about representation learning, which is the major task of deep learning. Deep learning, in some senses,
Starting point is 00:10:50 is to map all the knowledge from the symbolic space to neural space. Because in the neural space, all the concepts are represented using continuous vectors. It's a continuous space. It has a lot of very nice math properties. It's very easy to train. That's why if you have a large amount of data and you want to train a highly
Starting point is 00:11:14 nonlinear function, it's much easier to do so in the neural space than in the symbolic space. But the disadvantage of the neural spaces is not human comprehensible. Because if I give you, say that, okay, these two concepts are similar because the vectors of their representation are close to each other. How close they are? I don't know.
Starting point is 00:11:38 It's hard to explain. It's uninterpretable. It's not interpretable at all. That's why people believe that the neural network model is like a black box. It can give you very precise prediction, but it's hard to explain how the model come up with the prediction. This applies to some tasks like image recognition. Deep learning model does a great job for tasks like this, but give a different task, like a mass task. If I give you
Starting point is 00:12:11 a problem statement like, let's say the population of a city is 5,000, it increases by 10% every year. What's the population after 10 years? The deep learning will try to just map this text into a number without knowing how the numbers come up with. But in this particular case, we need the neurosymbolic computing.
Starting point is 00:12:38 Ideally, you need to identify how many steps you need to take to generate the result. And for each step, what are the functions? So this is a much tougher task. I don't think the current deep learning model can solve. All right. But that is something you're working on. Yes.
Starting point is 00:12:57 You're trying to figure out how you can move from symbolic representations to neural representations and also have them be interpretable. Yeah, exactly. Big task. Yeah, yeah. There's a book called Thinking Fast and Slow. In that book, the author described two different systems that drive the way we think. They call it system one and the system two. System one is like very intuitive, fast, and emotional. It's like you ask me something. I don't need to think.
Starting point is 00:13:30 I give you answer immediately because I already answered the similar questions many, many times. System two is slower, more logical, more derivative. It's like you need some reasoning, such as the question I just asked, right? more logical, more derivative. It's like you need some reasoning, such as the question I just asked, like the mass problem or the population of the city. You need to think harder.
Starting point is 00:13:55 I think that most of the state-of-the-art deep learning models are like system one. It's trained on large amounts of training data. Each training sample is input-out of pairs. So the model learns the mapping between input and output by fitting a nonlinear function on the data. That's it. Without knowing how exactly the results are generated.
Starting point is 00:14:19 But now we are working on, in some sense, system 2. That's a neural symbolic. You not only need to identify to generate the answer, but also needs to figure out the intermediate steps you follow to generate the answer. Your group has several areas of research interest, and I want you to be our tour guide today and take us on a couple of excursions to explore these areas. And let's start with an area called neural language modeling. So talk about some promising projects and lines of inquiry, particularly as they relate to neural symbolic reasoning and computing. Neural language model is not a new topic. It has
Starting point is 00:15:17 been there for many years. Only recently, Google proposed a neural language model called BERT, it achieved state-of-the-art results on many NLP tasks because they used a new neural network architecture called a transformer. So the idea of this model is the representation learning. Whatever tests they take, they will represent it using vectors. And we are working on the same problem, but we are taking a different approach. So we also want to learn a representation and try to make the representation as universal as possible, in the sense that the same representation can be used by many different applications. Historically, there are two approaches to achieve the goal.
Starting point is 00:16:07 One is to use large data. The idea is that if I can collect all the data in the world, then I believe the representation learned from this data is universal, because I see all of them. The other approach is that since the goal of this representation is to serve different applications, how about I train the model using application-specific objective functions across many, many different applications? So this is called multitask learning. So Microsoft
Starting point is 00:16:41 research is taking the multitask learning approach. So we have a model called MTDN, Unified Language Model. And that's MTDN, so multitask... It stands for Multitask Deep Neural Network. For these two models, the multitask learning is applied at a different stage, at the portraying stage and the fine-tuning stage. So this is the neural language model part. Mainly, I would say this is still like system one. Still back to the thinking fast.
Starting point is 00:17:12 Thinking fast. Gotcha. Fast thinking. That's a good anchor. Well, let's talk about an important line of work that you're tackling, and it falls under the umbrella of vision and language. You call it VL. Vision language.
Starting point is 00:17:23 Vision language. You call it VL. Vision language. Vision language. Give us a snapshot of the current VL landscape in terms of progress in the field, and then tell us what you're doing to advance the state of the art. This is called vision language. The idea is the same. We still learn the representation.
Starting point is 00:17:39 Now, since we are learning a hidden semantic space where all the objects will be represented as vectors, no matter the original media of the object, it could be a text, could be an image, could be a video. So remember we talked about the representation learning for natural language. Now we extend the concept, extend the modality for natural language to multi-modality to handle natural language, vision, and video. The idea is, okay, give me a video or image or text. I will represent them using vectors. By doing so, if we do it correctly, then this leads to many, many interesting applications.
Starting point is 00:18:30 For example, you can do image search. You just put a query, say, I want the image of sleeping. It will return all these images. See, that's cross modality because the query is in natural language and the return the results in image. And you can also do image captioning, for example, give you an image and the system will generate a description of the image automatically. This is very useful for, let's say, blind people. Yeah. Well, help me think though about other applications.
Starting point is 00:19:05 Other applications, as I said, for blind people. We have a big project called Seeing AI. Right. The idea is, let's say if you're a bride, you're walking on the street and you're wearing a glass. The glass will take pictures of the surroundings for you. And it immediately tell you, oh, there's a car, there's a boy. So captioning and audio. Audio. Then tell you what happens around you. And another project we are working on is called visual language navigation. The idea is we build a 3D environment. It's a simulation. But it's a 3D environment and they put a robot there. It's an simulation, but it's a 3D environment, and they put a robot there.
Starting point is 00:19:45 It's an agent. And you can ask the agent to achieve a task by giving the agent natural language instructions. Okay, go upstairs, turn left, open the door, grab a cup of coffee for me. Something like that. This is going to be very, very useful for scenarios like mixture reality, like HoloLens. I was just going to say you must be working with a lot of the researchers in VR and AR. Yes. These are sort of potential applications,
Starting point is 00:20:24 but we are at the early stage of developing this core technology in a simulated environment. So you're upstream in the VL category. And as it trickles down into the various other applications, people can adapt the technology to what they're working on. Let's talk about a third area. And I think this is one of the most fascinating right now, and that's conversational AI. I've had a couple people on the podcast already who've talked a little bit about this, Reham Mansour and Patrice Samar, who's head of the Machine Teaching Group. they're instantiating in the form of question answering agents, task-oriented dialogue systems or what we might call bespoke AI, and bots, chatbots. Yeah, these are all obviously different types of dialogues. Social chatbots are extremely interesting. Do you know Microsoft Shares? I know of it. Yeah, it's a very popular social chatbot.
Starting point is 00:21:25 It has attracted more than 600 million users worldwide. Is this in China or worldwide? It's deployed in five different countries. So it has Chinese version, has Japanese version, English version. It has five different languages. Yeah, it's very interesting. Do you have it? I have it on my WeChat.
Starting point is 00:21:48 All right, so tell me about it. Yeah, this is AI aging. But the design goal of this social chatbot is different from, let's say, task-oriented bot. Task-oriented mainly helps you accomplish a particular task. For example, you can use it to book a movie ticket, reserve a table in the restaurant. Get directions. Yeah, get directions. And the social chatbot is designed as an AI companion, which can eventually establish emotional connections with the user.
Starting point is 00:22:24 Wow. So you can treat it as a friend, as your friend. So an AI friend instead of an imaginary friend. Yeah, it's an AI friend. It can chat with you about all sorts of topics. It can also help you accomplish a few tasks, if they're simple enough. Right now, I want to dive a little deeper on the topic of neurosymbolic AI. And this is proposing an approach to AI that borrows from mathematical theory
Starting point is 00:22:50 on how the human brain encodes and processes symbols. And we've talked about it a little bit, but what are you hoping that you'll accomplish with neurosymbolic AI that we aren't accomplishing now? As I said, the key difference between this approach versus the regular deep learning model is the capability of reasoning. The deep learning model is like a black box. You cannot open it. So you take input and get output. This model can, on the fly,
Starting point is 00:23:21 identify the necessary components and assemble them on the fly. That's the key difference. In the older deep learning model, it's just one model, black box. Now it's not a black box. It's actually exactly like what people are thinking. When you face a problem, first of all, you divide it and conquer, right? You divide the complex problem into smaller ones. Then for each smaller ones, you identify, you search in to conquer, right? You divide the complex problem into smaller ones. Then for each smaller ones, you identify, you search in your memory, identify the solution. And you assemble all these solutions together to solve a problem. This problem could be
Starting point is 00:23:56 unseen before. Could be a new problem. That's the power of the neurosympathetic approach. So it sounds like, and I think this kind of goes back to the mission statement of your group is that you are working with deep learning toward artificial general intelligence. This is a very significant step toward that. And it's about the knowledge reusability, right? By learning the capability of decomposing the complex problem into simpler ones, you know how to solve
Starting point is 00:24:31 a new complex problem and reuse the existing technologies. This is the way we solve the problem. I think the neurosymbolic approach tries to mimic the way people solve problems. People, as I said, it's like system one, system two. For the neurothymolonic approach try to mimic the way people solve problems. Right.
Starting point is 00:24:45 People, as I said, it's like system one, system two. For these sophisticated problems, people's system is like system two. Right. You need to analyze the problem and then find the key steps.
Starting point is 00:25:00 And for each step, I need to find the solution. All right. So our audience is very technical. And I don't know if you could go in to a bit of a deeper dive on how you're doing this computationally, mathematically, to construct these neuro-symbolic architectures. Yeah, there are many different ways. The learning challenge is that we have a lot of data, but we don't have the labels for the intermediate steps.
Starting point is 00:25:33 So the model needs to learn these intermediate steps automatically. In some sense, these are hidden variables. There are many different ways of learning this. So there are different approaches. One approach is called reinforcement learning. You try to assemble different ways to general answer and if it doesn't give you answer, you twist back and try different combinations. So yeah, that's one way of learning this. As long as the model has the capability of learning all sorts of combinations in a very efficient way, we can solve this problem.
Starting point is 00:26:13 The idea is that, think about how people solve sophisticated problems. When we are young, we learn to solve these simple problems. Then we learn the skill. And we combine these basic skills to solve more sophisticated work. We try to mimic the human learning pattern using the neurosymbolic models. So in that case, you don't need to label a lot of data. You label some. Eventually, the model learns two things. One is it learns to solve
Starting point is 00:26:47 all these basic tasks. And more importantly, the model is going to learn how to ensemble these basic skills to solve more sophisticated tasks. The idea of pre-training models is getting a lot of attention right now and has been framed as AI in the big leagues or a new AI paradigm. So talk about the work going on across the industry in pre-trained models and what MSR is bringing to the game. The goal of these pre-training models is to learn a universal representation of the natural language. Then there are two strategies of learning this universal representation. One is to train the model on large amounts
Starting point is 00:27:34 of data. If you get all the data in the world, you can be pretty sure that the model trend is universal. The other is multitasking learning. And the unified language model is using the multi-task learning in the training stage. Okay. We grouped the language model into three different categories. Given the left and right, to predict the word in the middle. That's one task. The other task is giving input sentence, produce the output sentence. Second.
Starting point is 00:28:07 The third task is giving a sequence. You always want to predict the next word based on the history. So these are three very different tasks, cover a lot of natural language processing scenarios. And we use multitask learning for this unified language model. Given the training data, we use three different objective functions to learn jointly the model parameters. The main advantage of the Unified Language Model
Starting point is 00:28:34 is that it can be applied to both natural language understanding tasks and natural language generation task. AI is arguably the most powerful technology to emerge in the last century and is becoming ubiquitous in this century. Given the nature of the work you do and the potential to cause big disruptions, both in technology and in the culture or society, is there anything that keeps you up at night?
Starting point is 00:29:15 And if so, how are you working to anticipate and mitigate the negative consequences that might result from any of the work you're putting out? Yeah, there are a lot of open questions, especially at Microsoft, we are building AI products for millions of users. All users are very different. Take Microsoft's Shardless, the chatbot system example. In order to have a very engaging conversation, sometimes
Starting point is 00:29:46 the shy system will tell you some joke. You may find the joke very interesting, funny. But other people may find the joke offensive. So it's about culture. It's very difficult
Starting point is 00:30:02 to find the trade-off. You want the conversation interesting enough so that you engage with the people, but you also don't want to offend people. So there are a lot of guidance about who is in control. For example, if you want to switch a topic, do you allow your agent to switch a topic or agent always follow the topic of the user. And generally, people agree that for all the human machine systems, human
Starting point is 00:30:35 needs to be in control all the time. But in reality, there are a lot of exceptions for what happens if the agent notices that the user is going to hurt herself. For example, in one situation that we found, the user talked to the SHRs for seven hours. And it's already 2 a.m. in the morning. The SHRs would force the user to take a break. We have a lot of sort of rules embedded into the system to make sure that we build a system for good.
Starting point is 00:31:12 People are not going to misuse the AI technology for something that's not good. So are those, like you say, you're actually building those kinds of things in, like go to bed, it's past your bedtime?
Starting point is 00:31:25 Something like that. Yeah, I just remind you. Right. So let's drill in a little on this topic, just because I think one of the things that we think of when we think of dystopic manifestations of a technology that could convince us that it's human, where does the psychological... I think the entire research community is working together to set up some rules, to set up the right expectations for our users. For example, one rule I think, I believe it's true, is that you should never confuse users. She's talking to a bot or real human. You should never confuse users. She's talking to a bot or real human. You should never confuse users.
Starting point is 00:32:09 Forget about Xiao Ice for now and just talk about the other stuff you're working on. Are there any sort of big issues in your mind that don't have to do with, you know, users being too long with a chatbot or whatever, but kinds of unintended consequences that might occur from any of the other work? Well, for example, let's back to the deep learning model, right? Deep learning model is very powerful of prediction things. People use deep learning model for recommendation all the time. But there's a very serious limitation of these models. Is that the model can learn correlation but not the causation. For
Starting point is 00:32:54 example, if I want to hire software developer, then I got a lot of candidates. I asked the system to give me a recommendation. The deep learning model gave me a recommendation. You know, this guy is good. And I asked the system to give me a recommendation. The differentiating model give me a recommendation. You know, this guy is good. And then I ask the system, why? Because the candidate is a male. Then people will say, your system is wrong, it's biased. But actually, the system is not wrong.
Starting point is 00:33:20 The way we use the system is wrong. Because the system learns the strong correlation between the gender and the job title, but there's no causality. The system does not have the causality at all. A famous example is that there's a strong correlation between the rooster's crow and the sunrise, but it does not cause the sunrise at all. But these are the problems of these different models. People need to be aware of the limitations of the models so that they do not misuse them.
Starting point is 00:33:58 So one step further, are there ways that you can move towards causality? Yeah, there are a lot of ongoing works. There's a recent book called The Book of Why. The Book of Why. Yeah, The Book of Why by Jody Pierce. There are a lot of new models he's developing. One of the popular models is called the Bayesian network. Of course,
Starting point is 00:34:26 the Bayesian network can be used in many applications, but he believes this at least is a promising tool to implement the causal models. I'm getting a reading list from this podcast. It's awesome. Well, we've talked about your professional path, Zhang Feng. Tell us a little bit about your personal history. Where did you grow up? Where did you get interested in computer science? And how did you end up in AI research? I was born in Shanghai.
Starting point is 00:34:56 I grew up in Shanghai. And I studied design back to college. So I was not a computer science student at all. I learned to program only because I wanted to date a girl at that time. So I needed money. You learned to code so you could date a girl. So you could get that part. I love it.
Starting point is 00:35:27 Then when I was graduating in year 1999, Microsoft Research found a lab in China. I sent them a resume and got a chance to interview. And they accepted my application. That's it. Now, after that, I started to work on AI. Before that, I knew a little about AI. Okay, back up a little. What was your degree in design?
Starting point is 00:35:59 I got an undergraduate design, bachelor's degree in design. Then I got electronic, I got double E. Electronic engineering. Yeah, then computer science a little bit later because I got interested in computer science after. Finally, I got a computer science degree. A PhD? PhD, yeah. Did you do that in Shanghai or Beijing?
Starting point is 00:36:18 Shanghai. Shanghai. So 1999, you came to Microsoft Research. Yeah, in China. Okay, and then you came over here or? Then 2005, I moved to Raman and joined a product group at that time. My mission at that time was to build the first natural user interface for Microsoft Windows Vista. And we couldn't make it.
Starting point is 00:36:44 And after one year, I joined the Microsoft Research here. I think there are a lot more fundamental work to do before we can build a real system for users. Let's go upstream a little. Yeah.
Starting point is 00:36:57 Okay. Then I worked for eight years in Microsoft Research in an NLP group. And now you're partner research manager for the Deep Learning group. Yeah, yeah. What's one interesting thing
Starting point is 00:37:09 that people don't know about you? Maybe it's a personal trait or a hobby or a side quest that may have influenced your career as a researcher. I remember when I interviewed Microsoft Research, during the interview, I filled almost all the questions. And finally, I said, okay, it's hopeless.
Starting point is 00:37:30 I went home, and the next day I got a phone call saying, you're hired. In retrospect, I think I did not give the correct answer. I asked the right questions during the interview. I think it's very important for researchers to learn how to ask the right questions.
Starting point is 00:37:50 That's funny. How do you get a wrong answer in an interview? Because I was asked all the questions about speech and natural language. I had no idea at all. Remember that time he asked me to figure out an algorithm called VTuby. I never heard of that. Then I actually asked a lot of questions. And he answered part of it. Then later on he said, I cannot answer more questions because I answer this question, you will get the answer.
Starting point is 00:38:22 That shows that I asked the right questions. Let's close with some thoughts on the potential ahead. And here's your chance to talk to would-be researchers out there who will take the AI baton and run with it for the next couple decades. What advice or direction would you give to your future colleagues or even your future successors? I think first of all, you need to be passionate about research.
Starting point is 00:38:49 It's critical to identify the problem you really want to devote your lifetime to work on. That's number one. Number two, after you identify this problem you want to work on, stay focused. Number three,
Starting point is 00:39:02 keep your eyes open. That's my advice. Is that how you did yours? I think so. John Fangao, thank you for joining us today. Thanks for having me. To learn more about Dr. John Fangao and how researchers are going deeper on deep learning,
Starting point is 00:39:26 visit Microsoft.com slash research.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.