Microsoft Research Podcast - 104 - Going deep on deep learning with Dr. Jianfeng Gao
Episode Date: January 29, 2020Dr. Jianfeng Gao is a veteran computer scientist, an IEEE Fellow and the current head of the Deep Learning Group at Microsoft Research. He and his team are exploring novel approaches to advancing the ...state-of-the-art on deep learning in areas like NLP, computer vision, multi-modal intelligence and conversational AI. Today, Dr. Gao gives us an overview of the deep learning landscape and talks about his latest work on Multi-task Deep Neural Networks, Unified Language Modeling and vision-language pre-training. He also unpacks the science behind task-oriented dialog systems as well as social chatbots like Microsoft Xiaoice, and gives us some great book recommendations along the way! https://www.microsoft.com/research Â
Transcript
Discussion (0)
Historically, there are two approaches to achieve the goal. One is to use large data.
The idea is that if I can collect all the data in the world, then I believe the representation
learned from this data is universal, because I see all of them. The other approach is that
since the goal of this representation is to serve different applications. How about I train the model using
application-specific objective functions across many, many different applications?
You're listening to the Microsoft Research Podcast, a show that brings you closer to
the cutting edge of technology research and the scientists behind it. I'm your host, Gretchen Huizenga.
Dr. John Fangao is a veteran computer scientist, an IEEE fellow,
and the current head of the Deep Learning Group at Microsoft Research.
He and his team are exploring novel approaches to advancing the state-of-the-art on deep learning
in areas like NLP, computer vision,
multimodal intelligence, and conversational AI.
Today, Dr. Gao gives us an overview
of the deep learning landscape
and talks about his latest work
on multitask deep neural networks,
unified language modeling, and vision language pre-training.
He also unpacks the science
behind task-oriented dialogue systems,
as well as social chatbots like Microsoft
Show Ice, and gives us some great book recommendations along the way.
That and much more on this episode of the Microsoft Research Podcast.
John Feng Gao, welcome to the podcast.
Thank you.
So you're a partner research manager of the Deep Learning Group at MSR.
What's your big goal as a researcher yourself?
And what's the big goal of your group?
What gets you up in the morning?
It's like all the world-class research teams.
Our ultimate goal is to advance the state of the art.
And we want to push the AI frontiers by using deep learning technology or developing new
deep learning technologies. That's the goal I think every group has. But for us, because we
are a group at Microsoft, we also have a mission to transfer the latest deep learning and AI
technologies into Microsoft products so that we can benefit millions of Microsoft users.
Well, interestingly, as you talk about the deep learning group, as I understand it,
that's a relatively new group here at Microsoft Research, but deep learning
is not a new thing here. So tell us how and why this
group actually came about. Yeah, deep learning has a long history, but I think the first deep learning
model at that time was called the Neural Network Model. It was developed half a century ago.
Right. But at this time, because the training data available for larger scale model learning is not available. So the
performance of these neural net models are not as good as the state-of-the-art
model at that time. So deep learning only I think it took off in the last
decade when the large amounts of training data is available and the
larger scale training infrastructure, computing training data is available and the larger scale training infrastructure, computing training infrastructure is available.
And Microsoft deep learning also has a long history.
I remember back to 2012, the speech group at Microsoft Research already demonstrated
the power of deep learning by applying them to acoustic modeling.
They were able to reduce the error rate of the speech recognition system
by about 10% to 15%.
That was considered a very significant milestone at that time.
After almost 10 years hard work without any significant improvement, they used deep learning
to hit the bar.
Then in two years, the vision team, computer vision team at Microsoft developed an extremely deep model called ResNet.
And they reached the human parity and won a lot of competitions.
And I think the first deep learning group at Microsoft Research was founded back in 2014.
At that time, our focus is to develop new deep learning technologies for natural language
processing and web search and a lot of business applications.
In the beginning, we think that deep learning can be not only used to push the frontier
of AI, but also to benefit Microsoft products.
So there are two parts in the deep learning group.
One is the research part, the other is the incubation part. I was managing the
incubation part and Dr. Lee Teng was managing the research part. Then after
two or three years, the incubation starts to show very promising business results
internally. So they moved the team to an independent business incubation division.
Then in some sense,
the big deep learning team
is split into two parts.
Then later on,
they moved my team to Dynamics,
asking me to build the real products for customers.
And at that time,
I have to make a choice.
So I got to either stay there
to be a general manager of the new product team
or move back to MSR.
And so I decided to move back last year.
So last year, we built a new deep learning group.
This is probably the biggest research team at MSR AI.
Talk a little bit more granularly
about deep learning itself
and how your particular career has ebbed and flowed in the deep learning world.
I joined Microsoft almost 20 years ago.
Speech group was my first team.
I worked on speech, then I worked on natural language processing, web search, machine translation, statistical machine learning, and even intelligent sales and marketing.
I touched deep learning back to 2012 when Lee Dunn introduced me the speech deep learning model.
At that time, I remember he was super excited and ran into my office saying,
oh, we should build a deep learning model for natural language processing.
I don't believe that. But anyway, we tried it. The first deep learning model for natural language processing. I don't believe that.
But anyway, we tried it.
The first deep learning model we developed is called DSSM.
It stands for Deep Structured Simulated Model.
The idea is very simple.
We take the web search scenario as a test case.
The idea is that you have a query.
You want to identify relevant documents,
but unfortunately, documents are written by the author. Query is used by the users using very,
very different vocabulary and language. There's a mismatch. So the deep learning idea is to map
both query and document into a common vector space we call semantic space.
In that space, all these concepts
are represented using vectors.
And the distance between vectors measures
the semantic similarity.
The idea is very straightforward.
Fortunately, we got a lot of Bing click data.
Use a SQL query and a click document.
These are weak supervision training data. It's your query and the clicker document. These are weak supervision
training data. We have tons of this. And then we train a deep learning model called DSSM.
It's fantastic. Encouraged by this result, we decided to form a deep learning team.
The key concept of deep learning is representation learning. Let's take a natural language example.
Let's say natural language sentence consists of words and phrases. These are
symbolic tokens. The good thing about these symbolic tokens is that people can
understand them easily, but they are discrete, meaning that if you're given
two words, you won't ask question how similar they are.
Deep learning is try to map all these words into semantical representations so that you
can measure the semantic similarity. And this mapping is done through a nonlinear function.
And the deep learning model in some sense is the implementation of this nonlinear function. And the deep learning model, in some sense, is the implementation of this
nonlinear function. And it is a very effective implementation in the sense that you can add
more and more layers, make them very deep, and you have different model architecture to capture
different aspects of the input and even identify the features at a different abstract level.
Then this model needs a large amount of data to train.
I think half century ago, we don't have the computer power to do this.
Now we have, and we also have a large amount of training for this.
That's why I think deep learning take off.
Okay. Well, let's talk a little bit about these representations and some of the latest research
that's going on today. In terms of the kinds of representations you're dealing with, we've been
talking about symbolic representations, both in language and mathematics. And you're moving into a space where you're dealing more with neural representations.
And those two things,
that architecture is going to kind of set the stage
for the work that we're going to talk about in a minute.
But I would like you to talk a little bit about
both the definitions of symbolic representations
and neural representations
and why these neural representations represent an interesting and
possibly fruitful line of research. Let's talk about two different spaces. One is called the
symbolic space. The other is the neural space. They have different characteristics. The symbolic
space, take natural language as an example, is what we are familiar with, where the concepts are
represented using words, phrases, and sentences. These are discrete. The problem of this space
is that natural language is highly ambiguous. So the same concept can be represented using
very different words and phrases, and the same words or sentence can mean totally different things
given the context.
But in the symbolic space,
it's hard to tell.
In the neural space, it's different.
All the concepts are going to be represented
using vectors.
And the distance between vectors
measures the relationship
at the semantic level.
So we already talked about
representation learning,
which is the major task of deep learning.
Deep learning, in some senses,
is to map all the knowledge
from the symbolic space to neural space.
Because in the neural space,
all the concepts are represented
using continuous vectors.
It's a continuous space.
It has a lot of very nice math properties.
It's very easy to train. That's why if you have a large amount of data and you want to train a highly
nonlinear function, it's much easier to do so in the neural space than in the symbolic space.
But the disadvantage of the neural spaces is not human comprehensible.
Because if I give you, say that,
okay, these two concepts are similar
because the vectors of their representation
are close to each other.
How close they are?
I don't know.
It's hard to explain.
It's uninterpretable.
It's not interpretable at all.
That's why people believe that the neural network model is like a black box.
It can give you very precise prediction, but it's hard to explain how the model come up
with the prediction.
This applies to some tasks like image recognition. Deep learning model does a great job for tasks like this,
but give a different task, like a mass task. If I give you
a problem statement like, let's say
the population of a city is 5,000, it
increases by 10% every year. What's the population
after 10 years?
The deep learning will try to just map this text
into a number without knowing how the numbers come up with.
But in this particular case,
we need the neurosymbolic computing.
Ideally, you need to identify how many steps
you need to take to generate the result.
And for each step, what are the functions?
So this is a much tougher task.
I don't think the current deep learning model can solve.
All right.
But that is something you're working on.
Yes.
You're trying to figure out how you can move from symbolic representations to neural representations
and also have them be interpretable.
Yeah, exactly.
Big task.
Yeah, yeah. There's a book called Thinking Fast and Slow. In that book, the author
described two different systems that drive the way we think. They call it system one and the system two. System one is like very intuitive, fast, and emotional.
It's like you ask me something.
I don't need to think.
I give you answer immediately
because I already answered the similar questions many, many times.
System two is slower, more logical, more derivative.
It's like you need some reasoning, such as the question I just asked, right? more logical, more derivative.
It's like you need some reasoning,
such as the question I just asked,
like the mass problem or the population of the city.
You need to think harder.
I think that most of the state-of-the-art
deep learning models are like system one.
It's trained on large amounts of training data.
Each training sample is input-out of pairs.
So the model learns the mapping between input and output
by fitting a nonlinear function on the data.
That's it.
Without knowing how exactly the results are generated.
But now we are working on, in some sense, system 2.
That's a neural symbolic. You not only need to identify to
generate the answer, but also needs to figure out the intermediate steps you follow to generate the answer. Your group has several areas of research interest, and I want you to be our tour guide
today and take us on a couple of excursions to explore these areas.
And let's start with an area called neural language modeling.
So talk about some promising
projects and lines of inquiry, particularly as they relate to neural
symbolic reasoning and computing. Neural language model is not a new topic. It has
been there for many years. Only recently, Google proposed a neural language model called BERT,
it achieved state-of-the-art results on many NLP tasks because they used a new neural network architecture called a transformer.
So the idea of this model is the representation learning.
Whatever tests they take, they will represent it using vectors.
And we are working
on the same problem, but we are taking a different approach. So we also want to learn a representation
and try to make the representation as universal as possible, in the sense that the same representation
can be used by many different applications. Historically, there are two approaches to achieve the goal.
One is to use large data.
The idea is that if I can collect all the data in the world,
then I believe the representation learned from this data is universal,
because I see all of them.
The other approach is that since the goal of this representation
is to serve different
applications, how about I train the model using application-specific objective functions
across many, many different applications? So this is called multitask learning. So Microsoft
research is taking the multitask learning approach. So we have a model called MTDN, Unified Language Model.
And that's MTDN, so multitask...
It stands for Multitask Deep Neural Network.
For these two models, the multitask learning is applied at a different stage,
at the portraying stage and the fine-tuning stage.
So this is the neural language model part.
Mainly, I would say this is still like system one.
Still back to the thinking fast.
Thinking fast.
Gotcha.
Fast thinking.
That's a good anchor.
Well, let's talk about an important line of work that you're tackling,
and it falls under the umbrella of vision and language.
You call it VL.
Vision language.
Vision language. You call it VL. Vision language. Vision language.
Give us a snapshot of the current VL landscape
in terms of progress in the field,
and then tell us what you're doing
to advance the state of the art.
This is called vision language.
The idea is the same.
We still learn the representation.
Now, since we are learning
a hidden semantic space
where all the objects will be represented
as vectors, no matter the original media of the object, it could be a text, could be an
image, could be a video.
So remember we talked about the representation learning for natural language. Now we extend the concept, extend the modality
for natural language to multi-modality to handle natural language, vision, and video.
The idea is, okay, give me a video or image or text. I will represent them using vectors. By doing so, if we do it correctly, then this leads to many, many interesting applications.
For example, you can do image search.
You just put a query, say, I want the image of sleeping.
It will return all these images.
See, that's cross modality because the query is in natural language and the return
the results in image. And you can also do image captioning, for example, give you an
image and the system will generate a description of the image automatically. This is very useful
for, let's say, blind people.
Yeah. Well, help me think though about other applications.
Other applications, as I said, for blind people. We have a big project called Seeing AI.
Right.
The idea is, let's say if you're a bride, you're walking on the street and you're wearing a glass. The glass will take pictures of the surroundings for you. And it immediately tell you, oh, there's a car,
there's a boy.
So captioning and audio.
Audio. Then tell you what happens around you. And another project we are working on
is called visual language navigation. The idea is we build a 3D environment. It's a
simulation. But it's a 3D environment and they put a robot there. It's an simulation, but it's a 3D environment, and they put a robot there.
It's an agent.
And you can ask the agent to achieve a task by giving the agent natural language instructions.
Okay, go upstairs, turn left, open the door, grab a cup of coffee for me.
Something like that. This is going to be very, very useful for scenarios like mixture reality,
like HoloLens.
I was just going to say you must be working with a lot of the researchers in VR and AR.
Yes.
These are sort of potential applications,
but we are at the early stage of developing this core technology in a simulated environment.
So you're upstream in the VL category.
And as it trickles down into the various other applications, people can adapt the technology to what they're working on.
Let's talk about a third area.
And I think this is one of the most fascinating right now, and that's conversational AI. I've had a couple people on the podcast already who've talked a little bit about this, Reham Mansour and Patrice Samar, who's head of the Machine Teaching Group. they're instantiating in the form of question answering agents, task-oriented dialogue systems
or what we might call bespoke AI, and bots, chatbots. Yeah, these are all obviously different
types of dialogues. Social chatbots are extremely interesting. Do you know Microsoft Shares?
I know of it. Yeah, it's a very popular social chatbot.
It has attracted more than 600 million users worldwide.
Is this in China or worldwide?
It's deployed in five different countries.
So it has Chinese version, has Japanese version, English version.
It has five different languages.
Yeah, it's very interesting.
Do you have it?
I have it on my WeChat.
All right, so tell me about it.
Yeah, this is AI aging.
But the design goal of this social chatbot is different from, let's say, task-oriented bot.
Task-oriented mainly helps you accomplish a particular task.
For example, you can use it to book a movie ticket, reserve a table in the restaurant.
Get directions.
Yeah, get directions.
And the social chatbot is designed as an AI companion, which can eventually establish emotional connections with the user.
Wow.
So you can treat it as a friend, as your friend.
So an AI friend instead of an imaginary friend.
Yeah, it's an AI friend.
It can chat with you about all sorts of topics.
It can also help you accomplish a few tasks, if they're simple enough.
Right now, I want to dive a little deeper on the topic of neurosymbolic AI.
And this is proposing an approach to AI that borrows from mathematical theory
on how the human brain encodes and processes symbols. And we've talked about it a little bit,
but what are you hoping that you'll accomplish with neurosymbolic AI that we aren't accomplishing now? As I said, the key difference between this approach
versus the regular deep learning model
is the capability of reasoning.
The deep learning model is like a black box.
You cannot open it.
So you take input and get output.
This model can, on the fly,
identify the necessary components
and assemble them on the fly. That's
the key difference. In the older deep learning model, it's just one model, black box. Now
it's not a black box. It's actually exactly like what people are thinking. When you face
a problem, first of all, you divide it and conquer, right? You divide the complex problem
into smaller ones. Then for each smaller ones, you identify, you search in to conquer, right? You divide the complex problem into smaller ones. Then
for each smaller ones, you identify, you search in your memory, identify the solution. And
you assemble all these solutions together to solve a problem. This problem could be
unseen before. Could be a new problem. That's the power of the neurosympathetic approach.
So it sounds like, and I think this kind of goes back to the mission statement of your
group is that you are working with deep learning toward artificial general intelligence.
This is a very significant step toward that.
And it's about the knowledge reusability, right? By learning the capability of decomposing
the complex problem
into simpler ones,
you know how to solve
a new complex problem and reuse
the existing technologies.
This is the way we solve
the problem. I think
the neurosymbolic approach
tries to mimic the way people
solve problems.
People, as I said, it's like system one, system two. For the neurothymolonic approach try to mimic the way people solve problems. Right.
People, as I said,
it's like system one, system two.
For these sophisticated problems,
people's system
is like system two.
Right.
You need to analyze the problem
and then find the key steps.
And for each step,
I need to find the solution.
All right.
So our audience is very technical.
And I don't know if you could go in to a bit of a deeper dive on how you're doing this
computationally, mathematically, to construct these neuro-symbolic architectures.
Yeah, there are many different ways. The learning challenge is that we have a lot of data,
but we don't have the labels for the intermediate steps.
So the model needs to learn these intermediate steps automatically.
In some sense, these are hidden variables.
There are many different ways of learning this.
So there are different approaches. One approach is called reinforcement
learning. You try to assemble different ways to general answer and if it doesn't
give you answer, you twist back and try different combinations. So yeah, that's
one way of learning this. As long as the model has the capability of learning all sorts of combinations in a very
efficient way, we can solve this problem.
The idea is that, think about how people solve sophisticated problems.
When we are young, we learn to solve these simple problems.
Then we learn the skill. And we combine these basic skills to solve more sophisticated work.
We try to mimic the human learning pattern using the neurosymbolic models.
So in that case, you don't need to label a lot of data.
You label some.
Eventually, the model learns two things.
One is it learns to solve
all these basic tasks. And more importantly, the model is going to learn how to ensemble
these basic skills to solve more sophisticated tasks.
The idea of pre-training models is getting a lot of attention right now
and has been framed as AI in the big leagues or a new AI paradigm.
So talk about the work going on across the industry in pre-trained models
and what MSR is bringing to the game.
The goal of these pre-training models is to learn a universal representation of the natural language.
Then there are two strategies of learning this universal representation. One is to train the model on large amounts
of data. If you get all the data in the world, you can be pretty sure that the model trend
is universal. The other is multitasking learning. And the unified language model is using the multi-task learning in the training stage.
Okay.
We grouped the language model into three different categories.
Given the left and right, to predict the word in the middle.
That's one task.
The other task is giving input sentence, produce the output sentence.
Second.
The third task is giving a sequence.
You always want to predict the next word based on the history.
So these are three very different tasks,
cover a lot of natural language processing scenarios.
And we use multitask learning for this unified language model. Given the training data,
we use three different objective functions
to learn jointly the model parameters.
The main advantage of the Unified Language Model
is that it can be applied
to both natural language understanding tasks
and natural language generation task.
AI is arguably the most powerful technology to emerge in the last century and is becoming ubiquitous in this century.
Given the nature of the work you do
and the potential to cause big disruptions,
both in technology and in the culture or society,
is there anything that keeps you up at night?
And if so, how are you working to anticipate
and mitigate the negative consequences
that might result from any of the work you're putting out?
Yeah, there are a lot of open questions,
especially at Microsoft, we are building AI products for millions of users.
All users are very different.
Take Microsoft's Shardless, the chatbot system example.
In order to have a very engaging conversation, sometimes
the shy system will
tell you some joke.
You may find the joke very interesting,
funny.
But other people may find the joke
offensive.
So it's about culture.
It's very difficult
to find the trade-off.
You want the conversation interesting enough so that you engage with the people, but you
also don't want to offend people.
So there are a lot of guidance about who is in control.
For example, if you want to switch a topic, do you allow your agent to switch a topic
or agent always follow the topic of the user.
And generally, people agree that
for all the human machine systems, human
needs to be in control all the time.
But in reality, there are a lot of exceptions for what happens
if the agent notices that the user is going to hurt herself.
For example, in one situation that we found, the user talked to the SHRs for seven hours.
And it's already 2 a.m. in the morning.
The SHRs would force the user to take a break. We have a lot of sort of rules embedded into the system
to make sure that
we build a system for good.
People are not going to misuse
the AI technology
for something that's not good.
So are those, like you say,
you're actually building
those kinds of things in,
like go to bed,
it's past your bedtime?
Something like that. Yeah, I just remind you.
Right. So let's drill in a little on this topic, just because I think one of the things that we
think of when we think of dystopic manifestations of a technology that could convince us that it's
human, where does the psychological...
I think the entire research community is working together to set up some rules, to set up the
right expectations for our users. For example, one rule I think, I believe it's true, is
that you should never confuse users. She's talking to a bot or real human. You should never confuse users. She's talking to a bot or real human.
You should never confuse users.
Forget about Xiao Ice for now and just talk about the other stuff you're working on.
Are there any sort of big issues in your mind that don't have to do with, you know, users being too long with a chatbot or whatever,
but kinds of unintended consequences that might occur from any of the other work?
Well, for example, let's back to the deep learning model, right?
Deep learning model is very powerful of prediction things.
People use deep learning model for recommendation all the time.
But there's a very serious limitation of
these models. Is that the model can learn correlation but not the causation. For
example, if I want to hire software developer, then I got a lot of candidates.
I asked the system to give me a recommendation. The deep learning model
gave me a recommendation. You know, this guy is good. And I asked the system to give me a recommendation. The differentiating model give me a recommendation.
You know, this guy is good.
And then I ask the system, why?
Because the candidate is a male.
Then people will say, your system is wrong, it's biased.
But actually, the system is not wrong.
The way we use the system is wrong.
Because the system learns the strong correlation between
the gender and the job title, but there's no causality. The system does not have the
causality at all. A famous example is that there's a strong correlation between the rooster's
crow and the sunrise,
but it does not cause the sunrise at all.
But these are the problems of these different models.
People need to be aware of the limitations of the models so that they do not misuse them.
So one step further,
are there ways that you can move towards causality?
Yeah, there are a lot of ongoing works.
There's a recent book called The Book of Why.
The Book of Why.
Yeah, The Book of Why by Jody Pierce.
There are a lot of new models he's developing.
One of the popular models is called the Bayesian network. Of course,
the Bayesian network can be used in many applications, but he believes this at least
is a promising tool to implement the causal models. I'm getting a reading list from this podcast.
It's awesome. Well, we've talked about your professional path, Zhang Feng.
Tell us a little bit about your personal history.
Where did you grow up?
Where did you get interested in computer science?
And how did you end up in AI research?
I was born in Shanghai.
I grew up in Shanghai.
And I studied design back to college.
So I was not a computer science student at all.
I learned to program only because I wanted to date a girl at that time.
So I needed money.
You learned to code so you could date a girl.
So you could get that part.
I love it.
Then when I was graduating in year 1999,
Microsoft Research found a lab in China.
I sent them a resume and got a chance to interview.
And they accepted my application.
That's it.
Now, after that, I started to work on AI.
Before that, I knew a little about AI.
Okay, back up a little. What was your degree in design?
I got an undergraduate design, bachelor's degree in design. Then I got electronic,
I got double E.
Electronic engineering. Yeah, then computer science a little bit later
because I got interested in computer science after.
Finally, I got a computer science degree.
A PhD?
PhD, yeah.
Did you do that in Shanghai or Beijing?
Shanghai.
Shanghai.
So 1999, you came to Microsoft Research.
Yeah, in China.
Okay, and then you came over here or?
Then 2005, I moved to Raman and joined a product group at that time.
My mission at that time was to build the first natural user interface for Microsoft Windows Vista.
And we couldn't make it.
And after one year,
I joined the Microsoft Research here.
I think there are a lot more
fundamental work to do
before we can build a real system
for users.
Let's go upstream a little.
Yeah.
Okay.
Then I worked for eight years
in Microsoft Research
in an NLP group.
And now you're partner research manager
for the Deep Learning group.
Yeah, yeah.
What's one interesting thing
that people don't know about you?
Maybe it's a personal trait or a hobby or a side quest
that may have influenced your career as a researcher.
I remember when I interviewed Microsoft Research,
during the interview,
I filled almost all the questions.
And finally, I said,
okay, it's hopeless.
I went home,
and the next day I got a phone call
saying, you're hired.
In retrospect,
I think I did not give the correct answer.
I asked the right questions during the interview.
I think it's very important for
researchers to learn how to ask the right questions.
That's funny. How do you get a wrong answer in an interview?
Because I was asked all the questions about speech and natural language. I had
no idea at all. Remember that time he asked me to figure out an algorithm called VTuby.
I never heard of that.
Then I actually asked a lot of questions.
And he answered part of it.
Then later on he said, I cannot answer more questions because I answer this question,
you will get the answer.
That shows that I asked the right questions.
Let's close with some thoughts on the potential ahead.
And here's your chance to talk to would-be researchers out there
who will take the AI baton and run with it for the next couple decades.
What advice or direction would you give to your future colleagues
or even your future successors?
I think first of all,
you need to be passionate about research.
It's critical to identify the problem
you really want to devote your lifetime to work on.
That's number one.
Number two,
after you identify this problem
you want to work on,
stay focused.
Number three,
keep your eyes open.
That's my advice.
Is that how you did yours?
I think so.
John Fangao, thank you for joining us today.
Thanks for having me.
To learn more about Dr. John Fangao
and how researchers are going deeper on deep learning,
visit Microsoft.com slash research.