Microsoft Research Podcast - 101 - Going meta: learning algorithms and the self-supervised machine with Dr. Philip Bachman
Episode Date: December 4, 2019Deep learning methodologies like supervised learning have been very successful in training machines to make predictions about the world. But because they’re so dependent upon large amounts of human-...annotated data, they’ve been difficult to scale. Dr. Phil Bachman, a researcher at MSR Montreal, would like to change that, and he’s working to train machines to collect, sort and label their own data, so people don’t have to. Today, Dr. Bachman gives us an overview of the machine learning landscape and tells us why it’s been so difficult to sort through noise and get to useful information. He also talks about his ongoing work on Deep InfoMax, a novel approach to self-supervised learning, and reveals what a conversation about ML classification problems has to do with Harrison Ford’s face. https://www.microsoft.com/research Â
Transcript
Discussion (0)
Training a machine to look at a large amount of unannotated data and point to specific examples and say,
well, I think if a human comes in and tells me exactly what that thing is, I'll learn a lot about the problem that I'm trying to solve.
So this general notion of carefully selecting which of those examples you want to spend the money or spend the time to get a human to go in and provide the annotations for those examples. That's this idea of active learning.
You're listening to the Microsoft Research Podcast, a show that brings you closer to
the cutting edge of technology research and the scientists behind it. I'm your host, Gretchen Huizinga.
Deep learning methodologies like supervised learning have been very successful in training
machines to make predictions about the world.
But because they're so dependent on large amounts of human-annotated data, they've been
difficult to scale.
Dr. Phil Bachmann, a researcher at MSR Montreal, would like to change that, and he's working
to train machines to collect, sort, and label their own data so people don't have to.
Today, Dr. Bachman gives us an overview of the machine learning landscape
and tells us why it's been so difficult to sort through noise and get to useful information.
He also talks about his ongoing work on Deep InfoMax, a novel approach to self-supervised learning,
and reveals what a conversation about ML classification problems
has to do with Harrison Ford's face. That and much more on this episode of the Microsoft Research
Podcast. Phil Bachman, welcome to the podcast. Hi, thanks for having me.
So as a researcher at MSR Montreal, you've got a lot going on.
Let's start macro and then get micro.
And we'll start with a little phrase that I like in your bio that says you want to understand the ways in which actionable information can be distilled from raw data.
Unpack it for us. What big problem or
problems are you working on? What gets you up in the morning? So I'd say the key here is to sort
of understand the distinction between information in general and let's say information that might
be useful. So for example, if the images are coming from the camera that you're using to pilot a
self-driving car,
then low-level sensor noise probably doesn't provide you useful information for deciding whether to stop the car or whether to turn or make other sorts of decisions that are useful
for driving. So what I'm interested in, sort of this phrase actionable information here,
it's referring specifically to trying to focus on getting our
models to capture the information content in the data that we're looking at that is actually going
to be useful in the future for making some sorts of decisions. So if we're training a model that's
processing the video data that's being used to drive this car, then perhaps we don't want to
waste the effort of the model on trying
to represent this low-level information about small variations in pixel intensity.
And we'd rather have the model focus its capacity for representing information on the information
that corresponds to sort of higher-level structure in the image, so things like the presence
or absence of a pedestrian or another car in front of it.
So that's kind of what I mean with this phrase actionable information.
So this distillation from raw data is on doing learning from data that hasn't been manually curated or that doesn't have a lot of information injected into it by a human who's doing the
data collection process.
So going back to the self-driving car example, I'd like to have a system where we could allow the computer just to watch thousands of hours
of video that's captured from a bunch of cars driving around. Then what I want to be able to
do is have a system that's just watching all of that video and doesn't require that much input
from a person who's pointing to the video and saying specifically what's going to be interesting
or useful in the future. So this information that's going to be useful for
performing the types of tasks that we want our model to do eventually. Before we get specific,
give us a short historical tour of the deep learning methodologies as a level set,
and then tell us why we need a methodology for learning representations from unlabeled data.
Okay, so in the context of machine learning, people often break it down into three categories.
So there will be supervised learning, unsupervised learning, and reinforcement learning.
And it's not always clear what the distinction between the methods are,
but supervised learning is sort of what's had the most immediate success and what's driving a lot of
the deep learning power technologies that are being used for doing things like speech recognition in phones or doing
automated question answering for chatbots and stuff like that. So supervised learning refers to
kind of a subset of the techniques that people apply when they have access to a large amount of
data and they have a specific type of action that they want a model to perform when it processes
that data. And what they do is they get a model to perform when it processes that data.
And what they do is they get a person to go and label all the data and say, okay, well, this is the input to the model at this point in time.
And given this input, this is what the model should output.
So you're putting a lot of constraints on what the model is doing and constructing those constraints manually by having a person looking at a set of a million images.
And for each image, they say, oh, this is a cat. this is a dog, this is a person, this is a car. So after having done that for thousands of hours, you now have a large data set where you have a bunch of different images
and each of those images has an associated tag. And so now the kind of techniques that we work
with and the optimization methods that we use for training our models, are very effective at fitting really large, powerful models
to large amounts of this sort of annotated data.
So that's kind of the traditional supervised learning.
But the major downside there is that the process of providing all of those annotations
can be very expensive.
So that process of supervised learning has a lot of issues with scalability. What we'd like to do ideally is make use of a lot of that and figure
out what kinds of information is actionable. So finding the information that seems like it will
be useful for making decisions. So that's getting into a contrast between supervised learning and
unsupervised learning. Then there's also reinforcement learning, which is a slightly
different set of techniques where you actually allow a model to go out and kind of perform
experiments or try to do things. And then somehow it receives feedback about the things that it's
doing that says, oh, what you just did, that was a good thing or that was a bad thing. And then it
learns by kind of a process of trial and error. So that's a general idea of reinforcement learning.
Okay. We mentioned two flavors of this, unsupervised and then self-supervised. Is that another differentiation there? So the self-supervised learning, it's not a completely
different thing, but it's a sort of subset of those types of techniques. So the general idea
behind self-supervised learning is that we try
to design procedures that will generate little supervised learning problems for a model to solve,
where the process of generating those little supervised learning problems is kind of automatic.
And the hope here is that the kind of procedurally generated supervised learning problems that our
little algorithm is generating based on the unlabeled data will force the model to capture some useful information about the structure of that data
that will allow it to answer more sort of human-oriented questions easier in the future.
So just to clarify this concept of procedurally generating supervised learning problems,
one really simple example would be that you could try to train a model to have some understanding of the statistical
structure of visual data by showing a model a bunch of images. But what you do is you take
each image and you split it into a left half and a right half. So now what you do is you take your
model and all the model is allowed to see is the left half of the image. And then you have another model that sort of tries to form a
representation of the right half of the image. And so the model that looked at the left half
of the image, you present it with representations of the right halves of like, let's say 10,000
images, one of which is the image that it looked at. So it's kind of got like a partner that it's looking for
in this big bag of encoded right halves of images.
And the job of the encoder that's processing the left half of the image
is to be able to look in that bag and pick out the right half
that actually corresponds to the image that it originally came from.
So in this case, we're taking something that looks like unsupervised learning,
but instead here what we're doing is treating it more like a supervised learning problem.
So the model that looks at the left half of the image,
its task is to solve something that looks like just a simple classification problem,
and then making this like a 1,000-way classification problem.
The other thing that comes to my mind is there's this weird thing on the internet where,
like Harrison Ford, you see half of his face and the other half of his face, and they're
completely different. Like, if you put each halves together, they wouldn't look like Harrison Ford,
but together with the different halves, they look like him. So, that would really trick the machine,
I would think. Actually, I wouldn't be so confident about that.
Really? Yeah. The question that you're sort of training the machine to answer is,
which of these possible things do you think is most likely associated with the thing that you're
currently looking at? So, unless there was somebody else's right face half that looked
significantly more Harrison Ford-ish than his own right face half,
then the model actually could do pretty reasonably, I'd expect.
That's hilarious.
So unless you had somebody where it was like this really strict dichotomous separation of
the halves of their face, like Two-Face from Batman or something.
Right. That's another one.
In which case maybe the model would fail. But if it's within the standard realm of human
variability, I think it would be okay.
Well, that's good. So let's move ahead to the algorithms that we're talking about here. And
you call them learning algorithms. And you've described your goal for learning algorithms in
some intriguing ways. You want to train machines to go out and fetch data for themselves and
actively find out about the world. And you want to get the machine
to ask itself interesting questions. So it begins to build up its own knowledge base.
Tell us about these learning algorithms for active learning and what it takes
to turn a machine into an information seeking missile.
Yeah. So this kind of overall objective there that you've described
is targeted at kind of expanding the scope
of which parts of the problems that we're currently trying to solve are solved by the
machine rather than by a person who is acting as a shepherd for the machine or as a teacher
or something along those lines. So right now, the machine learning component of most systems is
a very important part of the system, but there's a whole lot of human effort that surrounds the production and use of something
like a practical image classifier or a practical machine translation system. So that's one part of
the effort that's required for getting an automated system out there in the world. So part of the
process is just the initial decision, like the thing that we want to do is machine translation. Here's a way of formalizing that problem and specifying it such that we can
go out and now perform another part of the process. So this other part of the process
is a data collection. So you'd have to go out and you'd have to explicitly collect a lot of data
that is relevant to the task that you're trying to solve. And then you have to take that data and you maybe have to have somebody curate it
to make that data more directly useful
or more immediately useful for the kinds of algorithms
that we tend to use right now.
So a lot of the work that I want to do
is about trying to reduce the amount of human effort
that's required on those two fronts
and trying to get as much of those two parts of the problem
automated and built into the models that we're training
so that we don't have to go out and manually annotate all the data.
Talk to me about the technical end of that. Our listeners are pretty sophisticated
and you're talking about algorithms that are training a machine to do something for itself.
Go a little deeper there. Okay, yeah. I'll kind of jump into the learning algorithms for active learning part, which I guess
I actually completely skipped over as I was answering the question before. So, training a
machine to go out and collect its own data and point to specific examples and say, well, I think
if a human comes in and tells me exactly what that thing is, I'll learn a lot about the problem that I'm trying to solve.
So this general notion of carefully selecting which of those examples you want to spend the money
or spend the time to get a human to go in and provide the annotations for those examples,
that's this idea of active learning.
So rather than just assuming that you have a huge batch of data and all of the data is labeled, a lot of practical problems are structured more like you have a lot
of unlabeled data and you have to decide how to collect data and apply labels to it so that you
can then train a model. So to do this efficiently is you take some of the data, you train a model,
and then you look at what the model is doing and you try to figure out where it's weak and where
it's strong. And based on where it's weak and where it's strong, you use that to try and decide how to go out and pick other examples specifically so that you can minimize the amount of data that you have to collect and provide annotations to such that you end up with a model that makes good predictions at the end.
So that's just active learning, a lot of them revolve around assumptions about what kind of classifier
or what kind of decision function you're going to train on that data that you're collecting the
labels for. So there might be assumptions that all of the data already has some sort of fixed
representation, and then you're going to feed that representation into a linear classifier,
for example. And if you make that kind of assumption, then there might be very good
heuristics for going out and deciding which particular sets of features you want to apply labels to.
So you can minimize the uncertainty and minimize the number of errors that's made by this linear classifier.
But for working with more complicated data or working in scenarios where you also want to learn a powerful representation of the data at the same time that you're collecting the data and applying labels.
You might want to sort of transform this process where you decide on what the model is going to be.
And then you sit down for weeks or years and come up with a very clever heuristic for how to collect data efficiently to make that model succeed when it has a small amount of labeled data. And you'd like to replace some of those more effort-intensive parts of the
process with a machine that can kind of train itself to learn what kinds of data
it's going to need at the same time that you're also training the model that's
making the prediction. Let's spend some time talking about your current research, and there's a lot of flavors to it.
Let's start with what you're calling Deep InfoMax, or DIM. But I want to point out, too,
that in addition to Deep InfoMax, you have augmented multiscale Deep InfoMax or Amdim,
spatio-temporal Deep InfoMax, Deep Graph InfoMax. There's a lot of sort of offshoots, I guess you
might call it. So I'm going to go sort of free range here because you'll be able to give us a
better guided tour of the main idea and all the offshoots better than I will. Tell us about the Deep InfoMax research family
and what you're up to. Okay. So the kind of higher level idea that ties these things together
is the idea that we want to learn to represent the data that we're looking at. So sometimes that data
might be text, sometimes it might be images, or in the case, for example, of the DeepGraph InfoMax, it might be a graph.
So the overall higher level idea of DeepInfoMax is that we want to form representations that
act a little bit like an associative memory.
Kind of going back to what I was saying about the thing with the split faces before, we
can think of the left half of a face and the right half of
a face sort of as random variables. So you can think of just sampling the left half of a face
and there might be slightly different versions of the right half of that face that are all sort of
valid. So looking at the left half, I guess as you're getting at with the Harrison Ford thing,
the right half isn't always perfectly determined.
But you can think of the distribution
of all possible right half faces,
and the variability there is much broader
than the variability that you have
if you're just looking at
what is the right half of Harrison Ford's face
given that we're looking at the left half.
So the mutual information between our representation of the left half of the face and the right half of the face is high, when our ability
to predict what the right half of the face looks like is very good relative to how well we could
predict what the right half of the face looks like in the case where we didn't get to see the left
half. If we were just looking at a bunch of different images that had the same shape as the images of the right halves of a face,
these images have a lot of variability in their structure.
Like some of them, it might be the back half or the front half of a car
or something like that, and it looks very different from faces.
So in principle, we can sort of make a reasonable prediction,
for example, of whether or not the image that we're looking at right now
encodes the right half of a face, but there's still some uncertainty there.
And then when we add in the left half of Harrison Ford's face, and we're trying to say, okay,
well, out of the distribution of things that look like the right halves of faces, which
ones correspond to Harrison Ford, the more precisely we can make that guess, the higher
the mutual information is between our representation of the left half and the right half of the
face.
Well, let me ask you to go a little deeper on the technical side of this.
You sent me a slide that has a lot of algorithmic explanation of deep InfoMax and then how you kind of take that further with augmented multi-scale deep InfoMax?
So the actual mutual information aspect sort of formally, the way it shows up here is that we sample this kind of true pair of corresponding image and audio sample.
And then we have a distribution from which we can sample just another random audio sample.
And we can sample maybe say a thousand of just another random audio sample. And we can sample maybe, say,
1,000 of those other random audio samples,
and we can encode them with our audio encoder.
And then we can sort of present a little classification problem
to the model that looked at the image,
where that classification problem is telling the model
that looked at the image to identify which among,
let's say, 1,001 audio recordings
is the audio recording that comes
from that same point in time. So the mutual information here, what we're doing kind of
more technically is we're constructing a lower bound on the mutual information between the
random variables corresponding to the representation of the image and the representation of the audio
modality. So we first draw a sample from the joint distribution
of the representations of those two modalities.
And then we also have to sample a lot of samples
from what's called the marginal distribution
of that second random variable,
which is the representations of the audio modality.
So we draw, say, a thousand samples
from that marginal distribution.
And then we construct this little classification problem
where the model is trying to identify
which of the audio samples
was the sample from the true joint distribution
over audio and visual data,
and which of the samples just came from
random samples from the marginal distribution.
So this is a technique called noise contrastive estimation
that's been developed and applied
in a lot of different scenarios.
So a good example of this is techniques that have been used for training word vectors.
But in the case where we're using it, it's a technique that can be used for constructing kind of a formally correct lower bound on the mutual information between these two random variables.
One of which corresponds to samples of visual data and one of which corresponds to samples of audio data.
And the joint distribution over those two kind of random variables is constructed by
just going around the world with a camera and a microphone and just taking little snippets
of visual and audio information from different points in time and in different scenes.
All right.
Well, as you describe Deep Info Max and then you have augmented multi-scale Deep InfoMax, you call that improving Deep InfoMax based on some limitations in the prior. How would you differentiate how the augmented multi-scale Deep InfoMax is better than the original idea, depending on specifically how you implement it, has some significant downsides in some sense.
The original DeepInfoMax was just looking at a single version of a single image.
And in this case, there's sort of an issue where if you're just looking at this single image
and let's say encoding all of the little patches in the image,
the way that the original DeepInfo Max presentation kind of goes is that you
take that image, you encode each of the patches, and you also encode the whole image. And so here
we're going to sort of train the representation of the whole image such that it can look at all
of these patches and say that, oh, those are patches that came from my image. So this is a
little bit like that idea of associative memory, but it's applied on sort of a single input.
So kind of procedurally how you would do this
is that you would take an image, you would encode it,
you get representations of all the little patches,
and you get a representation of the whole image.
And now you're gonna construct
a little classification problem
where you take a thousand other images
and you also encode their patches.
And you sort of mix them into a bucket
with all the patches from the original image
that you computed a full image encoding for.
And the job of the full image encoding
is to look in that bucket
and essentially pick out all the patches
that are part of its image.
So one of the difficulties here,
like one of the shortcomings
of that particular way of formulating it,
if you take that more restrictive interpretation,
the main downside is that the encoder
that's processing the full image can downside is that the encoder that's
processing the full image can basically just memorize the content that's there. And it's fairly
easy for the model to just copy that information into the representation of the whole image.
And essentially, it's just a memory that stores the representations of all the little patches.
There might be some areas in which this is useful, but for some types of predictive tasks, it might not be so useful because you're not really asking the representation of the whole image to answer sort of interesting predictive problems about what kinds of other things might you see that weren't explicitly in the image that you're looking at now. So if you're looking at left half of faces and right half of faces, if instead of
looking at left half of face and right half of face, all you did was you showed your encoder
this left half of the face, you encoded it to a small vector, and then you showed it the same
left half again. And you said, is this the one that you looked at before? The model might not
actually have to learn that much to be able to solve that task really well. But if you take it
and you change it to a task where the kinds of predictions that you're forcing that representation to make are a little bit more interesting, you can ask a more
interesting question, which is like, did this eye come from the right half of the face whose left
half you looked at? So here now the model is answering kind of a more challenging question.
This is one of the main changes that we make when we go from the original formulation of the deep
InfoMax to this augmented deep InfoMax. So this is the augmented part, not the multi-scale part. That's another
thing that we're looking at multiple scales of representation. But if we just look at the
augmented part, kind of the big improvement there is that we're forcing the model to answer questions
or form an associative memory where the associations that we're forcing it to make
are more challenging to make so that the model has to put a little more effort into how it's going to represent the data.
I like to explore consequences, Phil, both intended and otherwise, that new technologies inevitably have on society.
And this is the part of the podcast where I always ask, what can possibly go wrong?
So you're working in a lab that has a stated aim of teaching machines to read, think, and communicate like humans.
Is there anything about that that keeps you up at night?
And if so, what is it?
And more importantly, what are you doing to address it?
So we do have a group here that's working on what we call
fairness, accountability, transparency, and ethics.
So it's the FATE group.
So they're working on a lot of questions that are, let's say,
immediately relevant as opposed to questions that are kind of long-term relevant or irrelevant, depending on your perspective.
So there's this idea of existential risk, which is more of a long-term question.
So this is the kind of question like, well, if we develop a superhuman AI, is it going to care about us and take care of us or is it going to consume us in its quest for more resources?
So we'll set that aside and so like the more immediately salient one is the kinds of things that the fake group is looking at and so these are things like well if we're training a system
that's going to sit at a bank and analyze people's credit history are there historical trends in the
data that might be due to systemic discrimination
or systemic disadvantaging of particular groups of people that are going to be reflected in the
data that we use to train our system such that then when the system goes to make decisions,
it's kind of implicitly or accidentally discriminating against these groups just due
to the fact that they were also historically discriminated against and that's reflected in the data that we're using to train the system. So me personally,
a great thing that I could do would be create something that's like the internal combustion
engine of machine learning or even like the steam engine. Those things have had an incredible effect
on society and that's been very empowering and it's helped with a lot of progress but it also makes it easier for people to do bad things at scale so i'm kind of more
worried about that type of problem and i think that that type of problem isn't necessarily a
technological problem it's a little bit more of a system or social problem because i think the
technology is going to happen and so kind of the things that
worry me there are along the lines of like seeing the technology and the way in which it increases
people's leverage over the world and ability to affect it kind of at scale. I guess for me on a
day-to-day basis, like I don't think about it too much as I'm doing research because to me, again,
it's not really so much of a technical problem. I think it would be very hard to design the technology so that it can't do
bad things. Well, listen, I happen to know you didn't start out in Montreal. So tell us a little
bit about yourself. What got a young Phil Bachman interested in computer science and how did he land
at Microsoft Research in Montreal? I kind of always grew up with a computer in the home.
I was fortunate in that sense
that I was always around computers
and I could use them for playing games
and I could do a little bit of programming.
And I'm not old,
but I'm not in the youngest demographic
that you would see working in tech.
And one of the things that I really liked
when I was in high school,
I started playing a lot of these first-person games where you kind of run around and you shoot things.
For better or worse, it was fun.
So one of the things that was a challenge at first for me was I didn't have great internet.
So what I would do is go to the school library and look around.
And it turned out that you could download some bots that people had made.
So you could sort of fake the multiplayer kind of experience.
So I thought that was really cool and one of the things i had you know started thinking about there was okay well
you know what is it that these bots are actually doing so i've been doing a little bit of coding
and like making some little simple games so thinking about that like how would we automate
this little thing that kind of is fairly simple at its core, but that when we let it loose in this
environment, so like when we let it run around and compete with the other players, it does something
interesting and fun. And so that was sort of always at the back of my mind a bit, I guess.
And I bounced around a little bit academically and starting doing research in a slightly different
field. But then eventually I kind of sat around and watched a bunch of online
lectures. And there were a couple areas of machine learning, like reinforcement learning, for example,
that really started to click with me and that I was excited about because it was getting back to
those kinds of questions I'd asked myself about before, like, how do we get this little bot to
do interesting things? So that brought me from Texas because I was in grad school in Texas after having done my undergraduate studies in New York. But then I found this group that was in Montreal doing reinforcement learning. So I came and I worked with that group and of the jobs that were available elsewhere and an exciting opportunity popped up here so there was a startup called Maluba that was based out of Waterloo
and it was developing kind of technology and software for doing virtual personal assistance
and the company wanted to sort of start getting more aggressive about pushing their technology
forward so they came to Montreal because there was a lot of machine learning cool stuff happening in Montreal and then opened a research lab. And basically as those
lab doors were opening, I walked in and joined the company. And then about a year later,
we were actually acquired by Microsoft. So that's how I ended up in MSR.
Well, at the risk of heading into uncomfortable icebreaker question territory, Phil, tell us one interesting thing about yourself that people might not know and how has it influenced your career as a researcher?
And even if it didn't, tell us something interesting about yourself anyway.
Personally, I'd say one thing that I've always enjoyed is being fairly involved in at least one type of, let's say, goal-oriented physical activity. That's a
super weird sounding description. But for example, as an undergrad, I did a lot of rock climbing.
So having that as a thing where I could just really be focused and apply myself to solving
problems in some sense. A lot of climbing is about kind of planning out what you're going to do. And
it's a little bit like solving a puzzle sometimes. And having that as a thing that's sort of separate
from the work I do, but that still is kind of mentally and also physically active and being
able to kind of apply myself to that strongly. I don't do the rock climbing specifically anymore,
but what I do now is I play a lot of soccer. So I really enjoy the combination of the physical
aspect as well as the mental aspect of the game.
So there's a lot of extemporaneous kind of inventive thinking, and it can be very satisfying
when you kind of do something that's exactly right at exactly the right time, especially when you
realize later that you didn't really even think about it. It just sort of happened. I guess that
might be related to some of the better moments as a researcher that you have when you're trying to solve a problem
and you're just kind of messing around and then something just sort of clicks
and you just kind of see how you should do it.
At the end of every podcast, I give my guests the proverbial last word.
So tell our listeners from your perspective, what are the big challenges out there right now
in research directions that might address them when we're talking about machine learning research
and what's hype and what's hope and what's the future? I guess one that I would say is filtering
through all the different things that people are writing and saying and trying to figure out which
parts of what they're saying seem new, but they're really just kind of a rewording of some concept that
you're familiar with. And you just kind of have to rephrase it a little bit and then see how it
fits into your existing internal framework and being able to use that ability to figure out
what's new and what's different and figure out how it differs from what people were trying before. And that allows you to
be kind of more precise in your guesses about what is actually important. But a lot of that
sort of washes out in the end and it doesn't really survive that long. Sort of at the beginning
as a researcher, you have to rely on other people because you don't really know where you're going
yet. But over time, taking those training wheels off a little bit
and developing your own personal internal framework
for how you think about problems
so that when you get new information,
you can kind of quickly contextualize it
and figure out which are the new bits
that are actually going to change the way that you look at things
and which bits are sort of just a different version
of something that you already have.
Phil Bachman, thanks for joining us from Montreal today.
Yeah, thanks for having me.
To learn more about Dr. Phil Bachman and the latest research in machine learning,
visit Microsoft.com slash research.