Microsoft Research Podcast - 080 - All Data AI with Dr. Andrew Fitzgibbon
Episode Date: June 12, 2019You may not know who Dr. Andrew Fitzgibbon is, but if you’ve watched a TV show or movie in the last two decades, you’ve probably seen some of his work. An expert in 3D computer vision and graphics..., and head of the new All Data AI group at Microsoft Research Cambridge, Dr. Fitzgibbon was instrumental in the development of Boujou, an Emmy Award-winning 3D camera tracker that lets filmmakers place virtual props, like the floating candles in Hogwarts School for Witchcraft and Wizardry, into live-action footage. But that was just his warm-up act. On today’s podcast, Dr. Fitzgibbon tells us what he’s been working on since the Emmys in 2002, including body- and hand-tracking for powerhouse Microsoft technologies like Kinect for Xbox 360 and HoloLens, explains how research on dolphins helped build mathematical models for the human hand, and reminds us, once again, that the “secret sauce” to most innovation is often just good, old-fashioned hard work.
Transcript
Discussion (0)
I do believe that there will be a future where we find it weird that we used to
carry a flat screen in our pocket and pull out this flat screen to look at it
in order to do our digital work. I think that sometime soon when I refit my
office instead of setting up a bunch of LCD panels I'll just put a large black
curved piece of plywood in front of me,
and I'll wear a hololens and all my documents will appear in the real world in front of me.
I absolutely believe that the screen in the pocket or the screen attached to the desk
is going to be as weird as the phone attached to the building.
You're listening to the Microsoft Research Podcast, a show that brings you closer to
the cutting edge of technology research and the scientists behind it. I'm your host, Gretchen
Huizinga. You may not know who Dr. Andrew Fitzgibbon is, but if you've watched a TV show or movie in
the last two decades, you've probably seen some of his work.
An expert in 3D computer vision and graphics, and head of the new All Data AI group at Microsoft Research Cambridge,
Dr. Fitzgibbon was instrumental in the development of Boozhoo, an Emmy Award-winning 3D camera tracker
that lets filmmakers place virtual props, like the floating candles in Hogwarts School for Witchcraft and Wizardry,
into live-action
footage. But that was just his warm-up act. On today's podcast, Dr. Fitzgibbon tells us what
he's been working on since the Emmys in 2002, including body and hand tracking for powerhouse
Microsoft technologies like Kinect for Xbox 360 and HoloLens, explains how research on dolphins
helped build mathematical models for the human hand,
and reminds us once again that the secret sauce to most innovation
is often just good old-fashioned hard work.
That and much more on this episode of the Microsoft Research Podcast.
Andrew Fitzgibbon, welcome to the podcast.
Hi, Gretchen. Great to be here.
So I usually start my podcasts by way of introduction, but I want to go a little off script with you because you're funny.
You've said that you situate your research at the intersection of computer vision and computer graphics with excursions into neuroscience. And at a more subatomic level, you characterize your work as, at its core,
extracting information about the world from photons. So give us a little more context about these excursions and extractions. How in general would you describe what you do, Andrew? What gets you up in the morning? So what I want to do is make computers help us change the world. And we can change the world
in a bunch of ways that are useful to humans. I like to think that I work on the technologies
that underpin the ways we can make computers make the world better for us. A lot of what we do in what we now call AI,
AI, the combination of machine learning, computer vision, natural language processing,
it's all about taking information from the world,
and that information comes through sensors somehow,
whether the sensors are the pixels in a camera,
or the sensors in a microphone,
or even the sensors in your
keyboard when you type a tweet. These are all sources of information from the real world that
we would like to do something good with. And for me, something good might be improving the computer
graphics in Harry Potter. And I consider that good because it makes lots of humans happy.
And in some sense, everything we do in life is about making humans happy.
Well, tell me a little more about the excursions into neuroscience when you're talking about
computer vision and computer graphics.
It might not seem a natural leap, but I think it kind of is.
Can you unpack that a little?
Yes, I started collaborating when I was in Oxford with some of the neuroscientists who
worked there. And we were interested in a fairly simple question, which is, roughly, can humans
point at things? Now, you think, obviously, humans can point at things. But we don't have a good
theory for how the brain integrates 3D information. So I was interested in purely providing the neuroscientists with a
sort of a mathematical backup. I was saying, no one thinks the brain does it by multiplying
matrices and vectors. But if it did, you would see these sort of error patterns. And then we
went to the real world and looked at people wearing virtual reality headsets and the kind
of pointing mistakes they make. And those are different from the patterns that a computer would make using today's technology.
So we still don't know what the brain does, but we have more evidence that it's not the same as what the computer does.
All right, so let's talk a little more about extracting information about the world from photons.
That's a very granular way of describing the computer vision work that you do. Could you
explain that technically a little better? So, regular listeners to your podcast will know that
computer vision is one of those problems that's sort of easy to state, you know, does this picture
contain a cat or not, but has turned out to be incredibly hard. And even in this age of deep learning
successes with computer vision, we still know that it's incredibly hard. So my abstraction of
gather information from photons is just to kind of stand back and say, why are we so excited about
computer vision? Why are we so excited about these capabilities? And one of the things that computer vision does is
it allows us to acquire information from far away. And of course, one can hear information from far
away, but it's a capability that allows us to do things like recognize people far away or to drive
a car. And I want to think about it at that level of abstraction because I want to always understand that my end goal is to do something real. If we think deep neural networks. I absolutely love trying to
understand deep neural networks, and I do so at a theoretical level, but also I want to know what
the practical consequence of that understanding will be. Well, now that we've situated your
research, let's situate you. Until this week, I would have introduced you as a partner scientist
in HoloLens, but you've recently been tapped to lead a group at MSR Cambridge called All Data AI,
or ADA, which turns out to be an acronym that references Ada Lovelace. Was that intentional
at all, and why, if it was? So, yes, it was intentional in the sense that Ada Lovelace is
sometimes considered the world's first computer programmer. Whether or not she programmed computers, she certainly was the first person to observe that the computational powers of the analytical engine or the difference engine could be applied to quantities that were not numbers. Now, of course, since the dawn of computers, we've represented quantities
in the computer, like strings of characters, like words. They're all numbers. So what's new
about saying that it operates on something beyond numbers? Well, the thing that's new is not so much
that the computer understands the sequence of numbers, but that it can understand the
interconnections between them. And that is represented fundamentally by a computer science concept called a graph.
And a graph is something that we can use to represent a wide variety of computer science concepts.
Okay. So why now with this group at MSR Cambridge?
Has there been a confluence of things that exist now that didn't before?
I think, of course, we have all observed the advances in AI due to the advances in machine
learning, due to the advances in deep learning. And I think now there's a golden opportunity to
apply that to a broader range of areas. It's also a fantastic
opportunity here and now for us to think about a new generation of AI programming. Until today,
AI programming has been the domain of high priests and priestesses who have PhDs in machine learning
and who understand linear algebra. And yet, we believe that actually a lot of AI programming could be a lot simpler.
So we're looking at how we might think about a third generation of AI programming.
The first generation would have been the raw work of Hinton and Lacan,
who just wrote the code.
You know, it was difficult code to write.
The second generation is this set of tools which go by the name of TensorFlow and PyTorch,
again, which lots of listeners will have used or heard of.
And these tools have been, again, fantastic for democratizing AI,
but they make it hard for somebody who's just a great computer programmer to understand.
In my view, they somewhat hide the beauty of the AI models that are underneath.
Neural network models are actually relatively simple. They're simple models with complex consequences. And the research world is increasing its understanding of these complex consequences.
But it's also nice to just see the code unadorned and see how simple these models are.
Well, let's go back a ways before we talk about more current research, because you have some what I would call greatest hits in your earlier work.
One is called Buju, a camera tracker that actually won an Emmy, and it's been used in computer graphics and live action footage.
And what I've heard is pretty much every movie made since it was released in 2002.
Tell us more about Buju. How and where did it come about?
What does it do and how does it work technically?
Buju was a great fun project.
I moved from the University of Edinburgh,
where I did my PhD,
to Oxford University in the mid-90s.
And I was working there with some amazing people
who were interested in the question
of how a robot
might navigate its way around some building or around some environment.
So we worked hard on the how does a robot navigate problem,
and we discovered that one of the things the robot has to do in order to know where it's been in three dimensions
is to build a three-dimensional model of the world.
And we worked hard on making a beautiful 3D model because we figured this would be useful maybe for those industries where even today it's common, for example, to sculpt a car out of clay before
building the computer model. Certainly in those days, if you were going to have an alien in the
movie, I think Pitch Black was one of the first aliens we looked at, they would make the alien out of clay and then try to scan it into
the computer. So we thought we had a great product. We were going to use our robot navigator,
spin the camera around the alien model, pull that into the computer, and then give people
a computer model of the 3D object. It turned out nobody in the movie industry was interested.
Dang it.
You had to, yeah, yeah.
You had to spray paint the model with a toothbrush to get texture on it.
It didn't work in little corners.
It was terrible.
And they could just do better by using, you know,
existing artists to create the models.
But somebody at one of the effects companies
kind of thought through
how we must be getting this model and said, how do you know where the cameras are? We said, you know,
well, it works it out. That's part of how we get the 3D model. We work out how the cameras are.
So it turns out something you really need in movies that's really hard to do is to figure
out where the cameras are. Now, what does that mean? That means maybe I have a camera mounted in a boat,
and the boat contains some hobbits and is sailing down a river towards, you know, some impressive
mountain. It would be just great if the impressive mountain had two huge statues pointing at the
hobbits. But it turns out that nobody built those 400-meter statues. So what we have to do is build some statues back in the studio,
and we built some 40-centimeter statues back in the studio.
And if only we knew what motion the camera had undergone
while it rocked in that boat moving down the river,
we'd be able to make a robot do the same motion,
and then we could superimpose the images.
So in those days, the only way to find out where the camera was,
was you could imagine
trying to use markers or trying to use GPS. None of these things work. So you would have to just
sort of manually position the camera for every single shot. It was incredibly expensive.
So somebody who had worked on that kind of footage realised that our algorithms were
producing this as a kind of unwanted side effect. We flipped, I guess nowadays we would say we pivoted.
There was a startup and Buju was launched.
And I'm super proud of it because it was one of the first products
which did kind of 3D computer vision.
There were products available then that would do character recognition,
that would do number plate recognition, but this really did 3D vision.
And we had to go beyond the academic state of the art to deliver it. And we learned about
delivering computer vision to the real world. Some of what we learned was you just have to
type in lots of code. People would ask me, you know, what's the secret sauce in Buju? And I'd
say, well, you know, have you read all the papers on 3D structured motion? They would say, yeah, I'm pretty sure of them.
I would say, yeah, that's the secret sauce.
We implemented all of them.
And, you know, it's an attitude that helps us today.
Sometimes in research, you're assessed by how beautiful and clean your algorithm is.
Sometimes in the real world, you have to implement all the dirty algorithms until you find the beautiful one.
But with Buju, it just worked and we had something that was actually useful.
You know, on that note, I was going to ask you a question further down in the podcast,
but I'm going to bump it up because it's sort of tied in. I imagine you said it about this.
You once said, if I had to nominate one key to success, it's a focus on everything. Is
that what you're talking about?
That's exactly what you're talking about?
That's exactly what I'm talking about. Yeah, exactly. There is no silver bullet. You know,
you really have to focus on building a real thing. And that just means something that other humans would be happy to use. So remember, in Buju, we didn't focus on the right thing,
but the real thing we wanted to build was this 3D modeler. And we knew we would
make a sort of a, what's called a Wizard of Oz demo. We'd say, if this thing worked, it would
make you a model that looks like this. Would you like it? And then when the humans agree that that's
what they would like, then you have a target. And then of course you may have to pivot or you may
deliver something that's only half as good and discover that, hey, that's still useful. Or you may achieve everything you thought you needed to achieve and discover that
actually it needs to be twice as good. But I really like having a concrete goal to aim for.
Well, that's a beautiful segue into the next topic I want to talk about because Microsoft's
Kinect technology has been dubbed a failure that became a great success.
And the science behind it has a fascinating history.
You were there early on, and I'd love you to share some stories about Connect with us.
What's your particular perspective on this technology?
How did it come about? How has it evolved?
And how has it impacted other areas of research you're involved in?
I love the Connect story because in some sense, it's a classic example of when academic style
researchers meet engineers who really want to change the world.
So at Microsoft Research, we were looking at whether we could make computer vision algorithms
that would be able to follow the movements of the human body.
And we were working with the academic research field and we were doing pretty well and we had good results.
So one day, the Xbox people, Alex Kipman, who is working in Xbox then, came to us and said,
we've had this great idea for a video game where you're going to like recognize the motion of a human in a camera
and then it's going to control the games, and it's going to be amazing. And we said, I'm glad you asked us that because we're
actually the world experts on this, and I can tell you it's not going to work.
So then they said, oh yeah, it's funny you say that because look at this program we wrote.
And they showed us their version of it, and their version was
better than anything in the academic literature.
Oh my gosh. version was better than anything in the academic literature. Some genius programmers had put
together an amazing demo of how it would work. What was amazing was it was using an idea from
the academic literature, but they had engineered it so well and made it so effective and really
worked hard on it. The reason that idea wasn't very popular with academics at the time
was that it's pretty hard to take a single image from a camera, identify the human in the image,
and then list off where the hands are, where the elbows are. But supposing you already had an image
from, let's say, 30 milliseconds ago, because it's a video sequence, and you already knew in the
image from 30 milliseconds ago where everything was, then you could just simply say to yourself, well, they can't have gone far,
they were there 30 milliseconds ago, you know, and find where everything was. So we knew how to do
that in academia, but what no one had done is really, really worked hard on that, really kind
of tried to moonshot that and make it really work. And our contribution was to observe that, okay,
you've got a system that works 99% of the time, assuming it was right 30 milliseconds ago. There's a calculation you can do,
which tells you that system will definitely fail after five minutes. And this is what they observed
and they knew that that would happen. And you could design a gain around that, that, you know,
ran in sort of three minute sections and then reset itself. But ideally you would have the
system just not
make mistakes like that. So our first contribution was to say, what we need to do here is just
basically have the system every couple of seconds kind of reset itself. And we were using machine
learning. And this was kind of an early instance of machine learning really being applied to one
of these hard computer vision problems. So we said to them, okay, we could maybe do it, maybe. But,
you know, we would need real world examples of this thing running in 10 different living rooms
across the planet in order to even know if we're doing well, not to mind train our machine learning
algorithm. And then the horrifying moment two weeks later when we were on a call and they said,
yep, we've got 10 people in living rooms across the planet. Our Japan people are finishing up there tomorrow and then they're
moving over to China. So suddenly we realized, okay, these people are really serious. And then
when we needed to hire a Hollywood studio to generate training data, they hired a Hollywood
studio to generate training data. So there was just a huge amount of vision there, which
we were saying this stuff because it seemed right, but really no one had done it to that level before.
And the reason was they understood what the machine learning was doing.
They figured it works from examples.
If you don't have enough examples, no brainer, let's get the examples.
Whereas academics would always be, I'm happy to spend a year building a better theory.
Whereas with Connect, the partnership with people who really wanted to get stuff done, maybe that's where I've inherited some of this, allowed us to make
really fantastic progress.
Microsoft's HoloLens is another computer vision technology that you've had a lot to do with.
So give us the Andrew Fitzgibbon take on HoloLens and its journey from birth until now with the release of HoloLens 2.
What have you discovered about its capabilities over the years?
And what do you think HoloLens has contributed to the computer vision research community?
HoloLens is an amazing device.
It came out of the same team that we worked with on Connect,
Alex Kipman's team.
And at one level, HoloLens is exactly what Kipman said it was
when they announced it first, three, maybe four years ago.
He said, this is the future of the PC.
And in one sense, I do believe that there will be a future
where we find it weird that we used to carry a flat screen in our pocket
and pull out this flat screen to look at it in order to do our digital work.
I think that sometime soon when I refit my office,
instead of setting up a bunch of LCD panels, I'll just put
a large black curved piece of plywood in front of me and I'll wear a HoloLens and all my documents
will appear in the real world in front of me. I absolutely believe that the screen in the pocket
or the screen attached to the desk is going to be as weird as the phone attached to the building.
So that's far future.
That's when HoloLens has the form factor of a small set of glasses.
But towards that future,
HoloLens today is amazingly valuable for real people doing real work.
Because HoloLens lives in a 3D world,
it has lots of 3D vision in it. One of the pieces that I'm incredibly impressed by in HoloLens lives in a 3D world it has lots of 3D vision in it
one of the pieces that I'm incredibly
impressed by in HoloLens
because I didn't work on it
does the job of figuring out
where your head is in the 3D world
this is related to work I did
you know on Buju many years ago
but on HoloLens
it does it all the time
in real time
on an incredibly low-powered device.
So that's a beautiful piece of technology that, again,
I think very few people other than Microsoft could have put together.
My work on HoloLens stemmed from a piece of very blue skies research we did almost 10 years ago now.
We, that's me and a friend called Tom Cashman, who arrived as an intern,
we decided we want to learn about the 3D structure of stuff that moves. What's stuff that moves? Well,
the human body is something that moves, and we knew a bit about the 3D structure of the human
body from Connect. But we wanted to learn about this 3D structure just from still images. So we
had to think of something that's kind of bendy and movie,
but that is somehow not too bendy and not too movie.
So we realised that dolphins were the ideal thing to work on.
So we decided to write a paper called What Shape Are Dolphins?
Now, we didn't really care about dolphins,
but we cared about bendy movie 3D stuff.
And we worked on that paper in order to build mathematical models
of 3D. We thought, well, why did we want to know about 3D? Well, when you're interacting with the
virtual world or the mixed reality world in the HoloLens, one of the important 3D objects is the
human hand. If the system can look at your hands and fully accurately determine the position of every bone and knuckle in the hand, then you can use your hands in the virtual world to pick up virtual objects.
And the virtual objects behave exactly as you would expect from real world physics.
So the research on dolphins became a research on the slightly more bendy object that is the human hand.
And then we knew that this technology was useful for something. HoloLens was being developed. So we thought, let's see if we can deliver this dream of real-world physics to the HoloLens. And
we were extremely happy to see announced recently a Mobile World Congress in Barcelona, HoloLens 2
with fully articulated
hand tracking. Let's shift over and talk about Ada a little bit more deeply and talk about specific
problems you want to tackle and what technical ground you hope to break in the projects you
want to take on. What's on your roadmap, Andrew? So Ada is about permeating AI into the parts of the world where we almost don't yet know we need it.
Today, we still can't ask questions of the internet like, find me all the ski chalets within 100 metres of the slopes, right? That's a hard question.
Why is it a hard question? Because all the ski chalets have their own little website. It's not necessarily aggregated.
And the way to answer that question is very easy. You should just read and understand all
the webpages in the world and be able to answer any questions about them. So a computer that can
really read and understand all the webpages in world is clearly, in some sense, you know, sort of infinitely far away.
So the idea is that throughout computer science, there are areas where we can permeate using AI and machine learning to deliver systems that work better. So for example, on the HoloLens, we had to take our hand tracking
code, which we wrote as standard, you know, computer vision researcher code, and make it
maybe 500 times more efficient in order to run on that low power device. Now, if we could achieve
that for general code, then maybe we could make, you know, 500 times more efficient the code we run in data centres around the world.
Maybe we could achieve 500 times as much or maybe we could save energy.
If we had capabilities in user interfaces that allowed us to adapt to what the user is doing but not be annoying,
and this is a crucial combination, then of course we would have much
better capabilities. Of course we would have happier users, happier humans. If we think about
the challenge of maybe adapting user interfaces so that my user interface works better for me and
yours works better for you, one of the things that we want to do is learn from a very small number of signals.
So maybe every time you switch on your computer, you know, you arrange the windows in a certain
order. We already have systems that will try to learn your preferences and try to do that for you.
But what we know today is that a lot of the times the systems that do that are a bit annoying.
Why are they a bit annoying?
Because they don't have the human level understanding.
Oh, this person's in a hurry.
This person's using a different machine.
They're in a different context.
They're at home, not work.
How could a system really understand from small amounts of training data what it should do and when it should do nothing?
And these are areas which I think we're sort of touching on today.
There are whole areas of AI research that our team looks at,
and they're different questions.
In all of them, again, there's this trade-off between a fundamental research angle
and how are we actually going to demonstrate our AI to the world.
So, Andrew, I've heard you use the phrase, change the world or a better world, and these are the sort of end goals.
So there are a lot of things that could go right if you're successful, but we have to talk at least a little bit about
what could possibly go wrong. So with all of the possibilities that you're describing,
they seem to pose new kinds of social protocols that we'll have to develop and adapt with things
like advanced 3D computer vision
and head-mounted computers and cameras galore.
Is there anything that keeps you up at night?
There are things where I think I have some idea
what the answer is going to be.
And then there are things where I hope I have some idea
what the answer is going to be.
Let me do one of the first ones.
When I describe a world where everybody's glasses
have the possibility of
projecting 3D content into the world in front of them, you can immediately think, but this is
terrible. People are going to just, you know, spend their entire time looking at the content
and not interacting with the other person in front of them. But I think we already have social
protocols that solve that today. So if you and I are talking and there's a TV over to
the right, maybe I'll turn to look at the TV. It's rude, but maybe you understand something
interesting must have happened on the TV and we both turn and maybe it's something we'll discuss
and we'll get back to chatting. We also have sort of protocols. It's rude for me to take out my phone
and look at it when I'm talking with you, but you understand if I tell you, well, I'm just going to
look that up or I'm just going to figure out what's happening in that meeting. So I'm not
worried with something like HoloLens that we won't easily develop these protocols. We might need some
technical solutions like when there's something in my 3D world, maybe it's, you know, my email
client is maybe sitting on the desk in front of me. You can see that it's my email client, but of
course you can't read the content within it. So, but these are technical problems we can solve. The social protocols, I'm confident
there we will develop. And I'm not concerned about that aspect. But you might well say,
okay, but you should be concerned about maybe a world where the cameras on my device are revealing information that I wouldn't like
revealed to other people in the world. And I'm happy that we, the AI community essentially,
are talking about this and thinking about ways in which we can ensure the security of our data
so that any person can have a concrete understanding about what information is leaving the device, how
well encrypted information is used, and that we can
all have protocols for how we dispose of information when we don't need it. Another exciting thing about
the HoloLens is that, for example, in hand tracking, none of the images that the cameras take leave the
device. And today we are looking at a number of ways in which I can securely send your information
to the cloud, securely do machine learning on it and send the answer back, and then delete the information and guarantee that to the
customer. So that's one aspect where we as Microsoft have just devoted an awful lot of effort to
thinking about security and privacy of that information. Another side of security is trust.
We earn that trust obviously by having behaved well in the
past, by having strong statements about what we're going to keep and what we're not going to keep,
and also by the strength of our research in aspects like secure computation and secure
machine learning. It's story time, and you've got a good one. Tell us what got a young Andrew Fitzgibbon
interested in computer science and what was your path to Microsoft research?
Gosh, it's a long one. I grew up in the 80s. I liked mathematics. I liked messing around with
electronics because we didn't really have computers then. We had
computers at school. One of my summer jobs was as a water taxi driver. And being a water taxi
driver means you're incredibly busy from about 9am to 11am when all the boats go out to the sea.
And then you're incredibly busy from about 3pm to about 6pm or maybe 7pm when everyone comes back.
So you've got a huge amount of time sitting
out in the sun or maybe the rain during the day where you're just kind of killing time. And one
of the things you can do is read. And of course, if many of your listeners will love mathematics,
you can play around with mathematics. But what was also great back then was because you didn't
have a computer beside you, I would write programs that I would then try and type in the following
day when I got to school. And that was just a great way to kind of, in a very casual way,
learn about computer programming. So I would scratch out a program on paper, line 10, do this,
line 20, do that. Or later I would write in assembly language the little bits that needed
doing. And it's kind of therapeutic. It's like solving mathematical puzzles, right? Mathematical puzzles are great, but they're hard to invent for yourself. So you might buy a book
of mathematical puzzles and then you solve them all after a few weeks. Whereas with a computer
program, you're of course continually inventing the puzzles for yourself.
Right. So from that job as a water taxi driver. Then what? Then where?
So I did mathematics and computer science at university. I wanted to do physics,
but the physics department told me it would be much too hard to do physics with computer science.
And someone told me later that also they weren't that keen on people with blue hair.
So I ended up doing mathematics and computer science. The mathematics department were much more enlightened. And I was sitting in a topology lecture one day and thinking,
what would this stuff be useful for? And I thought, oh, maybe it'll be useful for recognising the shapes of letters. And basically, I went and found myself a master's course that
had something to do with computer vision. And I went to Heriot-Watt University in Edinburgh,
which was running a fantastic course on sort of, they called it knowledge-based systems. Nowadays, you would
think of it as an introductory AI course. So I finished my master's, and then I got a job as a
programmer in Edinburgh University. I got that job in 1989. But then somehow by 1997, somewhat
accidentally, I ended up with a PhD. I remember now
how that happened. I was sitting in some meeting as the programmer and they were talking about
their research. And I said something like, well, has anybody tried just like rendering 500 views
of the thing and then using that? And someone said, oh, you should write a paper on that.
And I thought it was a joke. And of course, it turned out that that was my first paper. It was
one of these things where let's use brute force instead of mathematics and see
what happens.
And from then on, you know, I hope I've managed to do elegant mathematics as well as brute
force.
But I think a practical mindset was there even from then.
Well, sometimes you need tweezers and sometimes you need a hammer.
Right.
Exactly.
Exactly.
All right.
So is that your accidental PhD and then what?
So the PhD was part-time with a research assistant job at Edinburgh.
And that job was coming to an end, or at least I thought it was coming to an end
because the contract was due to end at a certain date.
And of course, like a very clever person, I forgot to ask anybody whether that was true.
So I started looking for another job. I ended up in Oxford working with the great computer vision researcher of the UK,
Andrew Zissman, and a bunch of other amazing people. And there again, I learned the interaction
of mathematics and code. I worked there, that led to Bourdieu, which we have mentioned earlier.
And then about mid 2000s, I moved to Microsoft Research.
Right. You did a bit of a drive-by on reference to blue hair.
It was the 80s, but did you indeed have blue hair?
Oh, yes, I indeed had blue hair for much of my undergraduate career.
It was varying degrees of blue and sometimes would
turn into green when the bleach wore out.
Was it ever smurf blue?
No, it was kind of a dark electric blue. At least that was the aim.
Well, at the end of every podcast, I give my guests the chance to say anything they want
to our listeners. And sometimes it's general advice or wisdom or inspiration.
Other times it's specific challenges or open problems in the field.
So what would you like to say to emerging researchers?
People sometimes ask, what problem should I research? And, you know, there's a sort of a
simple general answer, solve important problems that will change the world. But sometimes that's
too big and you don't really have an idea what the big important problems that will change the world. But sometimes that's too big,
and you don't really have an idea what the big important problems are. A signal that I found
valuable is when you're listening to a talk or reading a paper, find something that annoys you.
Find something where you think, really, that can't be the right way to do this.
Then, and this is crucial, ask yourself, really, why is this annoying me? And then find a real world
example where the thing that's annoying you is going to go wrong. A real world example just needs
to be something that you know we should be able to do. None of the existing technologies can do it,
and you've got an idea. And I think that has been a very valuable source of inspiration
for the times when you don't have a big idea.
Sometimes you just have a big idea and that's great. Go for it.
And don't forget to focus on everything.
Focus on everything. That's the best idea.
Andrew Fitzgibbon, thank you for joining us from Cambridge today.
Thank you. It's an honor to have been here.
To learn more about Dr. Andrew Fitzgibbon and the latest in 3D computer vision and
all data AI, visit Microsoft.com slash research.