Microsoft Research Podcast - 080r - All Data AI with Dr. Andrew Fitzgibbon
Episode Date: May 6, 2020This episode originally aired in June, 2019 You may not know who Dr. Andrew Fitzgibbon is, but if you’ve watched a TV show or movie in the last two decades, you’ve probably seen some of his work. ...An expert in 3D computer vision and graphics, and head of the new All Data AI group at Microsoft Research Cambridge, Dr. Fitzgibbon was instrumental in the development of Boujou, an Emmy Award-winning 3D camera tracker that lets filmmakers place virtual props, like the floating candles in Hogwarts School for Witchcraft and Wizardry, into live-action footage. But that was just his warm-up act. On today’s podcast, Dr. Fitzgibbon tells us what he’s been working on since the Emmys in 2002, including body- and hand-tracking for powerhouse Microsoft technologies like Kinect for Xbox 360 and HoloLens, explains how research on dolphins helped build mathematical models for the human hand, and reminds us, once again, that the “secret sauce” to most innovation is often just good, old-fashioned hard work. https://www.microsoft.com/research
Transcript
Discussion (0)
When I spoke with Andrew Fitzgibbon last summer, he introduced us to his all-new AllData AI group,
a team that studies AI models and techniques that apply to complex real-world data,
from big to small, structured to unstructured, and everything in between. Whether you got the
411 on the Machine Learning Dataverse last June, or you're ready to go beyond the numbers today,
I know you'll enjoy episode 80 of the Microsoft Research Podcast,
All Data AI.
I do believe that there will be a future where we find it weird that we used to carry a flat
screen in our pocket and pull out this flat screen to look at it in order to do our digital
work.
I think that sometime soon when I refit my office,
instead of setting up a bunch of LCD panels,
I'll just put a large black curved piece of plywood in front of me
and I'll wear a hololens and all my documents will appear in the real world in front of me.
I absolutely believe that the screen in the pocket or the screen attached to the desk is going to be as
weird as the phone attached to the building. You're listening to the Microsoft Research
Podcast, a show that brings you closer to the cutting edge of technology research
and the scientists behind it. I'm your host, Gretchen Huizenga. You may not know who Dr. Andrew Fitzgibbon is, but if you've watched a TV
show or movie in the last two decades, you've probably seen some of his work. An expert in 3D
computer vision and graphics, and head of the new All Data AI group at Microsoft Research Cambridge,
Dr. Fitzgibbon was instrumental in the development of Boozhoo, an Emmy-award-winning
3D camera tracker that lets filmmakers place virtual props, like the floating candles in
Hogwarts School for Witchcraft and Wizardry, into live-action footage. But that was just his warm-up
act. On today's podcast, Dr. Fitzgibbon tells us what he's been working on since the Emmys in 2002,
including body and hand tracking for powerhouse Microsoft technologies
like Connect for Xbox 360 and HoloLens,
explains how research on dolphins
help build mathematical models for the human hand,
and reminds us once again
that the secret sauce to most innovation
is often just good old-fashioned hard work.
That and much more on this episode of the Microsoft Research Podcast.
Andrew Fitzgibbon, welcome to the podcast.
Hi, Gretchen. Great to be here.
So I usually start my podcasts by way of introduction, but I want to go a little
off script
with you because you're funny. You've said that you situate your research at the intersection of
computer vision and computer graphics with excursions into neuroscience. And at a more
subatomic level, you characterize your work as at its core, extracting information about the world
from photons. So give us a little more context about these excursions and extractions.
How in general would you describe what you do, Andrew?
We'll get you up in the morning.
So what I want to do is make computers help us change the world.
And we can change the world in a bunch of ways that are useful to humans.
I like to think that I work on the technologies that underpin
the ways we can make computers make the world better for us.
A lot of what we do in what we now call AI,
AI, the combination of machine learning, computer vision, natural language processing,
it's all about taking information from the world,
and that information comes through sensors somehow, whether the sensors are the pixels in a camera,
or the sensors in a microphone, or even the sensors in your keyboard when you type a tweet.
These are all sources of information from the real world that we would like to do something
good with. And for me, something good might be improving the computer graphics in Harry Potter. And I consider that good because it makes
lots of humans happy. And in some sense, everything we do in life is about making humans happy.
Well, tell me a little more about the excursions into neuroscience when you're talking about
computer vision and computer graphics.
It might not seem a natural leap, but I think it kind of is. Can you unpack that a little?
Yes, I started collaborating when I was in Oxford with some of the neuroscientists who worked there.
And we were interested in a fairly simple question, which is, roughly, can humans point at things? Now, you think, obviously, humans can point at things, but we don't have a good
theory for how the brain integrates 3D information. So, I was interested in purely providing the
neuroscientists with a sort of a mathematical backup. I was saying, no one thinks the brain
does it by multiplying matrices and vectors, but if it did, you would see these sort of error patterns.
And then we went to the real world and looked at people wearing virtual reality headsets and
the kind of pointing mistakes they make. And those are different from the patterns that a computer
would make using today's technology. So we still don't know what the brain does,
but we have more evidence that it's not the same as what the computer does.
Hmm. All right. So let's talk a little more about extracting information about the world
from photons. That's a very granular way of describing the computer vision work that you do.
Could you explain that technically a little better?
So, regular listeners to your podcast will know that computer vision is one of those problems that's sort of easy to state, you know, does this picture contain a cat or not, but has turned out to be incredibly hard.
And even in this age of deep learning successes with computer vision, we still know that it's incredibly hard. My abstraction of gathering information from photons is just to kind of stand back and say,
why are we so excited about computer vision? Why are we so excited about these capabilities?
And one of the things that computer vision does is it allows us to acquire information from far away.
And of course, one can hear information from far away, but it's a capability that allows us to do things like
recognise people far away or to drive a car. And I want to think about it at that level of
abstraction because I want to always understand that my end goal is to do something real.
If we think always about the end goal, I think we do a much better job of making progress on the fundamental research.
The alternative is to say, my interest is in understanding deep neural networks.
I absolutely love trying to understand deep neural networks, and I do so at a theoretical level, but also I want to know what the practical consequence of that understanding will be. Well, now that we've situated your research, let's situate you. Until this week,
I would have introduced you as a partner scientist in HoloLens, but you've recently
been tapped to lead a group at MSR Cambridge called Across Discipline AI, or ADA, which turns
out to be an acronym that references Ada Lovelace. Was that intentional
at all? And why, if it was? So, yes, it was intentional in the sense that Ada Lovelace is
sometimes considered the world's first computer programmer. Whether or not she programmed
computers, she certainly was the first person to observe that the computational powers of the analytical engine
or the difference engine could be applied to quantities that were not numbers. Now, of course,
since the dawn of computers, we've represented quantities in the computer, like strings of
characters, like words. They're all numbers. So what's new about saying that it operates on
something beyond numbers? Well, the thing that's new is not so much that the computer understands a sequence of numbers,
but that it can understand the interconnections between them.
And that is represented fundamentally by a computer science concept called a graph.
And a graph is something that we can use to represent a wide variety of computer science concepts. Okay. So why now with this group at MSR Cambridge? Has there been a confluence of
things that exist now that didn't before?
I think, of course, we have all observed the advances in AI due to the advances in machine learning, due to the advances
in deep learning. And I think now there's a golden opportunity to apply that to a broader range of
areas. It's also a fantastic opportunity here and now for us to think about a new generation of AI
programming. Until today, AI programming has been the domain of high priests and priestesses
who have PhDs in machine learning and who understand linear algebra.
And yet, we believe that actually a lot of AI programming could be a lot simpler.
So we're looking at how we might think about a third generation of AI programming. The first generation would have been the raw work of Hinton and Lacan, who, which lots of listeners will have used or heard of. And these tools have been, again, fantastic for democratizing AI,
but they make it hard for somebody who's just a great computer programmer to understand.
In my view, they somewhat hide the beauty of the AI models that are underneath.
Neural network models are actually relatively simple. They're simple
models with complex consequences, and the research world is increasing its understanding of these
complex consequences. But it's also nice to just see the code unadorned and see how simple these
models are. Well, let's go back a ways before we talk about more current research, because you have some what I would call greatest hits in your earlier work.
One is called Boujou, a camera tracker that actually won an Emmy. And it's been used in computer graphics and live
action footage. And what I've heard is pretty much every movie made since it was released in 2002.
Tell us more about Buju. How and where did it come about? What does it do and how does it work
technically? Buju was a great fun project. I moved from the University of Edinburgh, where I did my PhD, to Oxford
University in the mid-90s. And I was working there with some amazing people who were interested in
the question of how a robot might navigate its way around some building or around some environment.
So we worked hard on the how does a robot navigate problem and we discovered that one
of the things the robot has to do in order to know where it's been in three dimensions is to build a
three-dimensional model of the world. And we worked hard on making a beautiful 3D model because we
figured this would be useful maybe for those industries where even today it's common for example to sculpt a car out
of clay before building the computer model certainly in those days if you were going to
have an alien in the movie i think pitch black was one of the first aliens we looked at they would
make the alien out of clay and then try to scan it into the computer so we thought we had a great
product we were going to use our robot navigator, spin the camera around the alien model, pull that into the computer, and then give people a computer model of the 3D object. It turned out nobody in the movie industry was interested.
Dang it. It didn't work in little corners. It was terrible. And they could just do better by using, you know, existing artists to create the models.
But somebody at one of the effects companies kind of thought through how we must be getting this
model and said, how do you know where the cameras are? We said, you know, well, it works it out.
That's part of how we get the 3D model. We work out how the cameras are. So it turns out something you really need in movies that's really hard to do is to figure
out where the cameras are. Now, what does that mean? That means maybe I have a camera mounted
in a boat and the boat contains some hobbits and is sailing down a river towards, you know,
some impressive mountain. It would be just great if the impressive mountain had two
huge statues pointing at the hobbits. But it turns out that nobody built those 400-meter statues.
So what we have to do is build some statues back in the studio, and we built some 40-centimeter
statues back in the studio. And if only we knew what motion the camera had undergone while it
rocked in that boat moving down the river,
we'd be able to make a robot do the same motion and then we could superimpose the images. So in
those days, the only way to find out where the camera was, was you could imagine trying to use
markers or trying to use GPS. None of these things work. So you would have to just sort of manually
position the camera for every single shot. It was incredibly expensive. So somebody who had worked on that kind of footage realized that our algorithms were
producing this as a kind of unwanted side effect. We flipped, I guess nowadays we would say we
pivoted. There was a startup and Buju was launched. And I'm super proud of it because it was one of
the first products which did kind of 3D computer vision.
There were products available then that would do character recognition, that would do number plate recognition.
But this really did 3D vision.
And we had to go beyond the academic state of the art to deliver it.
And we learned about delivering computer vision to the real world.
Some of what we learned was you just have to type in lots of code. People would ask me, you know, what's the secret sauce in Buju? And I'd say,
well, you know, have you read all the papers on 3D structure and motion? They would say, yeah,
I'm pretty sure of them. I would say, yeah, that's the secret sauce. We implemented all of them.
And, you know, it's an attitude that helps us today. Sometimes in research,
you're assessed by how beautiful and clean your algorithm is. Sometimes in the real world, you have to implement all the dirty algorithms until's sort of tied in. I imagine you said it about this. You once said,
if I had to nominate one key to success, it's a focus on everything. Is that what you're talking
about? That's exactly what I'm talking about. Yeah, exactly. There is no silver bullet. You
really have to focus on building a real thing. And that just means something that other humans would be happy to use.
So remember, in Buju, we didn't focus on the right thing, but the real thing we wanted to
build was this 3D modeler. And we knew we would make a sort of a, what's called a Wizard of Oz
demo. We'd say, if this thing worked, it would make you a model that looks like this. Would you
like it? And then when the humans agree
that that's what they would like, then you have a target. And then of course, you may have to pivot
or you may deliver something that's only half as good and discover that, hey, that's still useful.
Or you may achieve everything you thought you needed to achieve and discover that actually
it needs to be twice as good. But I really like having a concrete goal to aim for.
Well, that's a beautiful segue into the next topic I want to talk about,
because Microsoft's Kinect technology has been dubbed a failure that became a great success.
And the science behind it has a fascinating history. You were there early on,
and I'd love you to share some stories about Connect with us. What's your particular perspective on this technology? How did it come about? How has it evolved? And how has it impacted other
areas of research you're involved in? I love the Connect story because in some sense,
it's a classic example of when academic style researchers meet engineers who really want to change the world. So at Microsoft
Research, we were looking at whether we could make computer vision algorithms that would be
able to follow the movements of the human body. And we were working with the academic research
field, and we were doing pretty well, and we had good results. So one day, the Xbox people,
Alex Kipman, who was working in Xbox then, came to us and said,
we've had this great idea for a video game where you're going to like recognise the motion of a human in a camera
and then it's going to control the games and it's going to be amazing.
And we said, I'm glad you asked us that because we're actually the world experts on this and I can tell you it's not going to work.
So then they said, oh yeah, it's funny you say that because look at this program we wrote.
And they showed us their version of it.
And their version was better than anything in the academic literature.
Oh, my gosh.
Some genius programmers had put together an amazing demo of how it would work.
What was amazing was it was using an idea from the academic literature, but they
had engineered it so well and made it so effective and really worked hard on it. The reason that
idea wasn't very popular with academics at the time was that it's pretty hard to take
a single image from a camera, identify the human in the image and then list off where
the hands are, where the elbows are.
But supposing you already had an image from, let's say, 30 milliseconds ago,
because it's a video sequence, and you already knew in the image from 30 milliseconds ago where everything was, then you could just simply say to yourself, well,
they can't have gone far. They were there 30 milliseconds ago, you know, and find where
everything was. So we knew how to do that in academia, but what no
one had done is really, really worked hard on that, really kind of tried to moonshot that and
make it really work. And our contribution was to observe that, okay, you've got a system that works
99% of the time, assuming it was right 30 milliseconds ago. There's a calculation you
can do, which tells you that system will definitely fail after five minutes. And this is what they
observed and they knew that that would happen. And you could design a gain around that,
that, you know, ran in sort of three-minute sections and then reset itself. But ideally,
you would have the system just not make mistakes like that. So our first contribution was to say,
what we need to do here is just basically have the system every couple of seconds kind of reset
itself. And we were using machine learning.
And this was kind of an early instance of machine learning
really being applied to one of these hard computer vision problems.
So we said to them, OK, we could maybe do it, maybe.
But, you know, we would need real world examples of this thing running
in 10 different living rooms across the planet
in order to even know if we're doing well,
not to mind train our machine learning algorithm. And then the horrifying moment two weeks later,
when we were on a call and they said, yep, we've got 10 people in living rooms across the planet.
Our Japan people are finishing up there tomorrow and then they're moving over to China. So suddenly
we realized, okay, these people are really serious. And then when we needed to hire a
Hollywood studio to generate training data needed to hire a Hollywood studio to
generate training data, they hired a Hollywood studio to generate training data. So there was
just a huge amount of vision there, which we were saying this stuff because it seemed right,
but really no one had done it to that level before. And the reason was they understood what
the machine learning was doing. They figured it works from examples. If you don't have enough
examples, no brainer, let's get the examples.
Whereas academics would always be,
I'm happy to spend a year building a better theory.
Whereas with Connect,
the partnership with people
who really wanted to get stuff done,
maybe that's where I've inherited some of this,
allowed us to make really fantastic progress.
Microsoft's HoloLens is another computer vision technology
that you've had a lot to do with.
So give us the Andrew Fitzgibbon take on HoloLens is another computer vision technology that you've had a lot to do with. So give us the Andrew Fitzgibbon take on HoloLens and its journey from birth until now with the release of HoloLens 2.
What have you discovered about its capabilities over the years?
And what do you think HoloLens has contributed to the computer vision research community?
HoloLens is an amazing device.
It came out of the same team that we worked with on Connect, Alex Kipman's team.
And at one level, HoloLens is exactly what Kipman said it was when they announced it
first three, maybe four years ago.
He said, And when I refit my office, instead of setting up a bunch of LCD panels, I'll just put a
large black curved piece of plywood in front of me and I'll wear a HoloLens and all my
documents will appear in the real world in front of me.
I absolutely believe that the screen in the pocket or the screen attached to the desk
is going to be as weird as the phone attached to the building.
So that's far future. That's when HoloLens has the form factor of a small set of glasses.
But towards that future, HoloLens today is amazingly valuable for real people doing real work. Because HoloLens lives in a 3D world, it has lots of 3D vision in it. One of the pieces that I'm incredibly impressed by in HoloLens,
because I didn't work on it,
does the job of figuring out where your head is in the 3D world.
This is related to work I did, you know, on Buju many years ago.
But on HoloLens, it does it all the time in real time
on an incredibly low-powered device.
So that's a beautiful piece of technology that, again, it does it all the time in real time on an incredibly low power device.
So that's a beautiful piece of technology that, again,
I think very few people other than Microsoft could have put together.
My work on HoloLens stemmed from a piece of very blue skies research we did almost 10 years ago now.
We, that's me and a friend called Tom Cashman, who arrived as an intern, we decided we want to learn about the 3D structure of stuff that moves. What's stuff that moves? Well, the human body is something that moves, and we knew a bit about something that's kind of bendy and movie, but that is
somehow not too bendy and not too movie. So we realised that dolphins were the ideal thing to
work on. So we decided to write a paper called What Shape Are Dolphins? Now, we didn't really
care about dolphins, but we cared about bendy movie 3D stuff. And we worked on that paper in order to build mathematical models of 3D. We thought,
well, why did we want to know about 3D? Well, when you're interacting with the virtual world
or the mixed reality world in the HoloLens, one of the important 3D objects is the human hand.
If the system can look at your hands and fully accurately determine the position of every bone and knuckle in the hand. Then you can use your hands in the virtual
world to pick up virtual objects and the virtual objects behave exactly as you
would expect from real-world physics. So the research on dolphins became a
research on the slightly more bendy object that is the human hand and then
we knew that this technology was useful for something.
HoloLens was being developed.
So we thought, let's see if we can deliver this dream of real-world physics to the HoloLens.
And we were extremely happy to see announced recently a Mobile World Congress in Barcelona,
HoloLens 2 with fully articulated hand tracking.
Let's shift over and talk about Ada a little bit more deeply and talk about specific problems you want to tackle across disciplines and what technical ground you hope to break in the
projects you want to take on. What's on your roadmap, Andrew? Drew? So ADA, a cross-discipline AI, is about permeating AI into the parts of the world where
we almost don't yet know we need it. Today, we still can't ask questions of the internet like,
find me all the ski chalets within 100 meters of the slopes, right? That's a hard question.
Why is it a hard question? Because all the ski chalets have their own little website. It's not necessarily aggregated.
And the way to answer that question is very easy. You should just read and understand all the web
pages in the world and be able to answer any questions about them. So a computer that can
really read and understand all the web pages in the world
is clearly in some sense, you know, sort of infinitely far away. So the idea of AI across
disciplines is that throughout computer science, there are areas where we can permeate using AI
and machine learning to deliver systems that work better. So for example,
on the HoloLens, we had to take our hand tracking code, which we wrote as standard, you know,
computer vision researcher code, and make it maybe 500 times more efficient in order to run
on that low power device. Now, if we could achieve that for general code, then maybe we could make,
you know, 500 times more efficient the code we run in data centers around the world. Maybe we
could achieve 500 times as much or maybe we could save energy. If we had capabilities in user
interfaces that allowed us to adapt to what the user is doing, but not be annoying,
and this is a crucial combination, then of course we would have much better capabilities. Of course
we would have happier users, happier humans. If we think about the challenge of maybe adapting
user interfaces so that my user interface works better for me and yours works better for you,
one of the things that we want to do is learn from a very small number of signals.
So maybe every time you switch on your computer, you know, you arrange the windows in a certain order.
We already have systems that will try to learn your preferences and try to do that for you.
But what we know today is that a lot of the times
the systems that do that are a bit annoying.
Why are they a bit annoying?
Because they don't have the human level understanding.
Oh, this person's in a hurry.
This person's using a different machine.
They're in a different context.
They're at home, not work.
How could a system really understand
from small amounts of training data
what it should do and when it should do nothing?
And these are areas which I think we're sort of touching on today. There are whole areas of AI research that
our team looks at, and they're different questions. In all of them, again, there's this trade-off
between a fundamental research angle and how are we actually going to demonstrate our AI to the world?
So, Andrew, I've heard you use the phrase, change the world or a better world. And these are the
sort of end goals. So there are a lot of things that could go right if you're successful.
But we have to talk at least a little bit about what could possibly go wrong.
So with all of the possibilities that
you're describing, they seem to pose new kinds of social protocols that we'll have to develop
and adapt with things like advanced 3D computer vision and head-mounted computers and cameras
galore. Is there anything that keeps you up at night? There are things where I think I have
some idea what the answer is going to be. And then there are things where I think I have some idea what the answer is going to be. And then there are things where I hope I have some idea what the answer is going to be. Let me do one of the first ones. When I describe a world where everybody's glasses have the possibility of projecting 3D content into the world in front of them, you can immediately think, but this is terrible. People are going to just, you know, spend their entire time looking at the content and not interacting with the other person in front of them. But I think we already
have social protocols that solve that today. So if you and I are talking and there's a TV over to
the right, maybe I'll turn to look at the TV. It's rude, but maybe you understand something
interesting must have happened on the TV and we both turn and maybe it's something we'll discuss and we'll get back to chatting.
We also have sort of protocols.
It's rude for me to take out my phone and look at it when I'm talking with you.
But you understand if I tell you, well, I'm just going to look that up or I'm just going to figure out what's happening in that meeting.
So I'm not worried with something like HoloLens that we won't easily develop these protocols.
We might need some technical solutions like when there's something in my 3D world, maybe it's, you know, my email client is maybe sitting
on the desk in front of me. You can see that it's my email client, but of course you can't read the
content within it. So, but these are technical problems we can solve. The social protocols,
I'm confident there we will develop, and I'm not concerned about that aspect. But you might well say,
okay, but you should be concerned about maybe a world where the cameras on my device are revealing
information that I wouldn't like revealed to other people in the world. And I'm happy that we,
the AI community, essentially, are talking about this and thinking about ways
in which we can ensure the security of our data
so that any person can have a concrete understanding
about what information is leaving the device,
how well encrypted information is used,
and that we can all have protocols
for how we dispose of information when we don't need it.
Another exciting thing about the HoloLens
is that, for example, in hand tracking,
none of the images that the cameras take
leave the device. And today, we are looking at a number of ways in which I can securely send your
information to the cloud, securely do machine learning on it and send the answer back, and then
delete the information and guarantee that to the customers. So that's one aspect where we, as
Microsoft, have just devoted an awful lot of effort to thinking about security and privacy of that information.
Another side of security is trust.
We earn that trust obviously by having behaved well in the past,
by having strong statements about what we're going to keep and what we're not going to keep,
and also by the strength of our research
in aspects like secure computation and secure machine learning.
It's story time, and you've got a good one. Tell us what got a young Andrew Fitzgibbon
interested in computer science and what was your path to Microsoft research? Gosh, it's a long one. I grew up in the 80s. I liked mathematics. I liked messing around
with electronics because we didn't really have computers then. We had computers at school.
One of my summer jobs was as a water taxi driver. And being a water taxi driver means you're
incredibly busy from about 9am to 11am when all the boats go out to the sea. And being a water taxi driver means you're incredibly busy from about 9am to
11am when all the boats go out to the sea. And then you're incredibly busy from about 3pm to
about 6pm or maybe 7pm when everyone comes back. So you've got a huge amount of time sitting out
in the sun or maybe the rain during the day where you're just kind of killing time. And one of the
things you can do is read. And of course, if many of your listeners will love mathematics, you can play around with mathematics.
But what was also great back then was because you didn't have a computer beside you, I would write programs that I would then try and type in the following day when I got to school.
And that was just a great way to kind of, in a very casual way, learn about computer programming. So I would scratch out a program on paper,
line 10, do this, line 20, do that. Or later I would write in assembly language the little bits that needed doing. And it's kind of therapeutic. It's like solving mathematical
puzzles, right? Mathematical puzzles are great, but they're hard to invent for yourself.
So you might buy a book of mathematical puzzles and then you solve them all after a few weeks.
Whereas with a computer program, you're of course continually inventing the puzzles for yourself.
Right. So from that job as a water taxi driver, then what? Then where?
So I did mathematics and computer science at university. I wanted to do physics,
but the physics department told me it would be much too hard to do physics with computer science.
And someone told me later that also they weren't that keen on people with blue hair.
So I ended up doing mathematics and computer science.
The mathematics department were much more enlightened.
And I was sitting in a topology lecture one day and thinking, what would this stuff be useful for?
And I thought, oh, maybe it'll be useful for recognising the shapes of letters.
And basically, I went and found myself
a master's course that had something to do with computer vision.
And I went to Heriot-Watt University in Edinburgh, which was running a fantastic
course on sort of, they called it knowledge based systems.
Nowadays, you would think of it as an introductory AI course.
So I finished my master's and then I got a job as a programmer in Edinburgh
University. I got that job in 1989, but then somehow by 1997, somewhat accidentally, I ended
up with a PhD. I remember now why that happened. I was sitting in some meeting as the programmer
and they were talking about their research. And I said something like, well, has anybody tried just like rendering 500 views of the thing
and then using that?
And someone said, oh, you should write a paper on that.
And I thought it was a joke.
And of course, it turned out that that was my first paper.
It was one of these things where let's use brute force instead of mathematics
and see what happens.
And from then on, you know, I hope I've managed to do elegant mathematics as well as
brute force, but I think the practical mindset was there even from then.
Well, sometimes you need tweezers and sometimes you need a hammer.
Right, exactly, exactly.
All right. So, was that your accidental PhD and then what?
So, the PhD was part-time with a research assistant job at Edinburgh. And that job was
coming to an end, or at least And that job was coming to an end,
or at least I thought it was coming to an end because the contract was due to end at a certain
date. And of course, like a very clever person, I forgot to ask anybody whether that was true.
So I started looking for another job. I ended up in Oxford working with the great computer
vision researcher of the UK, Andrew Zisserman, and a bunch of other amazing people. And there again, I learned the
interaction of mathematics and code. I worked there, that led to Bourdieu, which we have mentioned
earlier. And then about mid 2000s, I moved to Microsoft Research.
Right. You did a bit of a drive-by on a reference to blue hair. It was the 80s, but did you indeed have blue hair?
Oh, yes, I indeed had blue hair for much of my undergraduate career.
It was varying degrees of blue, and sometimes it would turn into green when the bleach wore out.
Was it ever Smurf blue?
No, it was kind of a dark electric blue.
At least that was the aim.
Well, at the end of every podcast, I give my guests the chance to say anything they want to our listeners. And sometimes it's general advice or wisdom or inspiration.
Other times it's specific challenges or open problems in the field.
So what would you like to say to emerging across-discipline researchers?
People sometimes ask, what problem should I research?
And, you know, there's a sort of a simple general answer,
solve important problems that will change the world.
But sometimes that's too big and you don't really have an idea what the big important problems are.
A signal that I found valuable is when you're listening to a talk or reading a paper,
find something that annoys you.
Find something where you think, really, that can't be the right way to do this.
Then, and this is crucial, ask yourself, really, why is this annoying me?
And then find a real world example where the thing that's annoying you is going to go wrong.
A real world example just needs to be something that you know we should be able to do. None of the existing technologies can do it and you've got an idea. And I think that
has been a very valuable source of inspiration for the times when you don't have a big idea.
Sometimes you just have a big idea and that's great. Go for it.
And don't forget to focus on everything.
Focus on everything. That's the best idea.
Andrew Fitzgibbon, thank you for joining us from Cambridge today.
Thank you. It's an honor to have been here.
To learn more about Dr. Andrew Fitzgibbon and the latest in 3D computer vision and all
data AI, visit Microsoft.com slash research.