Microsoft Research Podcast - 080r - All Data AI with Dr. Andrew Fitzgibbon

Starting point is 00:00:00 When I spoke with Andrew Fitzgibbon last summer, he introduced us to his all-new AllData AI group, a team that studies AI models and techniques that apply to complex real-world data, from big to small, structured to unstructured, and everything in between. Whether you got the 411 on the Machine Learning Dataverse last June, or you're ready to go beyond the numbers today, I know you'll enjoy episode 80 of the Microsoft Research Podcast, All Data AI. I do believe that there will be a future where we find it weird that we used to carry a flat screen in our pocket and pull out this flat screen to look at it in order to do our digital

Starting point is 00:00:42 work. I think that sometime soon when I refit my office, instead of setting up a bunch of LCD panels, I'll just put a large black curved piece of plywood in front of me and I'll wear a hololens and all my documents will appear in the real world in front of me. I absolutely believe that the screen in the pocket or the screen attached to the desk is going to be as weird as the phone attached to the building. You're listening to the Microsoft Research Podcast, a show that brings you closer to the cutting edge of technology research

Starting point is 00:01:17 and the scientists behind it. I'm your host, Gretchen Huizenga. You may not know who Dr. Andrew Fitzgibbon is, but if you've watched a TV show or movie in the last two decades, you've probably seen some of his work. An expert in 3D computer vision and graphics, and head of the new All Data AI group at Microsoft Research Cambridge, Dr. Fitzgibbon was instrumental in the development of Boozhoo, an Emmy-award-winning 3D camera tracker that lets filmmakers place virtual props, like the floating candles in Hogwarts School for Witchcraft and Wizardry, into live-action footage. But that was just his warm-up act. On today's podcast, Dr. Fitzgibbon tells us what he's been working on since the Emmys in 2002, including body and hand tracking for powerhouse Microsoft technologies

Starting point is 00:02:06 like Connect for Xbox 360 and HoloLens, explains how research on dolphins help build mathematical models for the human hand, and reminds us once again that the secret sauce to most innovation is often just good old-fashioned hard work. That and much more on this episode of the Microsoft Research Podcast. Andrew Fitzgibbon, welcome to the podcast.

Starting point is 00:02:38 Hi, Gretchen. Great to be here. So I usually start my podcasts by way of introduction, but I want to go a little off script with you because you're funny. You've said that you situate your research at the intersection of computer vision and computer graphics with excursions into neuroscience. And at a more subatomic level, you characterize your work as at its core, extracting information about the world from photons. So give us a little more context about these excursions and extractions. How in general would you describe what you do, Andrew?

Starting point is 00:03:10 We'll get you up in the morning. So what I want to do is make computers help us change the world. And we can change the world in a bunch of ways that are useful to humans. I like to think that I work on the technologies that underpin the ways we can make computers make the world better for us. A lot of what we do in what we now call AI, AI, the combination of machine learning, computer vision, natural language processing, it's all about taking information from the world,

Starting point is 00:03:45 and that information comes through sensors somehow, whether the sensors are the pixels in a camera, or the sensors in a microphone, or even the sensors in your keyboard when you type a tweet. These are all sources of information from the real world that we would like to do something good with. And for me, something good might be improving the computer graphics in Harry Potter. And I consider that good because it makes lots of humans happy. And in some sense, everything we do in life is about making humans happy. Well, tell me a little more about the excursions into neuroscience when you're talking about computer vision and computer graphics. It might not seem a natural leap, but I think it kind of is. Can you unpack that a little?

Starting point is 00:04:33 Yes, I started collaborating when I was in Oxford with some of the neuroscientists who worked there. And we were interested in a fairly simple question, which is, roughly, can humans point at things? Now, you think, obviously, humans can point at things, but we don't have a good theory for how the brain integrates 3D information. So, I was interested in purely providing the neuroscientists with a sort of a mathematical backup. I was saying, no one thinks the brain does it by multiplying matrices and vectors, but if it did, you would see these sort of error patterns. And then we went to the real world and looked at people wearing virtual reality headsets and the kind of pointing mistakes they make. And those are different from the patterns that a computer would make using today's technology. So we still don't know what the brain does,

Starting point is 00:05:19 but we have more evidence that it's not the same as what the computer does. Hmm. All right. So let's talk a little more about extracting information about the world from photons. That's a very granular way of describing the computer vision work that you do. Could you explain that technically a little better? So, regular listeners to your podcast will know that computer vision is one of those problems that's sort of easy to state, you know, does this picture contain a cat or not, but has turned out to be incredibly hard. And even in this age of deep learning successes with computer vision, we still know that it's incredibly hard. My abstraction of gathering information from photons is just to kind of stand back and say, why are we so excited about computer vision? Why are we so excited about these capabilities? And one of the things that computer vision does is it allows us to acquire information from far away.

Starting point is 00:06:18 And of course, one can hear information from far away, but it's a capability that allows us to do things like recognise people far away or to drive a car. And I want to think about it at that level of abstraction because I want to always understand that my end goal is to do something real. If we think always about the end goal, I think we do a much better job of making progress on the fundamental research. The alternative is to say, my interest is in understanding deep neural networks. I absolutely love trying to understand deep neural networks, and I do so at a theoretical level, but also I want to know what the practical consequence of that understanding will be. Well, now that we've situated your research, let's situate you. Until this week, I would have introduced you as a partner scientist in HoloLens, but you've recently been tapped to lead a group at MSR Cambridge called Across Discipline AI, or ADA, which turns

Starting point is 00:07:21 out to be an acronym that references Ada Lovelace. Was that intentional at all? And why, if it was? So, yes, it was intentional in the sense that Ada Lovelace is sometimes considered the world's first computer programmer. Whether or not she programmed computers, she certainly was the first person to observe that the computational powers of the analytical engine or the difference engine could be applied to quantities that were not numbers. Now, of course, since the dawn of computers, we've represented quantities in the computer, like strings of characters, like words. They're all numbers. So what's new about saying that it operates on something beyond numbers? Well, the thing that's new is not so much that the computer understands a sequence of numbers,

Starting point is 00:08:09 but that it can understand the interconnections between them. And that is represented fundamentally by a computer science concept called a graph. And a graph is something that we can use to represent a wide variety of computer science concepts. Okay. So why now with this group at MSR Cambridge? Has there been a confluence of things that exist now that didn't before? I think, of course, we have all observed the advances in AI due to the advances in machine learning, due to the advances in deep learning. And I think now there's a golden opportunity to apply that to a broader range of areas. It's also a fantastic opportunity here and now for us to think about a new generation of AI programming. Until today, AI programming has been the domain of high priests and priestesses

Starting point is 00:09:08 who have PhDs in machine learning and who understand linear algebra. And yet, we believe that actually a lot of AI programming could be a lot simpler. So we're looking at how we might think about a third generation of AI programming. The first generation would have been the raw work of Hinton and Lacan, who, which lots of listeners will have used or heard of. And these tools have been, again, fantastic for democratizing AI, but they make it hard for somebody who's just a great computer programmer to understand. In my view, they somewhat hide the beauty of the AI models that are underneath. Neural network models are actually relatively simple. They're simple models with complex consequences, and the research world is increasing its understanding of these complex consequences. But it's also nice to just see the code unadorned and see how simple these

Starting point is 00:10:18 models are. Well, let's go back a ways before we talk about more current research, because you have some what I would call greatest hits in your earlier work. One is called Boujou, a camera tracker that actually won an Emmy. And it's been used in computer graphics and live action footage. And what I've heard is pretty much every movie made since it was released in 2002. Tell us more about Buju. How and where did it come about? What does it do and how does it work technically? Buju was a great fun project. I moved from the University of Edinburgh, where I did my PhD, to Oxford University in the mid-90s. And I was working there with some amazing people who were interested in the question of how a robot might navigate its way around some building or around some environment. So we worked hard on the how does a robot navigate problem and we discovered that one

Starting point is 00:11:27 of the things the robot has to do in order to know where it's been in three dimensions is to build a three-dimensional model of the world. And we worked hard on making a beautiful 3D model because we figured this would be useful maybe for those industries where even today it's common for example to sculpt a car out of clay before building the computer model certainly in those days if you were going to have an alien in the movie i think pitch black was one of the first aliens we looked at they would make the alien out of clay and then try to scan it into the computer so we thought we had a great product we were going to use our robot navigator, spin the camera around the alien model, pull that into the computer, and then give people a computer model of the 3D object. It turned out nobody in the movie industry was interested. Dang it. It didn't work in little corners. It was terrible. And they could just do better by using, you know, existing artists to create the models.

Starting point is 00:12:31 But somebody at one of the effects companies kind of thought through how we must be getting this model and said, how do you know where the cameras are? We said, you know, well, it works it out. That's part of how we get the 3D model. We work out how the cameras are. So it turns out something you really need in movies that's really hard to do is to figure out where the cameras are. Now, what does that mean? That means maybe I have a camera mounted in a boat and the boat contains some hobbits and is sailing down a river towards, you know, some impressive mountain. It would be just great if the impressive mountain had two huge statues pointing at the hobbits. But it turns out that nobody built those 400-meter statues. So what we have to do is build some statues back in the studio, and we built some 40-centimeter

Starting point is 00:13:17 statues back in the studio. And if only we knew what motion the camera had undergone while it rocked in that boat moving down the river, we'd be able to make a robot do the same motion and then we could superimpose the images. So in those days, the only way to find out where the camera was, was you could imagine trying to use markers or trying to use GPS. None of these things work. So you would have to just sort of manually position the camera for every single shot. It was incredibly expensive. So somebody who had worked on that kind of footage realized that our algorithms were producing this as a kind of unwanted side effect. We flipped, I guess nowadays we would say we pivoted. There was a startup and Buju was launched. And I'm super proud of it because it was one of

Starting point is 00:14:02 the first products which did kind of 3D computer vision. There were products available then that would do character recognition, that would do number plate recognition. But this really did 3D vision. And we had to go beyond the academic state of the art to deliver it. And we learned about delivering computer vision to the real world. Some of what we learned was you just have to type in lots of code. People would ask me, you know, what's the secret sauce in Buju? And I'd say, well, you know, have you read all the papers on 3D structure and motion? They would say, yeah, I'm pretty sure of them. I would say, yeah, that's the secret sauce. We implemented all of them.

Starting point is 00:14:39 And, you know, it's an attitude that helps us today. Sometimes in research, you're assessed by how beautiful and clean your algorithm is. Sometimes in the real world, you have to implement all the dirty algorithms until's sort of tied in. I imagine you said it about this. You once said, if I had to nominate one key to success, it's a focus on everything. Is that what you're talking about? That's exactly what I'm talking about. Yeah, exactly. There is no silver bullet. You really have to focus on building a real thing. And that just means something that other humans would be happy to use. So remember, in Buju, we didn't focus on the right thing, but the real thing we wanted to build was this 3D modeler. And we knew we would make a sort of a, what's called a Wizard of Oz demo. We'd say, if this thing worked, it would make you a model that looks like this. Would you

Starting point is 00:15:42 like it? And then when the humans agree that that's what they would like, then you have a target. And then of course, you may have to pivot or you may deliver something that's only half as good and discover that, hey, that's still useful. Or you may achieve everything you thought you needed to achieve and discover that actually it needs to be twice as good. But I really like having a concrete goal to aim for. Well, that's a beautiful segue into the next topic I want to talk about, because Microsoft's Kinect technology has been dubbed a failure that became a great success. And the science behind it has a fascinating history. You were there early on,

Starting point is 00:16:26 and I'd love you to share some stories about Connect with us. What's your particular perspective on this technology? How did it come about? How has it evolved? And how has it impacted other areas of research you're involved in? I love the Connect story because in some sense, it's a classic example of when academic style researchers meet engineers who really want to change the world. So at Microsoft Research, we were looking at whether we could make computer vision algorithms that would be able to follow the movements of the human body. And we were working with the academic research field, and we were doing pretty well, and we had good results. So one day, the Xbox people, Alex Kipman, who was working in Xbox then, came to us and said, we've had this great idea for a video game where you're going to like recognise the motion of a human in a camera

Starting point is 00:17:12 and then it's going to control the games and it's going to be amazing. And we said, I'm glad you asked us that because we're actually the world experts on this and I can tell you it's not going to work. So then they said, oh yeah, it's funny you say that because look at this program we wrote. And they showed us their version of it. And their version was better than anything in the academic literature. Oh, my gosh. Some genius programmers had put together an amazing demo of how it would work. What was amazing was it was using an idea from the academic literature, but they

Starting point is 00:17:46 had engineered it so well and made it so effective and really worked hard on it. The reason that idea wasn't very popular with academics at the time was that it's pretty hard to take a single image from a camera, identify the human in the image and then list off where the hands are, where the elbows are. But supposing you already had an image from, let's say, 30 milliseconds ago, because it's a video sequence, and you already knew in the image from 30 milliseconds ago where everything was, then you could just simply say to yourself, well, they can't have gone far. They were there 30 milliseconds ago, you know, and find where everything was. So we knew how to do that in academia, but what no

Starting point is 00:18:25 one had done is really, really worked hard on that, really kind of tried to moonshot that and make it really work. And our contribution was to observe that, okay, you've got a system that works 99% of the time, assuming it was right 30 milliseconds ago. There's a calculation you can do, which tells you that system will definitely fail after five minutes. And this is what they observed and they knew that that would happen. And you could design a gain around that, that, you know, ran in sort of three-minute sections and then reset itself. But ideally, you would have the system just not make mistakes like that. So our first contribution was to say, what we need to do here is just basically have the system every couple of seconds kind of reset

Starting point is 00:19:02 itself. And we were using machine learning. And this was kind of an early instance of machine learning really being applied to one of these hard computer vision problems. So we said to them, OK, we could maybe do it, maybe. But, you know, we would need real world examples of this thing running in 10 different living rooms across the planet in order to even know if we're doing well, not to mind train our machine learning algorithm. And then the horrifying moment two weeks later,

Starting point is 00:19:30 when we were on a call and they said, yep, we've got 10 people in living rooms across the planet. Our Japan people are finishing up there tomorrow and then they're moving over to China. So suddenly we realized, okay, these people are really serious. And then when we needed to hire a Hollywood studio to generate training data needed to hire a Hollywood studio to generate training data, they hired a Hollywood studio to generate training data. So there was just a huge amount of vision there, which we were saying this stuff because it seemed right, but really no one had done it to that level before. And the reason was they understood what the machine learning was doing. They figured it works from examples. If you don't have enough

Starting point is 00:20:03 examples, no brainer, let's get the examples. Whereas academics would always be, I'm happy to spend a year building a better theory. Whereas with Connect, the partnership with people who really wanted to get stuff done, maybe that's where I've inherited some of this, allowed us to make really fantastic progress.

Starting point is 00:20:20 Microsoft's HoloLens is another computer vision technology that you've had a lot to do with. So give us the Andrew Fitzgibbon take on HoloLens is another computer vision technology that you've had a lot to do with. So give us the Andrew Fitzgibbon take on HoloLens and its journey from birth until now with the release of HoloLens 2. What have you discovered about its capabilities over the years? And what do you think HoloLens has contributed to the computer vision research community? HoloLens is an amazing device. It came out of the same team that we worked with on Connect, Alex Kipman's team. And at one level, HoloLens is exactly what Kipman said it was when they announced it

Starting point is 00:20:57 first three, maybe four years ago. He said, And when I refit my office, instead of setting up a bunch of LCD panels, I'll just put a large black curved piece of plywood in front of me and I'll wear a HoloLens and all my documents will appear in the real world in front of me. I absolutely believe that the screen in the pocket or the screen attached to the desk is going to be as weird as the phone attached to the building. So that's far future. That's when HoloLens has the form factor of a small set of glasses. But towards that future, HoloLens today is amazingly valuable for real people doing real work. Because HoloLens lives in a 3D world, it has lots of 3D vision in it. One of the pieces that I'm incredibly impressed by in HoloLens,

Starting point is 00:22:07 because I didn't work on it, does the job of figuring out where your head is in the 3D world. This is related to work I did, you know, on Buju many years ago. But on HoloLens, it does it all the time in real time on an incredibly low-powered device. So that's a beautiful piece of technology that, again, it does it all the time in real time on an incredibly low power device. So that's a beautiful piece of technology that, again, I think very few people other than Microsoft could have put together.

Starting point is 00:22:37 My work on HoloLens stemmed from a piece of very blue skies research we did almost 10 years ago now. We, that's me and a friend called Tom Cashman, who arrived as an intern, we decided we want to learn about the 3D structure of stuff that moves. What's stuff that moves? Well, the human body is something that moves, and we knew a bit about something that's kind of bendy and movie, but that is somehow not too bendy and not too movie. So we realised that dolphins were the ideal thing to work on. So we decided to write a paper called What Shape Are Dolphins? Now, we didn't really care about dolphins, but we cared about bendy movie 3D stuff. And we worked on that paper in order to build mathematical models of 3D. We thought, well, why did we want to know about 3D? Well, when you're interacting with the virtual world or the mixed reality world in the HoloLens, one of the important 3D objects is the human hand. If the system can look at your hands and fully accurately determine the position of every bone and knuckle in the hand. Then you can use your hands in the virtual

Starting point is 00:23:49 world to pick up virtual objects and the virtual objects behave exactly as you would expect from real-world physics. So the research on dolphins became a research on the slightly more bendy object that is the human hand and then we knew that this technology was useful for something. HoloLens was being developed. So we thought, let's see if we can deliver this dream of real-world physics to the HoloLens. And we were extremely happy to see announced recently a Mobile World Congress in Barcelona, HoloLens 2 with fully articulated hand tracking.

Starting point is 00:24:32 Let's shift over and talk about Ada a little bit more deeply and talk about specific problems you want to tackle across disciplines and what technical ground you hope to break in the projects you want to take on. What's on your roadmap, Andrew? Drew? So ADA, a cross-discipline AI, is about permeating AI into the parts of the world where we almost don't yet know we need it. Today, we still can't ask questions of the internet like, find me all the ski chalets within 100 meters of the slopes, right? That's a hard question. Why is it a hard question? Because all the ski chalets have their own little website. It's not necessarily aggregated. And the way to answer that question is very easy. You should just read and understand all the web pages in the world and be able to answer any questions about them. So a computer that can really read and understand all the web pages in the world

Starting point is 00:25:26 is clearly in some sense, you know, sort of infinitely far away. So the idea of AI across disciplines is that throughout computer science, there are areas where we can permeate using AI and machine learning to deliver systems that work better. So for example, on the HoloLens, we had to take our hand tracking code, which we wrote as standard, you know, computer vision researcher code, and make it maybe 500 times more efficient in order to run on that low power device. Now, if we could achieve that for general code, then maybe we could make, you know, 500 times more efficient the code we run in data centers around the world. Maybe we could achieve 500 times as much or maybe we could save energy. If we had capabilities in user

Starting point is 00:26:20 interfaces that allowed us to adapt to what the user is doing, but not be annoying, and this is a crucial combination, then of course we would have much better capabilities. Of course we would have happier users, happier humans. If we think about the challenge of maybe adapting user interfaces so that my user interface works better for me and yours works better for you, one of the things that we want to do is learn from a very small number of signals. So maybe every time you switch on your computer, you know, you arrange the windows in a certain order. We already have systems that will try to learn your preferences and try to do that for you. But what we know today is that a lot of the times

Starting point is 00:27:05 the systems that do that are a bit annoying. Why are they a bit annoying? Because they don't have the human level understanding. Oh, this person's in a hurry. This person's using a different machine. They're in a different context. They're at home, not work. How could a system really understand

Starting point is 00:27:19 from small amounts of training data what it should do and when it should do nothing? And these are areas which I think we're sort of touching on today. There are whole areas of AI research that our team looks at, and they're different questions. In all of them, again, there's this trade-off between a fundamental research angle and how are we actually going to demonstrate our AI to the world? So, Andrew, I've heard you use the phrase, change the world or a better world. And these are the sort of end goals. So there are a lot of things that could go right if you're successful. But we have to talk at least a little bit about what could possibly go wrong.

Starting point is 00:28:02 So with all of the possibilities that you're describing, they seem to pose new kinds of social protocols that we'll have to develop and adapt with things like advanced 3D computer vision and head-mounted computers and cameras galore. Is there anything that keeps you up at night? There are things where I think I have some idea what the answer is going to be. And then there are things where I think I have some idea what the answer is going to be. And then there are things where I hope I have some idea what the answer is going to be. Let me do one of the first ones. When I describe a world where everybody's glasses have the possibility of projecting 3D content into the world in front of them, you can immediately think, but this is terrible. People are going to just, you know, spend their entire time looking at the content and not interacting with the other person in front of them. But I think we already have social protocols that solve that today. So if you and I are talking and there's a TV over to the right, maybe I'll turn to look at the TV. It's rude, but maybe you understand something interesting must have happened on the TV and we both turn and maybe it's something we'll discuss and we'll get back to chatting.

Starting point is 00:29:06 We also have sort of protocols. It's rude for me to take out my phone and look at it when I'm talking with you. But you understand if I tell you, well, I'm just going to look that up or I'm just going to figure out what's happening in that meeting. So I'm not worried with something like HoloLens that we won't easily develop these protocols. We might need some technical solutions like when there's something in my 3D world, maybe it's, you know, my email client is maybe sitting on the desk in front of me. You can see that it's my email client, but of course you can't read the content within it. So, but these are technical problems we can solve. The social protocols, I'm confident there we will develop, and I'm not concerned about that aspect. But you might well say,

Starting point is 00:29:46 okay, but you should be concerned about maybe a world where the cameras on my device are revealing information that I wouldn't like revealed to other people in the world. And I'm happy that we, the AI community, essentially, are talking about this and thinking about ways in which we can ensure the security of our data so that any person can have a concrete understanding about what information is leaving the device, how well encrypted information is used, and that we can all have protocols

Starting point is 00:30:15 for how we dispose of information when we don't need it. Another exciting thing about the HoloLens is that, for example, in hand tracking, none of the images that the cameras take leave the device. And today, we are looking at a number of ways in which I can securely send your information to the cloud, securely do machine learning on it and send the answer back, and then delete the information and guarantee that to the customers. So that's one aspect where we, as Microsoft, have just devoted an awful lot of effort to thinking about security and privacy of that information.

Starting point is 00:30:49 Another side of security is trust. We earn that trust obviously by having behaved well in the past, by having strong statements about what we're going to keep and what we're not going to keep, and also by the strength of our research in aspects like secure computation and secure machine learning. It's story time, and you've got a good one. Tell us what got a young Andrew Fitzgibbon interested in computer science and what was your path to Microsoft research? Gosh, it's a long one. I grew up in the 80s. I liked mathematics. I liked messing around with electronics because we didn't really have computers then. We had computers at school.

Starting point is 00:31:38 One of my summer jobs was as a water taxi driver. And being a water taxi driver means you're incredibly busy from about 9am to 11am when all the boats go out to the sea. And being a water taxi driver means you're incredibly busy from about 9am to 11am when all the boats go out to the sea. And then you're incredibly busy from about 3pm to about 6pm or maybe 7pm when everyone comes back. So you've got a huge amount of time sitting out in the sun or maybe the rain during the day where you're just kind of killing time. And one of the things you can do is read. And of course, if many of your listeners will love mathematics, you can play around with mathematics. But what was also great back then was because you didn't have a computer beside you, I would write programs that I would then try and type in the following day when I got to school. And that was just a great way to kind of, in a very casual way, learn about computer programming. So I would scratch out a program on paper,

Starting point is 00:32:29 line 10, do this, line 20, do that. Or later I would write in assembly language the little bits that needed doing. And it's kind of therapeutic. It's like solving mathematical puzzles, right? Mathematical puzzles are great, but they're hard to invent for yourself. So you might buy a book of mathematical puzzles and then you solve them all after a few weeks. Whereas with a computer program, you're of course continually inventing the puzzles for yourself. Right. So from that job as a water taxi driver, then what? Then where? So I did mathematics and computer science at university. I wanted to do physics, but the physics department told me it would be much too hard to do physics with computer science. And someone told me later that also they weren't that keen on people with blue hair.

Starting point is 00:33:11 So I ended up doing mathematics and computer science. The mathematics department were much more enlightened. And I was sitting in a topology lecture one day and thinking, what would this stuff be useful for? And I thought, oh, maybe it'll be useful for recognising the shapes of letters. And basically, I went and found myself a master's course that had something to do with computer vision. And I went to Heriot-Watt University in Edinburgh, which was running a fantastic course on sort of, they called it knowledge based systems.

Starting point is 00:33:38 Nowadays, you would think of it as an introductory AI course. So I finished my master's and then I got a job as a programmer in Edinburgh University. I got that job in 1989, but then somehow by 1997, somewhat accidentally, I ended up with a PhD. I remember now why that happened. I was sitting in some meeting as the programmer and they were talking about their research. And I said something like, well, has anybody tried just like rendering 500 views of the thing and then using that? And someone said, oh, you should write a paper on that. And I thought it was a joke.

Starting point is 00:34:13 And of course, it turned out that that was my first paper. It was one of these things where let's use brute force instead of mathematics and see what happens. And from then on, you know, I hope I've managed to do elegant mathematics as well as brute force, but I think the practical mindset was there even from then. Well, sometimes you need tweezers and sometimes you need a hammer. Right, exactly, exactly. All right. So, was that your accidental PhD and then what?

Starting point is 00:34:38 So, the PhD was part-time with a research assistant job at Edinburgh. And that job was coming to an end, or at least And that job was coming to an end, or at least I thought it was coming to an end because the contract was due to end at a certain date. And of course, like a very clever person, I forgot to ask anybody whether that was true. So I started looking for another job. I ended up in Oxford working with the great computer vision researcher of the UK, Andrew Zisserman, and a bunch of other amazing people. And there again, I learned the interaction of mathematics and code. I worked there, that led to Bourdieu, which we have mentioned earlier. And then about mid 2000s, I moved to Microsoft Research.

Starting point is 00:35:18 Right. You did a bit of a drive-by on a reference to blue hair. It was the 80s, but did you indeed have blue hair? Oh, yes, I indeed had blue hair for much of my undergraduate career. It was varying degrees of blue, and sometimes it would turn into green when the bleach wore out. Was it ever Smurf blue? No, it was kind of a dark electric blue. At least that was the aim. Well, at the end of every podcast, I give my guests the chance to say anything they want to our listeners. And sometimes it's general advice or wisdom or inspiration. Other times it's specific challenges or open problems in the field.

Starting point is 00:35:59 So what would you like to say to emerging across-discipline researchers? People sometimes ask, what problem should I research? And, you know, there's a sort of a simple general answer, solve important problems that will change the world. But sometimes that's too big and you don't really have an idea what the big important problems are. A signal that I found valuable is when you're listening to a talk or reading a paper, find something that annoys you. Find something where you think, really, that can't be the right way to do this.

Starting point is 00:36:35 Then, and this is crucial, ask yourself, really, why is this annoying me? And then find a real world example where the thing that's annoying you is going to go wrong. A real world example just needs to be something that you know we should be able to do. None of the existing technologies can do it and you've got an idea. And I think that has been a very valuable source of inspiration for the times when you don't have a big idea. Sometimes you just have a big idea and that's great. Go for it. And don't forget to focus on everything. Focus on everything. That's the best idea. Andrew Fitzgibbon, thank you for joining us from Cambridge today.

Starting point is 00:37:16 Thank you. It's an honor to have been here. To learn more about Dr. Andrew Fitzgibbon and the latest in 3D computer vision and all data AI, visit Microsoft.com slash research.

Microsoft Research Podcast - 080r - All Data AI with Dr. Andrew Fitzgibbon

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.