Microsoft Research Podcast - 080 - All Data AI with Dr. Andrew Fitzgibbon

Starting point is 00:00:00 I do believe that there will be a future where we find it weird that we used to carry a flat screen in our pocket and pull out this flat screen to look at it in order to do our digital work. I think that sometime soon when I refit my office instead of setting up a bunch of LCD panels I'll just put a large black curved piece of plywood in front of me, and I'll wear a hololens and all my documents will appear in the real world in front of me. I absolutely believe that the screen in the pocket or the screen attached to the desk is going to be as weird as the phone attached to the building.

Starting point is 00:00:41 You're listening to the Microsoft Research Podcast, a show that brings you closer to the cutting edge of technology research and the scientists behind it. I'm your host, Gretchen Huizinga. You may not know who Dr. Andrew Fitzgibbon is, but if you've watched a TV show or movie in the last two decades, you've probably seen some of his work. An expert in 3D computer vision and graphics, and head of the new All Data AI group at Microsoft Research Cambridge, Dr. Fitzgibbon was instrumental in the development of Boozhoo, an Emmy Award-winning 3D camera tracker that lets filmmakers place virtual props, like the floating candles in Hogwarts School for Witchcraft and Wizardry, into live-action

Starting point is 00:01:25 footage. But that was just his warm-up act. On today's podcast, Dr. Fitzgibbon tells us what he's been working on since the Emmys in 2002, including body and hand tracking for powerhouse Microsoft technologies like Kinect for Xbox 360 and HoloLens, explains how research on dolphins helped build mathematical models for the human hand, and reminds us once again that the secret sauce to most innovation is often just good old-fashioned hard work. That and much more on this episode of the Microsoft Research Podcast. Andrew Fitzgibbon, welcome to the podcast.

Starting point is 00:02:11 Hi, Gretchen. Great to be here. So I usually start my podcasts by way of introduction, but I want to go a little off script with you because you're funny. You've said that you situate your research at the intersection of computer vision and computer graphics with excursions into neuroscience. And at a more subatomic level, you characterize your work as, at its core, extracting information about the world from photons. So give us a little more context about these excursions and extractions. How in general would you describe what you do, Andrew? What gets you up in the morning? So what I want to do is make computers help us change the world. And we can change the world in a bunch of ways that are useful to humans. I like to think that I work on the technologies that underpin the ways we can make computers make the world better for us. A lot of what we do in what we now call AI, AI, the combination of machine learning, computer vision, natural language processing, it's all about taking information from the world,

Starting point is 00:03:16 and that information comes through sensors somehow, whether the sensors are the pixels in a camera, or the sensors in a microphone, or even the sensors in your keyboard when you type a tweet. These are all sources of information from the real world that we would like to do something good with. And for me, something good might be improving the computer graphics in Harry Potter. And I consider that good because it makes lots of humans happy. And in some sense, everything we do in life is about making humans happy.

Starting point is 00:03:48 Well, tell me a little more about the excursions into neuroscience when you're talking about computer vision and computer graphics. It might not seem a natural leap, but I think it kind of is. Can you unpack that a little? Yes, I started collaborating when I was in Oxford with some of the neuroscientists who worked there. And we were interested in a fairly simple question, which is, roughly, can humans point at things? Now, you think, obviously, humans can point at things. But we don't have a good theory for how the brain integrates 3D information. So I was interested in purely providing the neuroscientists with a

Starting point is 00:04:26 sort of a mathematical backup. I was saying, no one thinks the brain does it by multiplying matrices and vectors. But if it did, you would see these sort of error patterns. And then we went to the real world and looked at people wearing virtual reality headsets and the kind of pointing mistakes they make. And those are different from the patterns that a computer would make using today's technology. So we still don't know what the brain does, but we have more evidence that it's not the same as what the computer does. All right, so let's talk a little more about extracting information about the world from photons. That's a very granular way of describing the computer vision work that you do. Could you explain that technically a little better? So, regular listeners to your podcast will know that

Starting point is 00:05:15 computer vision is one of those problems that's sort of easy to state, you know, does this picture contain a cat or not, but has turned out to be incredibly hard. And even in this age of deep learning successes with computer vision, we still know that it's incredibly hard. So my abstraction of gather information from photons is just to kind of stand back and say, why are we so excited about computer vision? Why are we so excited about these capabilities? And one of the things that computer vision does is it allows us to acquire information from far away. And of course, one can hear information from far away, but it's a capability that allows us to do things like recognize people far away or to drive a car. And I want to think about it at that level of abstraction because I want to always understand that my end goal is to do something real. If we think deep neural networks. I absolutely love trying to

Starting point is 00:06:28 understand deep neural networks, and I do so at a theoretical level, but also I want to know what the practical consequence of that understanding will be. Well, now that we've situated your research, let's situate you. Until this week, I would have introduced you as a partner scientist in HoloLens, but you've recently been tapped to lead a group at MSR Cambridge called All Data AI, or ADA, which turns out to be an acronym that references Ada Lovelace. Was that intentional at all, and why, if it was? So, yes, it was intentional in the sense that Ada Lovelace is sometimes considered the world's first computer programmer. Whether or not she programmed computers, she certainly was the first person to observe that the computational powers of the analytical engine or the difference engine could be applied to quantities that were not numbers. Now, of course, since the dawn of computers, we've represented quantities in the computer, like strings of characters, like words. They're all numbers. So what's new

Starting point is 00:07:31 about saying that it operates on something beyond numbers? Well, the thing that's new is not so much that the computer understands the sequence of numbers, but that it can understand the interconnections between them. And that is represented fundamentally by a computer science concept called a graph. And a graph is something that we can use to represent a wide variety of computer science concepts. Okay. So why now with this group at MSR Cambridge? Has there been a confluence of things that exist now that didn't before? I think, of course, we have all observed the advances in AI due to the advances in machine learning, due to the advances in deep learning. And I think now there's a golden opportunity to

Starting point is 00:08:22 apply that to a broader range of areas. It's also a fantastic opportunity here and now for us to think about a new generation of AI programming. Until today, AI programming has been the domain of high priests and priestesses who have PhDs in machine learning and who understand linear algebra. And yet, we believe that actually a lot of AI programming could be a lot simpler. So we're looking at how we might think about a third generation of AI programming. The first generation would have been the raw work of Hinton and Lacan, who just wrote the code. You know, it was difficult code to write.

Starting point is 00:09:08 The second generation is this set of tools which go by the name of TensorFlow and PyTorch, again, which lots of listeners will have used or heard of. And these tools have been, again, fantastic for democratizing AI, but they make it hard for somebody who's just a great computer programmer to understand. In my view, they somewhat hide the beauty of the AI models that are underneath. Neural network models are actually relatively simple. They're simple models with complex consequences. And the research world is increasing its understanding of these complex consequences. But it's also nice to just see the code unadorned and see how simple these models are. Well, let's go back a ways before we talk about more current research, because you have some what I would call greatest hits in your earlier work.

Starting point is 00:10:09 One is called Buju, a camera tracker that actually won an Emmy, and it's been used in computer graphics and live action footage. And what I've heard is pretty much every movie made since it was released in 2002. Tell us more about Buju. How and where did it come about? What does it do and how does it work technically? Buju was a great fun project. I moved from the University of Edinburgh, where I did my PhD, to Oxford University in the mid-90s.

Starting point is 00:10:40 And I was working there with some amazing people who were interested in the question of how a robot might navigate its way around some building or around some environment. So we worked hard on the how does a robot navigate problem, and we discovered that one of the things the robot has to do in order to know where it's been in three dimensions is to build a three-dimensional model of the world. And we worked hard on making a beautiful 3D model because we figured this would be useful maybe for those industries where even today it's common, for example, to sculpt a car out of clay before

Starting point is 00:11:17 building the computer model. Certainly in those days, if you were going to have an alien in the movie, I think Pitch Black was one of the first aliens we looked at, they would make the alien out of clay and then try to scan it into the computer. So we thought we had a great product. We were going to use our robot navigator, spin the camera around the alien model, pull that into the computer, and then give people a computer model of the 3D object. It turned out nobody in the movie industry was interested. Dang it. You had to, yeah, yeah. You had to spray paint the model with a toothbrush to get texture on it.

Starting point is 00:11:53 It didn't work in little corners. It was terrible. And they could just do better by using, you know, existing artists to create the models. But somebody at one of the effects companies kind of thought through how we must be getting this model and said, how do you know where the cameras are? We said, you know, well, it works it out. That's part of how we get the 3D model. We work out how the cameras are.

Starting point is 00:12:14 So it turns out something you really need in movies that's really hard to do is to figure out where the cameras are. Now, what does that mean? That means maybe I have a camera mounted in a boat, and the boat contains some hobbits and is sailing down a river towards, you know, some impressive mountain. It would be just great if the impressive mountain had two huge statues pointing at the hobbits. But it turns out that nobody built those 400-meter statues. So what we have to do is build some statues back in the studio, and we built some 40-centimeter statues back in the studio. And if only we knew what motion the camera had undergone while it rocked in that boat moving down the river,

Starting point is 00:12:56 we'd be able to make a robot do the same motion, and then we could superimpose the images. So in those days, the only way to find out where the camera was, was you could imagine trying to use markers or trying to use GPS. None of these things work. So you would have to just sort of manually position the camera for every single shot. It was incredibly expensive. So somebody who had worked on that kind of footage realised that our algorithms were producing this as a kind of unwanted side effect. We flipped, I guess nowadays we would say we pivoted.

Starting point is 00:13:26 There was a startup and Buju was launched. And I'm super proud of it because it was one of the first products which did kind of 3D computer vision. There were products available then that would do character recognition, that would do number plate recognition, but this really did 3D vision. And we had to go beyond the academic state of the art to deliver it. And we learned about delivering computer vision to the real world. Some of what we learned was you just have to type in lots of code. People would ask me, you know, what's the secret sauce in Buju? And I'd

Starting point is 00:14:00 say, well, you know, have you read all the papers on 3D structured motion? They would say, yeah, I'm pretty sure of them. I would say, yeah, that's the secret sauce. We implemented all of them. And, you know, it's an attitude that helps us today. Sometimes in research, you're assessed by how beautiful and clean your algorithm is. Sometimes in the real world, you have to implement all the dirty algorithms until you find the beautiful one. But with Buju, it just worked and we had something that was actually useful. You know, on that note, I was going to ask you a question further down in the podcast,

Starting point is 00:14:33 but I'm going to bump it up because it's sort of tied in. I imagine you said it about this. You once said, if I had to nominate one key to success, it's a focus on everything. Is that what you're talking about? That's exactly what you're talking about? That's exactly what I'm talking about. Yeah, exactly. There is no silver bullet. You know, you really have to focus on building a real thing. And that just means something that other humans would be happy to use. So remember, in Buju, we didn't focus on the right thing, but the real thing we wanted to build was this 3D modeler. And we knew we would make a sort of a, what's called a Wizard of Oz demo. We'd say, if this thing worked, it would

Starting point is 00:15:10 make you a model that looks like this. Would you like it? And then when the humans agree that that's what they would like, then you have a target. And then of course you may have to pivot or you may deliver something that's only half as good and discover that, hey, that's still useful. Or you may achieve everything you thought you needed to achieve and discover that actually it needs to be twice as good. But I really like having a concrete goal to aim for. Well, that's a beautiful segue into the next topic I want to talk about because Microsoft's Kinect technology has been dubbed a failure that became a great success. And the science behind it has a fascinating history. You were there early on, and I'd love you to share some stories about Connect with us.

Starting point is 00:15:55 What's your particular perspective on this technology? How did it come about? How has it evolved? And how has it impacted other areas of research you're involved in? I love the Connect story because in some sense, it's a classic example of when academic style researchers meet engineers who really want to change the world. So at Microsoft Research, we were looking at whether we could make computer vision algorithms that would be able to follow the movements of the human body. And we were working with the academic research field and we were doing pretty well and we had good results.

Starting point is 00:16:30 So one day, the Xbox people, Alex Kipman, who is working in Xbox then, came to us and said, we've had this great idea for a video game where you're going to like recognize the motion of a human in a camera and then it's going to control the games, and it's going to be amazing. And we said, I'm glad you asked us that because we're actually the world experts on this, and I can tell you it's not going to work. So then they said, oh yeah, it's funny you say that because look at this program we wrote. And they showed us their version of it, and their version was better than anything in the academic literature. Oh my gosh. version was better than anything in the academic literature. Some genius programmers had put

Starting point is 00:17:07 together an amazing demo of how it would work. What was amazing was it was using an idea from the academic literature, but they had engineered it so well and made it so effective and really worked hard on it. The reason that idea wasn't very popular with academics at the time was that it's pretty hard to take a single image from a camera, identify the human in the image, and then list off where the hands are, where the elbows are. But supposing you already had an image from, let's say, 30 milliseconds ago, because it's a video sequence, and you already knew in the image from 30 milliseconds ago where everything was, then you could just simply say to yourself, well, they can't have gone far, they were there 30 milliseconds ago, you know, and find where everything was. So we knew how to do

Starting point is 00:17:54 that in academia, but what no one had done is really, really worked hard on that, really kind of tried to moonshot that and make it really work. And our contribution was to observe that, okay, you've got a system that works 99% of the time, assuming it was right 30 milliseconds ago. There's a calculation you can do, which tells you that system will definitely fail after five minutes. And this is what they observed and they knew that that would happen. And you could design a gain around that, that, you know, ran in sort of three minute sections and then reset itself. But ideally you would have the system just not make mistakes like that. So our first contribution was to say, what we need to do here is just

Starting point is 00:18:30 basically have the system every couple of seconds kind of reset itself. And we were using machine learning. And this was kind of an early instance of machine learning really being applied to one of these hard computer vision problems. So we said to them, okay, we could maybe do it, maybe. But, you know, we would need real world examples of this thing running in 10 different living rooms across the planet in order to even know if we're doing well, not to mind train our machine learning algorithm. And then the horrifying moment two weeks later when we were on a call and they said, yep, we've got 10 people in living rooms across the planet. Our Japan people are finishing up there tomorrow and then they're moving over to China. So suddenly we realized, okay, these people are really serious. And then

Starting point is 00:19:13 when we needed to hire a Hollywood studio to generate training data, they hired a Hollywood studio to generate training data. So there was just a huge amount of vision there, which we were saying this stuff because it seemed right, but really no one had done it to that level before. And the reason was they understood what the machine learning was doing. They figured it works from examples. If you don't have enough examples, no brainer, let's get the examples. Whereas academics would always be, I'm happy to spend a year building a better theory. Whereas with Connect, the partnership with people who really wanted to get stuff done, maybe that's where I've inherited some of this, allowed us to make

Starting point is 00:19:48 really fantastic progress. Microsoft's HoloLens is another computer vision technology that you've had a lot to do with. So give us the Andrew Fitzgibbon take on HoloLens and its journey from birth until now with the release of HoloLens 2. What have you discovered about its capabilities over the years? And what do you think HoloLens has contributed to the computer vision research community? HoloLens is an amazing device. It came out of the same team that we worked with on Connect, Alex Kipman's team.

Starting point is 00:20:31 And at one level, HoloLens is exactly what Kipman said it was when they announced it first, three, maybe four years ago. He said, this is the future of the PC. And in one sense, I do believe that there will be a future where we find it weird that we used to carry a flat screen in our pocket and pull out this flat screen to look at it in order to do our digital work. I think that sometime soon when I refit my office, instead of setting up a bunch of LCD panels, I'll just put

Starting point is 00:21:07 a large black curved piece of plywood in front of me and I'll wear a HoloLens and all my documents will appear in the real world in front of me. I absolutely believe that the screen in the pocket or the screen attached to the desk is going to be as weird as the phone attached to the building. So that's far future. That's when HoloLens has the form factor of a small set of glasses. But towards that future, HoloLens today is amazingly valuable for real people doing real work. Because HoloLens lives in a 3D world,

Starting point is 00:21:43 it has lots of 3D vision in it. One of the pieces that I'm incredibly impressed by in HoloLens lives in a 3D world it has lots of 3D vision in it one of the pieces that I'm incredibly impressed by in HoloLens because I didn't work on it does the job of figuring out where your head is in the 3D world this is related to work I did you know on Buju many years ago

Starting point is 00:22:00 but on HoloLens it does it all the time in real time on an incredibly low-powered device. So that's a beautiful piece of technology that, again, I think very few people other than Microsoft could have put together. My work on HoloLens stemmed from a piece of very blue skies research we did almost 10 years ago now. We, that's me and a friend called Tom Cashman, who arrived as an intern,

Starting point is 00:22:26 we decided we want to learn about the 3D structure of stuff that moves. What's stuff that moves? Well, the human body is something that moves, and we knew a bit about the 3D structure of the human body from Connect. But we wanted to learn about this 3D structure just from still images. So we had to think of something that's kind of bendy and movie, but that is somehow not too bendy and not too movie. So we realised that dolphins were the ideal thing to work on. So we decided to write a paper called What Shape Are Dolphins? Now, we didn't really care about dolphins,

Starting point is 00:22:59 but we cared about bendy movie 3D stuff. And we worked on that paper in order to build mathematical models of 3D. We thought, well, why did we want to know about 3D? Well, when you're interacting with the virtual world or the mixed reality world in the HoloLens, one of the important 3D objects is the human hand. If the system can look at your hands and fully accurately determine the position of every bone and knuckle in the hand, then you can use your hands in the virtual world to pick up virtual objects. And the virtual objects behave exactly as you would expect from real world physics. So the research on dolphins became a research on the slightly more bendy object that is the human hand. And then we knew that this technology was useful for something. HoloLens was being developed. So we thought, let's see if we can deliver this dream of real-world physics to the HoloLens. And

Starting point is 00:23:57 we were extremely happy to see announced recently a Mobile World Congress in Barcelona, HoloLens 2 with fully articulated hand tracking. Let's shift over and talk about Ada a little bit more deeply and talk about specific problems you want to tackle and what technical ground you hope to break in the projects you want to take on. What's on your roadmap, Andrew? So Ada is about permeating AI into the parts of the world where we almost don't yet know we need it. Today, we still can't ask questions of the internet like, find me all the ski chalets within 100 metres of the slopes, right? That's a hard question. Why is it a hard question? Because all the ski chalets have their own little website. It's not necessarily aggregated. And the way to answer that question is very easy. You should just read and understand all

Starting point is 00:24:54 the webpages in the world and be able to answer any questions about them. So a computer that can really read and understand all the webpages in world is clearly, in some sense, you know, sort of infinitely far away. So the idea is that throughout computer science, there are areas where we can permeate using AI and machine learning to deliver systems that work better. So for example, on the HoloLens, we had to take our hand tracking code, which we wrote as standard, you know, computer vision researcher code, and make it maybe 500 times more efficient in order to run on that low power device. Now, if we could achieve that for general code, then maybe we could make, you know, 500 times more efficient the code we run in data centres around the world. Maybe we could achieve 500 times as much or maybe we could save energy. If we had capabilities in user interfaces that allowed us to adapt to what the user is doing but not be annoying,

Starting point is 00:26:02 and this is a crucial combination, then of course we would have much better capabilities. Of course we would have happier users, happier humans. If we think about the challenge of maybe adapting user interfaces so that my user interface works better for me and yours works better for you, one of the things that we want to do is learn from a very small number of signals. So maybe every time you switch on your computer, you know, you arrange the windows in a certain order. We already have systems that will try to learn your preferences and try to do that for you. But what we know today is that a lot of the times the systems that do that are a bit annoying. Why are they a bit annoying?

Starting point is 00:26:47 Because they don't have the human level understanding. Oh, this person's in a hurry. This person's using a different machine. They're in a different context. They're at home, not work. How could a system really understand from small amounts of training data what it should do and when it should do nothing? And these are areas which I think we're sort of touching on today. There are whole areas of AI research that our team looks at,

Starting point is 00:27:08 and they're different questions. In all of them, again, there's this trade-off between a fundamental research angle and how are we actually going to demonstrate our AI to the world. So, Andrew, I've heard you use the phrase, change the world or a better world, and these are the sort of end goals. So there are a lot of things that could go right if you're successful, but we have to talk at least a little bit about what could possibly go wrong. So with all of the possibilities that you're describing, they seem to pose new kinds of social protocols that we'll have to develop and adapt with things like advanced 3D computer vision

Starting point is 00:28:05 and head-mounted computers and cameras galore. Is there anything that keeps you up at night? There are things where I think I have some idea what the answer is going to be. And then there are things where I hope I have some idea what the answer is going to be. Let me do one of the first ones. When I describe a world where everybody's glasses

Starting point is 00:28:24 have the possibility of projecting 3D content into the world in front of them, you can immediately think, but this is terrible. People are going to just, you know, spend their entire time looking at the content and not interacting with the other person in front of them. But I think we already have social protocols that solve that today. So if you and I are talking and there's a TV over to the right, maybe I'll turn to look at the TV. It's rude, but maybe you understand something interesting must have happened on the TV and we both turn and maybe it's something we'll discuss and we'll get back to chatting. We also have sort of protocols. It's rude for me to take out my phone

Starting point is 00:29:00 and look at it when I'm talking with you, but you understand if I tell you, well, I'm just going to look that up or I'm just going to figure out what's happening in that meeting. So I'm not worried with something like HoloLens that we won't easily develop these protocols. We might need some technical solutions like when there's something in my 3D world, maybe it's, you know, my email client is maybe sitting on the desk in front of me. You can see that it's my email client, but of course you can't read the content within it. So, but these are technical problems we can solve. The social protocols, I'm confident there we will develop. And I'm not concerned about that aspect. But you might well say, okay, but you should be concerned about maybe a world where the cameras on my device are revealing information that I wouldn't like

Starting point is 00:29:45 revealed to other people in the world. And I'm happy that we, the AI community essentially, are talking about this and thinking about ways in which we can ensure the security of our data so that any person can have a concrete understanding about what information is leaving the device, how well encrypted information is used, and that we can all have protocols for how we dispose of information when we don't need it. Another exciting thing about the HoloLens is that, for example, in hand tracking, none of the images that the cameras take leave the device. And today we are looking at a number of ways in which I can securely send your information to the cloud, securely do machine learning on it and send the answer back, and then delete the information and guarantee that to the

Starting point is 00:30:28 customer. So that's one aspect where we as Microsoft have just devoted an awful lot of effort to thinking about security and privacy of that information. Another side of security is trust. We earn that trust obviously by having behaved well in the past, by having strong statements about what we're going to keep and what we're not going to keep, and also by the strength of our research in aspects like secure computation and secure machine learning. It's story time, and you've got a good one. Tell us what got a young Andrew Fitzgibbon interested in computer science and what was your path to Microsoft research? Gosh, it's a long one. I grew up in the 80s. I liked mathematics. I liked messing around with

Starting point is 00:31:23 electronics because we didn't really have computers then. We had computers at school. One of my summer jobs was as a water taxi driver. And being a water taxi driver means you're incredibly busy from about 9am to 11am when all the boats go out to the sea. And then you're incredibly busy from about 3pm to about 6pm or maybe 7pm when everyone comes back. So you've got a huge amount of time sitting out in the sun or maybe the rain during the day where you're just kind of killing time. And one of the things you can do is read. And of course, if many of your listeners will love mathematics, you can play around with mathematics. But what was also great back then was because you didn't

Starting point is 00:31:59 have a computer beside you, I would write programs that I would then try and type in the following day when I got to school. And that was just a great way to kind of, in a very casual way, learn about computer programming. So I would scratch out a program on paper, line 10, do this, line 20, do that. Or later I would write in assembly language the little bits that needed doing. And it's kind of therapeutic. It's like solving mathematical puzzles, right? Mathematical puzzles are great, but they're hard to invent for yourself. So you might buy a book of mathematical puzzles and then you solve them all after a few weeks. Whereas with a computer program, you're of course continually inventing the puzzles for yourself. Right. So from that job as a water taxi driver. Then what? Then where?

Starting point is 00:32:51 So I did mathematics and computer science at university. I wanted to do physics, but the physics department told me it would be much too hard to do physics with computer science. And someone told me later that also they weren't that keen on people with blue hair. So I ended up doing mathematics and computer science. The mathematics department were much more enlightened. And I was sitting in a topology lecture one day and thinking, what would this stuff be useful for? And I thought, oh, maybe it'll be useful for recognising the shapes of letters. And basically, I went and found myself a master's course that had something to do with computer vision. And I went to Heriot-Watt University in Edinburgh, which was running a fantastic course on sort of, they called it knowledge-based systems. Nowadays, you would think of it as an introductory AI course. So I finished my master's, and then I got a job as a

Starting point is 00:33:36 programmer in Edinburgh University. I got that job in 1989. But then somehow by 1997, somewhat accidentally, I ended up with a PhD. I remember now how that happened. I was sitting in some meeting as the programmer and they were talking about their research. And I said something like, well, has anybody tried just like rendering 500 views of the thing and then using that? And someone said, oh, you should write a paper on that. And I thought it was a joke. And of course, it turned out that that was my first paper. It was one of these things where let's use brute force instead of mathematics and see what happens.

Starting point is 00:34:10 And from then on, you know, I hope I've managed to do elegant mathematics as well as brute force. But I think a practical mindset was there even from then. Well, sometimes you need tweezers and sometimes you need a hammer. Right. Exactly. Exactly. All right.

Starting point is 00:34:24 So is that your accidental PhD and then what? So the PhD was part-time with a research assistant job at Edinburgh. And that job was coming to an end, or at least I thought it was coming to an end because the contract was due to end at a certain date. And of course, like a very clever person, I forgot to ask anybody whether that was true. So I started looking for another job. I ended up in Oxford working with the great computer vision researcher of the UK, Andrew Zissman, and a bunch of other amazing people. And there again, I learned the interaction of mathematics and code. I worked there, that led to Bourdieu, which we have mentioned earlier.

Starting point is 00:35:04 And then about mid 2000s, I moved to Microsoft Research. Right. You did a bit of a drive-by on reference to blue hair. It was the 80s, but did you indeed have blue hair? Oh, yes, I indeed had blue hair for much of my undergraduate career. It was varying degrees of blue and sometimes would turn into green when the bleach wore out. Was it ever smurf blue? No, it was kind of a dark electric blue. At least that was the aim.

Starting point is 00:35:36 Well, at the end of every podcast, I give my guests the chance to say anything they want to our listeners. And sometimes it's general advice or wisdom or inspiration. Other times it's specific challenges or open problems in the field. So what would you like to say to emerging researchers? People sometimes ask, what problem should I research? And, you know, there's a sort of a simple general answer, solve important problems that will change the world. But sometimes that's too big and you don't really have an idea what the big important problems that will change the world. But sometimes that's too big, and you don't really have an idea what the big important problems are. A signal that I found

Starting point is 00:36:09 valuable is when you're listening to a talk or reading a paper, find something that annoys you. Find something where you think, really, that can't be the right way to do this. Then, and this is crucial, ask yourself, really, why is this annoying me? And then find a real world example where the thing that's annoying you is going to go wrong. A real world example just needs to be something that you know we should be able to do. None of the existing technologies can do it, and you've got an idea. And I think that has been a very valuable source of inspiration for the times when you don't have a big idea. Sometimes you just have a big idea and that's great. Go for it.

Starting point is 00:36:52 And don't forget to focus on everything. Focus on everything. That's the best idea. Andrew Fitzgibbon, thank you for joining us from Cambridge today. Thank you. It's an honor to have been here. To learn more about Dr. Andrew Fitzgibbon and the latest in 3D computer vision and all data AI, visit Microsoft.com slash research.

Your Ad Here

Microsoft Research Podcast - 080 - All Data AI with Dr. Andrew Fitzgibbon

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.