Microsoft Research Podcast - 071 - Holograms, spatial anchors and the future of computer vision with Dr. Marc Pollefeys

Starting point is 00:00:00 So instead of carrying a small device with you or having a computer screen in front of you, the computer or the device will not anymore be a physical thing that you look at. It will be something that can place information anywhere in the world. And so you can have screens that move with you. You can choose how big or how many screens you want to place around you as you walk. The difference with now having to take out your phone and if you want to see one of those holograms that we would share, you have to actively look for it. If you have these glasses on, you will just be walking around and if there's something relevant for you, it will just appear in front of your eyes.

Starting point is 00:00:37 You're listening to the Microsoft Research Podcast, a show that brings you closer to the cutting edge of technology research and the scientists behind it. I'm your host, Gretchen Huizinga. Dr. Mark Poliface is a professor of computer science at ETH Zurich, a partner director of science for Microsoft, and the director of a new Microsoft mixed reality and AI lab in Switzerland. He's a leader in the field of computer vision Microsoft Mixed Reality and AI lab in Switzerland. He's a leader in the field of computer vision research, but it's hard to pin down whether his work is really about the future of computer vision or about a vision of future computers. Arguably, it's both.

Starting point is 00:01:17 On today's podcast, Dr. Paula Faze brings us up to speed on the latest in computer vision research, including his innovative work with Azure Spatial Anchors, tells us how devices like Kinect and HoloLens may have cut their teeth in gaming, but turned out to be game changers for both research and industrial applications, and explains how, while it's still early days now, in the future, you're much more likely to put your computer on your head than on your desk or your lap. That and much more on this episode of the Microsoft Research Podcast. Mark Poliface, welcome to the podcast. Thank you.

Starting point is 00:02:03 So you wear a few hats. You're a professor of computer science at ETH Zurich, a partner director of science for Microsoft, and now you're overseeing the creation of a new Microsoft Mixed Reality and AI lab in Switzerland. So it's pretty obvious what gets you up in the morning. Tell us a little bit about each of your roles and how you managed to switch hats and work everything into a day. Sure. I've been a professor for quite a while here in Switzerland and before that in the US. Then almost three years ago, I joined Microsoft to work with Alex Kipman on mixed reality. I spent two years in Redmond working with a large team of scientists and engineers on moving computer vision technology on HoloLens forward, for in particular, we worked on HoloLens 2 that was

Starting point is 00:02:50 recently announced. I told Alex that I was going to do this for two years, and then I wanted to come back to Zurich, back to being a professor at ETH. But after those two years, I realized that it was in a sense very complementary. So on the one hand, I'm really excited about doing academic basic research, but I was also always very interested in doing more applied research. And so I can partially do that at ETH, but of course there's no place to have a bigger impact with that applied research than Microsoft.

Starting point is 00:03:21 And so I realized that I wanted to continue doing that. And so at that point, I discussed with Alex what we could do, what would make sense. And I realized that there was something really interesting that could be done, which was to set up a lab here in Zurich. ETH being one of the top schools to recruit talent for the type of research that we need to do for mixed reality. At the same time from the side of ETH, with my ETH had on great opportunity to provide opportunities for students to work with Microsoft, to get access

Starting point is 00:03:51 to devices, to resources that we would not necessarily have at ETH. A lot of exciting projects to propose to the students. And so really essentially saw that there was a real win-win between, you know, what ETH can offer and what microsoft can offer and so both for myself but actually for everybody involved being able to kind of find all those different elements and bring them together and have something really nice come out of that yeah so yeah so there's synergies it is a lot of work but there's also a lot of nice synergies i want you to talk a little bit about how collaboration happens, particularly with what you've got going on. Microsoft researchers collaborate with ETH Zurich and EPFL researchers in the Swiss Joint Research Center, or the JRC.

Starting point is 00:04:35 Tell us more about the Swiss JRC and how it's helped you bridge the gap between your ETH Zurich role and your Microsoft role? Yes. So actually the GRC is a program that has been going on for about 10 years to stimulate collaboration between Microsoft ETH and APFL. And I was actually involved, I think in 2008 or so, this was one of the first grants that I got from Microsoft then. So I worked on the ETH side as a PI. But then over the years, the scheme became much more collaborative with PIs on both sides. So in the second generation

Starting point is 00:05:09 of these collaborative projects, I had a project with a colleague here, Otmar Hillegas, who was actually a former Microsoft researcher and then collaborating with researchers at Microsoft Research in Cambridge and then in Redmond. So that's more like the project in the past.

Starting point is 00:05:25 Now, currently, we have also great synergies with the GRC because having that framework in place that facilitates collaborative projects between the different schools and Microsoft, it means that we can also, in the area of mixed reality, where we want to foster more collaboration, also in the context of the lab here, this gives us a framework and a way to facilitate extra collaborations. Your primary area of research is computer vision and you've specialized in 3D computer vision.

Starting point is 00:05:52 So give us an overview of your work in the field in general, just to set the stage for our conversation today. In broad strokes, what are the big questions you're asking and the big problems you're trying to solve? So essentially computer vision is about extracting information from image data, from images or video. This can be information like recognizing things in the scene, or it can also be geometric information. I'm much more focused on extracting geometric information from images. As an example, this can be as you move with a camera through a scene, being able to recover the three-dimensional surfaces in the scene a three-dimensional reconstruction of the scene this can also be

Starting point is 00:06:30 recovering the motion of the camera for the scene so in which way the camera is moving through that scene it can also be finding back the location it can be figuring out the calibration of the camera it can be many different things related to this geometry. More and more in recent years I've been also combining that with at the same time also extracting semantic information from the scene and so having both geometry and semantic information so that for example when we do reconstruction we don't only reconstruct surfaces but then we know that part of the surface is a table another part of the surface is the floor for example and a third part of the surface could be the wall surfaces and so on. And so by doing both at the same time, we can achieve more and get better results for the different tasks at once.

Starting point is 00:07:13 Well, let's talk a little bit more deeply about the different kinds of computer vision work going on. Microsoft has a lot of research going on in computer vision and from a variety of angles. And when you and I talked before, you sort of differentiated what you're doing with what some of the other research areas in Microsoft are doing. So tell us how the work you're doing is different. Yes, so there's a lot of research in computer vision. We'll look at an image and we'll extract semantic information from that.

Starting point is 00:07:43 It will recognize those objects or it will be able to label different parts of the scene. This is an area that for a long time was struggling to get really good results. But now with the advent of deep learning, there's been tremendous progress in that area. When you look at what we are doing for mixed reality and also what I'm doing here in my lab,

Starting point is 00:08:03 extracting this geometric information from images is not something that you can as simply tackle with these methods that are now very popular in computer vision of using deep learning. A lot of the kind of geometric computations are ill-suited to be, you know, just easily tackled with deep learning, convolutional neural networks or that type of approach. The classical methods that leverage our strong understanding of geometry are still very strongly needed to get good quality results. So for example, for a device like HoloLens, where it is critically important to be able to know exactly how the device moves through the environment,

Starting point is 00:08:40 because it's actually what's needed to give the impression of like a hologram being static in the environment because it's actually what's needed to give the impression of like a hologram being static in the environment. So if I move my head, I need to somehow be able to fake the impression that what I display on the screen is static in the world as opposed to being static on the screen. To do that, I need to know very, very precisely how I'm moving my head through the environment. We do that by combination of inertial sensors together with the camera image data. And so we analyze the image data to compute how the headset is moving through the world. That's why HoloLens has a whole set of cameras

Starting point is 00:09:15 that observe the world. It's really to be able to track its position all the time. That's interesting because you take for granted that when I move my head, like if I'm talking to you and I move my head, you're not going to move with me. That's right. But with a device, it's going to be a little bit of a different experience unless you guys fix the technical aspects of it. So how are you tackling that technically? On HoloLens, you know, these are techniques that combine very much like we do with our own eyes, where we combine our visual sensing together with our inner ear, which is more inertial sensing. Combining those two, we can get a very good impression of how we move in through space. So we're actually doing roughly the same for mixed reality.

Starting point is 00:09:55 It's actually also very similar to what is being used in robotics. You can call it visual inertial odometry, which is determining your own motion from visual inertial data. Or even if you go beyond that, people often call it SLAM. This stands for simultaneous localization and mapping. It means that while you are localizing yourself, your relative motion in the environment, you're at the same time building up a map of the environment so that if you revisit it, you can actually recognize and realize that you have already seen part of the scene and correct your position or take that into account to kind of continue to estimate your position.

Starting point is 00:10:29 So this is used in robotics, in self-driving cars, and also very much in mixed reality and also for augmented reality on phones. The same techniques are being used. So this is a key element in HoloLens. And so this device needs to be able to track itself for the environment. So, Mark, I'll argue that you're doing the science behind what our future life with computers will look like. And a lot of it has to do with how we experience reality. And you've alluded to that just recently here. Currently, I'll say we have real life reality. I put that in air quotes because that in itself is, you know, arguable and also what most people refer to as virtual reality. But we're increasingly headed

Starting point is 00:11:13 toward a world of augmented and mixed reality where the lines are a bit more blurred. So tell us about your vision for this world. And I don't want you to be sensational about it. I mean, for real, we're heading into a different kind of paradigm here. How does it work and why do we need it even? Okay, so if you look at computers and mobile phones, for example, there's a lot of information that relates to the real world that's available. The first generation of computing essentially was, you know, it's a computer on your desk and you might use mapping or other tools to predetermine the route to a particular place of interest. But then you still have to take that with you. That's what changed completely once we went to mobile phones, which essentially is a mobile computer in your pocket.

Starting point is 00:11:59 So you always have it with you. That computer actually knows about its approximate position in space. And so a lot of things became possible, like for navigation, for example, or also applications like Uber or other ride-sharing services and so on, because you have now information that is spatially kind of with you in the real world. The next generation going beyond that, and where we go with mixed reality, is really about having information not only you know broadly contextually placed in space but now actually going to very precise positioning in space so meaning that in general you can expect not only to know this is roughly

Starting point is 00:12:37 where you have to go or look at your phone and get instructions but you can imagine now really to mix the digital information that you currently see on the screen of your phone and then the real-world information that you see in front of you, you can expect those to essentially be merged, to be mixed together in one reality, that's the reality in front of you. And so the information that you need to do the task,

Starting point is 00:13:02 which could be a navigation task, or could be, if you're a technician, a complicated task to repair a machine, needing to carefully know which part you have to take out first and which, you know, button you have to press, etc. All those things can now be communicated to you. Or, you know, in the future, as we're going already now in a number of contexts, with mixed reality can be communicated to you in a very intuitive way, just by overlaying the instructions on top of the world visually. So you can just see things in front of you, press this button and it just shows you with an arrow like the button or it shows you an example of exactly what to do. So it makes it a lot simpler

Starting point is 00:13:37 to process the information and be able to tackle more complicated tasks in that sense. Let's talk about Microsoft's HoloLens. Many people think of it as merely another device or maybe a pair of virtual reality glasses, but it's actually, as you start to describe it, the world's first self-contained holographic computer. Give us a brief history of HoloLens to clear up any misperceptions we might have, and then tell us where and how HoloLens is primarily being used today, and then we'll start talking about some of the specific work you're doing. So, if you look at the first generation HoloLens,

Starting point is 00:14:27 it was developed as a development kit, essentially, for Microsoft to learn about where this could be used. The initial concept was already going all the way to the long-term vision. And so you can see that on the first generation HoloLens, it wasn't clear if this was going to be something for consumers or something for industry or where exactly it would be applied. And it was put out there as, you know, in a sense, this magical device of the future to see where it could be useful to learn what type of tools and what type of solutions could be implemented on it. So we learned a lot. I joined, you know, between HoloLens 1 and HoloLens 2. I joined the

Starting point is 00:15:05 team and it became clear very quickly that we still have a long ways to go in terms of form factor, in terms of a number of aspects to get to something that makes sense to use in your daily life all the time, you know, as you currently use your cell phone. The device is still too bulky. It's too expensive for consumers, et cetera. So it's too early for that too bulky, it's too expensive for consumers, etc. So it's too early for that. However, it's not at all too early to use a device like HoloLens 1 or now obviously HoloLens 2 in settings where you have task workers, people that have to do often complicated tasks in the real world, repair machines, or it can also be a surgeon, for example, who's also a first-line worker, a person that's out in the real world having to do a complicated

Starting point is 00:15:50 operation and essentially needs access to information in a seamless as possible way to help him do that task. That's where HoloLens turned out to be incredibly valuable because it's a full computer that you wear on your head. It allows you to place as much information as you want around you. You can, in very natural ways, interface with it by speaking to it, by doing gestures.

Starting point is 00:16:14 So you can interact with the device without even having to touch anything. You can use your hands to actually do the task you're supposed to do in the real world and still get access to all the information you need to help you do that. This magic of HoloLens is that you have this full computer, but you can still use your hands to do the task that you have to do. HoloLens has some really interesting applications, but right now it's mainly an enterprise tool and it's particularly useful

Starting point is 00:16:38 for researchers. So tell us about this thing called research mode in HoloLens. What is it and why is it such a powerful tool, especially for computer vision researchers? So HoloLens has all of these sensors built in. So you have this device on your head that has sensors that look at the world from the same viewpoint as you are looking at the world. We have four cameras tracking the environment. We have this depth camera, which can be used in two different modes, one for tracking the hands, and then a second mode, which is a smaller field of view, but a more powerful signal out to sense further away, which we use for doing reconstruction of the 3D environment. So essentially we have all these different imaging sensors moving with you

Starting point is 00:17:20 through the world. So this is perfect for all types of computer vision research, potentially also for robotics research or other applications. You now have this device that can collect data in real time. You can either process it on device or over Wi-Fi, send it to a more beefy computer outside to do more expensive computations if you want to, or you can just store the data to do experiments. But you can collect this very rich data from all these different sensors to do all types of

Starting point is 00:17:48 computer vision experiments with. In particular, if you're thinking of doing things like trying to understand what the user is doing from a first-person point of view, you can actually use these different sensor streams to then develop computer vision algorithms to understand what a person is doing. You just mentioned a more beefy computer, which raises the question, what kind of a computer or a processing unit, shall we say? I've heard you refer to it as an HPU or holographic processing unit. How big is the processing unit? What are we talking about here? Well, it's a small coprocessor for HoloLens, so you can very much compare it to a state-of-the-art cell phone in terms of the general purpose computing. But then on top of that, because it needs to continuously run all of these computer vision tasks to be able to track itself in the environment, to be able to track the

Starting point is 00:18:42 hands, to be able in HoloLens 2 to also track your eyes, to know where you're looking or to recognize you based on Iris. So all of these different tasks, most of them need to run all the time. Yeah. This means that this is for hours at a time it runs. If you look at a cell phone and you take your cell phone and you run one of those fancy augmented reality apps, for example, you will notice that after a few minutes, your phone is running extremely hot

Starting point is 00:19:05 because it's consuming a lot of power to do all those computations yeah and your battery is draining very quickly we cannot afford that on HoloLens so if you would just have this general purpose processor you could run HoloLens for 10 minutes and your battery would be empty and your head would be hot exactly so this is exactly why Microsoft had to develop its own ASIC, so its own chip, which is the HPU, which is a chip that's dedicated to do all of these computer vision tasks and other signal processing tasks very efficiently at very low power and can sustain that all along.

Starting point is 00:19:40 If you look at HoloLens, the whole system is below 10 watts of power consumption. If you actually look carefully, the whole design of HoloLens is really done around being able to consume as little power as possible and be able to stay passively cooled so that it doesn't heat up your forehead and so on. Okay, so how are you doing that? Is it algorithms that help you out? What is the technical approach to solving those problems? Well, you have to be thinking in every algorithm, in everything you do, you really

Starting point is 00:20:10 have to be very careful and thinking right from the beginning how much power it's going to consume. It's a lot of engineering effort to get to a system that consumes that little power and that amount of computer vision processing all the time. It means that you can put some very efficient processing units that are well suited to do all this image processing operations. It means that some things that need to happen hide the latency in the rendering. All of these tasks are hardware accelerated with some dedicated hardware in the HPU to make them run very efficiently. And it also means that you have to be smart how you use the algorithms

Starting point is 00:20:45 and code things very efficiently. When you don't need all the sensors all the time, you don't use them all the time. It means that you really try to, at every point, everywhere you can, you try to save energy and just do what you need to do, but not more. Sounds like a teenager.

Starting point is 00:21:03 One of the coolest things you're working on is something you call spatial anchors. Talk about the science behind spatial anchors. How do they work? What are they actually? And what are some compelling virtual content in the real world use cases for spatial anchors? So spatial anchors are a type of visual anchoring in the real world. So as you move your device to a particular location, and this can be both a HoloLens or it can be a cell phone that runs one of the ARKit or ARCore applications, you are essentially always generating a little map of the environment. And when you want to place information in the world,

Starting point is 00:21:40 you will attach it to this local little map of the environment. Currently with HoloLens, you can place holograms in the world, you will attach it to this local little map of the environment. Currently with HoloLens, you can place holograms in the world and on HoloLens itself, it will be attached to a little map so that HoloLens is continuously building a map. And so you can place it there, then it knows in the map where that hologram is placed. And then with HoloLens, you can see again that hologram when you walk by the same place. Now, what Azure Spatial Anchors is doing is allowing you to extract a local little map and share that with others in the cloud so that they can also access your hologram if you want to share it with them. So that means that I can, for example, put on HoloLens and place

Starting point is 00:22:16 a hologram somewhere in the world. And then you could come by with your mobile phone and use ARKit or ARCore to find back this hologram and see it at the same place where I placed it in the world. That means that now you can start thinking of applications where, for example, I can put virtual breadcrumbs in the world and allow you to navigate. So these are for more consumer-oriented applications. But if you look at applications like indoor navigation

Starting point is 00:22:44 or if you think of applications in an enterprise where there's all types of machinery, there's all types of sensors, there's things that we call digital twins. This means that you have, for a real machine in the real world, you also have somewhere a digital representation of it in your servers in the cloud.

Starting point is 00:23:03 That information that is available in the cloud, you would like to be able to align your servers in the cloud that information that is available in the cloud you would like to be able to align it also in the real world so that if you walk around with your holographic computer with your HoloLens on you can actually see on top of the real world on top of the real machine you can actually also access in context all of the information that relates to it now to do that you need to know where you are and where it is in the world. And so that's where technology like Azure Spatial Anchors can essentially allow you to recover that.

Starting point is 00:23:32 Now, we are currently just at the beginning of this technology. There's a lot of things we still have to work out, but the basics are there. And you can see online a lot of people are giving it a try and are having fun with it. Can you remove them? Of course, you can remove them. You can move them around.

Starting point is 00:23:46 And everybody can have their own view of the world, meaning a service technician might want to see very different things than, you know, like a random person walking through a mall. This all becomes possible and different people would have, you know, different filters in a sense on all this virtual information. And you would have different levels of access and privacy, on all this virtual information. And you would have different levels of access and privacy and all these different things would come into play to let you see the things that are relevant to you. So Microsoft Connect, let's talk about that for

Starting point is 00:24:15 a second, because it has an interesting past. Some have characterized it as a failure that turned out to be a huge success. Again, I think a short history might be in order. And I'm not sure that you're the historian of Connect, but I know you've had a lot of connection with the people who are connected to Kinect. Let's say that. Yes. Tell us how Kinect has impacted the work you're doing today and how advances in cloud computing, cameras, microphone arrays, deep neural nets, etc. have facilitated that. So, yes. So if you look back, Kinect was introduced as a device for gaming for the Xbox. It had initially a great success there, but it was something more for the casual gamer than for the hardcore gamer.

Starting point is 00:24:57 But at the same time, they're also a little bit like with research modes. Kinect got opened up and people could access this 3D sensing data that Kinect was producing. And this was something that created quite a revolution in robotics and in computer vision, where people suddenly had access to a very cheap, standardized, very powerful 3D camera. And so this stimulated all types of research, both at Microsoft Research and everywhere in every vision and robotics lab around the world, people had Kinect. What's interesting is to see that then a lot of this research

Starting point is 00:25:31 that was developed on that, for example, one of those is Kinect Fusion, which consisted of taking single Kinect depth images. But as you move through the environment, having many of those images aligned to each other and create a more global 3D reconstruction through the environment, having many of those images, you know, aligned to each other and create a more global 3D reconstruction of the environment. So people developed all kinds

Starting point is 00:25:50 of very interesting techniques. Many of those came back and enabled Microsoft to develop much more efficiently and already have an idea of what was possible with the camera that was going to be integrated in HoloLens.

Starting point is 00:26:02 What could all be put on it in terms of algorithms and what was possible be put on it in terms of algorithms and what was possible and so on, because they could just see what happened with Kinect in the research world. This was actually one of the big reasons also there why I pushed for having a research mode made available on HoloLens,

Starting point is 00:26:16 because I think there also we can both provide a great tool to the research community to work with in terms of using HoloLens as a computer vision device, but at the same time also really leverage that and benefit from it by learning all the amazing things that are possible that we might not think of ourselves. And this more or less coincided with the time where we were already with our second generation sensor in HoloLens, which is an amazing sensor that was built for HoloLens 2. And so, you know, putting things together,

Starting point is 00:26:47 it became clear that it made a lot of sense to reuse that sensor and put it in a third-generation Kinect that now is clearly not made for gaming, but is really directly targeted at intelligent cloud type of scenarios where you can have a device in the world sensing with the best-in-class depth sensor it can essentially do one megapixel one million separate depth measurements at 30 frames per second and do all of that below a watt of power consumption so an amazing sensor to combine that

Starting point is 00:27:20 with a color camera and a state-of-the-art microphone array to bring back the Kinect sensor package, but in a much, much higher quality setting now. And that's what Azure Kinect is. It seems like we're heading for a huge paradigm shift, moving from a computer screen world to a holographic world. I mean, maybe that's an oversimplification, but I'm going to go there. And people have suggested that holograms have what it takes to be the next internet, the next big technical paradigm shift going beyond individual devices and manufacturers. So what does the world look like if you're wildly successful? Well, essentially, it means that people will wear a device like HoloLens or actually think more of

Starting point is 00:28:17 a device like normal glasses, maybe a little bit beefy glasses, but much closer to today's glasses than to today's HoloLens, that will enable them to see information in a natural way on top of the world. So instead of carrying a small device with you or having a computer screen in front of you, the computer or the device will not anymore be a physical thing that you look at. It will be something that can place information anywhere in the world. And so you can have screens that move with you. You can choose how big or how many screens you want to place around you as you walk. The difference with now having to take out your phone and if you want to see one of those holograms that we would share, you have to actively look for it. If you have these glasses on, you will just be walking around

Starting point is 00:28:59 and if there's something relevant for you, it will just appear in front of your eyes. The information will just kind of always be there in context, just what you need, ideally. So hopefully we can have some good AI help with that and moderate and just pick up the right things. Right. And show you what's helpful to you. It will be very different from where we are now. Well, that leads right into the time on the podcast where I asked the what could possibly go wrong question. I actually get kind of excited about what I could do with a holographic computer on my head. But to be honest, I'm not sure I'm as excited about what other people could do, and I'll call them bad actors or advertisers. Given the future that you're literally inventing, or you're working on inventing. Does anything keep you up at night about that? I certainly care a lot about the impact on privacy for this. So I think there's

Starting point is 00:29:50 challenges, but there's also opportunities. I think it will be very important as we will more and more have sensors continuously move and sense the environment, be it on a HoloLens or on your car, be it self-driving or just with driver assistance systems, be it any other systems or robots that will roam around maybe your living room, all of those will have a lot of sensors. So even more than your cell phone in your pocket now, you will have sensors including cameras and so on, all the time sensing. And so it's really important to figure out how to build systems

Starting point is 00:30:22 that tackle the problems that we want to tackle, like being able to know where you are in the world so that you can see the holograms and the things that are relevant to you. But at the same time, not expose that in a way that allows you to do relocalization, to be able to retrieve your holograms where they should be, but doesn't allow others to look at the inside of your house or inside of spaces that they're not supposed to look into. So these are actually active research topics that we're working on today, also at the same time as we're pushing the technology forward. Right. Well, and two other things pop into my head, like advertisers and spatial anchor spam and transparency concerns. Like if I'm wearing glasses that look pretty natural, how do I signal or how do we signal to other people, hey, we have a lot of information about you that you might not be aware of?

Starting point is 00:31:20 I think you're exactly right. So those are all really important issues and it's really important to think about them. As an example, the first generation HoloLens was designed in a way that all of those sensors that are needed to run continuously to just be able to operate HoloLens, that all this data would not be accessible to applications, but just be accessible to the operating system and be isolated in the HPU actually, and not be exposed on the general purpose processor where the applications live as a way to ensure privacy by designing the hardware there. So it's clear that these type of things are very important. It's often a trade-off, or at least let's say the easy solution is a trade-off between privacy and functionality. But I think that's where we have to be smart

Starting point is 00:32:04 and where we have to start already doing research in that space to work towards a future that we can live in. Marc, talk a little bit about yourself, your background and your journey. What got you started in computer science? How did you end up doing computer vision research? And what was your path to Microsoft,

Starting point is 00:32:21 not to mention all your other destinations? Well, you know, so I come from Belgium. Long ago, when I was like 12 years old or so, I wanted to get a game computer. And my father suggested it was maybe a better idea to get a computer on which I could program a bit. That's just like a dad. Yeah. And so, you know, the result was that I kind of thought it was pretty cool to program and all. And then I actually didn't study computer science eventually.

Starting point is 00:32:47 I thought I was going to study computer science, but I ended up studying electrical engineering, which is close enough. And one of the last exams of my studies, I had a computer vision exam. And the professor there asked if I was interested in maybe doing a PhD with him. And so that's how I got started in computer vision. And in particular, I picked a topic that was really about 3D reconstruction from images and had a lot of geometry in there. And from Belgium, from Leuven, University of Leuven, I then moved to the University of North Carolina in Chapel Hill, where I had a lot of colleagues doing computer graphics. Computer vision was very complementary to that. After a number of years, I got the opportunity to move back to ETH Zurich, which is really one of the top schools worldwide.

Starting point is 00:33:27 So I decided to move to Switzerland. And then in 2015, I guess, I got approached by Alex Kipman and the mixed reality team, the HoloLens team at Microsoft. And I hesitated for a while, but then I realized that there was really an opportunity to have a big impact. You know, at some point, even I had a conversation with Satya that kind of helped convince me to come over and, you know, help realize the vision of mixed reality. I hear he's very persuasive. He is. And I was really impressed. We were very aligned on the vision and on where we could go with this technology and what we could do. And so this was actually a very good conversation.

Starting point is 00:34:06 So you packed up and moved to Redmond. That's right. And now you're back in Zurich. That's right. Actually, before I decided to join, I told Alex, I said, I'm going to come for two years and you have to be okay with that. Otherwise, I'm not coming at all.

Starting point is 00:34:22 He convinced me that he was okay with it. And so, you know, the end result is now that eventually I didn't really want to fully leave Microsoft. I wanted actually to both continue having impact on Microsoft, but also felt that I could really do something in between that, you know, and have in some way the best of both worlds. In a sense, it's two half jobs turns out to be more than one job. Yeah.

Starting point is 00:34:43 But I'm really excited about the opportunity I got to do this. Well, maybe we can have a Marc Boulifais spatial anchor here in Redmond and work with you there. As we close, what advice or wisdom or even direction would you give to any of our listeners who might be interested in the work you're doing? Where would you advise people to dig in research-wise in this world of the holographic computer? I think at this point, I really care about figuring out how to get all of this amazing sensing technology out in the world. But at the same time, make sure that we have systems that preserve privacy. Figuring out how to marry those things, I think, is really exciting. And so that's one of the areas I'm really working on.

Starting point is 00:35:25 And I hope a lot of people are going to work on that. Mark Poliface, great to have you with us from Zurich today. Thanks for coming on the podcast. Yeah, thank you for having me. It was great. To learn more about Dr. Mark Poliface and Microsoft's vision for the future of computer vision, visit Microsoft.com slash research.

Your Ad Here

Microsoft Research Podcast - 071 - Holograms, spatial anchors and the future of computer vision with Dr. Marc Pollefeys

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.