Microsoft Research Podcast - 071 - Holograms, spatial anchors and the future of computer vision with Dr. Marc Pollefeys
Episode Date: April 10, 2019 Dr. Marc Pollefeys is a Professor of Computer Science at ETH Zurich, a Partner Director of Science for Microsoft, and the Director of a new Microsoft Mixed Reality and AI lab in Switzerland. He’s... a leader in the field of computer vision research, but it’s hard to pin down whether his work is really about the future of computer vision, or about a vision of future computers. Arguably, it’s both! On today’s podcast, Dr. Pollefeys brings us up to speed on the latest in computer vision research, including his innovative work with Azure Spatial Anchors, tells us how devices like Kinect and HoloLens may have cut their teeth in gaming, but turned out to be game changers for both research and industrial applications, and explains how, while it’s still early days now, in the future, you’re much more likely to put your computer on your head than on your desk or your lap.
Transcript
Discussion (0)
So instead of carrying a small device with you or having a computer screen in front of you,
the computer or the device will not anymore be a physical thing that you look at. It will be
something that can place information anywhere in the world. And so you can have screens that
move with you. You can choose how big or how many screens you want to place around you as you walk.
The difference with now having to take out your phone and if you want to see one of those
holograms that we would share, you have to actively look for it.
If you have these glasses on, you will just be walking around and if there's something
relevant for you, it will just appear in front of your eyes.
You're listening to the Microsoft Research Podcast, a show that brings you closer to
the cutting edge of technology research and the scientists behind it. I'm your host, Gretchen Huizinga.
Dr. Mark Poliface is a professor of computer science at ETH Zurich, a partner director
of science for Microsoft, and the director of a new Microsoft mixed reality and AI lab
in Switzerland. He's a leader in the field of computer vision Microsoft Mixed Reality and AI lab in Switzerland.
He's a leader in the field of computer vision research,
but it's hard to pin down whether his work is really about the future of computer vision or about a vision of future computers.
Arguably, it's both.
On today's podcast, Dr. Paula Faze brings us up to speed on the latest in computer vision research,
including his innovative work
with Azure Spatial Anchors, tells us how devices like Kinect and HoloLens may have cut their teeth
in gaming, but turned out to be game changers for both research and industrial applications,
and explains how, while it's still early days now, in the future, you're much more likely to
put your computer on your head than on your desk or your lap. That and much more on this episode of the Microsoft Research Podcast.
Mark Poliface, welcome to the podcast.
Thank you.
So you wear a few hats.
You're a professor of computer science at ETH Zurich, a partner director of science for Microsoft, and now you're overseeing the creation of a new Microsoft Mixed Reality and AI lab in Switzerland.
So it's pretty obvious what gets you up in the morning.
Tell us a little bit about each of your roles and how you managed to switch hats and work everything into a day.
Sure. I've been a professor for quite a while here in Switzerland and before that in the US.
Then almost three years ago, I joined Microsoft to work with Alex Kipman on mixed reality.
I spent two years in Redmond working with a large team of scientists and engineers on moving computer
vision technology on HoloLens forward, for in particular, we worked on HoloLens 2 that was
recently announced. I told Alex that I was going to do this for two years, and then I wanted to
come back to Zurich, back to being a professor at ETH. But after those two years, I realized that
it was in a sense very complementary. So on the one hand, I'm really excited about doing academic basic research,
but I was also always very interested
in doing more applied research.
And so I can partially do that at ETH,
but of course there's no place to have a bigger impact
with that applied research than Microsoft.
And so I realized that I wanted to continue doing that.
And so at that point, I discussed with Alex what we could do, what would make sense.
And I realized that there was something really interesting that could be done,
which was to set up a lab here in Zurich.
ETH being one of the top schools to recruit talent for the type of research
that we need to do for mixed reality.
At the same time from the side of ETH, with my ETH
had on great opportunity to provide opportunities for students to work with Microsoft, to get access
to devices, to resources that we would not necessarily have at ETH. A lot of exciting
projects to propose to the students. And so really essentially saw that there was a real win-win
between, you know, what ETH can offer and what microsoft can offer
and so both for myself but actually for everybody involved being able to kind of find all those
different elements and bring them together and have something really nice come out of that yeah
so yeah so there's synergies it is a lot of work but there's also a lot of nice synergies
i want you to talk a little bit about how collaboration happens, particularly with what you've got going on.
Microsoft researchers collaborate with ETH Zurich and EPFL researchers in the Swiss Joint Research Center, or the JRC.
Tell us more about the Swiss JRC and how it's helped you bridge the gap between your ETH Zurich role and your Microsoft role? Yes. So actually the GRC is a program that has been going on for about 10 years
to stimulate collaboration between Microsoft ETH and APFL.
And I was actually involved, I think in 2008 or so,
this was one of the first grants that I got from Microsoft then.
So I worked on the ETH side as a PI.
But then over the years, the scheme became much more collaborative
with PIs on both sides.
So in the second generation
of these collaborative projects,
I had a project with a colleague here,
Otmar Hillegas,
who was actually a former Microsoft researcher
and then collaborating with researchers
at Microsoft Research in Cambridge
and then in Redmond.
So that's more like the project in the past.
Now, currently, we have also great synergies with the GRC
because having that framework in place that facilitates collaborative projects
between the different schools and Microsoft,
it means that we can also, in the area of mixed reality,
where we want to foster more collaboration,
also in the context of the lab here,
this gives us a framework and a way to facilitate extra collaborations.
Your primary area of research is computer vision and you've specialized in 3D computer vision.
So give us an overview of your work in the field in general, just to set the stage for our conversation today.
In broad strokes, what are the big questions you're asking and the big problems you're trying to solve?
So essentially computer vision is about extracting information from image data,
from images or video. This can be information like recognizing things in the scene,
or it can also be geometric information. I'm much more focused on extracting geometric
information from images. As an example, this can be as you move with a camera through a scene,
being able to recover the three-dimensional
surfaces in the scene a three-dimensional reconstruction of the scene this can also be
recovering the motion of the camera for the scene so in which way the camera is moving through that
scene it can also be finding back the location it can be figuring out the calibration of the camera
it can be many different things related to this geometry. More and more in recent years I've been also combining that with at the same time also
extracting semantic information from the scene and so having both geometry and semantic information
so that for example when we do reconstruction we don't only reconstruct surfaces but then we know
that part of the surface is a table another part of the surface is the floor for example and a
third part of the surface could be the wall surfaces and so on. And so by doing both at the same time,
we can achieve more and get better results for the different tasks at once.
Well, let's talk a little bit more deeply about the different kinds of computer vision work going
on. Microsoft has a lot of research going on in computer vision and from a variety of angles.
And when you and I talked before,
you sort of differentiated what you're doing
with what some of the other research areas in Microsoft are doing.
So tell us how the work you're doing is different.
Yes, so there's a lot of research in computer vision.
We'll look at an image and we'll extract semantic information from that.
It will recognize those objects
or it will be able to label different parts of the scene.
This is an area that for a long time
was struggling to get really good results.
But now with the advent of deep learning,
there's been tremendous progress in that area.
When you look at what we are doing for mixed reality
and also what I'm doing here in my lab,
extracting this geometric information from images is not something that you can as simply tackle with these methods that are now
very popular in computer vision of using deep learning. A lot of the kind of geometric
computations are ill-suited to be, you know, just easily tackled with deep learning, convolutional
neural networks or that type of approach. The classical methods that leverage our strong understanding of geometry
are still very strongly needed to get good quality results.
So for example, for a device like HoloLens,
where it is critically important to be able to know
exactly how the device moves through the environment,
because it's actually what's needed to give the impression
of like a hologram being static in the environment because it's actually what's needed to give the impression of like a hologram
being static in the environment. So if I move my head, I need to somehow be able to fake the
impression that what I display on the screen is static in the world as opposed to being static on
the screen. To do that, I need to know very, very precisely how I'm moving my head through the
environment. We do that by combination of
inertial sensors together with the camera image data. And so we analyze the image data to compute
how the headset is moving through the world. That's why HoloLens has a whole set of cameras
that observe the world. It's really to be able to track its position all the time.
That's interesting because you take for granted that when I move my head, like if I'm talking to you and I move my head, you're not going to move with me.
That's right.
But with a device, it's going to be a little bit of a different experience unless you guys fix the technical aspects of it.
So how are you tackling that technically?
On HoloLens, you know, these are techniques that combine very much like we do with our own eyes, where we combine our visual sensing together with our inner ear, which is more inertial sensing.
Combining those two, we can get a very good impression of how we move in through space.
So we're actually doing roughly the same for mixed reality.
It's actually also very similar to what is being used in robotics.
You can call it visual inertial odometry, which is determining your own motion from visual inertial data.
Or even if you go
beyond that, people often call it SLAM. This stands for simultaneous localization and mapping.
It means that while you are localizing yourself, your relative motion in the environment,
you're at the same time building up a map of the environment so that if you revisit it,
you can actually recognize and realize that you have already seen part of the scene and correct
your position or take that into account to kind of continue to estimate your position.
So this is used in robotics, in self-driving cars, and also very much in mixed reality and also for augmented reality on phones.
The same techniques are being used.
So this is a key element in HoloLens.
And so this device needs to be able to track itself for the environment.
So, Mark, I'll argue that you're doing the science behind what our future life with computers
will look like. And a lot of it has to do with how we experience reality. And you've alluded to that
just recently here. Currently, I'll say we have real life reality. I put that in air quotes because that in itself is, you know,
arguable and also what most people refer to as virtual reality. But we're increasingly headed
toward a world of augmented and mixed reality where the lines are a bit more blurred. So
tell us about your vision for this world. And I don't want you to be sensational about it. I mean, for real,
we're heading into a different kind of paradigm here. How does it work and why do we need it even?
Okay, so if you look at computers and mobile phones, for example, there's a lot of information
that relates to the real world that's available. The first generation of computing essentially was,
you know, it's a computer on your desk and you might use mapping or other tools to predetermine the route to a particular place of interest.
But then you still have to take that with you.
That's what changed completely once we went to mobile phones, which essentially is a mobile computer in your pocket.
So you always have it with you.
That computer actually knows about its approximate position in space.
And so a lot of things became possible, like for navigation, for example,
or also applications like Uber or other ride-sharing services and so on,
because you have now information that is spatially kind of with you in the real world.
The next generation going beyond that, and where we go with mixed reality,
is really about having information not only you know broadly contextually placed in space but now actually going to very precise
positioning in space so meaning that in general you can expect not only to know this is roughly
where you have to go or look at your phone and get instructions but you can imagine now really
to mix the digital information
that you currently see on the screen of your phone
and then the real-world information that you see in front of you,
you can expect those to essentially be merged,
to be mixed together in one reality,
that's the reality in front of you.
And so the information that you need to do the task,
which could be a navigation task,
or could be, if you're a technician,
a complicated task to repair a machine, needing to carefully know which part you have to take out first and which, you know, button you have to press, etc. All those things
can now be communicated to you. Or, you know, in the future, as we're going already now in a number
of contexts, with mixed reality can be communicated to you in a very intuitive way, just by overlaying
the instructions on top of the world visually.
So you can just see things in front of you, press this button and it just shows you with an arrow
like the button or it shows you an example of exactly what to do. So it makes it a lot simpler
to process the information and be able to tackle more complicated tasks in that sense.
Let's talk about Microsoft's HoloLens.
Many people think of it as merely another device or maybe a pair of virtual reality glasses,
but it's actually, as you start to describe it, the world's first self-contained holographic computer.
Give us a brief history of HoloLens to clear up any misperceptions we might have,
and then tell us where and how HoloLens is primarily being used today,
and then we'll start talking about some of the specific work you're doing.
So, if you look at the first generation HoloLens,
it was developed as a development kit, essentially,
for Microsoft to learn about where this could be used.
The initial concept was already going all the way to the long-term vision.
And so you can see that on the first generation HoloLens,
it wasn't clear if this was going to be something for consumers or something for industry or where exactly it would be applied.
And it was put out there as, you know, in a sense, this magical device of the future to see where it could be useful to learn what type of tools and what type of solutions could be implemented on it.
So we learned a lot.
I joined, you know, between HoloLens 1 and HoloLens 2. I joined the
team and it became clear very quickly that we still have a long ways to go in terms of form
factor, in terms of a number of aspects to get to something that makes sense to use in your daily
life all the time, you know, as you currently use your cell phone. The device is still too bulky.
It's too expensive for consumers, et cetera. So it's too early for that too bulky, it's too expensive for consumers, etc. So it's too
early for that. However, it's not at all too early to use a device like HoloLens 1 or now
obviously HoloLens 2 in settings where you have task workers, people that have to do often
complicated tasks in the real world, repair machines, or it can also be a surgeon, for example,
who's also a first-line worker, a person that's out in the real world having to do a complicated
operation and essentially needs access to information in a seamless as possible way
to help him do that task. That's where HoloLens turned out to be incredibly valuable because
it's a full computer that you wear on your head.
It allows you to place as much information
as you want around you.
You can, in very natural ways,
interface with it by speaking to it,
by doing gestures.
So you can interact with the device
without even having to touch anything.
You can use your hands to actually do the task
you're supposed to do in the real world
and still get access to all the information
you need to help you do that.
This magic of HoloLens is that you have this full computer, but you can still use your hands to do the task that you have to do. HoloLens has some really interesting
applications, but right now it's mainly an enterprise tool and it's particularly useful
for researchers. So tell us about this thing called research mode in HoloLens. What is it
and why is it such a powerful tool, especially for computer vision researchers?
So HoloLens has all of these sensors built in.
So you have this device on your head that has sensors that look at the world from the same viewpoint as you are looking at the world.
We have four cameras tracking the environment.
We have this depth camera, which can be used in two different modes, one for tracking the hands, and then a second mode, which is a smaller field of view,
but a more powerful signal out to sense further away, which we use for doing reconstruction of
the 3D environment. So essentially we have all these different imaging sensors moving with you
through the world. So this is perfect for all types of computer vision research,
potentially also for robotics research or other applications.
You now have this device that can collect data in real time.
You can either process it on device or over Wi-Fi,
send it to a more beefy computer outside
to do more expensive computations if you want to,
or you can just store the data to do experiments.
But you can collect this very rich data from all these different sensors to do all types of
computer vision experiments with. In particular, if you're thinking of doing things like trying to
understand what the user is doing from a first-person point of view, you can actually use
these different sensor streams to then develop computer vision algorithms to understand what a person is doing.
You just mentioned a more beefy computer, which raises the question, what kind of a computer or a processing unit, shall we say? I've heard you refer to it as an HPU or holographic processing
unit. How big is the processing unit? What are we talking about here? Well, it's a small coprocessor for
HoloLens, so you can very much compare it to a state-of-the-art cell phone in terms of the
general purpose computing. But then on top of that, because it needs to continuously run all
of these computer vision tasks to be able to track itself in the environment, to be able to track the
hands, to be able in HoloLens 2 to also track
your eyes, to know where you're looking or to recognize you based on Iris.
So all of these different tasks, most of them need to run all the time.
Yeah.
This means that this is for hours at a time it runs.
If you look at a cell phone and you take your cell phone and you run one of those fancy
augmented reality apps, for example, you will notice that after a few minutes, your phone
is running extremely hot
because it's consuming a lot of power to do all those computations yeah and your battery is
draining very quickly we cannot afford that on HoloLens so if you would just have this
general purpose processor you could run HoloLens for 10 minutes and your battery would be empty
and your head would be hot exactly so this is exactly why Microsoft had to develop its own ASIC,
so its own chip, which is the HPU,
which is a chip that's dedicated to do all of these computer vision tasks
and other signal processing tasks very efficiently at very low power
and can sustain that all along.
If you look at HoloLens, the whole system is below 10 watts of power consumption.
If you actually look carefully, the whole design of HoloLens is really done around being
able to consume as little power as possible and be able to stay passively cooled so that
it doesn't heat up your forehead and so on.
Okay, so how are you doing that?
Is it algorithms that help you out?
What is the technical approach to solving those problems?
Well, you have to be thinking in every algorithm, in everything you do, you really
have to be very careful and thinking right from the beginning how much power it's going
to consume. It's a lot of engineering effort to get to a system that consumes that little
power and that amount of computer vision processing all the time. It means that you can put some very efficient processing units that are well suited to do
all this image processing operations.
It means that some things that need to happen hide the latency in the rendering.
All of these tasks are hardware accelerated with some dedicated hardware in the HPU to
make them run very efficiently.
And it also means that you have to be smart how you use the algorithms
and code things very efficiently.
When you don't need all the sensors all the time,
you don't use them all the time.
It means that you really try to,
at every point, everywhere you can,
you try to save energy
and just do what you need to do, but not more.
Sounds like a teenager.
One of the coolest things you're working on is something you call spatial anchors.
Talk about the science behind spatial anchors. How do they work? What are they actually?
And what are some compelling virtual content in the real world use cases for spatial anchors?
So spatial anchors are a type of visual anchoring in the real world. So as you move your device to a particular location,
and this can be both a HoloLens or it can be a cell phone
that runs one of the ARKit or ARCore applications,
you are essentially always generating a little map of the environment.
And when you want to place information in the world,
you will attach it to this local little map of the environment.
Currently with HoloLens, you can place holograms in the world, you will attach it to this local little map of the environment. Currently with HoloLens, you can place holograms in the world and on HoloLens itself, it will
be attached to a little map so that HoloLens is continuously building a map.
And so you can place it there, then it knows in the map where that hologram is placed.
And then with HoloLens, you can see again that hologram when you walk by the same place.
Now, what Azure Spatial Anchors is doing is allowing you to extract a local little
map and share that with others in the cloud so that they can also access your hologram if you
want to share it with them. So that means that I can, for example, put on HoloLens and place
a hologram somewhere in the world. And then you could come by with your mobile phone and use ARKit
or ARCore to find back this hologram and see it at the same place
where I placed it in the world.
That means that now you can start thinking of applications
where, for example, I can put virtual breadcrumbs in the world
and allow you to navigate.
So these are for more consumer-oriented applications.
But if you look at applications like indoor navigation
or if you think of applications in an enterprise
where there's all types of machinery,
there's all types of sensors,
there's things that we call digital twins.
This means that you have,
for a real machine in the real world,
you also have somewhere a digital representation of it
in your servers in the cloud.
That information that is available in the cloud, you would like to be able to align your servers in the cloud that information that is available in
the cloud you would like to be able to align it also in the real world so that if you walk around
with your holographic computer with your HoloLens on you can actually see on top of the real world
on top of the real machine you can actually also access in context all of the information that
relates to it now to do that you need to know where you are
and where it is in the world.
And so that's where technology like Azure Spatial Anchors
can essentially allow you to recover that.
Now, we are currently just at the beginning of this technology.
There's a lot of things we still have to work out,
but the basics are there.
And you can see online a lot of people are giving it a try
and are having fun with it.
Can you remove them?
Of course, you can remove them.
You can move them around.
And everybody can have their own view of the world,
meaning a service technician might want to see very different things
than, you know, like a random person walking through a mall.
This all becomes possible and different people would have,
you know, different filters in a sense on all this virtual information.
And you would have different levels of access and privacy, on all this virtual information. And you would have
different levels of access and privacy and all these different things would come into play to
let you see the things that are relevant to you. So Microsoft Connect, let's talk about that for
a second, because it has an interesting past. Some have characterized it as a failure that
turned out to be a huge success. Again, I think a short history might be in order. And I'm not
sure that you're the historian of Connect, but I know you've had a lot of connection with the people who are
connected to Kinect. Let's say that. Yes. Tell us how Kinect has impacted the work you're doing
today and how advances in cloud computing, cameras, microphone arrays, deep neural nets, etc.
have facilitated that. So, yes. So if you look back, Kinect was introduced as a device for gaming for the Xbox.
It had initially a great success there,
but it was something more for the casual gamer than for the hardcore gamer.
But at the same time, they're also a little bit like with research modes.
Kinect got opened up and people could access this 3D
sensing data that Kinect was producing. And this was something that created quite a revolution
in robotics and in computer vision, where people suddenly had access to a very cheap, standardized,
very powerful 3D camera. And so this stimulated all types of research, both at Microsoft Research
and everywhere in every vision and robotics lab around the world,
people had Kinect.
What's interesting is to see that then a lot of this research
that was developed on that, for example,
one of those is Kinect Fusion,
which consisted of taking single Kinect depth images.
But as you move through the environment,
having many of those images aligned to each other and create a more global 3D reconstruction through the environment, having many of those images, you know, aligned to each other
and create a more global 3D reconstruction
of the environment.
So people developed all kinds
of very interesting techniques.
Many of those came back
and enabled Microsoft to develop
much more efficiently
and already have an idea
of what was possible
with the camera that was going
to be integrated in HoloLens.
What could all be put on it
in terms of algorithms and what was possible be put on it in terms of algorithms
and what was possible and so on,
because they could just see what happened with Kinect
in the research world.
This was actually one of the big reasons also there
why I pushed for having a research mode
made available on HoloLens,
because I think there also we can both provide a great tool
to the research community to work with
in terms of using HoloLens as a computer vision device,
but at the same time
also really leverage that and benefit from it by learning all the amazing things that are possible
that we might not think of ourselves. And this more or less coincided with the time where we
were already with our second generation sensor in HoloLens, which is an amazing sensor that was
built for HoloLens 2. And so, you know, putting things together,
it became clear that it made a lot of sense to reuse that sensor
and put it in a third-generation Kinect
that now is clearly not made for gaming,
but is really directly targeted at intelligent cloud type of scenarios
where you can have a device in the world sensing
with the best-in-class depth sensor
it can essentially do one megapixel one million separate depth measurements at 30 frames per
second and do all of that below a watt of power consumption so an amazing sensor to combine that
with a color camera and a state-of-the-art microphone array to bring back the Kinect sensor package,
but in a much, much higher quality setting now.
And that's what Azure Kinect is.
It seems like we're heading for a huge paradigm shift, moving from a computer screen world to a holographic world. I mean, maybe that's an oversimplification, but I'm going to go there.
And people have suggested that holograms have what it takes to be the next internet,
the next big technical paradigm shift going beyond individual devices
and manufacturers. So what does the world look like if you're wildly successful?
Well, essentially, it means that people will wear a device like HoloLens or actually think more of
a device like normal glasses, maybe a little bit beefy glasses, but much closer to today's glasses
than to today's HoloLens, that will enable them to see
information in a natural way on top of the world. So instead of carrying a small device with you or
having a computer screen in front of you, the computer or the device will not anymore be a
physical thing that you look at. It will be something that can place information anywhere
in the world. And so you can have screens that move with you. You can choose how big or how many screens you want to place around you as you walk. The difference with now
having to take out your phone and if you want to see one of those holograms that we would share,
you have to actively look for it. If you have these glasses on, you will just be walking around
and if there's something relevant for you, it will just appear in front of your eyes.
The information will just kind of always be there in context, just what you need, ideally.
So hopefully we can have some good AI help with that and moderate and just pick up the right things.
Right.
And show you what's helpful to you.
It will be very different from where we are now.
Well, that leads right into the time on the podcast where I asked the what could possibly go wrong question. I actually get kind of excited about what I could do with a holographic computer on my head. But to be honest, I'm not sure I'm as excited about what other people could do, and I'll call them bad actors or advertisers. Given the future that you're literally inventing, or you're working on inventing. Does anything keep you up at night
about that? I certainly care a lot about the impact on privacy for this. So I think there's
challenges, but there's also opportunities. I think it will be very important as we will more
and more have sensors continuously move and sense the environment, be it on a HoloLens or on your
car, be it self-driving or just with driver assistance systems,
be it any other systems or robots that will roam around maybe your living room,
all of those will have a lot of sensors.
So even more than your cell phone in your pocket now,
you will have sensors including cameras and so on, all the time sensing.
And so it's really important to figure out how to build systems
that tackle the problems that we want to tackle, like being able to know where you are in the world so that you can see the holograms and the things that are relevant to you.
But at the same time, not expose that in a way that allows you to do relocalization, to be able to retrieve your
holograms where they should be, but doesn't allow others to look at the inside of your house or
inside of spaces that they're not supposed to look into. So these are actually active research
topics that we're working on today, also at the same time as we're pushing the technology forward.
Right. Well, and two other things pop into my head, like advertisers and spatial anchor spam and transparency concerns.
Like if I'm wearing glasses that look pretty natural, how do I signal or how do we signal
to other people, hey, we have a lot of information about you that you might not be aware of?
I think you're exactly right. So those are all really important issues and it's really important to think about them.
As an example, the first generation HoloLens was designed in a way that all of those sensors that are needed to run continuously to just be able to operate HoloLens, that all this data would not be accessible to applications, but just be accessible to the operating system and be isolated in the HPU actually, and not be exposed on the general purpose processor
where the applications live as a way to ensure privacy
by designing the hardware there.
So it's clear that these type of things are very important.
It's often a trade-off, or at least let's say the easy solution
is a trade-off between privacy and functionality.
But I think that's where we have to be smart
and where we have to start already doing research
in that space to work towards a future
that we can live in.
Marc, talk a little bit about yourself,
your background and your journey.
What got you started in computer science?
How did you end up doing computer vision research?
And what was your path to Microsoft,
not to mention all your other destinations?
Well, you know, so I come from Belgium.
Long ago, when I was like 12 years old or so, I wanted to get a game computer.
And my father suggested it was maybe a better idea to get a computer on which I could program a bit.
That's just like a dad.
Yeah.
And so, you know, the result was that I kind of thought it was pretty cool to program and all.
And then I actually didn't study computer science eventually.
I thought I was going to study computer science, but I ended up studying electrical engineering, which is close enough.
And one of the last exams of my studies, I had a computer vision exam.
And the professor there asked if I was interested in maybe doing a PhD with him.
And so that's how I got started in computer vision. And in particular, I picked a topic that was really about 3D reconstruction from images and had a lot of geometry in there.
And from Belgium, from Leuven, University of Leuven, I then moved to the University of
North Carolina in Chapel Hill, where I had a lot of colleagues doing computer graphics. Computer
vision was very complementary to that. After a number of years, I got the opportunity to move
back to ETH Zurich, which is really one of the top schools worldwide.
So I decided to move to Switzerland.
And then in 2015, I guess, I got approached by Alex Kipman and the mixed reality team, the HoloLens team at Microsoft.
And I hesitated for a while, but then I realized that there was really an opportunity to have a big impact. You know,
at some point, even I had a conversation with Satya that kind of helped convince me to
come over and, you know, help realize the vision of mixed reality.
I hear he's very persuasive.
He is. And I was really impressed. We were very aligned on the vision and on where we
could go with this technology and what we could do. And so this was actually a very good conversation.
So you packed up and moved to Redmond.
That's right.
And now you're back in Zurich.
That's right.
Actually, before I decided to join, I told Alex,
I said, I'm going to come for two years
and you have to be okay with that.
Otherwise, I'm not coming at all.
He convinced me that he was okay with it.
And so, you know, the end result is now
that eventually I didn't really want to fully leave Microsoft.
I wanted actually to both continue having impact on Microsoft,
but also felt that I could really do something in between
that, you know, and have in some way the best of both worlds.
In a sense, it's two half jobs turns out to be more than one job.
Yeah.
But I'm really excited about the opportunity I got to do this.
Well, maybe we can have a Marc Boulifais spatial anchor here in Redmond and work with you there.
As we close, what advice or wisdom or even direction would you give to any of our listeners who might be interested in the work you're doing?
Where would you advise people to dig in research-wise in this world of the holographic computer?
I think at this point, I really care about figuring out how to get all of this amazing sensing technology out in the world.
But at the same time, make sure that we have systems that preserve privacy.
Figuring out how to marry those things, I think, is really exciting.
And so that's one of the areas I'm really working on.
And I hope a lot of people are going to work on that.
Mark Poliface, great to have you with us from Zurich today.
Thanks for coming on the podcast.
Yeah, thank you for having me. It was great.
To learn more about Dr. Mark Poliface and Microsoft's vision for the future of computer vision, visit Microsoft.com slash research.