Microsoft Research Podcast - 017 - When Psychology Meets Technology with Dr. Daniel McDuff
Episode Date: March 28, 2018One of the most intriguing areas of machine learning research is affective computing, where scientists are working to bridge the gap between human emotions and computers. It is here, at the intersecti...on of psychology and computer science, that we find Dr. Daniel McDuff, who has been designing systems, from hardware to algorithms, that can sense human behavior and respond to human emotions. Today, Dr. McDuff talks about why we need computers to understand us, outlines the pros and cons of designing emotionally sentient agents, explains the technology behind CardioLens, a pair of augmented reality glasses that can take your heartrate by looking at your face, and addresses the challenges of maintaining trust and privacy when we’re surrounded by devices that want to know not just what we’re doing, but how we’re feeling.
Transcript
Discussion (0)
We've developed a system that allows people to look at another individual and see physiological
responses of that person. So it's data they wouldn't normally be able to see,
but it's superimposed onto that other person so they can actually see their heart beating,
they can see changes in stress based on heart rate variability. And that's all sensed remotely.
But you're giving the individual a new sensory channel that they can leverage.
You're listening to the Microsoft Research Podcast,
a show that brings you closer to the cutting edge of technology research and the scientists behind it.
I'm your host, Gretchen Huizenga. One of the most intriguing areas of machine learning
research is affective computing, where scientists are working to bridge the gap between human
emotions and computers. It is here at the intersection of psychology and computer science
that we find Dr. Daniel McDuff, who has been designing systems from
hardware to algorithms that can sense human behavior and respond to human emotions.
Today, Dr. McDuff talks about why we need computers to understand us, outlines the pros
and cons of designing emotionally sentient agents, explains the technology behind CardioLens,
a pair of augmented reality glasses that can take your heart rate
by looking at your face,
and addresses the challenges of maintaining trust and privacy
when we're surrounded by devices that want to know
not just what we're doing, but how we're feeling.
That and much more on this episode of the Microsoft Research Podcast.
Daniel McDuff, welcome to the show today. Great to have you with us. It's great to be here.
So you're in Human-Computer Interaction, or HCI, and you situate your research at the intersection of computer science and psychology.
So tell us in broad strokes about HCI and what you do. So the crux of what I do is teaching machines to understand
people in a deeper way, and that involves capturing and responding to their emotional state.
So can we design a machine that really understands people, not just what they're saying, but how they're saying it, how they're behaving?
And I think that's really fundamental to human-computer interaction
because so much of what we do as people is nonverbal.
It's not described in language.
And a lot of computer systems don't understand that,
and that's the focus of my work is bringing that EQ to technology.
EQ meaning emotional quotient?
Yeah, that's a somewhat slang term and it's used frequently to contrast IQ, which is something
that technology has a lot of. Technology can answer lots of questions very quickly because
it has access to all of the information on the internet but not much technology has EQ.
No. Does any?
I think we're starting to see the beginning of this. So you see social
robots as a great example of systems which have some kind of personality they
can express visually, some basic facial expressions on a screen or using sounds or lights.
Movies are a great example.
So R2-D2 is a great example of a system that doesn't have a face but can still communicate emotions.
Although that's fictional, we are starting to see systems in the real world that kind of behave in a somewhat similar way.
That's fascinating. I even think of Wallace
and Gromit animation where Gromit only communicates with his eyes and his eyebrows,
and yet you get almost everything that he wants to say through his eyes. Exactly. And we take a
lot of inspiration from animations and animators because I study facial expressions, it's magical how some creators can
show so much rich emotion just through a facial expression. And as we design systems that
recognize those and exhibit them, there's a lot we can learn from that side of the world.
I'm intrigued by the field of affective computing, and I understand it aims to bridge the gap between human emotions and computational technology.
So what is affective computing? What does it promise, and what do we need computers to understand us as human beings for?
At a high level, affective computing is designing systems that can read, interpret, and respond to human emotion. And that sounds like a daunting
task. There's a lot more we need to do in research, but we're starting to see real-world systems where
this is true. So systems where they can read facial expressions, for instance, or understand
the voice tone of someone, or look at sentiment, for instance, on Facebook posts or Twitter to understand the emotions that are being expressed. And this is kind of where
the world is now, but in the future, we can imagine systems that use multimodal data,
robotic systems that interact with us in an embodied way, that also sense this type of
information. And that's kind of the target we're focused on.
So why do you think we need computers to understand us?
I think it's fundamental to how we interact as human beings.
And so when we interact with a system that doesn't do those things that we take for granted,
it can be off-putting. For instance, if a system doesn't realize that I'm getting
frustrated with it, it can be more frustrating. It can even be upsetting. There's research showing
that robots that can apologize are liked a lot more than robots that don't, even if they're
better at completing the tasks they were intending to complete. So it can really improve our
relationship and our well-being because it fundamentally
improves the interaction we have with the devices around us.
So you co-authored an article called Designing Emotionally Sentient Agents. And aside from the
Hollywood connotations that phrase brings to mind, what should we understand about this research,
Daniel? And what should we be excited about or concerned about? I think there's a lot to be excited about in home care, in health care, in understanding just human interaction even more.
If we can design systems to mimic some of those things, it will deepen our understanding of how we as humans behave.
There are a number of challenges that we need to overcome.
One is how do we sense this information? And sensors
can be intrusive. You know, devices that are around us that are listening for commands all the time
are starting to appear. And you can imagine in future there could be camera systems as well.
So we need to think about the social norms that exist around the sensing side of things. Where
does that data go? How is it stored?
How do we know that the sensor's on?
How do we control it?
How do we stop it from recording when we want to?
And then how do we allow other people who are in our lives
to not be sensed even if we're being sensed?
Or if I invite someone into my home
and I've got a device that's always listening
or always watching,
what does that mean for our social interaction? So I think there's some challenges to overcome
there, but there's also more philosophical challenges about how much do we teach computers
of human emotion? Is it possible for a machine ever to feel emotion? What does that mean? And
how should machines express emotion or respond to this information? We
definitely don't want to design systems that are manipulative or that make people feel like
they're more intelligent than they are. If someone sees a system that appears emotional,
they might think, wow, this is really, really intelligent, even if it's only expressing very
basic behaviors. And that can be challenging
because some of the other abilities of that system might be quite weak. And so people
might trust it, even if it can't actually perform accurately the tasks it's trying to
do.
You're using terms that are interesting, and interesting is a kind of placeholder
word for other words that I'm actually thinking, like sentient and understanding regarding a machine.
And I wonder how I should interpret that.
What do people like you and your colleagues really believe about what you just addressed?
Can a machine ever feel?
Can it really understand?
Can it become sentient?
I think machines are fundamentally different to humans.
Machines can recognize some expressions of emotion.
They can respond to them.
But I don't think that that constitutes feeling and emotion.
Feeling and emotion requires experience.
It requires a reward and a cost associated with
different actions. It's much, much more complex than that. So I don't think machines will ever
experience emotion in the way that we do, but they will have many of the fundamental skills that we have. What can you tell us about the emerging field of what I would call artificial
emotional intelligence or emotional technology? You use an example of a bathroom mirror that has
ambient intelligence and can tell whether I've slept well. Why do I need that? It's a good question. I think it's important that we design systems that are ultimately
beneficial to people. And one of the roadblocks, especially in healthcare,
is that there's so much rich data out there, but it's very hard to understand it,
or it's cumbersome to monitor it and so designing systems that
make it seamless to be able to collect and understand that type of data is
really important so at MIT when I was a graduate student we built a mirror that
had a camera embedded in it was actually hidden behind to a glass so all it
looked like was just a regular mirror but when you looked in the mirror the camera was using some remote sensing technology we
built to measure the heart rate of the person and we can also measure things
like heart rate variability which is correlated with stress and so the mirror
could then display that information back to the user so it's not just reflecting
their outward appearance but they're sort of in a physiological
state as well. And I found that really compelling because in many cases, we want to know that
information, but we might not want to strap on a sensor or have to go out of our way to collect it.
And if it can be digitally captured by the devices we already use,
there's something quite compelling about that.
Let's talk about the technical aspects of your work for a bit. Much of it's centered
on computer vision technologies and involves webcams and algorithms that aim to understand
emotional states. What's the field of computer vision founded on technically,
and what new developments are we seeing?
So computer vision is exploding, actually. The past 10 years have been some of the most exciting in this domain
with the invention of what's commonly called deep learning.
And so this is the ability to leverage huge amounts of data to train systems that are
much more accurate than previous systems were.
So, for instance, we have object recognition, text recognition, scene understanding that's way more accurate than it used to be because we have these systems that capture lots of complexities of the data.
And because there's so much data they can this technology, as has a lot of other areas of affective computing, whether it's speech recognition or understanding vocal prosody and things like that.
So there's a lot of advances that have happened that basically improve the underlying sensing. And I don't think up until this point we've really had the volume of data about emotions
to go to the next level where we can really understand, okay, how do we build a system
that actually knows what to do with these sensor inputs with something that's as amorphous as
emotion is and hard to define. So the basis of what you're doing is on deep neural networks and machine learning models that you're then applying to the affective domain. And then that feeds into a system which is taking sort of intermediate level information.
For instance, does my facial expression appear positive or negative?
Is my voice tone high energy or low energy?
Is the language I'm using hostile or serene?
And then those intermediate states feed into a higher level understanding, which is combined
with context.
So we need to
know what's happening to interpret emotion. We can't just observe the person. We need to know
the situation, the social context. And so that's kind of where we're moving is really to combine
these sensor observations with more contextual information.
So I wasn't going to ask you this. it wasn't on my list, but how do you
gather data on emotions? Do you have to bring people in and make them angry? I mean, it's a
serious question in a funny way. Yeah, in the past, that was how it was often done. But a lot of my
work in the last few years has been focusing on in situ large scale data collection. So we always ask people if they want to opt in
and if they do then we enable them to use a system which is part of their everyday life.
So this might be a system that runs on their computer or runs on their cell phone and collects
this data over time. Often we might prompt them throughout the day so how are you feeling or we
might say you know is this feeling that we think you're feeling correct in order to get some kind of ground truth?
But ultimately, we want to be able to collect real life data about people's emotional experiences,
because we know if they come to a lab, it's not exactly the same as how it would be in the real world. One of the applications of this emotional
technology is the workplace. In an MIT Sloan Management Review article, it claims that
emotion-sensing technologies could help employees make better decisions, improve concentration,
alleviate stress. So tell us how this works and give us some examples of what it looks like.
And then maybe tell me why I would want my boss
to monitor my eye movements,
my facial expressions and my skin conductance.
So one example we give in that article
is about a trader in Japan
who unfortunately swapped the number of shares
they were selling and the price of the shares
and got those two numbers the wrong way around.
And that ended up being a huge
financial loss. And in high stress situations, that can be really problematic. Another example
would be air traffic control, a very high stress job where people have to be performing at a high
level for the whole of the duration of their shift. And so if we can design technology that is able to sense
when people are becoming overloaded,
too stressed to perform
at the level that they need to,
we could give them that feedback.
So for individuals,
that could be very helpful
for knowing when they need to take a break.
I myself in a job,
you know, an average day,
it would be great
if my computer knew when I was in flow
and stopped interrupting
me with email notifications. Or if I needed to take a break, it could suggest things that would
help me relax and make me more productive when I came back to my desk. And then I think it will
also benefit teams and organizations. Knowing the well-being of your company is a really important thing. And we're starting to see the development of really science around organizations and
particularly focused on the social components.
Social capital is really important and emotion plays a big role in that.
Tell me what safeguards a designer or a developer might think about so that this technology doesn't become nanny cam in the workplace?
That's a really, really important question.
And I think as we design this technology, it's important that we design social norms around how they're used.
Ultimately, technology will advance.
That's somewhat inevitable.
But how we use technology and the social norms that
we design around it are not inevitable so to give an example one of the
practices we follow is always opt-in so we always make sure that people choose
to switch on sensors rather than having it imposed upon them another example is
as we mentioned before,
allowing people to turn off sensors.
And that's really important that people have that.
It increases their trust and comfort with the system a lot.
So these are a couple of examples about
the kind of social norms we can design around this technology.
And I think there are many more that will develop
as we kind of advance the technology
and think about use cases.
Let's talk about reality for a bit. There's actual reality, which I have a passing familiarity with,
but also virtual reality, augmented reality, mixed reality. There's so many realities.
Give us a baseline definition of each of those different realities so we have a frame of reference for what I want to talk about next.
Great. So virtual reality is a completely alternative environment.
So this is where most people will probably be familiar with virtual reality
in terms of the headsets, with a screen where all the information that you see is displayed on that
screen.
Then augmented reality is usually when you can see the real world, but there's some
augmentation of what you see.
So there might be a transparent screen, which is actually displaying certain objects which
are superimposed on the real world.
And then there's this idea of mixed reality, which is really blurring the boundaries between
virtual and augmented reality. So you are leveraging much deeper understanding about
the environment, as well as incorporating a lot more augmentation.
So let's go along that thread for a second here, because when you talk about augmenting human
perception through mixed or virtual reality, you suggest that VR might be able to help people
develop superhuman senses. So what are the possibilities and challenges even of advancing
human senses in this way? Yeah, so I mean, one of the things I find most fascinating
about other areas of science like neuroscience
is how adaptable we are,
and particularly the brain is,
at being able to learn new things based on sensory input.
So we have a panel at South by Southwest where we're discussing some
of the ways that sensor inputs can influence people's perception. And one example that we've
developed is a system that allows people to look at another individual and see physiological
responses of that person. So it's data they wouldn't normally be able to see,
but it's superimposed onto that other person
so that they can actually see their heart beating,
they can see changes in stress
based on heart rate variability,
and that's all sensed remotely.
But you're giving the individual a new sensory channel
that they can leverage, something that they wouldn't normally have.
So this is like x-ray vision.
In a sense, yes. That's a good analogy.
I mean, from the superhero realm.
Yeah, exactly.
So the idea of superhuman senses would be physiological senses that you wouldn't normally be able to see, aside from somebody sweating
or blushing or, you know, their facial expressions, it's inside their bodies.
Exactly. Yeah. It's hidden information that wouldn't normally be accessible,
but using the new technology like, you know, high definition cameras and this augmented
experience that we can create through the HoloLens headset,
we can allow you to see that information in real time.
So maybe now junior high kids can actually find out if someone's in love with
them just by putting on these glasses and they don't have to ask their friend to go
ask their other friend if he likes me.
I've always wanted to build that demo and just see how badly it fails.
That would actually be a really compelling application of the technology,
just to help the junior high kids. So you're one of the creators of an application of this technology called CardioLens. And while it's still in the early stages of research,
and it's not being used in any real life situations right now, you're actually able
to read my heart rate by looking at my face through a pair of augmented reality glasses.
Tell me more about this. What are the possibilities of this research down the road?
Yeah, so I've been working on this area of remote or non-contact physiological measurement for a
while, and this is the idea that a regular webcam, just a camera that
might be on your cell phone or on your laptop, has the sensitivity to pick up very small changes in
the color of your skin or light reflected from your skin to be more accurate, which are related
to blood flow. So actually by analyzing a video input, video stream to that camera, we can pick up your
pulse, we can pick up your respiration rate and your heart rate variability. And there's new work
showing you can measure blood oxygenation and other things. And people are trying to get towards
things like blood pressure. So just using a regular device, no adaptation to the hardware
and some software, we can recover this information.
So what we did was to put the algorithm on the HoloLens, which has a camera that faces forward.
And so when you look at someone, it detects their face, it segments the skin,
it analyzes the color change and recovers the physiological information and then displays that
back in real time,
superimposed onto their appearance.
How accurate is it?
So the technology can be very accurate. So we've done a lot of validation of this.
We can measure heart rate to within one or two beats per minute, typically on a regular video.
We're using deep learning to address this problem. And we've got really, really good results
on some of the hardest data sets that we've tried it on. When you take this out into the real world
where people are moving around and the lighting's changing and you can't control if they're making
facial expressions or speaking or, you know, it starts to become more challenging. But that's the
type of data that we're pushing towards addressing. So we want to
start designing methods that are robust to all of those variations that you'd actually see in a real
life application. But normally you would measure someone's blood pressure or their pulse in a
clinical setting. I mean, you wouldn't necessarily. It would be a tool maybe for the medical community
first or the junior high boy who needs to know.
Exactly. That might be the biggest market. No, I think one application I am particularly excited
about is in medical applications, for instance, surgery. So you could see if a particular part
of the body has good or poor blood flow. And that could be important in transplant operations
where you're attaching a new part of the body, a new organ,
and you need to know if blood is flowing to that particular part of the body.
And with a heads-up display,
a surgeon could potentially look at that region
and see if there is a blood flow signal.
But there are other applications too. Another
example would be, for instance, being able to scan a scene and identify if there's someone who's
alive, for instance, in a search and rescue application. And this also works with infrared
cameras. So even if it's dark, we can still measure the signal. And, you know, there are
other things like baby monitors or in-hospital ICU units
monitoring physiological information without having to have people wired up to lots of
different sensors. We can just use a camera to do that. Every single show, I end up shaking my head.
No one can see it happening, but it's like, really, this is happening? I can't believe it.
It's amazing. Talk about the trade-offs between the
promises these technologies make and some of the concerns, very real concerns about privacy
of the data. Yeah, as I mentioned before, I think it's really important that we design this
technology appropriately. And I think that's where we'll see the biggest benefits. The benefits are
when people recognize, oh, this is something that
actually helps me in my everyday life or helps in a specific application like healthcare.
There are definitely big challenges to privacy because a lot of what we need to do to deploy
this technology is to be able to sense information longitudinally on a large scale, because everyone experiences emotion differently.
You can't just take 10 people and train a system on 10 people
that will generalize to the whole population.
And so we do need to overcome that challenge of, you know,
how do we make this technology such that people feel comfortable with it,
they trust it, and not lead them to feel as though their privacy is violated or that it's too obtrusive. And so I think in design challenges, it's about designing ways for people to be aware that technology's on, that it's there, what it's measuring, what it's doing with that data. And these are some more unsolved problems as of yet.
You said you prefer social norms over governmental regulations or legal remedies. So what's the
balance between the responsibilities of scientists, engineers, and programmers here
versus big regulatory initiatives like GDPR in Europe and other things that might be coming down
the pike? I think both are important, but the reason I prefer focusing on social norms is because as
a designer, as an engineer, that's something I can actively influence every day in my job.
So I can think about, okay, so I'm going to design this sensor system that people are going to choose
to use and capture their emotions, and it's going to create an experience that adapts to how they're feeling, I can choose how to design that and I can
influence the social norms around that technology. So being kind of a leader in the research space
allows me to do that actively, regularly. And I don't think we can necessarily rely 100% on
government or regulation to solve that piece of the puzzle.
A good part of being part of MSR is that we're very involved with the academic community.
I'm involved with the Future of Computing Academy at the ACM. And our task group within that
organization is to think about the ethical questions around AI, not just in affective
computing technology,
but just broadly with machine learning and AI technology that can make decisions about
important things like, for instance, healthcare or justice.
I think social norms and governmental regulation both serve a purpose there.
But one of the things I personally can actively work towards on a daily basis is thinking through what do I ask people to give up in terms of data and what do they get back for
that and how is that data used? And that's something I'm really, really interested in. Let's talk about you for a second.
I'm curious how you got interested in the emotional side of technology and how you ended up at MSR.
Who were your influences, your inspirations, your mentors?
So I did my master's at Cambridge University and was focused on machine learning.
But I was very interested in how I could address more social problems with that technology, not just focus on predicting
stock market prices or some of the sort of numerical analyses that are often solved using
machine learning algorithms. I wanted to see how this technology could actually help people.
And at the time, my advisor for my PhD, Rosalind Picard, who's one of the founders of this
field was working a lot with applications for people on the autism spectrum for whom
understanding emotions is a complex task and often a big challenge in social situations.
And that was one of the reasons that I joined that lab is because I really believed in the potential benefits of affective computing technology, not just to one portion of the population, but to everyone. I could see how it could benefit my life as well. So that's how I got into it. And, you know, it's becoming more true now, but certainly 10 years ago, there was no technology you could really think of that responded or understood human emotion.
No, even now.
Even now, I mean, yeah, we're getting there in research, but there's not many real-life applications you could point towards and say, oh, this is an example of a system that really understands nonverbal or emotional cues.
Right. So what was your path from Cambridge and Rosalind Picard to here?
So I went to the MIT Media Lab, where I did my PhD. And there I worked a lot on large-scale
data analysis to do with understanding emotions in real-world contexts. And then I worked for a
couple of years at a startup and joined MSR out of that and now lead the affective computing technology development within research here.
That's really cool.
So as we wrap up, Daniel, what thoughts or advice would you leave with our listeners, many of whom are aspiring researchers who might have an interest in human-computer interaction or affective computing?
What lines of research are interesting right now?
What might augment, to use an industry term, the field?
If I were to summarize the areas that I think are most important,
the first would be multimodal understanding.
So in the past, a lot of the systems that have been built
have focused just on one piece of information,
like, for instance, facial expressions or voice tone or text.
But to really understand emotions, you have to integrate all that information together.
Because if I just look at facial expressions, you know, if I were to show you a video of someone without the audio
and without the information about what they were saying, it would be hard to interpret exactly how they felt. Or many people have probably experienced
being on a phone call where they haven't been able to exactly understand how someone was feeling
because you've only got the voice tone and language to rely on. You don't have all of that
visual information about their gestures and facial expressions and body posture.
So I think multimodal understanding
is really important.
Another area that I'm particularly interested in
is something we've touched on already,
which is deploying this in the real world.
So how do we take these experiments that have typically
been performed in labs in research environments
where you bring 10 or 20 people in,
and you get them to experience a system, and you evaluate it, which is fine for controlled studies. But ultimately, if we're going to evaluate
the real system and how people actually respond to it in their everyday lives, we need to deploy it.
And so that's something we're focused on is really designing things that are so seamless that people
can use them without them being a burden. And we can start to mine this data that occurs in everyday contexts.
Daniel McDuff, it's been fascinating talking to you.
I wish there was more time, but thanks for coming in.
Thank you very much. It's a pleasure to be here.
To learn more about Dr. Daniel McDuff's work
and find out how machine learning can help
you improve your relationship with your computer, visit Microsoft.com slash research.