Microsoft Research Podcast - 051 (rerun) - When Psychology Meets Technology with Dr. Daniel McDuff
Episode Date: November 21, 2018This episode first aired in March (2018)One of the most intriguing areas of machine learning research is affective computing, where scientists are working to bridge the gap between human emotions and ...computers. It is here, at the intersection of psychology and computer science, that we find Dr. Daniel McDuff, who has been designing systems, from hardware to algorithms, that can sense human behavior and respond to human emotions. Today, Dr. McDuff talks about why we need computers to understand us, outlines the pros and cons of designing emotionally sentient agents, explains the technology behind CardioLens, a pair of augmented reality glasses that can take your heartrate by looking at your face, and addresses the challenges of maintaining trust and privacy when we’re surrounded by devices that want to know not just what we’re doing, but how we’re feeling. Â
Transcript
Discussion (0)
When I interviewed Daniel McDuff earlier this year, we talked about how his research is enabling machines to read human emotions by sensing physiological changes.
Turns out this work has implications beyond our feelings.
A new study on the role of emotion in problem solving suggests even the coldest of human calculations is tempered by emotion. Whether you heard Daniel's podcast back in March, or you're getting in touch with emotive computing
for the first time,
I know you'll enjoy episode 17
of the Microsoft Research Podcast,
When Psychology Meets Technology.
We've developed a system that allows people
to look at another individual
and see physiological responses of that person. So it's data they
wouldn't normally be able to see, but it's superimposed onto that other person so they
can actually see their heart beating. They can see changes in stress based on heart rate
variability. And that's all sensed remotely. But you're giving the individual a new sensory channel that they can leverage.
You're listening to the Microsoft Research Podcast, a show that brings you closer
to the cutting edge of technology research and the scientists behind it. I'm your host,
Gretchen Huizenga.
One of the most intriguing areas of machine learning research is affective computing,
where scientists are working to bridge the gap between human emotions and computers.
It is here at the intersection of psychology and computer science that we find Dr. Daniel
McDuff, who has been designing systems from hardware to algorithms that can sense human
behavior and respond to human emotions.
Today, Dr. McDuff talks about why we need computers to understand us,
outlines the pros and cons of designing emotionally sentient agents, explains the
technology behind CardioLens, a pair of augmented reality glasses that can take your heart rate
by looking at your face, and addresses
the challenges of maintaining trust and privacy when we're surrounded by devices that want
to know not just what we're doing, but how we're feeling.
That and much more on this episode of the Microsoft Research Podcast.
Daniel McDuff, welcome to the show today. Great to have you with us. It's great to be here.
So you're in Human-Computer Interaction, or HCI, and you situate your research at the intersection of computer science and psychology.
So tell us in broad strokes about HCI and what you do. So the crux of what I do is teaching machines to
understand people in a deeper way, and that involves capturing and responding to their
emotional state. So can we design a machine that really understands people, not just what they're
saying, but how they're saying it, how they're behaving. And I think that's really fundamental to human-computer interaction because so much of what we do as people is non-verbal,
it's not described in language. And a lot of computer systems don't understand that,
and that's the focus of my work, is bringing that EQ to technology.
EQ meaning emotional quotient. Yeah, that's a somewhat slang term, and it's used
frequently to contrast IQ, which is something that technology has a lot of. Technology can answer
lots of questions very quickly because it has access to all of the information on the internet,
but not much technology has EQ. No. Does any?
I think we're starting to see the beginning of this.
So you see social robots as a great example of systems which have some kind of personality.
They can express visually some basic facial expressions on a screen
or using sounds or lights.
Movies are a great example.
So R2-D2 is a great example of a system that doesn't have a face but can still communicate emotions. Although that's fictional,
we are starting to see systems in the real world that kind of behave in a somewhat similar way.
That's fascinating. I even think of Wallace and Gromit animation, where Gromit only communicates with his eyes and his
eyebrows, and yet you get almost everything that he wants to say through his eyes.
Exactly. And we take a lot of inspiration from animations and animators. Because I study facial
expressions, it's magical how some creators can show so much rich emotion just through a facial expression.
And as we design systems that recognize those and exhibit them, there's a lot we can learn
from that side of the world. I'm intrigued by the field of affective computing,
and I understand it aims to bridge the gap between human emotions and computational technology. So what
is affective computing? What does it promise? And what do we need computers to understand
us as human beings for? At a high level, affective computing is designing systems that
can read, interpret, and respond to human emotion. And that sounds like a daunting task.
There's a lot more we need to do in research
but we're starting to see real world systems
where this is true.
So systems where they can read facial expressions
for instance or understand the voice tone of someone
or look at sentiment for instance on Facebook posts
or Twitter to understand the emotions
that are being expressed.
And this is kind of where
the world is now, but in the future, we can imagine systems that use multimodal data,
robotic systems that interact with us in an embodied way that also sense this type of
information. And that's kind of the target we're focused on.
So why do you think we need computers to understand us? I think it's fundamental to how
we interact as human beings. And so when we interact with a system that doesn't do those
things that we take for granted, it can be off-putting. For instance, if a system doesn't
realize that I'm getting frustrated with it, It can be more frustrating. It can even be
upsetting. There's research showing that robots that can apologize are liked a lot more than
robots that don't, even if they're no better at completing the tasks they were intending to
complete. So it can really improve our relationship and our well-being because it fundamentally
improves the interaction we have with the devices around us.
So you co-authored an article called Designing Emotionally Sentient Agents.
And aside from the Hollywood connotations that phrase brings to mind,
what should we understand about this research, Daniel?
And what should we be excited about or concerned about?
I think there's a lot to be excited about in home care, in health care,
in understanding just human interaction even more.
If we can design systems to mimic some of those things, it will deepen our understanding of how we as humans behave.
There are a number of challenges that we need to overcome.
One is how do we sense this information?
And sensors can be intrusive.
Devices that are around us that are listening
for commands all the time are starting to appear and you can imagine in future there could be camera
systems as well so we need to think about the social norms that exist around the sensing side
of things where does that data go how is it stored how do we know that the sensor's on? How do we control it? How do we stop
it from recording when we want to? And then how do we allow other people who are in our lives to
not be sensed, even if we're being sensed? Or if I invite someone into my home and I've got a device
that's always listening or always watching, what does that mean for our social interaction? So I
think there's some challenges to overcome there,
but there's also more philosophical challenges about how much do we teach computers of human
emotion? Is it possible for a machine ever to feel emotion? What does that mean? And how should
machines express emotion or respond to this information? We definitely don't want to design
systems that are manipulative
or that make people feel like they're more intelligent than they are.
If someone sees a system that appears emotional,
they might think, wow, this is really, really intelligent,
even if it's only expressing very basic behaviors.
And that can be challenging because some of the other abilities of that system
might be quite weak.
And so people might trust it, even if it can't actually perform accurately the tasks it's trying
to do. You're using terms that are interesting, and interesting is a kind of placeholder word
for other words that I'm actually thinking, like sentient and understanding regarding a machine. And I wonder how I should interpret
that. What do people like you and your colleagues really believe about what you just addressed?
Can a machine ever feel? Can it really understand? Can it become sentient?
I think machines are fundamentally different to humans. Machines can recognize
some expressions of emotion. They can respond to them. But I don't think that that constitutes
feeling and emotion. Feeling and emotion requires experience. It requires a reward and a cost
associated with different actions. It's much, much more complex than that.
So I don't think machines will ever experience emotion
in the way that we do,
but they will have many of the sort of fundamental skills
that we have.
What can you tell us about the emerging field of what I would call artificial emotional intelligence or emotional technology? You use an example of a bathroom mirror that has
ambient intelligence and can tell whether I've slept well. Why do I need that?
That's a good question. I think it's important that we design systems that are ultimately beneficial to people.
And one of the roadblocks, especially in healthcare, is that there's so much rich data out there, but it's very hard to understand it or it's cumbersome to monitor it.
And so designing systems that make it seamless to be able to collect and understand that type of data is really important.
So at MIT, when I was a graduate student, we built a mirror that had a camera embedded in it.
It was actually hidden behind two-way glass, so all it looked like was just a regular mirror.
But when you looked in the mirror, the camera was using some remote sensing technology we built to measure the
heart rate of the person. And we can also measure things like heart rate variability, which is
correlated with stress. And so the mirror could then display that information back to the user.
So it's not just reflecting their outward appearance, but they're sort of in a physiological
state as well. And I found that really compelling because in many cases
we want to know that information, but we might not want to strap on a sensor or have to go out
of our way to collect it. And if it can be digitally captured by the devices we already use,
there's something quite compelling about that. Let's talk about the technical aspects of your
work for a bit. Much of it's centered on computer vision technologies and involves webcams and algorithms that aim to understand emotional states.
What's the field of computer vision founded on technically and what new developments are we seeing?
So computer vision is exploding.
Actually, you know, the past 10 years have been some of the most exciting in this domain with the invention of what's commonly called deep learning.
And so this is the ability to leverage huge amounts of data
to train systems that are much more accurate
than previous systems were.
So, for instance, we have object recognition,
text recognition, scene understanding
that's way more accurate than it used to be
because we have these systems that capture lots of complexities of the data
and because there's so much data they can learn from they get a really good
representation. And understanding facial expressions has also benefited from the
advances in this technology as has a lot of other areas of affective computing
whether it's speech recognition or
Understanding vocal prosody and things like that
so there's a lot of advances that have happened that basically improve the underlying sensing and
I don't think up until this point. We've really had the volume of data about emotions
To go to the next level where we can really understand,
okay, how do we build a system that actually knows what to do with these sensor inputs
with something that's as amorphous as emotion is and hard to define.
So the basis of what you're doing is on deep neural networks and machine learning models
that you're then applying to the affective domain.
Exactly, yeah. So we use deep learning for almost all of the sensing modalities we use,
whether it's vision-based or audio-based or language-based. And then that feeds into a
system which is taking sort of intermediate level information. For instance, does my facial
expression appear
positive or negative? Is my voice tone high energy or low energy? Is the language I'm using
hostile or serene? And then those intermediate states feed into a higher level understanding,
which is combined with context. So we need to know what's happening to interpret emotion. We can't
just observe the person. We need to know the situation, to interpret emotion. We can't just observe the person.
We need to know the situation, the social context.
And so that's kind of where we're moving.
It's really to combine these sensor observations with more contextual information.
So I wasn't going to ask you this.
It wasn't on my list.
But how do you gather data on emotions?
Do you have to bring people in and make them angry? I mean, it's a serious question in a funny way.
Yeah, in the past, that was how it was often done. But a lot of my work in the last few years
has been focusing on in situ large scale data collection. So we always ask people if they want
to opt in. And if they do, then we enable them to use a system which
is part of their everyday life. So this might be a system that runs on their computer or runs on
their cell phone and collects this data over time. Often we might prompt them throughout the day,
sort of, how are you feeling? Or we might say, you know, is this feeling that we think you're
feeling correct in order to get some kind of ground truth.
But ultimately, we want to be able to collect real life data about people's emotional experiences, because we know if they come to a lab, it's not exactly the same as how it would be in the real world.
One of the applications of this emotional technology is the workplace.
In an MIT Sloan Management Review
article, it claims that emotion sensing technologies could help employees make better
decisions, improve concentration, alleviate stress. So tell us how this works and give us
some examples of what it looks like. And then maybe tell me why I would want my boss to monitor
my eye movements, my facial expressions and my skin conductance?
So one example we give in that article is about a trader in Japan who unfortunately swapped the
number of shares they were selling and the price of the shares and got those two numbers the wrong
way around. And that ended up being a huge financial loss. And in high stress situations that can be really problematic.
Another example would be air traffic control, a very high stress job where people have to be
performing at a high level for the whole of the duration of their shift. And so if we can design
technology that is able to sense when people are becoming overloaded, too stressed to perform at the
level that they need to, we could give them that feedback. So for individuals, that could be very
helpful for knowing when they need to take a break. I myself in a job, you know, an average day,
it would be great if my computer knew when I was in flow and stopped interrupting me with
email notifications. Or if I needed to take a break,
it could suggest things that would help me relax
and make me more productive when I came back to my desk.
And then I think it will also benefit teams and organizations.
Knowing the well-being of your company
is a really important thing.
And we're starting to see the development
of really science around organizations and particularly focused on the social components.
Social capital is really important and emotion plays a big role in that.
Tell me what safeguards a designer or a developer might think about so that this technology doesn't become nanny cam in the workplace.
That's a really, really important
question. And I think as we design this technology, it's important that we design social norms around
how they're used. Ultimately, technology will advance. That's somewhat inevitable. But how we
use technology and the social norms that we design around it are not inevitable. So to give an
example, one of the practices we follow is always opt-in. So we always make sure that people choose
to switch on sensors rather than having it imposed upon them. Another example is, as we mentioned
before, allowing people to turn off sensors. And that's really important that people have that.
It increases their trust and comfort with the system a lot.
So these are a couple of examples
about social norms we can design around this technology.
And I think there are many more that will develop
as we advance the technology and think about use cases. Let's talk about reality for a bit. There's actual reality, which I have a passing
familiarity with, but also virtual reality, augmented reality, mixed reality. There's so
many realities. Give us a baseline definition of each of those different realities so we have a frame of reference for what I want to talk about next.
Great. So virtual reality is a completely alternative environment.
So this is where most people will probably be familiar with virtual reality in terms of the headsets with a screen where all the information that you see is displayed on that screen.
Then augmented reality is usually when you can see the real world, but there's some augmentation
of what you see. So there might be a transparent screen, which is actually displaying certain
objects which are superimposed on the real world. And then there's this idea of mixed reality,
which is really blurring the boundaries between virtual and augmented reality. So you are
leveraging much deeper understanding about the environment, as well as incorporating a lot
more augmentation. So let's go along that thread for a second here, because when you talk about
augmenting human perception through mixed or virtual reality, you suggest that VR might be
able to help people develop superhuman senses. So what are the possibilities and challenges even
of advancing human senses in this way? Yeah, so I mean, one of the things I find most fascinating about other areas of science like neuroscience is how adaptable we are, some of the ways that sensor inputs can influence
people's perception. And one example that we've developed is a system that allows people
to look at another individual and see physiological responses of that person. So it's data they
wouldn't normally be able to see, but it's superimposed onto that other person so they
can actually see their heart beating, they can see changes in stress based on heart rate
variability and that's all sensed remotely. But you're giving the individual a new sensory
channel that they can leverage, something that they wouldn't normally have.
So this is like x-ray vision.
In a sense, yes. That's a good analogy. that they can leverage something that they wouldn't normally have. So this is like x-ray vision.
In a sense, yes. That's a good analogy. I mean, from the superhero realm.
Yeah, exactly.
So the idea of superhuman senses would be physiological senses that you wouldn't
normally be able to see, aside from somebody sweating or blushing or,
you know, their facial expressions, it's inside their bodies.
Exactly. Yeah. It's hidden information that wouldn't normally be accessible.
But using the new technology like, you know, high definition cameras and this augmented experience that we can create through the HoloLens headset,
we can allow you to see that information in real time.
So maybe now junior high kids can actually find out if someone's in love with them
just by putting on these glasses and they don't have to ask their friend to go ask
their other friend if he likes me. I've always wanted to build that demo
and just see how badly it fails. That would actually be a really
compelling application of the technology just to help the junior high kids.
So you're one of the creators of an application of this technology called CardioLens.
And while it's still in the early stages of research, and it's not being used in any real-life situations right now,
you're actually able to read my heart rate by looking at my face through a pair of augmented reality glasses.
Tell me more about this. What are the possibilities of this research down the road?
Yeah, so I've been working on this area of remote or non-contact physiological measurement for a
while. And this is the idea that a regular webcam, just a camera that might be on your cell phone or on your laptop, has the
sensitivity to pick up very small changes in the color of your skin or light reflected from your
skin to be more accurate, which are related to blood flow. So actually by analyzing a video
input, video stream to that camera, we can pick up your pulse, we can pick up your respiration rate
and your heart rate variability. And there's new work showing you can measure blood oxygenation and other things and
people are trying to get towards things like blood pressure so just using a regular device no
adaptation to the hardware and some software we can recover this information so what we did was
to put the algorithm on the HoloLens,
which has a camera that faces forward.
And so when you look at someone, it detects their face,
it segments the skin, it analyzes the color change
and recovers the physiological information
and then displays that back in real time,
superimposed onto their appearance.
How accurate is it?
So the technology can be very accurate.
So we've done a lot of validation of this.
We can measure heart rate to within one or two beats per minute,
typically on a regular video.
We're using deep learning to address this problem.
And we've got really, really good results
on some of the hardest data sets that we've tried it on.
When you take this out into the real world, where people are moving around and the lighting's changing and you can't control if they're making facial expressions or speaking or, you know, it starts to become more challenging.
But that's the type of data that we're pushing towards addressing.
So we want to start designing methods that are robust to all of those variations that you'd actually see in a real life application. Junior high boy who needs to know. Exactly. Do you love me? That might be the biggest market.
No, I think one application I am particularly excited about is in medical applications, for instance, surgery.
So you could see if a particular part of the body has good or poor blood flow.
And that could be important in transplant operations where you're attaching a new part of the body, a new organ,
and you need to know if blood is flowing to that particular part of the body. And with a
heads-up display, a surgeon could potentially look at that region and see if there is a blood flow
signal. But there are other applications too. Another example would be, for instance, being
able to scan a scene and identify if there's someone who's alive, for instance, being able to scan a scene and identify
if there's someone who's alive, for instance, in a search and rescue application. And this also
works with infrared cameras. So even if it's dark, we can still measure the signal. And, you know,
there are other things like baby monitors or in-hospital ICU units monitoring physiological
information without having to have people wired up to lots
of different sensors. We can just use a camera to do that. Every single show, I end up shaking my
head. No one can see it happening, but it's like, really, this is happening? I can't believe it.
It's amazing. Talk about the trade-offs between the promises these technologies make and some of the concerns, very real concerns
about privacy of the data? Yeah, as I mentioned before, I think it's really important that we
design this technology appropriately. And I think that's where we'll see the biggest benefits.
The benefits are when people recognize, oh, this is something that actually helps me in my everyday life or helps in a specific application like healthcare.
There are definitely big challenges to privacy because a lot of what we need to do to deploy this technology
is to be able to sense information longitudinally on a large scale because everyone experiences emotion differently.
You can't just take 10 people and train a system on 10 people
that will generalize to the whole population.
And so we do need to overcome that challenge of, you know,
how do we make this technology such that people feel comfortable with it,
they trust it, and not lead them to feel as though their privacy is violated
or that it's too obtrusive. And so
I think in design challenges, it's about designing ways for people to be aware that technology's
on, that it's there, what it's measuring, what it's doing with that data. And these are some
more unsolved problems as of yet. You said you prefer social norms over governmental regulations or legal remedies.
So what's the balance between the responsibilities of scientists, engineers and programmers here
versus big regulatory initiatives like GDPR in Europe and other things that might be coming
down the pike?
I think both are important, but the reason I prefer focusing on social norms is because as a designer,
as an engineer, that's something I can actively influence every day in my job.
So I can think about, okay, so I'm going to design this sensor system that people are
going to choose to use and capture their emotions, and it's going to create an experience that
adapts to how they're feeling.
I can choose how to design that and I can
influence the social norms around that technology. So being kind of a leader in the research space
allows me to do that actively, regularly. And I don't think we can necessarily rely 100% on
government or regulation to solve that piece of the puzzle. A good part of being part of MSR is that we're very
involved with the academic community. I'm involved with the Future of Computing Academy at the ACM.
And our task group within that organization is to think about the ethical questions around AI,
not just in affective computing technology, but just broadly with machine learning and AI
technology that can
make decisions about important things like, for instance, healthcare or justice.
And I think social norms and governmental regulation both serve a purpose there.
But one of the things I personally can actively work towards on a daily basis is thinking
through what do I ask people to give up in terms of data and what do they get back for
that and how is that data used and that's something I'm really really interested in.
Let's talk about you for a second. I'm curious how you got interested in the emotional side that technology, not just focus on predicting
stock market prices or some of the sort of numerical analyses that are often solved using
machine learning algorithms. I wanted to see how this technology could actually help people.
And at the time, my advisor for my PhD, Rosalind Picard, who's one of the founders of this field, was working a
lot with applications for people on the autism spectrum for whom understanding
emotions is a complex task and often a big challenge in social situations. And
that was one of the reasons that I joined that lab is because I really
believed in the potential benefits of affective computing technology, not just to one portion of the population, but to everyone.
I could see how it could benefit my life as well.
So that's how I got into it.
And, you know, it's becoming more true now, but certainly 10 years ago, there was no technology you could really think of that responded or understood human emotion.
No, Even now.
Even now, I mean, yeah, we're getting there in research, but there's not many
real-life applications you could point towards and say, oh, this is an example of a system that
really understands nonverbal or emotional cues. Right. So what was your path from
Cambridge and Rosalind Picard to here? So I went to the MIT Media Lab, where I did my PhD,
and there I worked a lot on large-scale data analysis to do with understanding emotions in
real-world contexts. And then I worked for a couple of years at a startup and joined MSR
out of that and now lead the affective computing technology development within research here.
That's really cool. So as we wrap up, Daniel, what thoughts or advice would you leave with
our listeners, many of whom are aspiring researchers who might have an interest in
human-computer interaction or affective computing? What lines of research are interesting right now?
What might augment, to use an industry term, the field?
If I were to summarize the areas that I think are most important, the first would be multimodal understanding. piece of information like for instance facial expressions or voice tone or text but to really
understand emotions you have to integrate all that information together because if i just look at
facial expressions you know if i were to show you a video of someone without the audio and without
the information about what they were saying it would be hard to interpret exactly how they felt
or many people have probably experienced being on a phone call
where they haven't been able to exactly understand how someone was feeling because you've only got
the voice tone and language to rely on you don't have all of that visual information about their
gestures and facial expressions and body posture so i think multimodal understanding is really
important another area that I'm particularly interested in
is something we've touched on already, which is kind of deploying this in the real world. So how
do we take these experiments that have typically been performed in labs in research environments
where you bring 10 or 20 people in and you get them to experience a system and you evaluate it,
which is fine for controlled studies. But ultimately, if we're
going to evaluate the real system and how people actually respond to it in their everyday lives,
we need to deploy it. And so that's something we're focused on is really designing things
that are so seamless that people can use them without them being a burden. And we can start to
mine this data that occurs in everyday contexts.
Daniel McDuff, it's been fascinating talking to you.
I wish there was more time, but thanks for coming in.
Thank you very much. It's a pleasure to be here.
To learn more about Dr. Daniel McDuff's work
and find out how machine learning can help you improve your relationship with your computer,
visit microsoft.com slash research.