Microsoft Research Podcast - 051 (rerun) - When Psychology Meets Technology with Dr. Daniel McDuff

Starting point is 00:00:00 When I interviewed Daniel McDuff earlier this year, we talked about how his research is enabling machines to read human emotions by sensing physiological changes. Turns out this work has implications beyond our feelings. A new study on the role of emotion in problem solving suggests even the coldest of human calculations is tempered by emotion. Whether you heard Daniel's podcast back in March, or you're getting in touch with emotive computing for the first time, I know you'll enjoy episode 17 of the Microsoft Research Podcast, When Psychology Meets Technology. We've developed a system that allows people

Starting point is 00:00:39 to look at another individual and see physiological responses of that person. So it's data they wouldn't normally be able to see, but it's superimposed onto that other person so they can actually see their heart beating. They can see changes in stress based on heart rate variability. And that's all sensed remotely. But you're giving the individual a new sensory channel that they can leverage. You're listening to the Microsoft Research Podcast, a show that brings you closer to the cutting edge of technology research and the scientists behind it. I'm your host, Gretchen Huizenga.

Starting point is 00:01:19 One of the most intriguing areas of machine learning research is affective computing, where scientists are working to bridge the gap between human emotions and computers. It is here at the intersection of psychology and computer science that we find Dr. Daniel McDuff, who has been designing systems from hardware to algorithms that can sense human behavior and respond to human emotions. Today, Dr. McDuff talks about why we need computers to understand us, outlines the pros and cons of designing emotionally sentient agents, explains the technology behind CardioLens, a pair of augmented reality glasses that can take your heart rate

Starting point is 00:02:02 by looking at your face, and addresses the challenges of maintaining trust and privacy when we're surrounded by devices that want to know not just what we're doing, but how we're feeling. That and much more on this episode of the Microsoft Research Podcast. Daniel McDuff, welcome to the show today. Great to have you with us. It's great to be here. So you're in Human-Computer Interaction, or HCI, and you situate your research at the intersection of computer science and psychology. So tell us in broad strokes about HCI and what you do. So the crux of what I do is teaching machines to understand people in a deeper way, and that involves capturing and responding to their

Starting point is 00:02:54 emotional state. So can we design a machine that really understands people, not just what they're saying, but how they're saying it, how they're behaving. And I think that's really fundamental to human-computer interaction because so much of what we do as people is non-verbal, it's not described in language. And a lot of computer systems don't understand that, and that's the focus of my work, is bringing that EQ to technology. EQ meaning emotional quotient. Yeah, that's a somewhat slang term, and it's used frequently to contrast IQ, which is something that technology has a lot of. Technology can answer lots of questions very quickly because it has access to all of the information on the internet, but not much technology has EQ. No. Does any?

Starting point is 00:03:48 I think we're starting to see the beginning of this. So you see social robots as a great example of systems which have some kind of personality. They can express visually some basic facial expressions on a screen or using sounds or lights. Movies are a great example. So R2-D2 is a great example of a system that doesn't have a face but can still communicate emotions. Although that's fictional, we are starting to see systems in the real world that kind of behave in a somewhat similar way. That's fascinating. I even think of Wallace and Gromit animation, where Gromit only communicates with his eyes and his

Starting point is 00:04:26 eyebrows, and yet you get almost everything that he wants to say through his eyes. Exactly. And we take a lot of inspiration from animations and animators. Because I study facial expressions, it's magical how some creators can show so much rich emotion just through a facial expression. And as we design systems that recognize those and exhibit them, there's a lot we can learn from that side of the world. I'm intrigued by the field of affective computing, and I understand it aims to bridge the gap between human emotions and computational technology. So what is affective computing? What does it promise? And what do we need computers to understand us as human beings for? At a high level, affective computing is designing systems that

Starting point is 00:05:16 can read, interpret, and respond to human emotion. And that sounds like a daunting task. There's a lot more we need to do in research but we're starting to see real world systems where this is true. So systems where they can read facial expressions for instance or understand the voice tone of someone or look at sentiment for instance on Facebook posts or Twitter to understand the emotions

Starting point is 00:05:42 that are being expressed. And this is kind of where the world is now, but in the future, we can imagine systems that use multimodal data, robotic systems that interact with us in an embodied way that also sense this type of information. And that's kind of the target we're focused on. So why do you think we need computers to understand us? I think it's fundamental to how we interact as human beings. And so when we interact with a system that doesn't do those things that we take for granted, it can be off-putting. For instance, if a system doesn't

Starting point is 00:06:20 realize that I'm getting frustrated with it, It can be more frustrating. It can even be upsetting. There's research showing that robots that can apologize are liked a lot more than robots that don't, even if they're no better at completing the tasks they were intending to complete. So it can really improve our relationship and our well-being because it fundamentally improves the interaction we have with the devices around us. So you co-authored an article called Designing Emotionally Sentient Agents. And aside from the Hollywood connotations that phrase brings to mind, what should we understand about this research, Daniel?

Starting point is 00:06:56 And what should we be excited about or concerned about? I think there's a lot to be excited about in home care, in health care, in understanding just human interaction even more. If we can design systems to mimic some of those things, it will deepen our understanding of how we as humans behave. There are a number of challenges that we need to overcome. One is how do we sense this information? And sensors can be intrusive. Devices that are around us that are listening

Starting point is 00:07:25 for commands all the time are starting to appear and you can imagine in future there could be camera systems as well so we need to think about the social norms that exist around the sensing side of things where does that data go how is it stored how do we know that the sensor's on? How do we control it? How do we stop it from recording when we want to? And then how do we allow other people who are in our lives to not be sensed, even if we're being sensed? Or if I invite someone into my home and I've got a device that's always listening or always watching, what does that mean for our social interaction? So I think there's some challenges to overcome there, but there's also more philosophical challenges about how much do we teach computers of human

Starting point is 00:08:11 emotion? Is it possible for a machine ever to feel emotion? What does that mean? And how should machines express emotion or respond to this information? We definitely don't want to design systems that are manipulative or that make people feel like they're more intelligent than they are. If someone sees a system that appears emotional, they might think, wow, this is really, really intelligent, even if it's only expressing very basic behaviors. And that can be challenging because some of the other abilities of that system

Starting point is 00:08:43 might be quite weak. And so people might trust it, even if it can't actually perform accurately the tasks it's trying to do. You're using terms that are interesting, and interesting is a kind of placeholder word for other words that I'm actually thinking, like sentient and understanding regarding a machine. And I wonder how I should interpret that. What do people like you and your colleagues really believe about what you just addressed? Can a machine ever feel? Can it really understand? Can it become sentient? I think machines are fundamentally different to humans. Machines can recognize some expressions of emotion. They can respond to them. But I don't think that that constitutes

Starting point is 00:09:34 feeling and emotion. Feeling and emotion requires experience. It requires a reward and a cost associated with different actions. It's much, much more complex than that. So I don't think machines will ever experience emotion in the way that we do, but they will have many of the sort of fundamental skills that we have. What can you tell us about the emerging field of what I would call artificial emotional intelligence or emotional technology? You use an example of a bathroom mirror that has ambient intelligence and can tell whether I've slept well. Why do I need that?

Starting point is 00:10:19 That's a good question. I think it's important that we design systems that are ultimately beneficial to people. And one of the roadblocks, especially in healthcare, is that there's so much rich data out there, but it's very hard to understand it or it's cumbersome to monitor it. And so designing systems that make it seamless to be able to collect and understand that type of data is really important. So at MIT, when I was a graduate student, we built a mirror that had a camera embedded in it. It was actually hidden behind two-way glass, so all it looked like was just a regular mirror. But when you looked in the mirror, the camera was using some remote sensing technology we built to measure the heart rate of the person. And we can also measure things like heart rate variability, which is correlated with stress. And so the mirror could then display that information back to the user.

Starting point is 00:11:15 So it's not just reflecting their outward appearance, but they're sort of in a physiological state as well. And I found that really compelling because in many cases we want to know that information, but we might not want to strap on a sensor or have to go out of our way to collect it. And if it can be digitally captured by the devices we already use, there's something quite compelling about that. Let's talk about the technical aspects of your work for a bit. Much of it's centered on computer vision technologies and involves webcams and algorithms that aim to understand emotional states. What's the field of computer vision founded on technically and what new developments are we seeing? So computer vision is exploding.

Starting point is 00:11:58 Actually, you know, the past 10 years have been some of the most exciting in this domain with the invention of what's commonly called deep learning. And so this is the ability to leverage huge amounts of data to train systems that are much more accurate than previous systems were. So, for instance, we have object recognition, text recognition, scene understanding that's way more accurate than it used to be because we have these systems that capture lots of complexities of the data

Starting point is 00:12:30 and because there's so much data they can learn from they get a really good representation. And understanding facial expressions has also benefited from the advances in this technology as has a lot of other areas of affective computing whether it's speech recognition or Understanding vocal prosody and things like that so there's a lot of advances that have happened that basically improve the underlying sensing and I don't think up until this point. We've really had the volume of data about emotions To go to the next level where we can really understand,

Starting point is 00:13:06 okay, how do we build a system that actually knows what to do with these sensor inputs with something that's as amorphous as emotion is and hard to define. So the basis of what you're doing is on deep neural networks and machine learning models that you're then applying to the affective domain. Exactly, yeah. So we use deep learning for almost all of the sensing modalities we use, whether it's vision-based or audio-based or language-based. And then that feeds into a system which is taking sort of intermediate level information. For instance, does my facial expression appear

Starting point is 00:13:45 positive or negative? Is my voice tone high energy or low energy? Is the language I'm using hostile or serene? And then those intermediate states feed into a higher level understanding, which is combined with context. So we need to know what's happening to interpret emotion. We can't just observe the person. We need to know the situation, to interpret emotion. We can't just observe the person. We need to know the situation, the social context. And so that's kind of where we're moving. It's really to combine these sensor observations with more contextual information. So I wasn't going to ask you this.

Starting point is 00:14:19 It wasn't on my list. But how do you gather data on emotions? Do you have to bring people in and make them angry? I mean, it's a serious question in a funny way. Yeah, in the past, that was how it was often done. But a lot of my work in the last few years has been focusing on in situ large scale data collection. So we always ask people if they want to opt in. And if they do, then we enable them to use a system which is part of their everyday life. So this might be a system that runs on their computer or runs on their cell phone and collects this data over time. Often we might prompt them throughout the day,

Starting point is 00:14:57 sort of, how are you feeling? Or we might say, you know, is this feeling that we think you're feeling correct in order to get some kind of ground truth. But ultimately, we want to be able to collect real life data about people's emotional experiences, because we know if they come to a lab, it's not exactly the same as how it would be in the real world. One of the applications of this emotional technology is the workplace. In an MIT Sloan Management Review article, it claims that emotion sensing technologies could help employees make better decisions, improve concentration, alleviate stress. So tell us how this works and give us some examples of what it looks like. And then maybe tell me why I would want my boss to monitor

Starting point is 00:15:41 my eye movements, my facial expressions and my skin conductance? So one example we give in that article is about a trader in Japan who unfortunately swapped the number of shares they were selling and the price of the shares and got those two numbers the wrong way around. And that ended up being a huge financial loss. And in high stress situations that can be really problematic. Another example would be air traffic control, a very high stress job where people have to be performing at a high level for the whole of the duration of their shift. And so if we can design technology that is able to sense when people are becoming overloaded, too stressed to perform at the level that they need to, we could give them that feedback. So for individuals, that could be very

Starting point is 00:16:30 helpful for knowing when they need to take a break. I myself in a job, you know, an average day, it would be great if my computer knew when I was in flow and stopped interrupting me with email notifications. Or if I needed to take a break, it could suggest things that would help me relax and make me more productive when I came back to my desk. And then I think it will also benefit teams and organizations. Knowing the well-being of your company is a really important thing.

Starting point is 00:16:59 And we're starting to see the development of really science around organizations and particularly focused on the social components. Social capital is really important and emotion plays a big role in that. Tell me what safeguards a designer or a developer might think about so that this technology doesn't become nanny cam in the workplace. That's a really, really important question. And I think as we design this technology, it's important that we design social norms around how they're used. Ultimately, technology will advance. That's somewhat inevitable. But how we use technology and the social norms that we design around it are not inevitable. So to give an

Starting point is 00:17:45 example, one of the practices we follow is always opt-in. So we always make sure that people choose to switch on sensors rather than having it imposed upon them. Another example is, as we mentioned before, allowing people to turn off sensors. And that's really important that people have that. It increases their trust and comfort with the system a lot. So these are a couple of examples about social norms we can design around this technology. And I think there are many more that will develop as we advance the technology and think about use cases. Let's talk about reality for a bit. There's actual reality, which I have a passing

Starting point is 00:18:34 familiarity with, but also virtual reality, augmented reality, mixed reality. There's so many realities. Give us a baseline definition of each of those different realities so we have a frame of reference for what I want to talk about next. Great. So virtual reality is a completely alternative environment. So this is where most people will probably be familiar with virtual reality in terms of the headsets with a screen where all the information that you see is displayed on that screen. Then augmented reality is usually when you can see the real world, but there's some augmentation of what you see. So there might be a transparent screen, which is actually displaying certain objects which are superimposed on the real world. And then there's this idea of mixed reality, which is really blurring the boundaries between virtual and augmented reality. So you are

Starting point is 00:19:32 leveraging much deeper understanding about the environment, as well as incorporating a lot more augmentation. So let's go along that thread for a second here, because when you talk about augmenting human perception through mixed or virtual reality, you suggest that VR might be able to help people develop superhuman senses. So what are the possibilities and challenges even of advancing human senses in this way? Yeah, so I mean, one of the things I find most fascinating about other areas of science like neuroscience is how adaptable we are, some of the ways that sensor inputs can influence people's perception. And one example that we've developed is a system that allows people to look at another individual and see physiological responses of that person. So it's data they wouldn't normally be able to see, but it's superimposed onto that other person so they

Starting point is 00:20:46 can actually see their heart beating, they can see changes in stress based on heart rate variability and that's all sensed remotely. But you're giving the individual a new sensory channel that they can leverage, something that they wouldn't normally have. So this is like x-ray vision. In a sense, yes. That's a good analogy. that they can leverage something that they wouldn't normally have. So this is like x-ray vision. In a sense, yes. That's a good analogy. I mean, from the superhero realm. Yeah, exactly. So the idea of superhuman senses would be physiological senses that you wouldn't

Starting point is 00:21:18 normally be able to see, aside from somebody sweating or blushing or, you know, their facial expressions, it's inside their bodies. Exactly. Yeah. It's hidden information that wouldn't normally be accessible. But using the new technology like, you know, high definition cameras and this augmented experience that we can create through the HoloLens headset, we can allow you to see that information in real time. So maybe now junior high kids can actually find out if someone's in love with them just by putting on these glasses and they don't have to ask their friend to go ask their other friend if he likes me. I've always wanted to build that demo

Starting point is 00:21:57 and just see how badly it fails. That would actually be a really compelling application of the technology just to help the junior high kids. So you're one of the creators of an application of this technology called CardioLens. And while it's still in the early stages of research, and it's not being used in any real-life situations right now, you're actually able to read my heart rate by looking at my face through a pair of augmented reality glasses. Tell me more about this. What are the possibilities of this research down the road? Yeah, so I've been working on this area of remote or non-contact physiological measurement for a while. And this is the idea that a regular webcam, just a camera that might be on your cell phone or on your laptop, has the

Starting point is 00:22:46 sensitivity to pick up very small changes in the color of your skin or light reflected from your skin to be more accurate, which are related to blood flow. So actually by analyzing a video input, video stream to that camera, we can pick up your pulse, we can pick up your respiration rate and your heart rate variability. And there's new work showing you can measure blood oxygenation and other things and people are trying to get towards things like blood pressure so just using a regular device no adaptation to the hardware and some software we can recover this information so what we did was to put the algorithm on the HoloLens, which has a camera that faces forward.

Starting point is 00:23:29 And so when you look at someone, it detects their face, it segments the skin, it analyzes the color change and recovers the physiological information and then displays that back in real time, superimposed onto their appearance. How accurate is it? So the technology can be very accurate. So we've done a lot of validation of this.

Starting point is 00:23:49 We can measure heart rate to within one or two beats per minute, typically on a regular video. We're using deep learning to address this problem. And we've got really, really good results on some of the hardest data sets that we've tried it on. When you take this out into the real world, where people are moving around and the lighting's changing and you can't control if they're making facial expressions or speaking or, you know, it starts to become more challenging. But that's the type of data that we're pushing towards addressing. So we want to start designing methods that are robust to all of those variations that you'd actually see in a real life application. Junior high boy who needs to know. Exactly. Do you love me? That might be the biggest market.

Starting point is 00:24:46 No, I think one application I am particularly excited about is in medical applications, for instance, surgery. So you could see if a particular part of the body has good or poor blood flow. And that could be important in transplant operations where you're attaching a new part of the body, a new organ, and you need to know if blood is flowing to that particular part of the body. And with a heads-up display, a surgeon could potentially look at that region and see if there is a blood flow signal. But there are other applications too. Another example would be, for instance, being able to scan a scene and identify if there's someone who's alive, for instance, being able to scan a scene and identify if there's someone who's alive, for instance, in a search and rescue application. And this also

Starting point is 00:25:31 works with infrared cameras. So even if it's dark, we can still measure the signal. And, you know, there are other things like baby monitors or in-hospital ICU units monitoring physiological information without having to have people wired up to lots of different sensors. We can just use a camera to do that. Every single show, I end up shaking my head. No one can see it happening, but it's like, really, this is happening? I can't believe it. It's amazing. Talk about the trade-offs between the promises these technologies make and some of the concerns, very real concerns about privacy of the data? Yeah, as I mentioned before, I think it's really important that we design this technology appropriately. And I think that's where we'll see the biggest benefits.

Starting point is 00:26:17 The benefits are when people recognize, oh, this is something that actually helps me in my everyday life or helps in a specific application like healthcare. There are definitely big challenges to privacy because a lot of what we need to do to deploy this technology is to be able to sense information longitudinally on a large scale because everyone experiences emotion differently. You can't just take 10 people and train a system on 10 people that will generalize to the whole population. And so we do need to overcome that challenge of, you know, how do we make this technology such that people feel comfortable with it, they trust it, and not lead them to feel as though their privacy is violated

Starting point is 00:27:03 or that it's too obtrusive. And so I think in design challenges, it's about designing ways for people to be aware that technology's on, that it's there, what it's measuring, what it's doing with that data. And these are some more unsolved problems as of yet. You said you prefer social norms over governmental regulations or legal remedies. So what's the balance between the responsibilities of scientists, engineers and programmers here versus big regulatory initiatives like GDPR in Europe and other things that might be coming down the pike? I think both are important, but the reason I prefer focusing on social norms is because as a designer,

Starting point is 00:27:45 as an engineer, that's something I can actively influence every day in my job. So I can think about, okay, so I'm going to design this sensor system that people are going to choose to use and capture their emotions, and it's going to create an experience that adapts to how they're feeling. I can choose how to design that and I can influence the social norms around that technology. So being kind of a leader in the research space allows me to do that actively, regularly. And I don't think we can necessarily rely 100% on government or regulation to solve that piece of the puzzle. A good part of being part of MSR is that we're very

Starting point is 00:28:26 involved with the academic community. I'm involved with the Future of Computing Academy at the ACM. And our task group within that organization is to think about the ethical questions around AI, not just in affective computing technology, but just broadly with machine learning and AI technology that can make decisions about important things like, for instance, healthcare or justice. And I think social norms and governmental regulation both serve a purpose there. But one of the things I personally can actively work towards on a daily basis is thinking through what do I ask people to give up in terms of data and what do they get back for

Starting point is 00:29:06 that and how is that data used and that's something I'm really really interested in. Let's talk about you for a second. I'm curious how you got interested in the emotional side that technology, not just focus on predicting stock market prices or some of the sort of numerical analyses that are often solved using machine learning algorithms. I wanted to see how this technology could actually help people. And at the time, my advisor for my PhD, Rosalind Picard, who's one of the founders of this field, was working a lot with applications for people on the autism spectrum for whom understanding emotions is a complex task and often a big challenge in social situations. And that was one of the reasons that I joined that lab is because I really

Starting point is 00:30:20 believed in the potential benefits of affective computing technology, not just to one portion of the population, but to everyone. I could see how it could benefit my life as well. So that's how I got into it. And, you know, it's becoming more true now, but certainly 10 years ago, there was no technology you could really think of that responded or understood human emotion. No, Even now. Even now, I mean, yeah, we're getting there in research, but there's not many real-life applications you could point towards and say, oh, this is an example of a system that really understands nonverbal or emotional cues. Right. So what was your path from

Starting point is 00:31:00 Cambridge and Rosalind Picard to here? So I went to the MIT Media Lab, where I did my PhD, and there I worked a lot on large-scale data analysis to do with understanding emotions in real-world contexts. And then I worked for a couple of years at a startup and joined MSR out of that and now lead the affective computing technology development within research here. That's really cool. So as we wrap up, Daniel, what thoughts or advice would you leave with our listeners, many of whom are aspiring researchers who might have an interest in human-computer interaction or affective computing? What lines of research are interesting right now? What might augment, to use an industry term, the field?

Starting point is 00:31:46 If I were to summarize the areas that I think are most important, the first would be multimodal understanding. piece of information like for instance facial expressions or voice tone or text but to really understand emotions you have to integrate all that information together because if i just look at facial expressions you know if i were to show you a video of someone without the audio and without the information about what they were saying it would be hard to interpret exactly how they felt or many people have probably experienced being on a phone call where they haven't been able to exactly understand how someone was feeling because you've only got the voice tone and language to rely on you don't have all of that visual information about their gestures and facial expressions and body posture so i think multimodal understanding is really

Starting point is 00:32:41 important another area that I'm particularly interested in is something we've touched on already, which is kind of deploying this in the real world. So how do we take these experiments that have typically been performed in labs in research environments where you bring 10 or 20 people in and you get them to experience a system and you evaluate it, which is fine for controlled studies. But ultimately, if we're going to evaluate the real system and how people actually respond to it in their everyday lives, we need to deploy it. And so that's something we're focused on is really designing things that are so seamless that people can use them without them being a burden. And we can start to

Starting point is 00:33:21 mine this data that occurs in everyday contexts. Daniel McDuff, it's been fascinating talking to you. I wish there was more time, but thanks for coming in. Thank you very much. It's a pleasure to be here. To learn more about Dr. Daniel McDuff's work and find out how machine learning can help you improve your relationship with your computer, visit microsoft.com slash research.

Your Ad Here

Microsoft Research Podcast - 051 (rerun) - When Psychology Meets Technology with Dr. Daniel McDuff

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.