Microsoft Research Podcast - 030 - Examining the Social Impacts of Artificial Intelligence with Dr. Fernando Diaz

Starting point is 00:00:00 If I'm running an experiment, what is an ethical experiment? What's an unethical experiment to run on users? To what extent should users be aware of the fact that they're in experiments? How am I going to recognize and address biases in the data that my machine's learning from? And that's just scratching the surface. There's going to be plenty of other questions that are going to be raised in the next few years about how we design these systems in a way that is respectful of our users. You're listening to the Microsoft Research Podcast, a show that brings you closer to

Starting point is 00:00:30 the cutting edge of technology research and the scientists behind it. I'm your host, Gretchen Huizenga. Developing complex artificial intelligence systems in a lab is a challenging task. But what happens when they go into production and interact with real humans? That's what researchers like Dr. Fernando Diaz, a principal research manager at Microsoft Research Montreal, want to know. He and his colleagues are trying to understand and address the social implications of these systems as they enter the open world. Today, Dr. Diaz shares his insights on the kinds of questions we need to be asking about artificial

Starting point is 00:01:12 intelligence and its impact on society. He also talks about how algorithms can affect your taste in music and why now more than ever, computer science education needs to teach ethics along with algorithms. That and much more on this episode of the Microsoft Research Podcast. Fernando Diaz, welcome to the podcast. Thank you. You're a principal research manager at Microsoft Research Montreal, and you work in artificial intelligence and search and information retrieval, but you're also really involved in fairness, accountability, transparency, and ethics or FATE research. So in broad strokes, and we'll get specific in a bit, what gets you up in the morning? What are the big questions you want answers to and the big problems you'd like to solve? Well, a lot of these systems that we're building are extremely successful.

Starting point is 00:02:10 So information retrieval or web search, computer vision, these have all been developed over the course of many, many decades. And they're starting to get productionized and users are starting to see them on a daily basis. What we haven't thought so much about in designing these systems is, as computer scientists, what is the social context in which these things are being used? And so my concern here is better understanding, you know, what are the social implications of these systems we're building? How will the social context in which these systems are used affect not just our own metrics like precision or recall, but also society at large? And I think this is something that's really coming to a fore for computer scientists because a lot of these techniques that have been developed in isolation are just starting to be commercialized right now. So you're a computer scientist by training, and you've done research in information retrieval, statistical methods, machine learning. But you've lately become very

Starting point is 00:03:10 interested in the interface of these AI systems and society, particularly what happens when AI hits production, or as some people say, the open world. Why this interest for you now? What questions are you asking? What piqued your interest in going this direction? That's a great question. So I obviously went to graduate school and got a PhD and was studying these systems at a pretty abstract level and really conducting experiments with data that was static and collected offline. And soon after graduate school, I came into industrial research lab and was working with production teams that were implementing the techniques that I was studying in graduate school. And you begin to realize that when you take these algorithms and you scale them out and you put them in front of real users, a lot of the basic assumptions that you were making in the laboratory were not well supported in reality. And so that was sort of a check for me in terms of my research

Starting point is 00:04:06 agenda, really coming back to the first principles and trying to understand better, okay, what is the problem here? What exactly do I need to be measuring? And how do I optimize for this right metric, whatever that might be? So you were at Microsoft Research before, and then you took a bit of a break, and then you came back. You started in New York, and now you're in Montreal. What brought you back? So I had actually, after graduate school, started my industrial research career in Montreal. And for various reasons, I had to relocate out of Montreal. But even while I lived here, I recognized that this city itself, and Canada in general, has a pretty rich tradition

Starting point is 00:04:45 of very strong computer science research, machine learning research. And in the back of my head, like I'd always wanted to come back to participate in that. And so when the opportunity arose to rejoin Microsoft Research in a new Montreal lab, it was the perfect fit, especially since the lab itself focuses on artificial intelligence. The city itself is really going through a blossoming of AI research and being part of that and contributing my voice to that broader conversation is something that just made sense to me. Let's talk about Montreal for a minute. It's become a global hotspot in artificial intelligence research. And the goal of the MSR Montreal lab is very specific.

Starting point is 00:05:27 They want to teach machines to read, think, and communicate like humans. Give us your particular take on where we are on this quest and how your research interests align with what the Montreal lab is doing. Well, I think part of the reason there's a research lab in that area is the fact that there are still a lot of unanswered questions with respect to how to engineer these systems. And I think it will require more than just folks from natural language processing or more than just folks from dialogue or reinforcement learning. It really requires the combination of those. And I think that's one of the things that makes this lab especially unique. Now, my role is to come into this lab and hopefully add to that conversation by providing a perspective of, well, how will these systems actually behave when they're in a human context? Because, like I said before, it's very easy to design these systems in isolation. And then when you deploy them, realize that there were some very strong assumptions in

Starting point is 00:06:25 the experiments that you were conducting. Now, the role of the group that I'm building is to try to sort of anticipate some of these questions and better engineer our systems to be robust to, say, differences in the populations that I might be interacting with or in the corpora that I might be gaining knowledge from. What is the group you're building there? So the group that I'm building is sort of a sibling organization of the Fairness, Accountability, Transparency, and Ethics group that we started in New York City a few years ago. The focus will be on studying the social implications

Starting point is 00:07:00 of artificial intelligence in society. And so that group will be composed of folks with a technical computer science background, but also a technical background from related disciplines such as sociology. And so the idea here being that in order for computer scientists to better understand and address the social implications, they really need experts in, you know, sociology. They need experts in anthropology, et cetera, to give us the insight into those things that we have not been measuring so well so far. Yeah, let's talk about that. The fairness, accountability, transparency, and ethics application to a variety of artificial intelligence and machine learning technology

Starting point is 00:07:43 research is super important right now. And as you bring up, this is because not all coroner cases can be explored in the lab, and there are some unintended consequences, along with the intended ones, that can surprise people. And so this community doing research in this area is quite diverse in terms of academic training. What do each of these experts bring to the mix when they're looking at fairness, accountability, transparency, and ethics? So somebody from a social science background will have a better understanding of technology use in general, how it's used or misused, and how people just react to certain tools that we provide them. Somebody from a legal background

Starting point is 00:08:22 might be able to better comment on the policy implications of certain technologies that are being developed or really give us a deeper understanding of what we mean when we talk about something like fairness. And then folks from the computer science community really understand the systems that are being developed and may be able to conceptualize some of the constraints such as fairness and incorporate them into the system. But it really requires these multiple perspectives to come up with better approaches to designing these systems. Let's go back to some stuff you've done in the past and are still working on now, information access systems and search engine information

Starting point is 00:09:02 retrieval. And in a paper you wrote, you suggested there's a gap between studying these systems and implementing them, but you also make a sort of provocative statement that there are open problems that are better addressed by academia than industry. What kinds of problems are those, and why do you make that statement? One of the things that happens in information access research is you have academics who have contributed to the community a lot, but these days a lot of the, say, web search research is happening at the big web search companies where they have the data, they have the users, etc. And a lot of times the academics don't have access to the experimentation platforms or the data. And so there's a disparity in terms of the amount of rigor you can do in your research. So what I was claiming in that article was that, you know, well, academics, even though they don't

Starting point is 00:09:50 have that amount of data, they do have a broad set of collaborators that you may not find at the bigger search engine companies. So at a university, you have access to sociologists in other departments, you have access to economics, professors, all of these are potential collaborators, which will help you understand the problem from multiple different perspectives, instead of perhaps one very specific perspective, which you might have in a web search company. I think data set releases are one strategy. One of the other approaches that I think are one of the other types of scientific platforms that one would not have in academia is the experimentation. So I can actually run A-B tests, you can run controlled experiments with large populations of users, which doesn't really exist in a dataset

Starting point is 00:10:35 release. No, that's true. And so one of the things that I think is worth exploring is how do we actually provide access to academics for doing that sort of controlled experimentation. Interesting. That's happened in bits and pieces here and there, but I think this is really something that we as an industrial researcher can think about providing. Okay, let's go back to data. Let's talk about data for a minute. In machine learning circles, there seems to be some consensus that it's no longer good enough just to have big data, and I use that with air quotes around it, but it also has to be good data or unbiased data, which is part of that mix. So we know that there's probable bias in a lot of these big data sets, and we need to fix that. People are now talking about trying to remove bias through things like

Starting point is 00:11:37 search engine audits and fairness-aware algorithms and that kind of thing. How do you do that? One of the reasons we're concerned about bias in data is that the trained model will be biased when it's deployed. And so step one is to be able to detect whether or not the actions of artificial intelligence are biased themselves. And if they are, how do I go back and retrain that algorithm or add constraints to the algorithm so it doesn't learn the biases from the data. And so my work to date has focused primarily on the measurement side of things. On the measurement side of things, it has more to do with understanding the users that are coming into the system, what they're asking for, and whether or not the system, by virtue of the fact of who the user is or what population they're coming from, is behaving in a way that you would consider biased.

Starting point is 00:12:29 And that requires a lot of the expertise from the information retrieval community who have been thinking a lot about measurement and evaluation for almost since the beginning of the research agenda of the community in the 50s. And so this is what makes it a good natural fit between auditing and measurement and information retrieval. So as we've discussed, bias has become a bit of a buzzword in data science and AI research. But you've suggested there are other social issues besides bias that also need to be addressed. What are some of these issues and how can your research help? Yeah, I do think that bias is a very important problem. But I think one of the reasons why I talk about the social implications of AI is because I think bias is just one of the social implications that we can detect. There are certainly others.

Starting point is 00:13:16 So a pretty clear one is transparency. So how do I make the algorithm's decisions about what it's doing transparent to the user so that the user feels a little bit more in control of the situation when they're actually trying to cooperate with an algorithm? A second one would be sort of the cultural implications of algorithms. So this happens more in context of, say, a movie or music recommendation. So I'm building this big system to recommend music to individuals. What are the longer-term cultural implications of deploying these recommendation algorithms if I know that recommending certain musicians will push somebody's musical taste in certain directions in the long term? What does this mean for the creation or curation of culture? The other side of that problem is that music recommendation

Starting point is 00:14:06 algorithms can really have profound effects on the creators and musicians themselves. And so I might, as a computer scientist, say, well, this is the best algorithm for music recommendation. I'm just going to deploy it. But as computer scientists, we haven't really thought about, well, what are the effects on the actual creators? I think that, for me, is one that is especially salient. So on that thread then, how might you craft a research project that would ask these questions and how would you do your research there? Right. So let's take this example of music recommendation. So you can imagine sitting down with musicians and better understanding understanding what is

Starting point is 00:14:46 important to them. What do they feel like they're getting out of a system? How do they feel like they're in control or out of control in a recommendation system? And sitting down with folks who come more from the sociology or anthropology backgrounds, media studies, to help me as a computer scientist understand what that big population of musicians look like. And then I can, as a computer scientist, sit down and try to better understand, well, how do I design an algorithm which will both satisfy my listeners as well as satisfy my musicians? Now, I think even posing it that way is extremely reductive. And so that's why I wish there was somebody in the room from one of these other disciplines to point out and say, well, Fernando, you know, you haven't thought about this. So given the nature of the research you do and all the things

Starting point is 00:15:43 you see as a result, is there anything we should be concerned about, anything that keeps you up at night? One of the things that does concern me is the fact that a lot of these techniques that we're developing as a research community are being put out and then deployed into production within a matter of days or weeks by people who perhaps were not the original experimenters. Now, there's nothing wrong with open science, but I do think that we need to be a bit more cognizant and aware of the effects that our algorithms will have before we rush to deploy them. And what I'm concerned about is that we're sort of quickly pushing out newer and newer algorithms without having a deeper understanding of those implications. Microsoft Research has a reputation for working closely with academic institutions,

Starting point is 00:16:31 and I know that education is something you're passionate about. So talk about what's going on in FATE, or Fairness, Accountability, Transparency, and Ethics, education, and tell us your vision for the future of education in this area. So I think when I went to graduate school or even undergraduate for computer science, one of the things that was not really taught so well was ethics or the social implications of the technologies that we're developing. And I think part of the reason for that is that, you know, you're studying operating systems or you're studying information retrieval at an abstract level. You're not really thinking about the context in which they're going to be deployed. And to be honest, the class is hard enough having to just think about the core algorithms. And so I think at least when I was

Starting point is 00:17:14 trained and even now, like understanding the social implications and social responsibility that you have as an engineer or scientist did not really make it into the conversation. And so I think that's being recognized now. I think that computer science departments are starting to develop curricula around ethics and computer science. And so I think students are starting to be trained. My concern, though, is that, you know, we already have a lot of people out there developing systems that have not gone through that training. On top of that, there's not a lot, I mean, you don't need a computer science PhD in order to, you know, start up a company. So that part of the, you know, who's not covered by education is another thing that keeps me up at night. But in terms of the education side of things,

Starting point is 00:17:59 I think as somebody who's been in an industrial research lab, I can provide that perspective to students when I'm in the classroom. So to better understand not just the practical implications of deploying a machine learning system at scale, but also the social and ethical implications of deploying a machine learning system at scale. And this is exactly the sorts of questions I alluded to before. If I'm running an experiment, what is an ethical experiment? What's an unethical experiment to run on users? To what extent should users be aware of the fact that they're in experiments? How am I going to recognize and address biases in the data that my machine's learning from? And that's just scratching the surface. There's going to be plenty of other questions that are going to be raised in the next few years about how we design these systems in a way that is respectful of our users. You know, you see the history of other areas, for example, medicine and the Hippocratic

Starting point is 00:18:54 Oath, first do no harm, and also the systems that are set up in place to help ensure that people aren't harmed. And we think naturally about that with physical trials. And it's just really encouraging to know that we're starting to think about that in artificial intelligence deployment at large scale, even though we don't have like an FDA for AI, there's movement. Yeah. And I think it's important to understand that that movement has happened in other disciplines, as you said. And so it's not like we're going through this for the first time. So to the extent that we can reach out to folks in the medical ethics community or folks in the social sciences, they can help us develop what it means to be a responsible computer scientist.

Starting point is 00:19:42 I mean, if we were starting from ground zero, that's especially daunting. But I think we do have collaborators across the disciplines that can help us with this problem. So as we close, Fernando, what's on the horizon for the next generation of researchers in your area? I know that's a big question, but at least what thoughts or advice might you leave with our listeners, many of whom are aspiring researchers, who might have an interest in the social impact of AI? So I think for folks interested in that area specifically, I think a lot of the research that's happening right now is really day one of a much, much longer research agenda. I think it's an extremely exciting area to be involved in because while we're starting to ask the fundamental questions now,

Starting point is 00:20:26 I also think that there's a lot of additional fundamental questions that have yet to be asked by exactly those young researchers right now. I think in terms of the social impact of that research, it's potentially very large because there are a lot of these systems that are already deployed. And so your ability to be involved in, say, a product like a search engine that has social impacts and to correct those things is extremely powerful. Yeah. So it's a great time to be a young researcher. Fernando Diaz, it's been great talking to you and super encouraging. So thanks for taking time to talk to us today. Thank you very much.

Starting point is 00:21:09 To learn more about Dr. Fernando Diaz and his research on fairness, accountability, transparency, and ethics in computer science, visit microsoft.com slash research.

Your Ad Here

Microsoft Research Podcast - 030 - Examining the Social Impacts of Artificial Intelligence with Dr. Fernando Diaz

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.