Microsoft Research Podcast - Ideas: Community building, machine learning, and the future of AI

Episode Date: December 1, 2025

As the Women in Machine Learning Workshop (WiML) marks its 20th annual gathering, cofounders, friends, and collaborators Jenn Wortman Vaughan and Hanna Wallach reflect on WiML’s evolution, navigatin...g the field of ML, and their work in responsible AI.Show notes

Transcript
Discussion (0)
Starting point is 00:00:00 You're listening to Ideas, a Microsoft Research podcast that dives deep into the world of technology research and the profound questions behind the code. In this series, we'll explore the technologies that are shaping our future and the big ideas that propel them forward. Hello and welcome. I'm Jen Wartman Vaughn. This week, Machine, researchers around the world will be attending the annual conference on neural information processing systems, or NERIPS. I am especially excited about NERIPS this year because of a co-located event, the 20th annual Workshop for Women in Machine Learning, or WIML, which I am going to be attending both as a mentor and as a keynote speaker. So to celebrate 20 years of WIML, I'm here today with my long-term collaborator, colleague, close friend, and my co-form.
Starting point is 00:01:00 founder of the Workshop for Women in Machine Learning, Hannah Wallach. You know, you and I have known each other for a very long time at this point. And in many ways, we followed very parallel and often intersecting paths before we both ended up here working in Responsible AI at Microsoft. So I thought it might be fun to kick off this podcast with a bit of the story of our interleaving trajectories. So let's start way back 20 years ago, around the time we first had the idea for Wimel. Where were you? And what were you up to? Yeah. So I was a PhD student at the University of Cambridge, and I was working with the late David McKay. I was focusing on machine learning for analyzing text. And at that point in time, I'd actually just begun working on
Starting point is 00:01:49 Bayesian latent variable models for text analysis. And my research was really focusing on trying to combine ideas from n-gram language modeling with statistical topic modeling in order to come up with models that just did a better job at modeling text. I was also doing this super weird two-country thing. So I was doing my PhD at Cambridge, but at the end of the first year of my PhD, I spent three months as a visiting graduate student at the University of Pennsylvania, and I loved it so much so that at the end of the three months, I said, can I have. extend for a full year. Cambridge said yes, Penn said yes, so I did that, and actually ended up then
Starting point is 00:02:31 extending another year and then another year and another year and so on and so forth. But during my first full year at Penn, that was when I met you. And it was at the visiting students weekend, and I had been told by the faculty and the department that I had to work really hard on recruiting you. I had no idea that that was actually going to be the start of a 20-plus year. friendship. Yeah. I still remember that visiting weekend very well. I actually met you. I met my husband, Jeff, and I met my PhD advisor, Michael Kerns, all on the same day at that visiting student weekend. So I didn't know it at the time, but it was a very big day for me. So around that time when I started my PhD at Penn, I was working in machine learning theory and algorithmic
Starting point is 00:03:19 economics. So even then, you know, just like I am now, I was interested in the intersection of people in AI systems. But since my training was in theory, my people tended to be these mathematically ideal people with these well-defined preferences and beliefs who behaved in very well-defined ways. Working in learning theory like this was appealing to me because it was very neat and precise.
Starting point is 00:03:47 There was just none of the mess of the real world. You could just write down your model, which contained all of your assumptions, and everything else that followed from there was in some sense objective. So I was really enjoying this work. And I was also so excited to have you around the department at the time. You know, honestly, I also loved Penn. It was just such a great environment.
Starting point is 00:04:12 I was just actually back there a few weeks ago, visiting to give a talk. I had an amazing time. But it was, I will say, very male-dominated in the computer science department at the time. In my incoming class of PhD students, we had 20 incoming PhDs, and I was the only woman there. But we managed to build a community. We had our weekly ladies' brunch, which I loved, and things like that really kept me going during my PhD. Yeah, I love that, ladies' brunch. That made a huge difference to me and kind of kept me going through the PhD as well.
Starting point is 00:04:48 And like you, I'd always been interested in people. And during the course of my PhD, I realized that I wasn't interested in analyzing text for the sake of text, right? I was interested because text is one of these ways that people communicate with each other. You know, people don't write text for the sake of writing text. They write it because they're trying to convey something. And it was really that that I was interested in. It was these kind of social aspects of text that I found super interesting. So coming out of the PhD, I then got a post.
Starting point is 00:05:21 stop job, focused on analyzing texts as part of these sort of broader social processes. From there, I ended up getting a faculty job, also at UMass, as one of four founding members of UMass's Computational Social Science Institute. So there was me in computer science, then there was another assistant professor in statistics, another in political science, and another in sociology. And in many ways, this was my dream job. I was being paid to develop and use machine learning methods to study social processes and answer questions that social scientists wanted to study. It was pretty awesome. You, I think, started a faculty position at the same time, right?
Starting point is 00:06:09 Yeah. So I also did a postdoc. First, I spent a year as a postdoc at Harvard, which was super fun. And then I started a 10-year track position in computer science at UCLA in 2010. Again, you know, it was a very male-dominated environment. My department was mostly men. But even more importantly than this, I just didn't really have a network there. You know, it was lonely.
Starting point is 00:06:36 One exception to this was Mihaila Vandershaw. She was at UCLA in the time, though not in my department. And she kind of took me under her wing. So I'm very grateful that I had that support. But overall, this position just wasn't a great fit for me. And I was under more stress than I think I have been at any other point in my life that I could really remember. Yeah. So at that point, then, you ended up transitioning to Microsoft Research, right?
Starting point is 00:07:04 Why did you end up choosing MSR? Yeah. So this was back in 2012. MSR had just opened up this new New York City lab at the time, and working in this lab was basically my dream job. I think I actually tried to apply before they had even officially opened the lab, like when I just heard it was happening. So this lab focused in three areas at the time.
Starting point is 00:07:31 It focused in machine learning, algorithmic economics, and computational social science. And my research at the time cut across all three of these areas. So it felt just like this purpose. perfect opportunity to work in the space where my work would fit in so well and be really appreciated. The algorithmic economics group at the time actually was working on building prediction markets to aggregate information about future events, and they were already in doing this building on top of some of my theoretical research, which is just super cool to see. So that was exciting. And I already knew a couple of people here. I knew John Langford,
Starting point is 00:08:11 and Dave Pennick, who was in the economics group at the time, because I'd done an internship, actually, with the two of them at Yahoo Research before they came to Microsoft. And I was really excited to come back and work with them again as well. You know, even here at the time that I joined the lab, it was 13 men and me. So, once again, not great numbers. And I think that in some ways this was especially hard on me because I was just naturally, like, a very shy person. and I hadn't really built up the confidence that I should have at that point in my career.
Starting point is 00:08:44 But on the other hand, I found the research fit just so spot on that I couldn't say no. And I suspect that this is something that you understand yourself because you actually came and joined me here in the New York lab a year or two later. So why did you make this switch? Yeah. So I anticipated that I was going to love my faculty job. It was focusing on all this stuff that I was so excited about. And much to my surprise, though, I kind of didn't.
Starting point is 00:09:17 And it wasn't like there was any one particular thing that I didn't like. It was more of a mixture of things. I did love my research, though. That was pretty clear to me. But I wasn't happy. So I spent a summer talking to as many people as possible in all different kinds of jobs, really just with the goal of figuring out what their day-to-day lives looked like. You were one of the people I spoke to, but I spoke to a ton of other people as well.
Starting point is 00:09:45 And from doing that, at the end of that summer, I ended up deciding to apply to industry jobs. And I applied to a bunch of places and got a bunch of authors. But I ended up deciding to join Microsoft Research, New York City, because of all the places I was considering going, they were the only place that said, we love your research. we love what you do, do you want to come here and do that same research? And that was really appealing to me because I loved my research. Of course, I wanted to come there and do my same research, and especially with all of these amazing people like you, Duncan Watts, who for many years, been somebody I'd really looked up to. He was there as well at that point in time.
Starting point is 00:10:28 There was this real focus on computational social science, but with a little bit more of an industry perspective. There are also these amazing machine learning researchers. just for many of the same reasons as you, I was just really excited to join that lab, and particularly excited to be working in the same organization as you again. Yeah, I'm happy to take at least a little bit of the credit for recruiting you to Microsoft here many years ago. Oh, yeah.
Starting point is 00:10:56 Yeah. I was really excited to have you join, too, though. I think the timing actually worked out so that I missed your first couple of months because I was on maternity leave with my first daughter at the time. I should say, I've got two daughters, and I'm very proud to share in the context of this podcast, that they're both very interested in math and reading as well. Yeah, they're both great. So then we ended up working in the same place, but despite that, it still took us several years to end up actually collaborating on research.
Starting point is 00:11:29 Do you remember how we ended up working together? Yeah, so I used to tell this story a lot. Actually, I was at this panel on AI and society back in, I think it was probably 2016. It was taking place in D.C. And someone on this panel made this statement that sooner AI systems are just going to be so good that all of the uncertainty is going to be taken out of our decision making. And something about this statement just like really set me off. I got so mad about it because I thought it was just such irresponsible. thing to be saying. So I came back to New York and I think I was ranting to you about this in the
Starting point is 00:12:11 lab and this conversation ended up getting us started on this whole longer discussion about the importance of communicating uncertainty and about explaining the assumptions that are behind the predictions that you're making and all of this. So this was something, I was really excited about this because this was something that had really been drummed into me for years as a Bayesian. So Bayesian statistics, which forms a lot of the foundation of the type of machine learning that I was doing, is all about explicitly stating assumptions and quantifying uncertainty. So I just felt super strongly about this stuff. Yeah. So somehow all of these discussions we are having led us to read up on this literature that was coming out of the machine learning community
Starting point is 00:12:58 on interpretability at the time. There were a bunch of these papers coming out that were making claims about models being interpretable without stopping to define who they were interpretable to or for what purpose, never actually taking these models and putting them down in front of real people. And we wanted to do something about this. So we started running controlled experiments with real people
Starting point is 00:13:24 and found that we often can't trust our intuition about what makes a model interpretable. Yeah, one of the things that came up a lot in that work was sort of how to measure the squishy, abstract human concepts, light interpretability that are really hard to define, let alone quantify and measure and stuff like that. Absolutely. So I think one of the first things that we really struggled with
Starting point is 00:13:51 in this line of work was what it even means to be interpretable or intelligible or intelligible or anything. of these terms that we're getting thrown around at the time. We ended up doing some research, which is still one of my favorite papers, with our colleagues, Fruke Porsabzi, Jay Kaufman, and Dan Goldstein. And in this work, we found it really useful to think about interpretability as a latent property that can be kind of influenced by different properties of a model or systems design. So things like the number of features the model has, or whether the model's linear, or even things like the user interface of the model. This was kind of a gateway project for me in the sense that it's one of
Starting point is 00:14:38 the first projects that I got really excited about that was more of a human computer interaction or HCI project rather than a theory project like I'd been working on in the past. And it just set off this huge spark of excitement in me. It felt to me at the time more important than other things that I was doing, and I just wanted to do more and more of this work. I would say the other project that had a really similar effect on me, which we also worked on together right around the same time, was our work with Ken Holstein mapping out challenges that industry practitioners were facing in the space of AI fairness. Oh, yeah. Okay. Yeah. That project, that was so fun, and I learned so much from it.
Starting point is 00:15:27 If I recall correctly, we originally hired Ken, who I think was an HCI PhD student at CMU at the time, as an intern to work with us on creating sort of user experiences for fairness tools like the Fair Learn Toolkit. And we started that project. So that was in collaboration with Mero Doodick and Haldame. We started that project by having Ken talk to a whole bunch of practitioners at Microsoft, but at other organizations. as well to get a sense for how they were and weren't using fairness toolkits like Fair Learn. And I want to point out that at that point in time, the academic research community was super focused on all of these simple quantitative metrics for assessing the fairness in the context of predictions and predictive machine learning models with this kind of understanding that these
Starting point is 00:16:21 tools could then be built to help practitioners assess the fairness of their predictive models. maybe even make fairer predictions. And so that's the kind of stuff that this Fair Learn Toolkit was originally developed to do. So we ended up asking all of these practitioners originally just as sort of the precursor to what we thought we were going to end up doing with this project. We also these practitioners about their current practices and challenges around fairness in their work and about their additional needs for support. So where do they feel like they have the right tools and processes and practices and
Starting point is 00:16:57 practices and where did they feel like they were missing stuff? And this was really eye-opening because what we found was so different than what we were expecting. And there's two things that really stood out to us. So the first thing was that we found a much, much wider range of applications beyond prediction. So we'd come into this assuming that all these practitioners were doing stuff with predictive machine learning models. But in fact, we were fighting, they would do all kinds of stuff. There was a bunch of unsupervised stuff. There was a bunch of, you know, language-based stuff, all of this kind of thing.
Starting point is 00:17:30 And in hindsight, that probably doesn't sound very surprising nowadays because of the rise of generative AI and really the entire machine learning in AI field is much less focused on prediction in that kind of narrow kind of classification regression kind of way. But at the time, this was really surprising, especially in light of the academic literature's focus on predictions when thinking about fairness. The second thing that we found was that practitioners often struggled to use existing fairness research,
Starting point is 00:18:03 in part because these quantitative metrics that were all the rage at that point in time just weren't really amenable to the types of real-world complex scenarios that these practitioners were facing. And there was a bunch of different reasons for this, but one of the things that really stood out to us was that this wasn't so much about the underlying models
Starting point is 00:18:22 and stuff like that. But it was actually that there were a variety of data challenges involved here, around things like data collection, collection of sensitive attributes, which we need in order to actually use these fairness metrics. So putting all this together, the upshot of all this was that we never did what we originally set out to do with that internship project. Because we uncovered this really large gap between research and practice, we ended up publishing this. paper that characterized this gap and then surfaced important directions for future research. The other thing that the paper did was emphasize the importance of doing this kind of qualitative work to actually understand what's happening in practice rather than just making assumptions about what practitioners are and aren't doing. The other thing that came out of that,
Starting point is 00:19:17 of course, was that the four of us, so you, me, Mero, and Howell learned a ton of about HCI and about qualitative research from Ken, which was just so fun. Yeah. And I started to be confronted with the fact that I could no longer reasonably ignore all of these messes of the real world because, you know, in some ways, responsibly I is really all about the messes. So I think this project was really a big shift for both of us.
Starting point is 00:19:51 And in some ways, working on this and the interpretability work really led us to be active in these early efforts that were happening within Microsoft in the responsibly I space. The research that we were doing was feeding directly into company policy. And it felt like it was just like a huge place where we could have some impact. So it's very exciting. So switching gears a bit, Hanna, do you remember how we first got the idea for WIMO? Yes, I do. So we were at Newyip's. This was back in 2005.
Starting point is 00:20:28 It was a, so Newark's was a very different conference back then. Now it's like tens of thousands of people. It's held in a massive convention center. Yes, there are researchers there, but there's a variety of people from across the tech industry who attend. But that is not what it was like back then. So in around, in 2005, it was more like 600 people thereabouts in terms. hotel. And the main conference would be held every year in Vancouver. And then everybody at the conference would pile onto these buses and we would all head up to Whistler for the workshops. So
Starting point is 00:21:04 super different to what's happening nowadays. It was my third time, I think that's right, I think it was my third time attending the conference. But it was my first time sharing a hotel room with other women. And I remember up at the workshops, up in Whistler, there were five of us sitting around in a hotel room, and we were talking about how amazing it was that there were five of us sitting around talking women. And we kind of couldn't believe there were five of us. We're all PhD students at the time. And so we decided to make this list, and we started trying to figure out who the other women in machine learning were. And we came up with about 10 names, and we were kind of amazed that there were even 10 women in machine learning. We thought this was a huge number. We
Starting point is 00:21:48 were very excited, and we started talking about how it might be really fun to just bring them all together sometime. So we returned from Murricks, and you and I ended up getting lunch to strategize. I still remember walking out of the department together to go get lunch, and you were walking ahead of me. I can visualize the coat you were wearing as you were walking in front of me. And so we strategized a bit, and ended up deciding, along with one of the other women, Lisa Weiner, to submit a proposal to the Grace Hopper conference for a session in which women in machine learning would give short talks about their research. We reached out to the 10 names that we had, that we'd written down in the hotel room, and through that process,
Starting point is 00:22:32 I actually ended up finding out about more women in machine learning, and eventually had something like 25 women listed on the final proposal. I think there's an email somewhere where one or other of us is saying to the other one, oh my gosh, it can't. believe there are so many women in the scene learning. So we submitted this proposal, and ultimately, the proposal was rejected by the Grace Harper Conference, but we was so excited about the idea and just really invested in it by that point, that we decided to hold our own co-located event the day before the Grace Harper Conference.
Starting point is 00:23:07 And I got to say, you know, 20 years later, I don't know what we were thinking. Like, that was a bold move on the part of three PhD students. And it turned out to be a huge amount of work that we had to do entirely ourselves as well. We had no idea what we were doing. But the Grace Hopper folks very nicely connected us with the venue that the conference was going to be held at. And somehow we managed to pull it off. Ultimately, that first workshop had around 100 women. And there was this, rather than just like a single short session, which was what we'd originally had in mind,
Starting point is 00:23:45 we had this full days worth of talks. I actually have the booklet of abstracts from all of those talks at my desk in the office. I still have that today. And it was just an amazing experience. Yeah, it was. And, you know, you mentioned how bold we were. I just, I really don't think that any of us at the time realized how bold we were being here. Getting this workshop rejected and then saying,
Starting point is 00:24:15 know, no, we think this is important. We're going to do it anyway on our own as grad students. So I've already talked a little bit about some of the spaces that I was in throughout my career where there just weren't a lot of women around in the room with me. How had you experienced a lack of community or network of women in machine learning before the founding of WIML? And, you know, why do you think it's important to have that kind of community? So I felt it in a number of different ways. I think I mentioned a few minutes ago that it was my third time at New York's, but my first time
Starting point is 00:24:51 sharing a hotel room with another woman. But there were many places over the years where I felt this. So first as an undergraduate, then I did a lot of free and open source software development, and I was pretty involved in stuff to do with the Debbie and Linux distribution.
Starting point is 00:25:07 And back then, the percentage of women involved in free and open source software development was about one and a half percent. and the percentage involved actually in Debian was even less than that. So that had led me and some others to start this Debian women project. And then again, of course, I faced this in machine learning. I just didn't know that many other women in machine learning.
Starting point is 00:25:30 I didn't, there weren't a large number of senior women, for example, to look up to his role models. There weren't a large number of female PhD students. And this kind of made me sad because I was really excited about machine learning. and I hope to spend my entire career in it, but because I didn't see so many other women around, particularly more senior women, that really made me question whether that would even be possible, and I just didn't know.
Starting point is 00:25:58 I think, you know, thinking about this, and I've obviously reflected on this a lot over the years, but I think having a diverse community in any area, be it free and open source software development, be it machine learning, any of these kinds of things, is just so important for so much. many reasons. And some of those reasons are little things like finding people that you would feel comfortable sharing a hotel room with. But many of these things are bigger things that can then
Starting point is 00:26:23 have like even kind of knock on cumulative effect, like feeling valued in the community, feeling welcome in the community, having role models, being able to sort of see people and say, oh, I want to be kind of like that person when I grow up. I could do this. And then even just representation of different perspectives in the work itself is so important. important. The flip side of that is that there are a whole bunch of things that can go wrong if you don't have a diverse community. You can end up with gatekeeping, with toxic or unsafe cultures. Obviously, attrition is people just leave these kinds of spaces because they feel that they're not welcomed there and won't be valued there. And then to that point of having representation of different perspectives, with a really homogenous community, you can end up with kind of blind spots around the technology itself, which can then lead to harms. 100%. So did you ever imagine during all of this that Wimel would still be around 20 years later
Starting point is 00:27:22 and we would be sitting here on a podcast talking about this? No, absolutely not. I didn't even think that Wimel would necessarily be around for a second year. I thought it was probably going to be like a one-off event. And I certainly don't think that I thought that I would still be involved in the machine learning community 20 years later as well. so very unexpected. I've got a question for you, though.
Starting point is 00:27:47 What do you remember most about that first workshop? I remember a lot of things. I remember that, you know, when we were planning this, we always really wanted the focus to be the research. And, you know, if you think back to what this first workshop looked like, it was a lot of us just giving talks or presenting posters about our own research to other people. And, you know, I remember thinking at the poster session,
Starting point is 00:28:11 And, like, the vibe was just so much different and better, healthier, really, than other poster sessions I had bid to. Everyone was so supportive and encouraging, but it really was all about the research. I also remember being blown away, just walking into that conference room in the morning and seeing all of these women gathered in one place and knowing that somehow we had actually made this happen. I remember we also faced some challenges with the workshop early on. What are the challenges that stand out to you most? Yeah. So a lot of people really got it, right? And they were super supportive.
Starting point is 00:28:49 So, for example, folks at Penn totally got it. And they actually funded a bunch of that first workshop. But others in the community didn't get it. Didn't see the point, didn't see why it was necessary. I remember having dinner with one machine learning researcher and him telling me that he didn't think this kind of workshop was necessary because women's experiences were no different to men's experiences. And then later on in the conversation, he talked about, like, you know, this is like an hour and a half later or something.
Starting point is 00:29:16 He talked about how he and a friend of his had gone to the bar at all women's college, and he felt so awkward and out of place. And I ended up pointing out to him that he just kind of explained to himself why we needed Wimble. So, yeah, there was some people who didn't get it, and it took a lot of sort of talking to people and kind of explaining. Another challenge was figuring out how to fund it in an ongoing manner once we decided that we wanted to do this more than once. So, as I said, Penn funded a lot of the first workshop, but that wasn't a sustainable model, and it wasn't going to be realistic for Penn to keep funding it. So in the end, we worked with Amy Greenwald to obtain a National Science Foundation grant that would cover a lot of costs. And we also received donations from other organizations. A third challenge was figuring out where to hold the workshop, given that we did want that focus to be on research.
Starting point is 00:30:13 So the first two times we held the workshop at the Grace Hopper conference, but we started to feel that that wasn't really the right venue, given that we wanted that focus to be on research. So we ended up moving it to Nureps, and this had a bunch of benefits, some of which I don't think we'd even fully thought through when we made that decision. So one of the benefits was that attendees WIML travel funding, so we would give them this travel funding to enable them to pay the cost of attending WIML, stay in hotel rooms, all this kind of stuff. This would actually enable them to attend NUIPS as well if we co-located with NURIPs. Another main benefit was that we held WIML on the day before NURIP. So then throughout the rest of the conference, Wimel attendees, would see familiar faces throughout the crowd and wouldn't necessarily feel so alone. So you're talking about these challenges. How have these challenges changed over time? Or, you know, more broadly, can you talk about how the workshop and women in machine learning as an organization as a whole kind of evolved over the years? I know that you served the term as the Wimel president.
Starting point is 00:31:25 Yeah. So it's changed. a lot. So first, obviously, most importantly, it evolved from being kind of this one-off event where we were just seeing what would happen to be in really a robust organization. And the first step in that was creating the Wimel Board. And as you just said, I served as a first president of that. But there have been a bunch of other steps since then. And one of the things I want to flag about the Wimel Board was that this was really important because the board members could focus on the long-term health of the organization and these sort of, like, you know, things that spanned multiple years, like how to get sustainable funding sources, this kind of thing,
Starting point is 00:32:07 versus the actual workshop organizers who would focus on things like running the call for submissions and stuff like that. And being able to separate those roles made it really just reduced the burden on the workshop organizers meant that we could take this kind of longer-term perspective. another really important step was becoming officially becoming a non-profit so that happened that happened a few years ago and again it just it was the natural thing to do at that point in time and just another step towards creating this sort of durable robust organization but it's really taken on a life of its own I'm honestly not super actively involved nowadays
Starting point is 00:32:46 which I think is fantastic the organization doesn't need me that's great it's also wild to me that because it's been a around for 20 years at this point, that there are women in the field who don't know what it's like to not have Wimel. So a bunch of other affinity groups got created. So to meet Ghebrou, co-founded Black in AI when she was actually a postdoc at Microsoft Research New York City. So you and I got to actually see the founding of that affinity group up close. And then now there are a ton of other affinity groups. So there's Latinx in AI, there's queer in AI, Muslims in ML, indigenous in AI and ML, new in ML, just to name a few.
Starting point is 00:33:29 Yeah. And all of these are growing, too, every year. You know, this year, Wimel had over 400 submissions. They accepted 250 to be presented. It's amazing. Yeah, yep. And there's going to be a Wimel presence this year, actually, at all three of the Nurep's venues. So there's going to be a presence in Mexico City, in Copenhagen, and, of course, in San Diego for the main workshop.
Starting point is 00:33:52 So it's pretty great. And, you know, on top of that, I think the organization now, as you were saying, is able to do so much more than just the workshop alone. So, for instance, Wimel now runs this worldwide mentorship program for women and non-binary individuals and machine learning where they're matched with a mentor and they can participate in these one-to-one mentoring meetings and seminars and panel discussions, which happens all throughout the, year. I think they have about 50 mentors signing up each year, but I'm sure they could always use more. So it's just really amazing to look back and see how much the Wimel community has done and how much it's grown. And, you know, on the one hand, I think that, honestly, like, founding Wimel was one of the things that I have done over the course of my career, if not the thing that I am most proud of to this day. But at the same time, like, we can't take credit for any of us.
Starting point is 00:34:57 It's like a community effort. It's been just the community has really kept this going for the last 20 years. So it's great. I'm going to stop gushing now, but it's amazing. And it's not just Wimel that's changed over the years. The entire industry has changed a ton as well. How has your research evolved as a result of these changes to the entire field of AI and machine learning? and also from your own change from academia to industry. It's a great question. You know, we've touched on this a little bit, but our research paths really evolved differently,
Starting point is 00:35:31 but ended up in these very similar places. We're working on responsible AI, we're advocating for interdisciplinary approaches, incorporating techniques from HCI and so on. And I think that part of this was because of shifts of the community and also what's happening in industry. Working in responsible AI in industry, there's definitely not ever a shortage of interesting problems to solve, right?
Starting point is 00:35:54 And I think that for both of us, our research interests in recent years really have been driven by these really practical challenges that we're seeing. We are both involved early on in defining what responsible AI means within Microsoft, shaping our internal responsible AI standard. I led this internal company-wide working group on AI transparency, which was focused both on model interpretability, like we were talking about earlier, but also other forms of transparency,
Starting point is 00:36:24 like data sheets for data sets, and that transparency notes that Microsoft now releases with all of our products. And at the same time, you were leading this internal working group on fairness. Yeah, taking on that internal working group was kind of a big transition point in my career. You know, when I joined Microsoft,
Starting point is 00:36:45 I was focusing on computational social science, and I was also entirely doing research and wasn't really that involved in stuff in the rest of the company. Then at the end of my first year at Microsoft, I attended the first fairness accountability and transparency in machine learning workshop, which was co-located with Nureps. It was one of the Nureps workshop. And I got really excited about that and thought, great, I'm going to spend like 20% of my time, maybe one day a week, doing research on topics in the space of fairness and accountability. accountability and transparency. That is not what ended up happening. Over the next couple of years, I ended up doing more and more research on
Starting point is 00:37:26 responsible AI, you know, as you said, on topics to do with fairness, to do with interpretability. And then in early 2018, I was asked to co-chair this internal working group on fairness. And that was the point where I started getting much more involved in responsible AI stuff across Microsoft, so outside of just Microsoft research. And this was really exciting to me because Responsible AI was so new, which meant that research had a really big role to play. It wasn't like this was kind of an established area where folks in engineering and policy knew exactly what they were doing.
Starting point is 00:38:01 And so that meant that I got to branch out from this very sort of research-focused work into much more applied work in collaboration with folks from policy, from engineering, and so on. Now, in fact, as well as being a researcher, I actually run a small applied science team, the Sociotechnical Alignment Center, or Stack, for short, within Microsoft Research, that focuses specifically on bridging research and practice in responsible AI. Yeah. Do you think that your involvement in WIML has played a role in this work? Yes, definitely. Yeah, without a doubt. So particularly when working on topics related to fairness, I ended up focusing a bunch on stuff to do with marginalized groups
Starting point is 00:38:50 as part of my responsible AI work. So there's been this sort of focus on marginalized groups, particularly women in the context of machine learning, with my WIML kind of work, and then in my research work, thinking about fairness as well. Um, the other way that it's that Wimel has really sort of affected, affected what I do is that I work with a much more varied group of people nowadays than I did back when I was just focusing on kind of machine learning computational social science and stuff like that. And many of my collaborators of people that I've met through Wimel over the years. And of course, there has been, um, another big shift within industry recently with just all the excitement around generative AI. Can you say a bit about how that has changed your research? Okay, yeah. So this is another big one. There are so many ways that this change my work. One of the biggest ways, though, is that generative AI systems are now everywhere. They're being used all over the place for all kinds of things. And, you know, you see all these news headlines about gen AI systems, you know, diagnosing illnesses, solving math problems, and writing code, stuff like that. And also headlines about various different risks that can occur. when you're using generative AI, so fabricating facts, memorizing copyrighted data, generating
Starting point is 00:40:12 harmful content, you know, these kinds of things. And with all this attention, it's really natural to ask, what is the evidence behind these claims? So where is this evidence coming from? And should we trust it? It turns out that much of the evidence comes from Gen. AI evaluations that involve measuring the capabilities, the behaviors, and the impact of Gen. AI systems. But the current evaluation practices that are often used in the space don't really have as much scientific rigor as we would like. And that's kind of a problem. So one of the biggest challenges is that the concepts of interest when people are sort of doing these Gen AI evaluations, so things like diagnostic ability, memorization, harmful content, concepts like
Starting point is 00:41:01 a much more abstract the concepts like prediction accuracy, the underpinned machine learning evaluations before the generative AI era. And when we look at these new concepts that we need to be able to focus on in order to evaluate gen AI systems, we see that they're actually much more reminiscent of these abstract contested concepts,
Starting point is 00:41:24 these kind of fuzzy, squishy concepts that are studied in the social sciences. So things like democracy and political science or personality traits and psychometrics. So there's really that sort of connection there to these kind of squishier things. So when I was focusing primarily on computational social science, most of my work was focused on developing machine learning methods
Starting point is 00:41:45 to help social scientists measure abstract contested concepts. So then when Gen. I started to be a big thing, and I saw all of these evaluative claims involving measurements of abstract concepts, it seemed super clear to me that if we were going to actually be able to make meaningful claims about what AI can do and can't do, we're going to need to take a different approach to Gen AI evaluation. And so I ended up sort of drawing on my computational
Starting point is 00:42:12 social science work around measurement. And I started advocating for adopting a variant of the framework that social scientists use for measuring abstract contested concepts. And my reason for doing this was that I believe, I still believe that this is an important way to improve the scientific rigor of Gen. AI evaluations. You know all of this, of course,
Starting point is 00:42:39 because you and I, along with a bunch of other collaborators at Microsoft Research and Stanford and the University of Michigan, published a position paper on this framework entitled Evaluating Gen. Gen. A.I. Systems is a social science measurement challenge at
Starting point is 00:42:55 ICML this past summer. What are you excited about at the moment? Yeah, so lately I have been spending a lot of time thinking about AI and critical thought. How can we design AI systems to support appropriate reliance, preserve human agency, and really encourage critical engagement on the part of the human, right? So this is an area where I think we actually have a huge opportunity, but there are also huge risks. If I think about my most optimistic possible vision of the future of AI, which is not something that it's easy for me to do, as I'm not a natural optimist, as you know, it would be a future in which AI helps people grow and flourish, in which it kind of enriches our own human capabilities and deepens our own human thinking and safeguards our own agency. So in this future, you know, we could build AI systems that actually help us brainstorm and learn new knowledge and skills, both in formal educational settings and in our day-to-day work as well.
Starting point is 00:44:05 But I think we're not going to achieve this future by default. It's something that we really need to design for if we want to get there. You mentioned that there are risks. What are the risks that you can see here? Yeah, there's so much at stake here. You know, in the short term, there are things like overreliable. alliance, depending on the output of an AI system, even when the system is wrong. This is something that I've worked on a bunch myself. There's a risk of loss of agency or the ability to make and execute independent decisions and to ensure that our outcomes of AI systems are aligned with personal or professional values of the humans who are using those systems.
Starting point is 00:44:45 This is something that I've been looking out recently in the context of AI tools for journalism. There's diminished innovation, by which I mean a loss of creativity or diversity of ideas. You know, longer term, we risk atrophied skills, people just losing or simply never developing helpful skills for their career or their life because of prolonged use of AI systems. The famous example that people often bring up here is pilots losing the ability to perform certain actions in flight because of dependence on autopilot systems. And I think we're already starting to see the same sort of thing happen across all sorts of fields because of AI.
Starting point is 00:45:27 And, you know, finally, another risk that I'll mention that seems to resonate with a lot of folks I talk to is what I would just call loss of joy, right? What happens when we are delegating to AI systems, the parts of our activities that we really take pleasure and find this satisfaction in doing ourselves? So then as a community, what should we be doing if we're worried about these risks? Yeah, I mean, I think this is going to have to be a big community effort if we want to achieve this. This is a big goal. But there are a few places I think we especially need work. So I think we need generalized principles and practices for AI system builders, for how they can build AI systems in ways that promote human agency and encourage critical thought. We also need principles and practices for system users.
Starting point is 00:46:19 So how do we teach the general population to use AI in ways that amplify their skills and capabilities and help them learn new things? And then, you know, close to your heart, I'm sure. I think that we need more work on measurement and evaluation, right? We are once again back to these squishy human properties. You know, I mentioned I've done some work on over-reliance and generative AI systems, and I started there because on the grand scale of risks here, over-reliance is something that is relatively easy to measure, at least in the short term.
Starting point is 00:46:58 But how do we start thinking about measuring people's critical thinking when using AI across all sorts of contexts and at scale and over long-time horizons? How do we measure this sort of, constitutional effect of AI systems just on our critical thought as a population. And by the way, if anyone listening is going to be at the Wimel Workshop, I'll actually be giving a keynote on this topic. And this is something I'm just incredibly excited about because, first, I'm incredibly excited about this topic, but also in the whole 20 years of WIML, I've given opening remarks in similar several times, but this is actually the very first time that I
Starting point is 00:47:37 will be talking about my own research there. So this is like my dream. dream. I'm thrilled that this is happening. That's awesome. Oh, that's so exciting. Excellent. So, one last question for you. If you could go back and talk to yourself 20 years ago and give yourself some advice, what would you say? Yeah, okay. I've thought about this one a bit over the past week, and there are three things here I want to mention. So first, I would tell myself to be brave about speaking up. You know, I'm about as introverted as it gets that I am naturally very shy. And this has always held me back. It still holds me back now. It was really
Starting point is 00:48:19 embarrassingly late in my career that I decided to do something about this and start to develop strategies to help myself speak up more. And eventually, it started to grow into something that's a little bit more natural. What kind of strategies? Yeah. So, You know, one example is I use a lot of notes for this podcast. I have a lot of notes here. I'm a big notes person. And things like that really help me. The second thing that I would tell myself is to, you know, work on the problems that you
Starting point is 00:48:51 really want to see solved. As researchers, we have this amazing freedom to choose our own direction. And early on, you know, a lot of the problems that I worked on were problems that I really enjoyed thinking about on a day-to-day basis. It was a lot of fun. They were like little math puzzles to me. But I often found that, you know, when I would be at conferences and people would ask me about my work, I didn't really want to talk about these problems. I just, in some sense, you know, I had fun doing it, but I didn't really care.
Starting point is 00:49:23 I wasn't passionate about it. I didn't care that I had solved the problem. And so once, many years ago now, when I was thinking about my research agenda, I got some good advice from our former lab director, Jennifer Chase, who suggested that I go through my recent projects and sort them into projects where I really liked working on them. It was a fun experience day-to-day. And projects that I like talking about after the fact and kind of felt good about the results. And then see where the overlap is. And this is something that, like, it kind of sounds kind of obvious when I say it now, but at the time, it was really eye-opening for me. That's so cool. And now I kind of want to do
Starting point is 00:50:03 that with all of my projects. Particularly at the moment, I actually just took five months, as you know, five months off of work for parental leave because I just had a baby. And so I'm sort of taking a big kind of inventory of everything as I get back into all of this now. And I love this idea. I think this is really cool. It's changed really my whole approach to research. Like, you know, we were talking about this, but most of the work I do now is more HCI than machine learning because I found that the problems that really motivate me that I want to be talking to people about at conferences are the people problems.
Starting point is 00:50:38 The third piece of advice I would give myself is that you should bring more people into your work, right? So there's this kind of vision on the outside of research being this solo endeavor, and it can feel so competitive at times, right? We all feel this. But time and time again, I've seen that the best research comes from collaborations and from bringing people together with diverse perspectives who can challenge each other in a way that is respectful, but makes the work better. Is there advice that you would give to your former self of 20 years ago? Yeah.
Starting point is 00:51:18 Okay. So I've also been thinking about this a bunch over the past week. There's actually a lot of advice of things I would just my former self. But there are three things that I keep coming back to. Okay. So first, and this is similar to your second point, push for doing the work that you find to be most fulfilling, even if that means taking a non-traditional path.
Starting point is 00:51:38 So in my case, I've always been interested in the social sciences. Back when I was a student, you know, even when I was a PhD student, doing research that combined computer science and the social sciences just wasn't really a thing. And so as a result, it would have been really easy for me to just be like, oh, well, I guess that isn't possible. I'll just focus on traditional computer science. science problems. But that's not what I ended up doing. Instead, and often in ways that made my career kind of harder than it probably would have been otherwise, I ended up pushing, I kept pushing,
Starting point is 00:52:10 and in fact, I keep pushing even nowadays, to bring these things together, computer science and the social sciences, in an interdisciplinary fashion. And this hasn't been easy, but cumulatively, the effect has been, that I've been able to do much more impactful work than I think I would have been able to do otherwise. And the work I've done, I've just enjoyed so much more than what otherwise have been the case. Okay, so second, be brave and share your work. So this is actually advice for my current self and my former self, as this is something that I definitely still struggle with. As do I, you know, and actually I think it's funny to hear you say this because I would say that you are much better at this than I am. I still think I have a lot of work to do on this one.
Starting point is 00:52:56 Yeah, it's hard. It's really hard. As you know, I am a perfectionist, and this is good in some ways, but this is also bad in other ways. And one way in which this is bad is that I tend to be really anxious about sharing and publicizing my work, especially when I feel it's not perfect. So as an example, I wrote this massive tutorial on computational social science for ICML in 2015, but I never put the slides, and I wrote a whole script for it. I never put the slides or the script online as a resource for others, because I have felt it needed more work. And I actually went back and looked at it earlier this year when we were working on the ICML paper. And I was stunned because it's great. Why didn't I put this online? All these things that I thought were problems 10 years ago. No, they're not a big deal.
Starting point is 00:53:41 I should have just shared it. As another example, Stack, my applied science team, was using LLMs as part of our approach to GenAI evaluation back in 2022, way before the sort of LLM as a judge paradigm was widespread. But I was really worried that other. would think negatively of us for doing this, so we didn't share that much about what we were doing. And I regret that because we missed out on an opportunity to kick off an industry-wide discussion about this of LLM as a judge paradigm.
Starting point is 00:54:11 Okay, so then my third point is that the social side of research is just as valuable as the technical side. And by this, I'm actually not talking about social science and computer science. I actually think that the how of doing research, including who you talk to, who you collaborate, and how you approach those interactions is just as important as the research itself. As a PhD student, I felt really bad about spending time socializing with other researchers, especially at conferences, because I thought that I was supposed to be listening to talks, reading papers, and discussing technical topics with researchers, and not socializing. But in hindsight, I think that was wrong. Many of those social connections have ended up being incredibly valuable through my research, both because I've ended up. collaborating with, and in some cases even hiring the people who I first got to know socially,
Starting point is 00:55:02 but also because the friendships that I've built, like our friendship, for example, have served as a crucial support network over the years, especially when things have felt particularly challenging. Yeah, absolutely. I agree with all of that so much. And with that, I will say, thank you so much for doing this podcast with me today. It was a lot of fun to reflect on the last 20 years of Wimel, but also the last 20 years of our careers and friendship and all of this. So it's great. And I never would have agreed to do this if it had been with anyone but you. Likewise.
Starting point is 00:55:39 So thank you everybody for listening to us. And hopefully some of you will join for the Greeningiff Annual Workshop for Women in Machine Learning, which is taking place on December 2nd. And of course, Jen and I will both be there in person. and we'll also be at Europe's afterwards. So feel free to reach out to us if you want to chat with us or to learn more about anything that we covered there today. You've been listening to Ideas, a Microsoft Research Podcast.
Starting point is 00:56:16 Find more episodes of the podcast at AKA.m.s.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.