Microsoft Research Podcast - Ideas: Community building, machine learning, and the future of AI
Episode Date: December 1, 2025As the Women in Machine Learning Workshop (WiML) marks its 20th annual gathering, cofounders, friends, and collaborators Jenn Wortman Vaughan and Hanna Wallach reflect on WiML’s evolution, navigatin...g the field of ML, and their work in responsible AI.Show notes
Transcript
Discussion (0)
You're listening to Ideas, a Microsoft Research podcast that dives deep into the world of technology research and the profound questions behind the code.
In this series, we'll explore the technologies that are shaping our future and the big ideas that propel them forward.
Hello and welcome. I'm Jen Wartman Vaughn. This week, Machine,
researchers around the world will be attending the annual conference on neural information processing
systems, or NERIPS. I am especially excited about NERIPS this year because of a co-located event,
the 20th annual Workshop for Women in Machine Learning, or WIML, which I am going to be attending
both as a mentor and as a keynote speaker. So to celebrate 20 years of WIML, I'm here today
with my long-term collaborator, colleague, close friend, and my co-form.
founder of the Workshop for Women in Machine Learning, Hannah Wallach. You know, you and I have
known each other for a very long time at this point. And in many ways, we followed very
parallel and often intersecting paths before we both ended up here working in Responsible AI at
Microsoft. So I thought it might be fun to kick off this podcast with a bit of the story of our
interleaving trajectories. So let's start way back 20 years ago, around the time we first had
the idea for Wimel. Where were you? And what were you up to? Yeah. So I was a PhD student at the
University of Cambridge, and I was working with the late David McKay. I was focusing on machine
learning for analyzing text. And at that point in time, I'd actually just begun working on
Bayesian latent variable models for text analysis. And my research was really focusing on trying
to combine ideas from n-gram language modeling with statistical topic modeling in order to come up
with models that just did a better job at modeling text.
I was also doing this super weird two-country thing.
So I was doing my PhD at Cambridge, but at the end of the first year of my PhD, I spent
three months as a visiting graduate student at the University of Pennsylvania, and I loved it
so much so that at the end of the three months, I said, can I have.
extend for a full year. Cambridge said yes, Penn said yes, so I did that, and actually ended up then
extending another year and then another year and another year and so on and so forth. But during
my first full year at Penn, that was when I met you. And it was at the visiting students weekend,
and I had been told by the faculty and the department that I had to work really hard on recruiting
you. I had no idea that that was actually going to be the start of a 20-plus year.
friendship. Yeah. I still remember that visiting weekend very well. I actually met you. I met my
husband, Jeff, and I met my PhD advisor, Michael Kerns, all on the same day at that visiting
student weekend. So I didn't know it at the time, but it was a very big day for me. So around that
time when I started my PhD at Penn, I was working in machine learning theory and algorithmic
economics. So even then, you know, just like I am now, I was interested in the intersection
of people in AI systems.
But since my training was in theory,
my people tended to be these mathematically ideal people
with these well-defined preferences and beliefs
who behaved in very well-defined ways.
Working in learning theory like this was appealing to me
because it was very neat and precise.
There was just none of the mess of the real world.
You could just write down your model,
which contained all of your assumptions,
and everything else that followed from there was in some sense objective.
So I was really enjoying this work.
And I was also so excited to have you around the department at the time.
You know, honestly, I also loved Penn.
It was just such a great environment.
I was just actually back there a few weeks ago, visiting to give a talk.
I had an amazing time.
But it was, I will say, very male-dominated in the computer science department at the time.
In my incoming class of PhD students, we had 20 incoming PhDs, and I was the only woman there.
But we managed to build a community.
We had our weekly ladies' brunch, which I loved, and things like that really kept me going during my PhD.
Yeah, I love that, ladies' brunch.
That made a huge difference to me and kind of kept me going through the PhD as well.
And like you, I'd always been interested in people.
And during the course of my PhD, I realized that I wasn't interested in analyzing text for the sake of text, right?
I was interested because text is one of these ways that people communicate with each other.
You know, people don't write text for the sake of writing text.
They write it because they're trying to convey something.
And it was really that that I was interested in.
It was these kind of social aspects of text that I found super interesting.
So coming out of the PhD, I then got a post.
stop job, focused on analyzing texts as part of these sort of broader social processes.
From there, I ended up getting a faculty job, also at UMass, as one of four founding members of
UMass's Computational Social Science Institute. So there was me in computer science, then there
was another assistant professor in statistics, another in political science, and another in sociology.
And in many ways, this was my dream job.
I was being paid to develop and use machine learning methods to study social processes and answer questions that social scientists wanted to study.
It was pretty awesome.
You, I think, started a faculty position at the same time, right?
Yeah.
So I also did a postdoc.
First, I spent a year as a postdoc at Harvard, which was super fun.
And then I started a 10-year track position in computer science at UCLA in 2010.
Again, you know, it was a very male-dominated environment.
My department was mostly men.
But even more importantly than this, I just didn't really have a network there.
You know, it was lonely.
One exception to this was Mihaila Vandershaw.
She was at UCLA in the time, though not in my department.
And she kind of took me under her wing.
So I'm very grateful that I had that support.
But overall, this position just wasn't a great fit for me.
And I was under more stress than I think I have been at any other point in my life that I could really remember.
Yeah.
So at that point, then, you ended up transitioning to Microsoft Research, right?
Why did you end up choosing MSR?
Yeah.
So this was back in 2012.
MSR had just opened up this new New York City lab at the time,
and working in this lab was basically my dream job.
I think I actually tried to apply before they had even officially opened the lab,
like when I just heard it was happening.
So this lab focused in three areas at the time.
It focused in machine learning, algorithmic economics, and computational social science.
And my research at the time cut across all three of these areas.
So it felt just like this purpose.
perfect opportunity to work in the space where my work would fit in so well and be really
appreciated. The algorithmic economics group at the time actually was working on building
prediction markets to aggregate information about future events, and they were already
in doing this building on top of some of my theoretical research, which is just super cool
to see. So that was exciting. And I already knew a couple of people here. I knew John Langford,
and Dave Pennick, who was in the economics group at the time,
because I'd done an internship, actually, with the two of them at Yahoo Research
before they came to Microsoft.
And I was really excited to come back and work with them again as well.
You know, even here at the time that I joined the lab, it was 13 men and me.
So, once again, not great numbers.
And I think that in some ways this was especially hard on me because I was just naturally, like, a very shy person.
and I hadn't really built up the confidence that I should have at that point in my career.
But on the other hand, I found the research fit just so spot on that I couldn't say no.
And I suspect that this is something that you understand yourself because you actually came and
joined me here in the New York lab a year or two later.
So why did you make this switch?
Yeah.
So I anticipated that I was going to love my faculty job.
It was focusing on all this stuff that I was so excited about.
And much to my surprise, though, I kind of didn't.
And it wasn't like there was any one particular thing that I didn't like.
It was more of a mixture of things.
I did love my research, though.
That was pretty clear to me.
But I wasn't happy.
So I spent a summer talking to as many people as possible in all different kinds of jobs,
really just with the goal of figuring out what their day-to-day lives looked like.
You were one of the people I spoke to, but I spoke to a ton of other people as well.
And from doing that, at the end of that summer, I ended up deciding to apply to industry jobs.
And I applied to a bunch of places and got a bunch of authors.
But I ended up deciding to join Microsoft Research, New York City, because of all the places I was considering going,
they were the only place that said, we love your research.
we love what you do, do you want to come here and do that same research? And that was really
appealing to me because I loved my research. Of course, I wanted to come there and do my same
research, and especially with all of these amazing people like you, Duncan Watts, who for many
years, been somebody I'd really looked up to. He was there as well at that point in time.
There was this real focus on computational social science, but with a little bit more of an
industry perspective. There are also these amazing machine learning researchers.
just for many of the same reasons as you,
I was just really excited to join that lab,
and particularly excited to be working in the same organization as you again.
Yeah, I'm happy to take at least a little bit of the credit
for recruiting you to Microsoft here many years ago.
Oh, yeah.
Yeah. I was really excited to have you join, too, though.
I think the timing actually worked out so that I missed your first couple of months
because I was on maternity leave with my first daughter at the time.
I should say, I've got two daughters, and I'm very proud to share in the context of this podcast,
that they're both very interested in math and reading as well.
Yeah, they're both great.
So then we ended up working in the same place, but despite that, it still took us several years
to end up actually collaborating on research.
Do you remember how we ended up working together?
Yeah, so I used to tell this story a lot.
Actually, I was at this panel on AI and society back in, I think it was probably 2016.
It was taking place in D.C.
And someone on this panel made this statement that sooner AI systems are just going to be so good that all of the uncertainty is going to be taken out of our decision making.
And something about this statement just like really set me off.
I got so mad about it because I thought it was just such irresponsible.
thing to be saying. So I came back to New York and I think I was ranting to you about this in the
lab and this conversation ended up getting us started on this whole longer discussion about
the importance of communicating uncertainty and about explaining the assumptions that are behind
the predictions that you're making and all of this. So this was something, I was really
excited about this because this was something that had really been drummed into me for years as a
Bayesian. So Bayesian statistics, which forms a lot of the foundation of the type of machine learning
that I was doing, is all about explicitly stating assumptions and quantifying uncertainty.
So I just felt super strongly about this stuff. Yeah. So somehow all of these discussions we are
having led us to read up on this literature that was coming out of the machine learning community
on interpretability at the time.
There were a bunch of these papers coming out
that were making claims about models being interpretable
without stopping to define who they were interpretable to
or for what purpose, never actually taking these models
and putting them down in front of real people.
And we wanted to do something about this.
So we started running controlled experiments with real people
and found that we often can't trust our intuition
about what makes a model interpretable.
Yeah, one of the things that came up a lot in that work
was sort of how to measure the squishy, abstract human concepts,
light interpretability that are really hard to define,
let alone quantify and measure and stuff like that.
Absolutely.
So I think one of the first things that we really struggled with
in this line of work was what it even means to be interpretable
or intelligible or intelligible or anything.
of these terms that we're getting thrown around at the time. We ended up doing some research,
which is still one of my favorite papers, with our colleagues, Fruke Porsabzi, Jay Kaufman, and Dan Goldstein.
And in this work, we found it really useful to think about interpretability as a latent property
that can be kind of influenced by different properties of a model or systems design. So things like
the number of features the model has, or whether the model's linear, or even things like the user
interface of the model. This was kind of a gateway project for me in the sense that it's one of
the first projects that I got really excited about that was more of a human computer interaction
or HCI project rather than a theory project like I'd been working on in the past. And it just
set off this huge spark of excitement in me. It felt to me at the time
more important than other things that I was doing, and I just wanted to do more and more of this
work. I would say the other project that had a really similar effect on me, which we also
worked on together right around the same time, was our work with Ken Holstein mapping out challenges
that industry practitioners were facing in the space of AI fairness.
Oh, yeah. Okay. Yeah. That project, that was so fun, and I learned so much from it.
If I recall correctly, we originally hired Ken, who I think was an HCI PhD student at CMU at the time, as an intern to work with us on creating sort of user experiences for fairness tools like the Fair Learn Toolkit.
And we started that project.
So that was in collaboration with Mero Doodick and Haldame.
We started that project by having Ken talk to a whole bunch of practitioners at Microsoft, but at other organizations.
as well to get a sense for how they were and weren't using fairness toolkits like Fair Learn.
And I want to point out that at that point in time, the academic research community was super
focused on all of these simple quantitative metrics for assessing the fairness in the context of
predictions and predictive machine learning models with this kind of understanding that these
tools could then be built to help practitioners assess the fairness of their predictive models.
maybe even make fairer predictions.
And so that's the kind of stuff that this Fair Learn Toolkit was originally developed to do.
So we ended up asking all of these practitioners originally just as sort of the precursor to
what we thought we were going to end up doing with this project.
We also these practitioners about their current practices and challenges around fairness
in their work and about their additional needs for support.
So where do they feel like they have the right tools and processes and practices and
practices and where did they feel like they were missing stuff?
And this was really eye-opening because what we found was so different than what we were expecting.
And there's two things that really stood out to us.
So the first thing was that we found a much, much wider range of applications beyond prediction.
So we'd come into this assuming that all these practitioners were doing stuff with predictive machine learning models.
But in fact, we were fighting, they would do all kinds of stuff.
There was a bunch of unsupervised stuff.
There was a bunch of, you know, language-based stuff, all of this kind of thing.
And in hindsight, that probably doesn't sound very surprising nowadays because of the rise of generative AI
and really the entire machine learning in AI field is much less focused on prediction
in that kind of narrow kind of classification regression kind of way.
But at the time, this was really surprising, especially in light of the academic literature's focus on predictions
when thinking about fairness.
The second thing that we found
was that practitioners often struggled
to use existing fairness research,
in part because these quantitative metrics
that were all the rage at that point in time
just weren't really amenable
to the types of real-world complex scenarios
that these practitioners were facing.
And there was a bunch of different reasons for this,
but one of the things that really stood out to us
was that this wasn't so much about the underlying models
and stuff like that.
But it was actually that there were a variety of data challenges involved here, around things like data collection, collection of sensitive attributes, which we need in order to actually use these fairness metrics.
So putting all this together, the upshot of all this was that we never did what we originally set out to do with that internship project.
Because we uncovered this really large gap between research and practice, we ended up publishing this.
paper that characterized this gap and then surfaced important directions for future research.
The other thing that the paper did was emphasize the importance of doing this kind of qualitative
work to actually understand what's happening in practice rather than just making assumptions
about what practitioners are and aren't doing. The other thing that came out of that,
of course, was that the four of us, so you, me, Mero, and Howell learned a ton of
about HCI and about qualitative research from Ken,
which was just so fun.
Yeah.
And I started to be confronted with the fact that I could no longer reasonably ignore
all of these messes of the real world because, you know, in some ways,
responsibly I is really all about the messes.
So I think this project was really a big shift for both of us.
And in some ways, working on this and the interpretability work really led us to be active in these early efforts that were happening within Microsoft in the responsibly I space.
The research that we were doing was feeding directly into company policy.
And it felt like it was just like a huge place where we could have some impact.
So it's very exciting.
So switching gears a bit, Hanna, do you remember how we first got the idea for WIMO?
Yes, I do.
So we were at Newyip's.
This was back in 2005.
It was a, so Newark's was a very different conference back then.
Now it's like tens of thousands of people.
It's held in a massive convention center.
Yes, there are researchers there, but there's a variety of people from across the tech industry who attend.
But that is not what it was like back then.
So in around, in 2005, it was more like 600 people thereabouts in terms.
hotel. And the main conference would be held every year in Vancouver. And then everybody at the
conference would pile onto these buses and we would all head up to Whistler for the workshops. So
super different to what's happening nowadays. It was my third time, I think that's right, I think
it was my third time attending the conference. But it was my first time sharing a hotel room
with other women. And I remember up at the workshops, up in Whistler, there were five of us sitting
around in a hotel room, and we were talking about how amazing it was that there were five of us
sitting around talking women. And we kind of couldn't believe there were five of us. We're all PhD
students at the time. And so we decided to make this list, and we started trying to figure out
who the other women in machine learning were. And we came up with about 10 names, and we were kind of
amazed that there were even 10 women in machine learning. We thought this was a huge number. We
were very excited, and we started talking about how it might be really fun to just bring
them all together sometime. So we returned from Murricks, and you and I ended up getting
lunch to strategize. I still remember walking out of the department together to go get lunch,
and you were walking ahead of me. I can visualize the coat you were wearing as you were walking
in front of me. And so we strategized a bit, and ended up deciding, along with one of the other
women, Lisa Weiner, to submit a proposal to the Grace Hopper conference for a session in which
women in machine learning would give short talks about their research. We reached out to the 10
names that we had, that we'd written down in the hotel room, and through that process,
I actually ended up finding out about more women in machine learning, and eventually had
something like 25 women listed on the final proposal. I think there's an email somewhere
where one or other of us is saying to the other one, oh my gosh, it can't.
believe there are so many women in the scene learning.
So we submitted this proposal, and ultimately, the proposal was rejected by the Grace Harper
Conference, but we was so excited about the idea and just really invested in it by that
point, that we decided to hold our own co-located event the day before the Grace Harper
Conference.
And I got to say, you know, 20 years later, I don't know what we were thinking.
Like, that was a bold move on the part of three PhD students.
And it turned out to be a huge amount of work that we had to do entirely ourselves as well.
We had no idea what we were doing.
But the Grace Hopper folks very nicely connected us with the venue that the conference was going to be held at.
And somehow we managed to pull it off.
Ultimately, that first workshop had around 100 women.
And there was this, rather than just like a single short session, which was what we'd originally had in mind,
we had this full days worth of talks.
I actually have the booklet of abstracts from all of those talks at my desk in the office.
I still have that today.
And it was just an amazing experience.
Yeah, it was.
And, you know, you mentioned how bold we were.
I just, I really don't think that any of us at the time realized how bold we were being here.
Getting this workshop rejected and then saying,
know, no, we think this is important. We're going to do it anyway on our own as grad students.
So I've already talked a little bit about some of the spaces that I was in throughout my career
where there just weren't a lot of women around in the room with me. How had you experienced
a lack of community or network of women in machine learning before the founding of WIML? And,
you know, why do you think it's important to have that kind of community? So I felt it in a number
of different ways. I think I mentioned
a few minutes ago that it was my
third time at New York's, but my first time
sharing a hotel room with another woman.
But there were many
places over the years where I felt
this. So first as an undergraduate,
then I did a lot
of free and open source software
development, and I was pretty involved in stuff
to do with the Debbie and Linux distribution.
And back then, the
percentage of women involved in free and
open source software development was about
one and a half percent.
and the percentage involved actually in Debian was even less than that.
So that had led me and some others to start this Debian women project.
And then again, of course, I faced this in machine learning.
I just didn't know that many other women in machine learning.
I didn't, there weren't a large number of senior women, for example, to look up to his role models.
There weren't a large number of female PhD students.
And this kind of made me sad because I was really excited about machine learning.
and I hope to spend my entire career in it,
but because I didn't see so many other women around,
particularly more senior women,
that really made me question whether that would even be possible,
and I just didn't know.
I think, you know, thinking about this,
and I've obviously reflected on this a lot over the years,
but I think having a diverse community in any area,
be it free and open source software development,
be it machine learning, any of these kinds of things,
is just so important for so much.
many reasons. And some of those reasons are little things like finding people that you would feel
comfortable sharing a hotel room with. But many of these things are bigger things that can then
have like even kind of knock on cumulative effect, like feeling valued in the community,
feeling welcome in the community, having role models, being able to sort of see people and
say, oh, I want to be kind of like that person when I grow up. I could do this. And then even
just representation of different perspectives in the work itself is so important.
important. The flip side of that is that there are a whole bunch of things that can go wrong if you don't have a diverse community. You can end up with gatekeeping, with toxic or unsafe cultures. Obviously, attrition is people just leave these kinds of spaces because they feel that they're not welcomed there and won't be valued there. And then to that point of having representation of different perspectives, with a really homogenous community, you can end up with kind of blind spots around the technology itself, which can then
lead to harms.
100%.
So did you ever imagine during all of this that Wimel would still be around 20 years later
and we would be sitting here on a podcast talking about this?
No, absolutely not.
I didn't even think that Wimel would necessarily be around for a second year.
I thought it was probably going to be like a one-off event.
And I certainly don't think that I thought that I would still be involved in the machine
learning community 20 years later as well.
so very unexpected.
I've got a question for you, though.
What do you remember most about that first workshop?
I remember a lot of things.
I remember that, you know, when we were planning this,
we always really wanted the focus to be the research.
And, you know, if you think back to what this first workshop looked like,
it was a lot of us just giving talks or presenting posters
about our own research to other people.
And, you know, I remember thinking at the poster session,
And, like, the vibe was just so much different and better, healthier, really, than other poster sessions I had bid to.
Everyone was so supportive and encouraging, but it really was all about the research.
I also remember being blown away, just walking into that conference room in the morning and seeing all of these women gathered in one place and knowing that somehow we had actually made this happen.
I remember we also faced some challenges with the workshop early on.
What are the challenges that stand out to you most?
Yeah.
So a lot of people really got it, right?
And they were super supportive.
So, for example, folks at Penn totally got it.
And they actually funded a bunch of that first workshop.
But others in the community didn't get it.
Didn't see the point, didn't see why it was necessary.
I remember having dinner with one machine learning researcher and him telling me that he didn't
think this kind of workshop was necessary because women's experiences were no different
to men's experiences.
And then later on in the conversation, he talked about, like, you know, this is like an hour and a half later or something.
He talked about how he and a friend of his had gone to the bar at all women's college, and he felt so awkward and out of place.
And I ended up pointing out to him that he just kind of explained to himself why we needed Wimble.
So, yeah, there was some people who didn't get it, and it took a lot of sort of talking to people and kind of explaining.
Another challenge was figuring out how to fund it in an ongoing manner once we decided that we wanted to do this more than once.
So, as I said, Penn funded a lot of the first workshop, but that wasn't a sustainable model, and it wasn't going to be realistic for Penn to keep funding it.
So in the end, we worked with Amy Greenwald to obtain a National Science Foundation grant that would cover a lot of costs.
And we also received donations from other organizations.
A third challenge was figuring out where to hold the workshop, given that we did want that focus to be on research.
So the first two times we held the workshop at the Grace Hopper conference, but we started to feel that that wasn't really the right venue, given that we wanted that focus to be on research.
So we ended up moving it to Nureps, and this had a bunch of benefits, some of which I don't think we'd even fully thought through when we made that decision.
So one of the benefits was that attendees WIML travel funding, so we would give them this travel funding to enable them to pay the cost of attending WIML, stay in hotel rooms, all this kind of stuff.
This would actually enable them to attend NUIPS as well if we co-located with NURIPs.
Another main benefit was that we held WIML on the day before NURIP.
So then throughout the rest of the conference, Wimel attendees, would see familiar faces throughout the crowd and wouldn't necessarily feel so alone.
So you're talking about these challenges. How have these challenges changed over time? Or, you know, more broadly, can you talk about how the workshop and women in machine learning as an organization as a whole kind of evolved over the years?
I know that you served the term as the Wimel president.
Yeah. So it's changed.
a lot. So first, obviously, most importantly, it evolved from being kind of this one-off event
where we were just seeing what would happen to be in really a robust organization. And the first
step in that was creating the Wimel Board. And as you just said, I served as a first president
of that. But there have been a bunch of other steps since then. And one of the things I want
to flag about the Wimel Board was that this was really important because the board members could
focus on the long-term health of the organization and these sort of, like, you know, things that
spanned multiple years, like how to get sustainable funding sources, this kind of thing,
versus the actual workshop organizers who would focus on things like running the call for
submissions and stuff like that. And being able to separate those roles made it really just
reduced the burden on the workshop organizers meant that we could take this kind of longer-term
perspective.
another really important step was becoming officially becoming a non-profit so that happened
that happened a few years ago and again it just it was the natural thing to do at that point
in time and just another step towards creating this sort of durable robust organization
but it's really taken on a life of its own I'm honestly not super actively involved nowadays
which I think is fantastic the organization doesn't need me that's great
it's also wild to me that because it's been a
around for 20 years at this point, that there are women in the field who don't know what it's
like to not have Wimel. So a bunch of other affinity groups got created. So to meet Ghebrou,
co-founded Black in AI when she was actually a postdoc at Microsoft Research New York City.
So you and I got to actually see the founding of that affinity group up close. And then now there
are a ton of other affinity groups. So there's Latinx in AI, there's queer in AI, Muslims in ML,
indigenous in AI and ML, new in ML, just to name a few.
Yeah.
And all of these are growing, too, every year.
You know, this year, Wimel had over 400 submissions.
They accepted 250 to be presented.
It's amazing.
Yeah, yep.
And there's going to be a Wimel presence this year, actually, at all three of the Nurep's venues.
So there's going to be a presence in Mexico City, in Copenhagen, and, of course, in San Diego for the main workshop.
So it's pretty great.
And, you know, on top of that, I think the organization now, as you were saying, is able to do so much more than just the workshop alone.
So, for instance, Wimel now runs this worldwide mentorship program for women and non-binary individuals and machine learning where they're matched with a mentor and they can participate in these one-to-one mentoring meetings and seminars and panel discussions, which happens all throughout the,
year. I think they have about 50 mentors signing up each year, but I'm sure they could always use
more. So it's just really amazing to look back and see how much the Wimel community has done
and how much it's grown. And, you know, on the one hand, I think that, honestly, like, founding Wimel
was one of the things that I have done over the course of my career, if not the thing that I am
most proud of to this day. But at the same time, like, we can't take credit for any of us.
It's like a community effort. It's been just the community has really kept this going for the last
20 years. So it's great. I'm going to stop gushing now, but it's amazing. And it's not just
Wimel that's changed over the years. The entire industry has changed a ton as well. How has your
research evolved as a result of these changes to the entire field of AI and machine learning?
and also from your own change from academia to industry.
It's a great question.
You know, we've touched on this a little bit,
but our research paths really evolved differently,
but ended up in these very similar places.
We're working on responsible AI,
we're advocating for interdisciplinary approaches,
incorporating techniques from HCI and so on.
And I think that part of this was because of shifts of the community
and also what's happening in industry.
Working in responsible AI in industry,
there's definitely not ever a shortage of interesting problems to solve, right?
And I think that for both of us, our research interests in recent years really have been driven
by these really practical challenges that we're seeing.
We are both involved early on in defining what responsible AI means within Microsoft,
shaping our internal responsible AI standard.
I led this internal company-wide working group on AI transparency,
which was focused both on model interpretability,
like we were talking about earlier,
but also other forms of transparency,
like data sheets for data sets,
and that transparency notes that Microsoft now releases
with all of our products.
And at the same time,
you were leading this internal working group on fairness.
Yeah, taking on that internal working group
was kind of a big transition point in my career.
You know, when I joined Microsoft,
I was focusing on computational social science,
and I was also entirely doing research and wasn't really that involved in stuff in the rest of the company.
Then at the end of my first year at Microsoft, I attended the first fairness accountability and transparency in machine learning workshop, which was co-located with Nureps.
It was one of the Nureps workshop.
And I got really excited about that and thought, great, I'm going to spend like 20% of my time, maybe one day a week, doing research on topics in the space of fairness and accountability.
accountability and transparency.
That is not what ended up happening.
Over the next couple of years, I ended up doing more and more research on
responsible AI, you know, as you said, on topics to do with fairness, to do with interpretability.
And then in early 2018, I was asked to co-chair this internal working group on fairness.
And that was the point where I started getting much more involved in responsible AI stuff
across Microsoft, so outside of just Microsoft research.
And this was really exciting to me because Responsible AI was so new,
which meant that research had a really big role to play.
It wasn't like this was kind of an established area
where folks in engineering and policy knew exactly what they were doing.
And so that meant that I got to branch out from this very sort of research-focused work
into much more applied work in collaboration with folks from policy, from engineering, and so on.
Now, in fact, as well as being a researcher, I actually run a small applied science team, the Sociotechnical Alignment Center, or Stack, for short, within Microsoft Research, that focuses specifically on bridging research and practice in responsible AI.
Yeah. Do you think that your involvement in WIML has played a role in this work?
Yes, definitely.
Yeah, without a doubt.
So particularly when working on topics related to fairness,
I ended up focusing a bunch on stuff to do with marginalized groups
as part of my responsible AI work.
So there's been this sort of focus on marginalized groups,
particularly women in the context of machine learning,
with my WIML kind of work,
and then in my research work, thinking about fairness as well.
Um, the other way that it's that Wimel has really sort of affected, affected what I do is that I work with a much more varied group of people nowadays than I did back when I was just focusing on kind of machine learning computational social science and stuff like that. And many of my collaborators of people that I've met through Wimel over the years. And of course, there has been, um, another big shift within industry recently with just all the excitement around generative AI. Can you say a bit about how that has changed your research?
Okay, yeah. So this is another big one. There are so many ways that this change my work. One of the biggest ways, though, is that generative AI systems are now everywhere. They're being used all over the place for all kinds of things. And, you know, you see all these news headlines about gen AI systems, you know, diagnosing illnesses, solving math problems, and writing code, stuff like that. And also headlines about various different risks that can occur.
when you're using generative AI, so fabricating facts, memorizing copyrighted data, generating
harmful content, you know, these kinds of things. And with all this attention, it's really natural
to ask, what is the evidence behind these claims? So where is this evidence coming from? And should
we trust it? It turns out that much of the evidence comes from Gen. AI evaluations that involve
measuring the capabilities, the behaviors, and the impact of Gen. AI systems. But the current
evaluation practices that are often used in the space don't really have as much scientific rigor
as we would like. And that's kind of a problem. So one of the biggest challenges is that the
concepts of interest when people are sort of doing these Gen AI evaluations, so things like
diagnostic ability, memorization, harmful content, concepts like
a much more abstract the concepts like prediction accuracy,
the underpinned machine learning evaluations
before the generative AI era.
And when we look at these new concepts
that we need to be able to focus on
in order to evaluate gen AI systems,
we see that they're actually much more reminiscent
of these abstract contested concepts,
these kind of fuzzy, squishy concepts
that are studied in the social sciences.
So things like democracy and political science
or personality traits and psychometrics.
So there's really that sort of connection there
to these kind of squishier things.
So when I was focusing primarily on computational social science,
most of my work was focused on developing machine learning methods
to help social scientists measure abstract contested concepts.
So then when Gen.
I started to be a big thing,
and I saw all of these evaluative claims
involving measurements of abstract concepts,
it seemed super clear to me that if we were going to actually
be able to make meaningful claims about what AI can do and can't do, we're going to need to
take a different approach to Gen AI evaluation. And so I ended up sort of drawing on my computational
social science work around measurement. And I started advocating for adopting a variant of the
framework that social scientists use for measuring abstract contested concepts. And my reason
for doing this was that I
believe, I still believe that this is
an important way to improve
the scientific rigor of
Gen. AI evaluations.
You know all of this, of course,
because you and I, along with a bunch
of other collaborators at Microsoft Research
and Stanford and the
University of Michigan, published a
position paper on this framework entitled
Evaluating Gen.
Gen. A.I. Systems is a social science
measurement challenge at
ICML this past summer.
What are you excited about at the moment?
Yeah, so lately I have been spending a lot of time thinking about AI and critical thought.
How can we design AI systems to support appropriate reliance, preserve human agency,
and really encourage critical engagement on the part of the human, right?
So this is an area where I think we actually have a huge opportunity, but there are also huge risks.
If I think about my most optimistic possible vision of the future of AI, which is not something that it's easy for me to do, as I'm not a natural optimist, as you know, it would be a future in which AI helps people grow and flourish, in which it kind of enriches our own human capabilities and deepens our own human thinking and safeguards our own agency.
So in this future, you know, we could build AI systems that actually help us brainstorm and learn new knowledge and skills, both in formal educational settings and in our day-to-day work as well.
But I think we're not going to achieve this future by default. It's something that we really need to design for if we want to get there.
You mentioned that there are risks. What are the risks that you can see here?
Yeah, there's so much at stake here. You know, in the short term, there are things like overreliable.
alliance, depending on the output of an AI system, even when the system is wrong.
This is something that I've worked on a bunch myself.
There's a risk of loss of agency or the ability to make and execute independent decisions
and to ensure that our outcomes of AI systems are aligned with personal or professional
values of the humans who are using those systems.
This is something that I've been looking out recently in the context of AI tools for
journalism.
There's diminished innovation, by which I mean a loss of creativity or diversity of ideas.
You know, longer term, we risk atrophied skills, people just losing or simply never developing
helpful skills for their career or their life because of prolonged use of AI systems.
The famous example that people often bring up here is pilots losing the ability to perform
certain actions in flight because of dependence on autopilot systems.
And I think we're already starting to see the same sort of thing happen across all sorts of fields because of AI.
And, you know, finally, another risk that I'll mention that seems to resonate with a lot of folks I talk to is what I would just call loss of joy, right?
What happens when we are delegating to AI systems, the parts of our activities that we really take pleasure and find this satisfaction in doing ourselves?
So then as a community, what should we be doing if we're worried about these risks?
Yeah, I mean, I think this is going to have to be a big community effort if we want to achieve this.
This is a big goal.
But there are a few places I think we especially need work.
So I think we need generalized principles and practices for AI system builders, for how they can build AI systems in ways that promote human agency and encourage critical thought.
We also need principles and practices for system users.
So how do we teach the general population to use AI in ways that amplify their skills and capabilities and help them learn new things?
And then, you know, close to your heart, I'm sure.
I think that we need more work on measurement and evaluation, right?
We are once again back to these squishy human properties.
You know, I mentioned I've done some work on over-reliance and generative AI systems,
and I started there because on the grand scale of risks here,
over-reliance is something that is relatively easy to measure,
at least in the short term.
But how do we start thinking about measuring people's critical thinking
when using AI across all sorts of contexts and at scale and over long-time horizons?
How do we measure this sort of,
constitutional effect of AI systems just on our critical thought as a population.
And by the way, if anyone listening is going to be at the Wimel Workshop, I'll actually be giving
a keynote on this topic. And this is something I'm just incredibly excited about because, first,
I'm incredibly excited about this topic, but also in the whole 20 years of WIML, I've given
opening remarks in similar several times, but this is actually the very first time that I
will be talking about my own research there. So this is like my dream.
dream. I'm thrilled that this is happening.
That's awesome. Oh, that's so exciting. Excellent.
So, one last question for you. If you could go back and talk to yourself 20 years ago and give
yourself some advice, what would you say? Yeah, okay. I've thought about this one a bit over
the past week, and there are three things here I want to mention. So first, I would tell
myself to be brave about speaking up. You know, I'm about as introverted as it gets that I am
naturally very shy. And this has always held me back. It still holds me back now. It was really
embarrassingly late in my career that I decided to do something about this and start to develop
strategies to help myself speak up more. And eventually, it started to grow into something
that's a little bit more natural. What kind of strategies? Yeah. So,
You know, one example is I use a lot of notes for this podcast.
I have a lot of notes here.
I'm a big notes person.
And things like that really help me.
The second thing that I would tell myself is to, you know, work on the problems that you
really want to see solved.
As researchers, we have this amazing freedom to choose our own direction.
And early on, you know, a lot of the problems that I worked on were problems that I really
enjoyed thinking about on a day-to-day basis.
It was a lot of fun.
They were like little math puzzles to me.
But I often found that, you know, when I would be at conferences and people would ask me about my work, I didn't really want to talk about these problems.
I just, in some sense, you know, I had fun doing it, but I didn't really care.
I wasn't passionate about it.
I didn't care that I had solved the problem.
And so once, many years ago now, when I was thinking about my research agenda, I got some good advice from our former lab director, Jennifer
Chase, who suggested that I go through my recent projects and sort them into projects where
I really liked working on them. It was a fun experience day-to-day. And projects that I like
talking about after the fact and kind of felt good about the results. And then see where the
overlap is. And this is something that, like, it kind of sounds kind of obvious when I say it now,
but at the time, it was really eye-opening for me. That's so cool. And now I kind of want to do
that with all of my projects. Particularly at the moment, I actually just took five months,
as you know, five months off of work for parental leave because I just had a baby. And so I'm
sort of taking a big kind of inventory of everything as I get back into all of this now. And I love
this idea. I think this is really cool. It's changed really my whole approach to research. Like,
you know, we were talking about this, but most of the work I do now is more HCI than machine learning
because I found that the problems that really motivate me
that I want to be talking to people about at conferences
are the people problems.
The third piece of advice I would give myself
is that you should bring more people into your work, right?
So there's this kind of vision on the outside of research
being this solo endeavor, and it can feel so competitive at times, right?
We all feel this.
But time and time again, I've seen that the best research comes from collaborations and from bringing people together with diverse perspectives who can challenge each other in a way that is respectful, but makes the work better.
Is there advice that you would give to your former self of 20 years ago?
Yeah.
Okay.
So I've also been thinking about this a bunch over the past week.
There's actually a lot of advice of things I would just my former self.
But there are three things that I keep coming back to.
Okay.
So first, and this is similar to your second point,
push for doing the work that you find to be most fulfilling,
even if that means taking a non-traditional path.
So in my case, I've always been interested in the social sciences.
Back when I was a student, you know, even when I was a PhD student,
doing research that combined computer science and the social sciences just wasn't really a thing.
And so as a result, it would have been really easy for me to just be like,
oh, well, I guess that isn't possible.
I'll just focus on traditional computer science.
science problems. But that's not what I ended up doing. Instead, and often in ways that made my
career kind of harder than it probably would have been otherwise, I ended up pushing, I kept pushing,
and in fact, I keep pushing even nowadays, to bring these things together, computer science and the
social sciences, in an interdisciplinary fashion. And this hasn't been easy, but cumulatively, the effect
has been, that I've been able to do much more impactful work than I think I would have been
able to do otherwise. And the work I've done, I've just enjoyed so much more than what
otherwise have been the case. Okay, so second, be brave and share your work. So this is actually
advice for my current self and my former self, as this is something that I definitely still struggle
with. As do I, you know, and actually I think it's funny to hear you say this because I would say
that you are much better at this than I am. I still think I have a lot of work to do on this one.
Yeah, it's hard. It's really hard. As you know, I am a perfectionist, and this is good in some ways, but this is also bad in other ways. And one way in which this is bad is that I tend to be really anxious about sharing and publicizing my work, especially when I feel it's not perfect. So as an example, I wrote this massive tutorial on computational social science for ICML in 2015, but I never put the slides, and I wrote a whole script for it. I never put the slides or the script online as a resource for others, because I have
felt it needed more work.
And I actually went back and looked at it earlier this year
when we were working on the ICML paper.
And I was stunned because it's great.
Why didn't I put this online?
All these things that I thought were problems 10 years ago.
No, they're not a big deal.
I should have just shared it.
As another example, Stack, my applied science team,
was using LLMs as part of our approach to GenAI evaluation back in 2022,
way before the sort of LLM as a judge paradigm was widespread.
But I was really worried that other.
would think negatively of us for doing this, so we didn't share that much about what we were doing.
And I regret that because we missed out on an opportunity to kick off an industry-wide discussion
about this of LLM as a judge paradigm.
Okay, so then my third point is that the social side of research is just as valuable as the technical side.
And by this, I'm actually not talking about social science and computer science.
I actually think that the how of doing research, including who you talk to, who you collaborate,
and how you approach those interactions is just as important as the research itself.
As a PhD student, I felt really bad about spending time socializing with other researchers, especially at conferences,
because I thought that I was supposed to be listening to talks, reading papers, and discussing technical topics with researchers, and not socializing.
But in hindsight, I think that was wrong. Many of those social connections have ended up being incredibly valuable through my research, both because I've ended up.
collaborating with, and in some cases even hiring the people who I first got to know socially,
but also because the friendships that I've built, like our friendship, for example, have served
as a crucial support network over the years, especially when things have felt particularly
challenging. Yeah, absolutely. I agree with all of that so much. And with that, I will say,
thank you so much for doing this podcast with me today. It was a lot of fun to reflect on the last 20 years
of Wimel, but also the last 20 years of our careers and friendship and all of this.
So it's great.
And I never would have agreed to do this if it had been with anyone but you.
Likewise.
So thank you everybody for listening to us.
And hopefully some of you will join for the Greeningiff Annual Workshop for Women in Machine Learning,
which is taking place on December 2nd.
And of course, Jen and I will both be there in person.
and we'll also be at Europe's afterwards.
So feel free to reach out to us if you want to chat with us
or to learn more about anything that we covered there today.
You've been listening to Ideas, a Microsoft Research Podcast.
Find more episodes of the podcast at AKA.m.s.
