Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas - 156 | Catherine D'Ignazio on Data, Objectivity, and Bias

Episode Date: July 19, 2021

How can data be biased? Isn't it supposed to be an objective reflection of the real world? We all know that these are somewhat naive rhetorical questions, since data can easily inherit bias from the p...eople who collect and analyze it, just as an algorithm can make biased suggestions if it's trained on biased datasets. A better question is, how do biases creep in, and what can we do about them? Catherine D'Ignazio is an MIT professor who has studied how biases creep into our data and algorithms, and even into the expression of values that purport to protect objective analysis. We discuss examples of these processes and how to use data to make things better. Support Mindscape on Patreon. Catherine D'Ignazio received a Master of Fine Arts from Maine College of Art and a Master of Science in Media Arts and Sciences from the MIT Media Lab. She is currently an assistant professor of Urban Science and Planning and Director of the Data+Feminism Lab at MIT. She is the co-author, with Lauren F. Klein, of the book Data Feminism. Web site MIT web page Google Scholar publications Data + Feminism Lab Wikipedia Twitter

Transcript
Discussion (0)
Starting point is 00:00:00 Do you want to find a stress-free way to buy your next car? Start at CarMax and shop your way. If you want to browse with confidence, get pre-qualified online with no impact on your credit score and shop cars within your budget, from luxury cars to family rides. CarMax has options for almost every price range, including more than 25,000 cars priced under $25,000. So, hey, want to get started? Just head to CarMax.com for details and get pre-qualified today. Want to drive? CarMax.
Starting point is 00:00:32 Hey, everyone, it's Cal Penn. I'm inviting you to join the best-sounding book club you've ever heard with my podcast, Earsay, the Audible and I-Heart Audio Book Club. Every episode, I nerd out with amazing guests and dive into the best new audiobooks available on Audible. It's the book club for your ears. Listen to Earsay, the Audible and I-Heart Audio Book Club. On the I-Heart Radio app or wherever you get your podcasts. Hello everyone, welcome to the Mindscape Podcast.
Starting point is 00:01:05 I'm your host, Sean Carroll. Everyone knows, I think, that even though words like data and algorithm carry a certain patina of objectivity with them, in the real world, it's often the case that neither the collection of data nor the analysis of data nor the use of algorithms are completely objective. They have biases built into them because all of these facts about the world or ideas about how data should be in. are created by human beings, and human beings have their foibles, right? And we see this in action in ways both sort of profound and trivial. There are algorithms that decide who people should hire, who should be suspected of committing crimes. Something we'll talk about in this podcast is crash test dummies.
Starting point is 00:01:53 When car crashes are done by car companies to test them for safety, it used to be that all of the crash test dummies were modeled after men. None of them were in the shape or sizes of women, and as a result, you could actually figure out ex post facto. The designs of seatbelts and things like that for cars were noticeably less effective for women than for men. So as objective as we might try to be, we're going to fall a little bit short. Think of it this way. This is one of the ways I like to think about it. You're standing somewhere right now, you're in a room where you're outside, or you're in your car, look around, and imagine trying to describe your immediate environment to somebody else in a completely objective way.
Starting point is 00:02:36 You can imagine doing that. Maybe you think you can do that. But the fact is you can't. You can say objectively true things, right? There are true things to say about the world. You're in a car. It's a Toyota, whatever it is. But you're making choices along the way. There are an infinite number of things you could say that are objectively true. But it's you. who are always going to be a little bit fallible and have your biases, have your history and your interests and so forth, that choose for you what features of the environment matter, right? How to divide up the environment into the interesting facts, the uninteresting facts, etc.
Starting point is 00:03:16 That right there is a way that non-objectivity creeps into how we characterize the world around us. So Catherine Dignazio, today's guests, is a graduate of the MIT Media Lab and is currently an assistant professor in MIT's urban science and planning department. And he's written a book with Lauren Klein called Data Feminism. And it's not just feminism. It's really the intersection of data and algorithms and how we are biased and how we can fight it.
Starting point is 00:03:44 So Catherine is someone who is pro-data, pro-alorithms. Her message is not that, you know, science and technology are tools of the oppressor or anything like that. They can be used to make the world a better place. but they're not always used in that way, and not necessarily even for pernicious reasons, right? You know, our biases, it's sort of a negative connotation word, but our individuality about who we are and therefore how we see and conceptualize the world creeps into how we talk about it and what we do about it, whether we like it or not. So being a little bit more conscious, being a little bit more cognitive, being a little bit more aware of what's going on can help us understand the world better. that's something we all want to do. So let's go.
Starting point is 00:04:44 Catherine Dignacio, welcome to the Mindscape Podcast. Thank you. Thank you for having me. So Data Feminism, the title of your book, and I'm sure we'll get into both of those. And I'm also sure that most of the audience, their eyes focus on the word feminism, right? That's the thing that is going to get people vibrating with either positive or negative valence. But I'd like to start with the word data a little bit because, you know, I'm a physicist. We have something that we have in mind. when you talk about data coming from experiments, finding the Higgs boson. But in the modern world,
Starting point is 00:05:15 the big data world where we're constantly being surveilled and tracked and things like that, data means something a little bit different, or at least the connotations are a little bit different. So what do you, in your idea, should our audience have in mind when you just say data something? What are the things that are flying around in this sphere of ideas? Sure. Yeah, yeah. Yeah, no, great question. So the data that we are, are referring to can include, you know, data from scientific experiments, of course. And obviously, you know, most people when you say data, their minds go to quantitative information. And so, of course, data includes that. You know, our definition is pretty expansive. It's just,
Starting point is 00:05:59 it's information that's collected in a systematic way. And so it's like it's a collection of similar things at some level. So, you know, if you think about just anything that you can put in a spreadsheet, basically, and also includes things you can't put in a spreadsheet. So, I mean, many of the most interesting sort of big data things are image data, for example, right? And so thinking about things that are not necessarily just rows and columns as we encountered them in databases, but also images, videos,
Starting point is 00:06:35 videos, audio, different kinds of things like that, which ultimately can be sort of analyzed, both quantitatively and qualitatively and decomposed into various kinds of parts and then deployed to do different things and even create new things. So like images that are generated from other images and things like that. And then, of course, in data feminism, when we're bringing a feminist lens, we also argue for thinking about qualitative data, including qualitative data, is a very equal counterpart to quantitative data, so not kind of creating a hierarchy out of quantitative and qualitative data. And also really arguing for the value, for really valuing lived experience as a form
Starting point is 00:07:28 of empirical data. And that's something that really comes from feminist theory and thinking and thinking about like, well, for people that have been excluded from, you know, the historical canon, how do we share our stories and our data and our kind of evidence that we bring to the table? It's often been through stories and personal experiences and lived experiences. So really kind of thinking about those as empirical data, obviously of a different nature. Like it's not the same thing to say like, this is my story and like that's your physics experiment. two things are different, you know. But yet at the same time, sort of not like denigrating it because it's not some kind of like generalizable thing that everyone in the world has also
Starting point is 00:08:15 experienced. And if I'm remembering correctly, you work at the MIT Media Lab. Is that right? So I graduated from the MIT Media Lab, but I'm actually now a professor in the Department of Urban Studies and planning at MIT. So different department. Are you a data scientist? There are people who call themselves data scientists. It's almost seems redundant to me, but it's clearly a growing field. Yeah, I would say I'm a data scientist. I would also say, you know, my, I'm really trained less as a data scientist and more as a software programmer. So I come out of software and database programming. And so more from the realm of like systems development and application development is where I spent a long, long time and sort of came into data.
Starting point is 00:09:03 science through doing all this database programming, but also always being interested in art and design. And so for me, those two things always came together actually in maps. And so I'm a long time sort of cartographer and map maker. And in fact, that's the course that I teach for our, that's the main course I teach in urban studies and planning is our GIS and spatial analysis course. We actually, I just had a podcast a couple weeks ago with Jordan Ellenberg, who is a high powered mathematician, geometer, and we talked about gerrymandering and the mathematics of figuring out whether a map is gerrymandered or not. Maps are surprisingly science-y, I think. It's a very important topic. I love maps for exactly this reason. They bring together science. They bring
Starting point is 00:09:50 together art and design. They bring together. I'm a very visual person, so they bring together the visual side. But yeah, so that's why I love maps is because I feel like they're like the integration of these two sides of the brain that they don't often talk to each other, they don't often encounter each other professionally, but then we find them together in maps. And so that for me is very exciting because it can cause both like, I don't know, friction, but also brilliance, I think. But so when we say the word data as an adjective, I mean, there's sort of data as a noun which you helped define, but I guess what I'm getting at is, because you know it already, I'm trying to get for the non-expert, what are the steps?
Starting point is 00:10:31 involved in collecting, analyzing, presenting data. Like, I think you got into the whole data feminism thing through the issues surrounding data visualization, if that's not wrong. So, like, what does a data scientist do to go from the raw stuff of reality to some presented data? Sure. Yeah. So, you know, there's this exercise I do with students, where we start off, And it's an exercise that's about thinking carefully about the ways that we take the world and sort of capture the world is what Joanna Drucker would say.
Starting point is 00:11:09 Instead of taking data, we actually are capturing data. It's super simple. And so I go out and I tell them to just go walk around for like 15 minutes. And your job is that you're going to classify people's shoes. And so you have to develop some taxonomy, our classification scheme, for shoes. and you have to collect at least 10 or 15 rows of data about those shoes. And so the interesting thing is, I mean, it's ostensibly a very simple topic. Like most people in the American universities have shoes, like you can find lots of them.
Starting point is 00:11:46 They come back with their data. But everybody has a different categorization scheme. They have different ways that they've classified shoes. And so this starts to open up some of the complexities. I think once we start to look at the ways in which we count things, the ways in which we aggregate things, the things that we find as being meaningful are not the same as the things that another person would find as being meaningful.
Starting point is 00:12:12 So this idea, you know, I often find myself in the position of teaching newcomers about data science and about data analysis methods and trying to help them. Like, I'm a really, you know, the whole thread of my work is about data literacy. So trying to expose people who don't consider themselves to be technical to start to understand how to use some of these methods. And in fact, they are not that complicated.
Starting point is 00:12:35 There's just, as my colleague Rahul Barkoff says, they're fancy ways of counting. And so, but so this is what becomes interesting, though, is that just the humanness of data, right? So even when we think that we're being so precise and so objective, there's still an infinite number of ways to classify shoes. You know what I mean? And that's not to say we should never classify shoes, right? And that's never useful to classify shoes because it certainly is. And, you know, in particular if you're a shoe company. Right.
Starting point is 00:13:09 But it's more just so that we can start to be aware of some of the limitations of what we can and can't do with data and understand that data in their essence are really, they are a reduction of the world. Like no row of data about a shoe is that. never going to describe the rich rich complexity of a shoe. And that's okay. Like, we don't need data to do that. But it's just important that we remember that it's a reduction in complexity and that we don't confuse the data for the thing itself. Because that's sort of where we can get into all sorts of troubles is when we think like, oh, these data that we just went out and downloaded from
Starting point is 00:13:52 the web, but just represent these raw facts because, in fact, they're not. Like, they've been shaped and formed by the institutions that have set out resources and ways of collecting them, that had developed some classification scheme, that may have done their own analysis and aggregation on them and so on. I love this example of the shoes, because it's exactly what got me interested in your book in the first place, the recognition that there's this tension between a discourse of describing the world with perfect objectivity and rigor so that none of our biases creep in with the reality that that's not a possible thing to do, right? And so we, you know, we might strive for,
Starting point is 00:14:30 might aspire to it. But even when we do something as innocent as collecting data on shoes, well, you know, who classifies what shoe as what is an immediate choice. The choice to look at shoes rather than socks was a choice. And so there's all these choices that are flavoring what we are choosing to talk about, what of the infinite number of facts about the world we're choosing to collect and then display. Exactly. That's exactly right. It's exactly right. And it's like, it's sort of like, with any classification scheme, like, it could have been done differently. And so, like, the important thing is to think about, well, like, how could it have been done differently and why was it done the way that it was and understanding some of those
Starting point is 00:15:07 motivations, not necessarily because those motivations are going to always be nefarious or something. Like, often it's not that, like, there's some evil institutional person behind it. It's just that it was done in a certain way for a certain purpose, which means there were things that were left out, right? And there were these paths that weren't pursued. there were these other data that were not collected. And so it's sort of drawing our attention to, in a way, like, the paths not taken.
Starting point is 00:15:33 And we talk a lot in data feminism about missing data, for example, as a way of reflecting on what are the structures and the powers that are shaping the data that we inherit and the data that we ourselves collect. Well, and one of the points you make is that not only does this human choice about what to do come into the data we collect, But then how to analyze the data, what to do about the data, how to visualize the data, right? These are all involving human choices. Yes, exactly. Yeah, at every stage of the process. It's like all of these, you know, the pipeline, as it were, at all of these stages of that process, there's these sort of very intentional choices.
Starting point is 00:16:14 There's this kind of particular set of actors that are working with the data. There's certain goals. There are certain audiences that they want to reach. So, yeah, sort of like a lot of the book is sort of deconstructed. I think some of the myths about data science, which often are things that like data scientists themselves, like they know this really well. Yeah, right? It's more like in the popular perception of data science or of statistics or of algorithms that you have to like deconstruct that. That they're these like kind of perfect black box systems that are going to always perfectly predict certain things.
Starting point is 00:16:50 It's almost more the popular narratives that we are. sort of trying to deconstruct because any data scientists, I feel like, who's worth their thought, like they know, they know their data intimately, and they also know the limitations of their data intimately. And if they are responsible data scientists, they're not going to be going out and, like, making these wild claims with data that are, have all these limitations. There's actually a joke within physics that nobody believes a theory that comes from a theoretical physical physicist except the person who proposed it and nobody and everybody believes an experimental result except the person who did it because they're very familiar with all of the funny things
Starting point is 00:17:34 that came in along the way. That's fabulous. Do you think it's generally true that people on the street are a little bit overly trusting of the data they see presented to them? Yeah. Yes, I think so. And in fact, you know, I countered this in my own classes. So prior to arriving at MIT, I taught at Emerson College.
Starting point is 00:17:53 in the journalism department. And so I taught data analysis and data visualization to journalists. And they, journalists are a group of folks. Like, they would come into the class saying, like, I'm not good at math. Like, I can't do numbers,
Starting point is 00:18:10 which I actually looked at their standardized test scores, was completely alive. They all did great on standardized tests. They're fine. But, you know, they had this image of themselves as like, I'm a word person and not a math person. And one of the, having that image, honestly, of themselves, I think, inhibited their ability to be skeptical of numbers because they were overly, you know, they'd like download some data on the internet and immediately just believe that the data were true, you know, first of all, like not to kind of understand that like they need to do in the same way journalists are taught to do a verification process for what kind of quotes and facts that they put in their articles. the same process needs to be done on a data set that you're inheriting from another actor or institution that you're using.
Starting point is 00:19:01 So a lot of it was like kind of teaching them about that process. And just having not worked with numbers intimately, like there was a kind of over trust or like an over placing of confidence in the numbers that they inherited. And a kind of slippage where they imagined that I could see it in their writing, when they would write with numbers, they were often throwing in like, you know, if there's like a decimal point, they would give all of the decimals, you know, like eight places. Too many to get to get you. You really don't need to know.
Starting point is 00:19:34 It was like 54.3.756%, whatever. But, you know, like there's this like need they felt to assert themselves with numbers and precision and things like that, but it was precisely because they were insecure about it. So, so I mean, I think these are ways that, yes, I think there is this kind of placement of faith, particularly when people feel a kind of level of insecurity or under exposure to data or to math or statistical ways of thinking, that like they're like, oh, it triggers this like, oh, I'm not that kind of person. I need to place my trust in the people that are that kind of person. And so a lot of what I did was sort of breaking that down and saying, no, in fact, like you can use, you know, you can,
Starting point is 00:20:16 you can do this work. And, you know, particularly like basic descriptive statistics or within everybody's reach. And, you know, teaching a kind of a skepticism, which is a healthy skepticism, but a deeply important one I think right now, particularly in the climate of misinformation and sort of, I don't know, sort of bad information actors on the internet, that is also happening with data as well. So thinking about how do we have a healthy skepticism and a kind of a citizen interrogation of data sets and being able to do a kind of a power analysis of a particular issue to understand how data may have been impacted by structural bias and by issues of power and things like that. So, and that's, that's knowledge they can draw from.
Starting point is 00:21:05 Like, that's worlds they've been exposed to, but those are pathways that they're not just going to come naturally. I think those are sort of muscles and skills that need to be taught. There are streaming services that turn our brains off, mindless entertainment. And then there's Wondrium, the streaming service that blows your mind. Wondrium has thousands of audio and visual learning experiences to feed your curiosity that goes so much farther than what you'd find searching the web. All of your favorites from the great courses are there, plus collections from Kino Lorber, Magellan TV, Craftsy, and more. For example, I loved the life and works of Jane Austen.
Starting point is 00:21:41 You can find out so many facts. Like one episode is completely devoted to Lady Susan, which wasn't even discovered until several decades after Jane Austen. Austin's death. After watching this course, I'm thinking of Austin's novels in new ways. So join me and experience your own mind-blowing moments with Wondrium. And right now, Minescape listeners get a special offer, a free month of unlimited access to the entire library. Go now to Wondrium.com slash Minescape to sign up today. That's W-O-N-D-R-I-U-M-com slash Minescape. Wondrium.com slash Minescape. Well, and you have examples in the book of famous mistakes that people made, you know, in the media with data, for example.
Starting point is 00:22:23 The one that struck me was 538 doing a story. I think it was on, you know, mass shootings in Nigeria. They plotted the number of them. But secretly what they were plotting was just the number of stories about mass shootings in Nigeria, even if they were about the same shooting. But I guess everyone makes mistakes. That's fine. You know, we all misread things. But there's something about putting it in a chart and making it.
Starting point is 00:22:46 look all objective that makes us just more likely to say, well, that's just the facts. There's no mistakes there. Exactly. Exactly. And so this is sort of one of the things we talk about in regards to data visualization, which is that our methods for data visualization are almost like dangerously seductive, right? Because, you know, we can do these really dazzling things. We use these very precise lines. We use these geometric shapes. And then, you know, it looks, by all accounts, to be quite true. And it's like, how could we ever question that? But in fact, exactly this.
Starting point is 00:23:22 You know, there's this kind of fundamental misstep in that project where the, the journalist sort of confused the object of analysis, basically, is like instead of actual kidnappings, we were actually talking about media reports of kidnappings, which are really quite different things. And unfortunately led to, yeah, like retraction of the article and things like that. So let's move our way into slightly less innocent mistakes about, or not mistakes, but ways in which we think about and present the data. You have a wonderful quote that you sort of secretly agree with yourselves, both of you, which is that data is the new oil. Explain what that means to the people who generally say it and also to you. Sure. Yeah. So this is such an interesting metaphor. And this metaphor was circupe.
Starting point is 00:24:16 a lot when we first started writing the book. And I feel like I hear it now less. So I don't know. I should do some Google trends thing on it or something. But I probably, anyone who's following conversations about big data has probably heard this phrase. It was first said, we actually traced its history. So it was first said in the 2000, like mid-2000s, like around 2006, seven. And then it was really boosted. I think of it as a meme. So it was boosted as this like sort of meme by the economist magazine. in I think 2011, 2012 type time, when they did this whole issue on data and all of the kind of great profits that can result from extracting data, using data from social media to infer various things to kind of fuel what you might call business intelligence these days in the corporate world. And the metaphor is really interesting because we can think about oil and we can think about data. in fact, the verbs that we use about both of those things really line up. So we extract, we mine, we clean, we refine, we process. Like all of these are metaphors of for oil. They're not
Starting point is 00:25:32 metaphors. They're things we do with oil. And there are metaphors that we use for working with data. Which is very interesting because oil is quite extractive, right? Like oil is an industry of extraction. And so the interesting thing with the metaphor when folks like the economists are using it is like they're using it in a very positive way, meaning profit. Like we can extract this natural resource, sort of quote unquote natural resource in the case of data. And then, you know, kind of clean process, analyze, deploy whatever. And it will yield great riches and profits. But then, you know, the question is always sort of like for whom. We raise that a lot in the book, we call them who questions.
Starting point is 00:26:18 And that's often what feminism is good for. It's good for asking who questions. So it's like even in the case of oil, you look at like, well, who benefits from oil? Like if we're saying data is the new oil and that's a good thing, who has benefited from oil in the past? And what kind of sort of externalities have been sort of created in the process and who has borne the brunt of those? And so, yeah, we sort of unpack that metaphor and don't disagree in the sense that, like, I think it's an apt metaphor to describe what a lot of corporations are doing with data. If you look around the companies that are making the most money with data right now are the companies that are, you know, they have the resources to collect, store, analyze, deploy, etc., these data-driven products and services. But it's, again, I think that metaphor of extraction is something we really need to think about in the process.
Starting point is 00:27:17 Well, I guess, I mean, to me, it's quite obvious that oil as a concept is a double-edged sword, right? As many good things about it has helped industrialize the world, increase the standards of living. It's pretty clear that it's also bad things about it, climate change. And I would argue that it also has led to massive inequality. Some people get rich, others don't. And the data is the new oil metaphor lines those up. And it sounds exactly right. I mean, is it exactly right?
Starting point is 00:27:47 I mean, do you think that the data has this property that it will have just as many bad? I shouldn't even say just as many. I don't know what the metric or the measure is. But very noticeable bad properties, bad effects, just like it will have good effects. Yeah. Yeah, I think so. I mean, I think in the sense of thinking about like what are the negative externality created by this the data economy,
Starting point is 00:28:13 which some people, I don't know if you've heard this, the other metaphor people have been using is the fourth industrial revolution around the sort of big data economy. I don't know. I scoff a little bit at that, I have to say. Because again, it's like, you know, for who? Like, who is benefiting? And, you know, one of the interesting things with data
Starting point is 00:28:34 and the negative externalities that these, that are created by the kind of big data economy, is that the externalities are very similar in a sense. Like the negative externalities are very similar to the ones that oil create. So like if you think about like even just one of Facebook's data centers that was located in New Mexico, cost something like $30 million a month in electricity to operate. Like this is a tremendous sort of ecological energy intensive sort of earth-executive sort of earth-executive. sort of earth exploiting thing to be basically just fueling us to talk to each other.
Starting point is 00:29:17 So thinking about like there are environmental sort of costs to the cloud, as it were. But then there's also all of these social and sort of inequality generating costs as well. Let me go into some of those in the book, too, wherein if the power of data is really centralized in the hands of large actors, because that's really who have the resources to be able to mobilize it, they're going to use it to their benefit, and they're not necessarily going to use it to the benefit of sort of human health and wellness, more generally speaking, right?
Starting point is 00:30:00 And so I think that's a big problem if it's really the corporations, and to some extent the elite, governments and universities that can mobilize data, and the rest of us are just kind of left in the dust. And that's a concern centered on, I guess, data plutocrats, if that's a category of people we can point to. But you also make the point in the book that even among data scientists, people who we would imagine are trying to be objective in finding true things about the world, there are
Starting point is 00:30:31 value systems that have sort of sneaky choices built into them. So there are values that are valorized, I guess I'm running out of vocabulary words here, but things like ethics and fairness and accountability and all of these sound great, but they can be used to actually silence voices that are being slightly contrary and so forth. Whereas other values like fairness or equality, sorry, fairness within the other category, justice or equity are not talked about as much. So maybe say something about how it's not just the Pluto, it's even the scientists who sort of maybe even innocently stumble their way into putting these
Starting point is 00:31:11 values in a hierarchy. Totally. Yeah, thanks for bringing that up. The, you know, one of the observations that we make, you know, so first of all, I think one, it's very encouraging that there's a lot of computer scientists right now and a lot of folks in the technical community are talking about what it commonly goes by is fairness, accountability and transparency. Are the things, you know, there's like kind of a whole set of conferences that are organized on these topics.
Starting point is 00:31:40 What that work tends to cover is work that's looking at algorithmic discrimination, discrimination and bias in large data sets and training data sets, you know, running from, you know, text, natural language processing to images and things like this. And then thinking about like how do we make these things more fair. But then there's been some interesting pushback on that work, and we're not the only ones doing that saying, well, like, why are these particular values, this like fairness, accountability and transparency? Like, why are these the set of things that we're organizing around? And in particular because a lot of work, and this is not all the work in the space. I don't want to be caricaturing it or something.
Starting point is 00:32:28 But a good amount of the work in the space when you take a concept like fairness, often what these things try to do is say, okay, well, let's say we have some discriminatory system like credit lending or something like that. And what we're going to do with our speculative algorithm is we're going to just, you know, tune these knobs and levers so that race, any kind of racial bias or any kind of gender bias is sort of like excluded from the system. You kind of like get the bias out of the system. And so it's not to say that those approaches have no value. But at the same point, one of the things we would say in response is that they're actually not addressing, they don't come with a full kind of conception of the problem. So they're not coming with a kind of root cause analysis of like how did we get to this point? And how do we get to the point that lending to black folks in the United States is perceived as higher risk?
Starting point is 00:33:32 This is through centuries of like really deliberate sort of design and discrimination that's been built in from kind of like the very beginning post civil wars. We've had all sorts of, in a way, big data tools like redlining, racial covenants, you know, both informal and formal legal tools that have worked to keep that kind of system in place. And so, you know, just looking at it. let's say, like mathematically within a kind of a closed system, unfortunately fails to account for these hundred of years of history, right? And almost invariably, the work that starts with this fairness lens is like starting from like time equals T equals zero or something, you know? And rather than saying, actually, we need to take like the past 200 years into account.
Starting point is 00:34:25 Like that needs to be part of the model. That's part of the data. And so like that would be the approach that's about equity and justice. justice. And so like that's, that's in a way like the challenge is like how do we do that? And we can't do that in a race blind and gender blind way. It just doesn't work like that because history is not race blind and gender blind, right? We have to acknowledge that past and we have to account for it somehow in our systems. So like, yeah. I mean, maybe we can get some some good examples on the table here to ground people's thoughts. Because I think that there is
Starting point is 00:34:59 an extremely naive point of view, which would just say, look, if you have an algorithm or if you have a data set, by definition it can't be biased. It's just a computer. I mean, I say that's a very naive point of view, but I know people who have it. Of course, in the real world, algorithms are trained on data sets and who
Starting point is 00:35:16 chooses which data set and what history is the data set reflecting. One that struck me very vividly recently was a fun thing going around Twitter. I think it came out after your book, but take a sentence like, she won the Nobel Prize and put it in Google translate into a language that does not have gendered pronouns and then translate it back to English and it's always he won the Nobel Prize
Starting point is 00:35:39 when it comes back, right? Yes, exactly. I mean, maybe explain more about how and examples of how these biases from history or from ignoring history creep into purportedly objective algorithms. Exactly. Yeah. And so, see, this is great because I think this goes back to the other topic we're talking about is that kind of faith, like an overplace faith in the systems that they are objective
Starting point is 00:36:03 and that they are going to work like that. So, yeah, so these things happen because, again, it goes back to the human constructed nature of the data sets, right? Like if we're going to make, let's take facial recognition, which we talk about in the book, right? So if we're going to make a system that recognizes a face and is able to distinguish a face as opposed to the background in a photograph, we need training data because the system needs to be able to look at a bunch of images
Starting point is 00:36:36 and have ones that have already been annotated and say, oh, this one's a face and here's the exact location of the face. And looking at many, many, many of those is going to teach the system where the face is and what faces look like and so on. But one of the things that, for example, I look at the work, of Joy Buala M. Weenie, who's a colleague at MIT. She's a Ghanaian American. She has very dark skin.
Starting point is 00:37:06 And she sat down in front of her. She was just got an off-the-shelf facial detection library. She wanted to use it in a class project. And it just wouldn't see her. Like it wouldn't put that little box around her head, you know? The invisible man. Yeah. She got her white friend, recognized the white friend.
Starting point is 00:37:26 She got an Asian friend and recognized the Asian friend. And then she took a, she had this like white theater mask and she put on this white theater mask. And then it recognized her face. Right. So we have to think about like what's happening behind that system. Is it that the, um, the engineers are racist? No. Like the engineers were not like sitting there being like, yeah, we're definitely going to discriminate against, uh, black women.
Starting point is 00:37:50 But what happened is they had, you know, she, she's a very technical person. So she dug into the guts of the system and she ended up writing a really important paper called Gender Shades with Tim Neat-Gabrew, where they audited the training data libraries that are used to train these kinds of systems. And in the resulting analysis that they looked at, it's sort of like the problem of like what data are available is often like the biggest source of bias. Because in the case of face data, these are celebrity and like political people. profiles. And so these are often the data that are used to then train the data sets. But then we can think about, well, like, who's celebrity, who's a political person, what kinds of racial and gender biases are those? It's going to be like mostly white people, mostly men. And in fact, what they found is something like, and I'm forgetting the exact figure, but something around like
Starting point is 00:38:47 88% of the faces in this benchmarking trading data set were what they called pale and male. So they were actually, they didn't rate their race. They were looking just at the skin color. And so of course, a system that's trained with that kind of training data
Starting point is 00:39:05 is going to fail really badly. Like it works great for white men because that's the kind of user group that's really centered in that kind of system. And then it works really terribly for black women because there's so few images of black women in that in the training data.
Starting point is 00:39:21 And so it's sort of like thinking about, it's not, really that the algorithm is biased, is that we are biased, right? And it's even also that even in building the system, so it's like the engineers weren't sitting there being like, ha, ha, ha, like, let's be racist and sexist. But also they didn't have mechanisms to seek out the inherent racism and sexism that will show up inevitably. They didn't have the tools to look for it. They didn't have the checks and balances to be able to like check for it.
Starting point is 00:39:55 before it ends up that the black woman discovers it on the tail end of things and then, I mean, sort of exposes it from there. And so that's, I mean, it's sort of like how we end up with these things is because as we build human systems, we're pulling from the human and social world to train those systems. And that world is not a race, you know, it's not a, it's, you know, hopefully one day we live in a world where we are not, you know, racist and sex. But right now the world is racist and sexist. Like that's the, you know, garbage and garbage out.
Starting point is 00:40:30 Exactly right. So we have to deal with. So we have to look for the garbage, basically. You can't travel faster than the speed of light, but what you might be able to do is hire people at nearly the speed of light with Indeed.com. Indeed is the job site that makes hiring incredibly simple. Everything you need is in one place, including interviewing. Indeed's hiring tools help you cut through the noise to hire faster and smarter.
Starting point is 00:40:54 Indeed, Instant Match will provide you a list of quality candidates whose resumes are on Indeed the moment you post a sponsored job. And then you can invite them to apply right away. According to Indeed Data, candidates you invite are three times more likely to apply for your job than those who only see it on search alone. Plus with Indeed Instant Match, 90% of employers get quality candidates from Indeed's resume database as soon as they sponsor a job post according to Indeed data. get started right now with a $75 sponsored job credit at Indeed.com slash
Starting point is 00:41:29 Minescape. That will be an upgrade to your job post at Indeed.com slash Minescape. Offer valid through September 30th, Terms and Conditions Apply, that's Indeed.com slash Minescape. Well, I mean, this really highlights, I think, what to me is the
Starting point is 00:41:45 big, looming philosophy question here, epistemology question, or whatever you want to call it, about objectivity. I mean, I can very crudely distinguish between three attitudes that we might have towards objectivity. One is, objectivity is good and we have it, basically. Like, the science and computers and data are pretty objective, even if we humans are flawed. Another attitude is, you know, objectivity is something we should aspire to, but we don't have it. We should be extra careful in trying to get there.
Starting point is 00:42:14 And a third attitude is the goal of being objective is just misplaced in the first place. We shouldn't even try. We should recognize our, you know, individual non-objective goals. So where do you sit in that classification scheme? Yeah, that's helpful. So I think what I would say, you know, and this is where I think some really interesting feminist theory comes into. So folks like Donna Harroway and Sandra Harding, who have thought through these questions really deliberately and specifically we can draw a lot from them. So they have the ideas of something called feminist objectivity. Donna Harroway specifically talks about situated knowledge.
Starting point is 00:43:02 And so, you know, I think where we can't end up, you know, and in fact, like, I don't think any, well, I shouldn't say that. I don't think any feminists would say, but I'm not, I guess I won't speak for all feminists. because maybe some of them would say this. Exactly. But like I think where we can't end up, I find it untenable, a position of like, oh, to everyone their own individualistic truth and truth is super subjective and everything is relative. You know, I think that's untenable because, you know, we have to be able to arrive at some
Starting point is 00:43:35 collective shared understanding about the world. And, you know, I think it's very dangerous when truth is just fiction. We've seen that happen recently. And so I think that's a very dangerous world to me. And yet at the same time, I think we have to recognize that the current conception of objectivity as it exists has also been exclusive. Like it hasn't been including all people, all genders, all races, all communities into this sort of fold of objectivity. And this is something feminists have critiqued for ages. A more recent critique is Ruha Benjamin, who calls this imagined objectivity.
Starting point is 00:44:25 And it's not to say that we've done some just incredible scientific achievements following these kind of tenets of scientific method. And yet in particular when it comes to how sort of science and objectivity meet the human and social world are disciplinary. is really constraining for us. And, you know, it's like, this is what's sort of like leading to, I think, like, the, some of the confusion on the part of the technical community for, like, how do we do this better? And often when you're trained in computer science, you're not given, like, you haven't really
Starting point is 00:45:00 been trained in, like, gender studies. You know what I mean? Like, you haven't been trained in history of race in the United States. And so, like, we're finding these places where, like, our knowledges haven't been able to these bodies of knowledge need to meet each other, and they haven't met each other and been incorporated. So I guess, like, in your scheme, I would be, I think, sort of somewhere in the middle,
Starting point is 00:45:27 you know, I think one of the things that Harroway would tell us is that all knowledge is situated. So we're all in a particular kind of context. You know, we're in a particular country, geography. We have a particular value system. And so the sort of, strategy that feminists would advocate for what you would call strong objectivity or feminist objectivity, which was something we pull in more people from more places, and we try to recognize more of our
Starting point is 00:45:57 own blind spots, more of the ways in which our kind of rigid conception of objectivity may be excluding people. And so how do we bring people to the table to understand the kind of cultural and social boundaries of our knowledge better. So maybe that's a little bit long-winded. That's what we're here for. It's like the feminist or my feminist kind of heritage would be like we don't like throw objectivity out the window. We definitely don't throw away like collective knowledge making.
Starting point is 00:46:28 But we do have to include more people and we do have to break down some of the exclusionary norms that end up pushing people out of, you know, the quote. unquote sort of objectivity. I guess my temptation upon hearing things like that, which was actually very eloquently said, and a lot of me wants to agree with it, but then there's part of me that wants to say also, make it even more complicated by saying, you know, look, the charge of the electron is equal in magnitude and opposite in sign to the charge of the proton. And that's not situated anywhere in particular. Like that's absolutely universal. So I guess I want to sort of imagine a spectrum of situatedness of knowledge where there's some raw physical facts that are pretty much universal
Starting point is 00:47:13 and some social statements we could make that are pretty obviously situated. Is that a fair way of thinking? Yeah, I think so. I mean, I think it's sort of like the closer that we get, you know, the closer we get to the human and social world, I think this is where when we try to take sort of like physical science, sort of laws, for example, and then apply those to human. circumstances. I just feel like it's a recipe for disaster. Because like these kind of laws will not, and people try to do that. Like they try to find, well, like, what's the underlying grammar of a city or what's the, you know, underlying law that guides this particular kind of human behavior? And the thing is that like humans are
Starting point is 00:47:57 just more complex than electrons. I mean, maybe one day we could get there. You know what I mean? In terms of like having some laws that are going to be like, oh, Catherine, you know, right after this podcast is going to go pat her cat. Like, we just know that about her. But I think this is what happens in the human world is that just things are messy and there's so many variables that the scientific method way of kind of like, well, let's, you know, take this one model and exclude all this other stuff and just pursue this one kind of model of reality down this particular path.
Starting point is 00:48:34 Like I think that can work for certain kinds of things. but it really can't generalize and apply as a method to all these different human and social circumstances. So those are things that we really need to understand as being contingent. And or like we may discover, you know, some laws, as it were, facts that might apply in more contingent circumstances. So this kind of applies, generally speaking, in Western cities that have this particular mode of transportation. You know what I mean? And that's still really useful knowledge. But it's also really useful to know that it's only in this kind of city.
Starting point is 00:49:11 It's not through all kinds of cities. So I think that's where the kind of proponents of unqualified objectivity can be irritating to feminists, is, you know, where you're trying to make these somewhat absurd, like universalist statements or generalizations, and imagining that it's always about generalizing in universalizing. universalizing, right? And so like kind of understanding when, when is universalization and generalization appropriate? And when is it more appropriate to just understand something that's deeply situated and contingent in its own circumstances? And that's where a lot of, you know,
Starting point is 00:49:52 more qualitative social science research comes in and things like that. I guess what my response to that, and I apologize for talking too much myself, but, but I'm working through this in real time because it's interesting. It goes back to the shoes that we started with, right? And I guess one could say, even with the electron and the proton, you claim that's an objective fact about the world, but someone chose to describe the electron as one particle and the proton as one particle, rather than, you know, different combinations or different agglomeration. You know, there was some carving of nature that was human that we turned, that we showed was useful. after the fact. It's not completely arbitrary. There are good reasons to do it. But it's still done
Starting point is 00:50:38 by human beings. The difference being that when it comes to fundamental physics, it's pretty easy to do that, right? It's pretty straightforward to agree on how to carve up nature that way. And when it comes to human beings or even shoes, that's going to be much less easy, much more prone to sneaking in our biases as some objective measure of something. Yeah. Yeah. And I think variation, you know, And so like it's sort of like with the proton and the electron, it's like it's going to be able to behave in a particular way. And that's like super repeatable. And then you can kind of demonstrate just even through repetition that this is pretty much always how this particular thing works. Whereas with shoes, like if I do the shoe experiment in my neighborhood versus in a different place versus with a different group of students, like it's going to be different every single time.
Starting point is 00:51:27 You know. And it's going to be different actually in a similar way. I guess I should say, because that's why I use it as a learning exercise. And so that, but that's what gets interesting to talk about is that, that sort of variation in that. But it's not, but it's not repeatable. You don't always come back with the same result. And so like, and that's for me what makes sort of data in the social and political world the most interesting thing, because they are contingent and yet they are still useful.
Starting point is 00:51:56 Like we can still do meaningful things with them. Like I said, like they're a reduction of the world, but hopefully they're a health, reduction that we can still use in some meaningful way. Well, thank you for indulging my dissent into the philosophical rabbit hole there, but I do want to come back to the data and the feminism. Let's get some more examples on the board, because we talked about how you make an algorithm, you know, for facial recognition or for translation, it can be biased.
Starting point is 00:52:21 You make the point in your book that even the choice of which data to collect smuggles in all sorts of presuppositions, right? And there's a whole bunch of data sets that either should exist or should it do exist, but don't include data that they should exist. I mean, how should people collecting the data be thinking about these issues? Yeah, absolutely. And I think people collecting data have a special, it's almost like a special kind of responsibility, right? Because you really do have to be thinking about not only sort of, you know,
Starting point is 00:52:55 how are we going to collect the data and store the data, but how are we going to steward the data? but how are we going to steward the data? How are we going to provide information, like metadata about the data in terms of who's going to come later? One of the, I think, really complicated things, you know, I find things like open data really interesting and useful and then also complicated because it's sort of like you have governments that are collecting data and they're collecting it for like a really specific institutional purpose. And then they're publishing it often with really. really bad metadata.
Starting point is 00:53:29 So, like, you don't know, like, where it's coming from or, like, what the columns represent. You have to do all sorts of, like, legwork to figure that out. And, you know, but then imagine that we can then use that to do something else that the data we're not intended to do. So, you know, in fact, like, this is why I have a lot of, I think, admiration. I would say for people in sort of library sciences and in fields that are about stewarding information because it's really about thinking about like how do we become good caretakers of information and not knowing when at the time of collection all of the possible ways that the data
Starting point is 00:54:07 might be used in the future and so I think that's what that's that's sort of what is complicated about it because of course you can't anticipate all those possible future uses and you know other data sets that people might want to combine it with to infer something else or whatever um so so yeah I mean And I think, but that's not to say, like, we shouldn't collect data. It's just to say that we have to do it with some of those caveats and things in mind. And then I think, you know, in particular for data feminism, we think in particular about ways that structural forces of inequality like sexism and racism can enter that process. So really tuning into, like, what are those ways that they enter? You know, really obvious one is when you collect gender data, for example.
Starting point is 00:54:58 And so, like, you know, when we are collecting gender data, thinking about, you know, first of all, why are we collecting gender data? But then also thinking about how, you know, it's, you know, 95% of the time when I'm filling out a forum that is asking for my gender, there's only a binary choice, but they're far more than two genders. And so kind of thinking very carefully about how some of our categories that we've sort of naturalized. So like this, we inherit this like received wisdom of like there's two genders. Well, in fact, there aren't.
Starting point is 00:55:32 Like empirically speaking, there are not two genders. Same with race. Sort of thinking about like how do we ask people to enter their race or how do we racially identify people? All of these are, I think, really fraught. But then even things that are not. overtly related to, say, racism, sexism, gender race, identity categories, but then that can often be used to infer those things. So like, for example, in the United States, things like zip code,
Starting point is 00:56:03 you can infer somebody's race with something like 80% accuracy from their zip code because our country is so segregated. And so thinking about ways that even collecting something like a zip code can be racialized, right? Like somebody could use that or a system, not even intentionally, could end up differentiating people and sorting people by, sort of racially by proxy through the zip code. And so, like, these are the things to think about as we're collecting data, like, what data may reveal about us or about these different group-based identity categories that may have implications downstream for whatever system you're building. on the on the specific issue of the two genders i just want to for the audiences benefit say two things number one
Starting point is 00:56:52 one of the very first podcasts i did was with alice drager and we talked about all the different ways one can fail to fit into the natural categories of the two genders that we're most familiar with but then also in your book you have this lovely chart that i had never seen before i don't know if it was brand new with you if you got it from somewhere uh that traces all the different ways you can end up in between the traditional poles of biologically male and female. And this is not even about gender, right? This is about sex. It's just just biology. Like you forget about psychology. I love, yeah, I love that piece. It's a piece from Scientific American. And it's called like Beyond X, Y and X, X or something like this. And yeah, it's a beautiful flow chart. The designers and the research team looked across all the most
Starting point is 00:57:42 recent literature in sex differentiation. And showed how, again, like our received wisdom is that like there's like three, maybe three sexes, you know, male, female, and intersex. And that it's like biologically determined at birth and then it's just set, right? But it's a beautiful flow chart visualization that shows how, no, in fact, sex is differentiated. And it unfolds. It's dynamic. Over time.
Starting point is 00:58:12 And so it kind of over time. Yeah. I just love the piece because it really complexifies, even in, I would say, gender studies, because in gender studies, there is also received wisdom of like, okay, well, gender is more of your identity and sex is a kind of biologic thing. So even in gender studies, that's the received wisdom there. And so I love that that piece is kind of complexifying and challenging that received wisdom as well. Well, the word complexifying is great here because that's a lot of the work that you've
Starting point is 00:58:44 set for yourself, right? Taking clear and easy distinctions and sort of saying, well, it's not always quite that simple. And the other thing I love about the chart is, you know, there's nothing in there about your feelings. It's about a mutation in this gene leads to a decrement in this particular hormone. And it's hard to look at the chart and go, oh, you're denying the science. It's exactly the opposite of that. Yeah. No, it's like showing the science. Yeah, it's just being honest. It's bringing us up to date about the science, actually. That's right. And the one other example, I love because it's just so direct and relatable, is the crash test,
Starting point is 00:59:18 crash test dummies and how, how they've affected ideas about safety in cars. So I'll let you tell that story if you want to. Sure. Yeah, no. So for years and years, we only used male-sized crash test dummies. So these were based on, you know,
Starting point is 00:59:37 the statistical average of the male body. And in fact, what this led to is that, pregnant people and women were 40% more likely to be injured or die in a car crash because we hadn't been basing it on women's bodies. So this is just such a clear and also such a harmful example of the ways in which sort of not just a simple thing of like not considering gender differences in bodies. Right? Like just such a clear example of that.
Starting point is 01:00:12 it's very similar too to like it was only fairly recently that the NIH mandated that you have to have equal numbers of women and men participants in health research studies because for many years women were excluded from scientific research due to evidently their menstrual cycles because like I'm just imagining like a bunch of men being like oh I don't know like they're weird like they're weird like they're have menstrual cycles, you know? Like, we definitely have. We can't include that because that might mess everything up, you know. But I just find that, like, so interesting because men have hormonal cycles, too. So it's like their hormones will mess things up with women's wealth. But anyway. So, yeah, but so I think that's like a really clear example of this sort of thing where, again,
Starting point is 01:01:04 it kind of goes back to this objectivity question of, like, you know, true objectivity. Maybe that would be great, but we don't have it yet. So there's all these ways that we have excluded, and we are still pretty far from including. And so we have to work on that. Love bread, baked goods and pasta, but not the way they make you feel? What if I told you there are macro-friendly options that don't taste like sawdust and sadness? Satisfying sandwiches, fully loaded bagels, noodles that can stand up to your favorite chunky sauces, all delicious. Craveworthy and smart, each serving of hero bread has up to 19 grams of protein and 32 grams of fiber.
Starting point is 01:01:42 and just zero to five grams net carbs and zero grims sugar. Hero bread bakes with heart-healthy olive oil and delivers this soft, fluffy, flavorful experience you love. Breakfast burritos, smear-loaded bagels, real mac and cheese. Hero bread bakes loaves, bagels, and tortillas that don't taste or feel like cardboard. Noodles that don't fall apart in hearty sauces. Plus, limited edition small batch bakes like the 2 grams net carb hero croissant or 1 gram net carb hero cheddar biscuit, handmade in a Sonoma-based French bakery. Shop now on hero.com.
Starting point is 01:02:12 Use code iHeart for 10% off. That's hero.co. Per serving, not a low-calorie foods and products contain alulose, see nutrition info on hero.com for sodium and sugar content. Hey, everyone, it's Cal Penn. I'm the host of Earsay, the Audible and IHeart Audiobook Club. This week on the podcast, I am sitting down with Ray Porter, the narrator of Andy Weir's audiobook Project Hail Mary,
Starting point is 01:02:38 massive sci-fi adventure about survival and science, And what happens when you wake up alone very far from Earth? I really had to make a decision because I caught myself getting that frog in my throat and starting to get teary as I'm narrating some of these sections. And it's like, okay, yo, yeah, yo, is this indulgent? And I really thought about it. I was like, no, at this point, it would kind of be betraying the trust the author and the listener have in telling this story if I don't go through it. But there's places in this book that deeply emotionally affected me. and I left it on the mic.
Starting point is 01:03:13 That's great. Because it served the story. People will say like, oh my God, I cried at the end. It's like, yeah, dude, me too. Listen to Eursay, the Audible and IHeart Audio Book Club on the IHart Radio app or wherever you get your podcasts. Well, the crash test dummy example is a good one because it seems like such an obvious mistake. Like if you're trying to be objective and you know that different people come with different size and shaped bodies, isn't the most objective thing to do to try to get the cross-section
Starting point is 01:03:45 that is as fair as possible of bodies. But these human flaws, and again, sometimes they might be pernicious, but sometimes they might just be, as human beings, we're finite and fallible. They get in the way. I remember the story, and I'm not going to get the numbers right, so I'm sure someone on the Internet will correct me. But Sally Ride, when she was the first female astronaut on the space shuttle,
Starting point is 01:04:09 Like they packed something like for a week long trip, thousands and thousands of tampons because they're like, well, who knows? And all you had to do was ask somebody, right? Even if it was all men doing the planning, how hard would it have been to ask? And that got in the way. Right. I love this story. Yeah, I know. I think this story's hilarious.
Starting point is 01:04:30 But it shows you how it's like things like social stigma, how they shape our inability to even have these conversations. Right? Like the fact that they couldn't ask her until she, I don't know, to open up some cabinet and a bunch of teeth on her head or something. They just have to ask any woman in the world. Yeah. So, so yeah. So I think, I mean, I think that's the thing too is sort of like if you just think about that as a bias, but then like magnified up into like what do we like take on as a research subject in terms of like, you know, who's going to work on menstruation, for example? Like if, if the majority of scientists in health, let's say, think that menstruation is weird, like they're not going to work on it, you know, or who's going to work on transgender health or who's going to work on segregation, you know? So it's sort of thinking about like how these taboos, norms, stigmas, values, sort of human and social things, they creep in and they affect things.
Starting point is 01:05:34 And they are pernicious ways, but they're not often, you know, I would say the majority of the time, they're not often from like some one person's brain being like, I'm going to do this. They're systemic. And they're part of this sort of, you know, again, this systemic inequality that we swim in every day. And so that's why it's important to have a kind of both a theoretical and kind of a material understanding of like what is, what is this water that we swim in? How do we look out for those things? And how do we ultimately create better work and better science by being able to recognize those things and, you know, take steps to avoid them in our own work? Well, this is, I might be going, I might be generalizing too glibly here.
Starting point is 01:06:21 But I think that this is one of the reasons why the tension about feminism is particularly sharp in my own field of, you know, physics and closely related fields. of computer science and philosophy and things like that, because on the one hand, there are fields that valorize objectivity and, you know, treating everyone equal would be part of that goal. But on the other hand, they make all their money out of simplifying things, right? And, you know, boiling things down to the essence and treating things like spherical cows. So this messiness of saying, well, you know, there's actually a lot of heterogeneity in the sample and things like that is anathema a little bit to the method that. has been so empirically successful in these fields. So there's, there's a mismatch there,
Starting point is 01:07:09 which, you know, maybe shining some light on it will make it a little bit better. I don't know. Yeah. No, I think that's a really important observation. And I, and I still, I hold hope for sort of like the quantitative fields. I mean, in a way, what I hold hope for is when I see really excellent mixed methods research, you know, because I think we do get somewhere by simplifying, we just have to remember that we simplified. You know what I mean? You know, it's like we just, again, we can't confuse the representation of reality for the reality itself. We can't confuse the spreadsheet for the, the truth out there.
Starting point is 01:07:47 Even though they are interconnected, they have a relation there that's important. And so I think for me, I get excited when I see sort of the mixed methods. And one example, like a concrete example I'll give of this is Mary Gray has. a book called Ghostwork, and that she wrote it collaboratively. And her co-author is a computer scientist. She's an ethnographer and qualitative researcher. And it's all about the sort of behind-the-scenes labor of the platform economy. Like so Uber, basically.
Starting point is 01:08:23 All the people that work behind the scenes, they showcase a lot of folks in India, for example, that they're kind of like helping the algorithm along the way as it does things. They do human in the loop sort of approvals of things and checks on people's licenses and stuff like that. And it's this really lovely study because, you know, Mary will be doing ethnography and interviewing people like in their houses in India. And she'll surface an interesting question. And then her partner will go test that computationally and quantitatively across a bunch of network data. And then he'll surface some interesting questions.
Starting point is 01:09:00 And then she'll kind of go try to validate those in interviews. And I get really excited by this kind of work because I feel like it's building on sort of the strengths and the limitations of both things. Because like the qualitative work is like, okay, you can interview like, even if you interview 100 people in India, it's not like you've interviewed the population. And then for the quantitative work, it's like, okay, you've got all this really great coverage, like in terms of scope and scale, but then what explains the variation, right?
Starting point is 01:09:29 And so I love how they can kind of go back and forth and make it really sort of multi-scaler. And so that for me, like it's a little bit closer, at least when we're talking about things in the kind of human and social and political realm, to being able to use these kind of different fields, methods to their best advantage and in a complementary way. Again, to get further to like kind of what is really explaining some of this
Starting point is 01:09:53 variation on the ground. Yeah. And again, if you put it that way, who can object to the idea that, you know, we should have these richer methodologies that we might find something out? I don't know, but people can be a little bit resistance, which I guess brings me to this question I probably should have asked within the first five minutes, but how do you define the word feminism? What does that, what does that mean to you? That's one of the points of contention, which maybe it shouldn't be, right? Like whether or not you think it has a positive or negative connotation. Sure, yeah. So for us, and I should say there's many feminisms, and Lauren and I in the book are specifically pulling from intersectional feminism.
Starting point is 01:10:32 But at the very basic line, if you are a person who believes inequality for all genders, you're a feminist. And that's kind of like the basic definition. And that is feminism. Is it a belief that all genders are equal? And there's a kind of a corollary to that because if you believe that all genders are equal and then you look at the world around you, you can see that that equality has not been realized in the world. And so feminism compels you to take action to realize the world in which all genders are equal. And then I would say those are two kind of like indisputable things about feminism. And then the sort of specific type of feminism that we draw from is.
Starting point is 01:11:20 called intersectional feminism. And this is an idea out of black feminism in the U.S., where black feminists said, you know, we cannot consider, like, gender inequality can't explain our reality, and it can't explain social inequality because we have to take race into consideration. And so that would be the basic idea of intersectionality.
Starting point is 01:11:47 And so this idea that we include both sexes, racism, and then since then, you know, other sort of forces of oppression, so thinking about like classism or colonialism and so on, so that we can try to think simultaneously, uncomprehensibly, about how those forces intersect to produce social inequality. So that's the, and I should say intersectionality is pretty well established at this point. So that was around in the late 80s, early 90s. And that is, I would say, the dominant kind of sort of feminist thought today.
Starting point is 01:12:20 I think physicists would be a lot more in favor of the idea of intersectionality. If it were just labeled non-linearity, I think that's what we would call it. That's great. Okay. Maybe I'm going to say that next time I speak to physicists. If I understand it correctly, it's just the idea that whatever discrimination you face for being, let's say, a woman and black is not just the discrimination you face for being a woman plus the discrimination you face for being black. They can interact nonlinearly. Yeah, they can enhance. Exactly. Yes, that's exactly right. Yeah, yeah, yeah. Okay, I'm going to totally remember this when I'm talking like to, is this just in physics? Anything, anything, any field that likes to have equations. Non-linearity is literally a feature of equations that just says you can't add things and get the sum of the effects is not the sum of the causes.
Starting point is 01:13:12 Exactly. Kimberly Crenshaw's points is like it's like it's not reducible to these like two separate things, but they combine and interact. The Schrodinger equation is linear, but human beings are absolutely not. That's a pretty obvious thing. But the less obvious thing, I mean, the thing that, you know, your definitions of feminism seem pretty transparent and unassailable, but the tricky part is that you use the word equality without quite digging into what that means. And if I pretend to be, you know, hard-nosed about this, I say, well, what do you mean? I mean, the average height of women throughout the world is not equal to the average height of men. They're not equal in that sense.
Starting point is 01:13:51 So what do you mean? Men and women, all gender should be treated equally. Yeah. I mean, equality of access to services, happiness, equality of opportunity, to promotions, et cetera, and a quality of power. I mean, I think, like, this is when we'll know that we have achieved feminists. goals is when we look at who's in power and it's representative of the population, you know. And, you know, it's just still we're so far behind on so many things, like in terms of both, if you just look at like political representation or if you look at like the wage gap,
Starting point is 01:14:39 just all of these different markers and measures. And so I think that to me would be like, this is when we'll have achieved the world. world in which we have equality of genders is when all those measures of power have equal representation. Yeah. Yeah, I mean, this is tough for me because, well, I had Elizabeth Anderson on the podcast, and equality is her thing. I'm not sure if you're familiar with her work, but he's a leading theorist of equality.
Starting point is 01:15:08 And people also, you know, the audience members, I can tell from YouTube comments, et cetera, some of them get their hackles up because they hear the word equality and they instantly read that as a quality of, I don't know, wealth, right? Like everyone has the same amount of wealth or the same about anything. And Anderson's point of view is much more nuanced and it's more about, you know, the equality of the ability to become who you want to be or something like that, right? Yeah. I can at least imagine a world where men and women have exactly equal equality to become who they want
Starting point is 01:15:42 to be and a bunch of women decide that they don't want to be in positions of power, right? I mean, maybe that's not the actual world because we're very far from this thought experiment of equal opportunity. But I worry about measuring it or judging it that way because that's an outcome-based measure. And I want the – it's like as a poker player, I know that you can do the perfect strategy and the outcome might not be what you want or what you anticipate, right? So maybe we have to be a little bit more nuanced about the kind of equality that we're shooting for here. Yeah. I mean, I like Anderson's definition. There's sort of equality to equal opportunity to become who you want to be or something like this. I might have mangled it. So sorry, Elizabeth, but yes, I think that's the thrust of the idea. No, and I like that. And I do think like as more women come into power, we might have, we might see a transformation of the institutions themselves. Like, yeah, maybe the institutions themselves will change and look,
Starting point is 01:16:44 really different from what they looked like before. Because I do think there is a, you know, if you look at, you know, other strains of feminism sort of that are like kind of pushing women into the corporate workforce or like, being like, hey, be more like men. Like, you know, wear your power suit and whatnot. And so I certainly don't think like the goal is not just to like, fill the pipeline, which is sometimes where I get frustrated with that line of research, particularly in STEM, which is not to say it shouldn't exist.
Starting point is 01:17:24 Like, I'm glad people are thinking about this. But it's always framed as like, where the women, not like, why are the men taking up so much space? Like, it's always like the women's problem that they're missing. And so, yeah, I mean, I think, like, institutions themselves will change as different people, both women and people of color, come into power. And that's possibly a really awesome thing. So for example, I'll say, like, I run a research lab at MIT, and I'm trying really carefully to run it in a feminist way, in an inclusive way.
Starting point is 01:18:06 We have a kind of a handbook. We have a set of values and norms. we try to do our best within our little scope and sphere of operation to push back against some of the sort of toxic academic culture that exists at universities, both for students and faculty and staff and so on. And so thinking about the ways in which, and those are small interventions, but I think they are potentially transformative. So I think it's right that institutions themselves may transform in the process of realizing this sort of like equality of access, opportunity, sort of freedom to be, come the person that you want to be, and so on. Well, and this leads in very well to sort of what I wanted to ask for the last big topic here, which is we talked a lot about how not taking feminism seriously or not taking fairness in some broad sense, justice. seriously can lead to these unintended biases in data analysis and processing and so forth.
Starting point is 01:19:14 So what about the flip side? I mean, can we use data or the analysis of data or the collection of data to bring about a better world? Can we, you know, be proactive data feminist in that sense? Absolutely. So, yeah, and I'll give you a really concrete example, which is my next project. I'm working on it now and it's going to be my next book. And in fact, I sent the book proposal today.
Starting point is 01:19:41 Oh, congratulations. So one of the things I'm looking at, you may remember from the book, we talked about the case of Maria Salgaro, who she is collecting data about feminicide in Mexico. So this is gender-based violence. This is when women are killed basically for being women. And this could be what we call in the United States, like intimate partner violence. but it also happens not only with intimate partners, but with other family members. It happens with drug trafficking, et cetera. And this is a big topic in Latin America.
Starting point is 01:20:19 Governments are passing laws about feminicide, and yet they're not, governments are not systematically tracking feminicides themselves, and they're not kind of putting into place the apparatus that would enable them to, like medical examiners to, lay people deaths as feminicides and things like that. And so Maria Salgaro sort of steps into this. And for five years, she's been collecting data about feminicide from news reports. And she's ended up with the largest public database of feminicide in Mexico. And she's ended up sharing data with journalists, NGOs. She's actually testified in front of Mexico's Congress a couple times. And, you know, we talked about this as this really interesting example of sort of feminist counter data, you know, so like kind of collecting data in order to hold institutions and societies accountable for things that they are missing, that they are not doing.
Starting point is 01:21:18 And since then, I ended up, well, I was living in Argentina, in fact, on sabbatical and ended up, I was really interested in this. And I started asking around. I started meeting other groups. it turns out she is not at all the only one who's doing this. There's many groups across Latin America who are doing this work. They range in size from people like Maria who are individuals to large nonprofit organizations who are mapping and monitoring gender-based violence. And so I've been interviewing them.
Starting point is 01:21:49 And in fact, my lab has started building technology to help them detect feminicide better from news reports because invariably they're using news and social media. to tally these killings. And so this is a case that is, I think, really interesting for how we can collect back or like monitor back or really think about how do we use data, aggregate these statistics as a way of holding governments accountable and building political will towards social change
Starting point is 01:22:23 to really like making a case and building an evidence base for changing policy. And so, yeah, So I think that's one example. I think there's many other examples out there. And in fact, especially, and that's one of the reasons I'm also so committed to data literacy is because I think, you know, these are tools that can and should be in the hands of, you know, human rights groups, journalists who are doing some of the most interesting sort of accountability work with data right now, social movements, community-based organizations and so on. So I think there's a lot of power and potential of data science really for good. Like that's what we're talking about here is like data science for justice.
Starting point is 01:23:05 So, yeah. And I think along those lines, there was one of the points you made in the book that, you know, your sort of suggestions to do better, which really struck me was the blending of reason with emotion. Or you know, like let emotion exist. Don't be, don't be afraid of it. And you can elaborate on why we should do that and what it means. It reminded me of a talk I heard at MIT by Evelyn Fox Keller, your MIT colleague, about the dawn of the scientific revolution and how it wasn't completely organic. Francis Bacon and friends would sit down in the coffee shops and decide what was science and how we do science and how we go about it.
Starting point is 01:23:47 And they made up a bunch of rules for sounding more objective than we really were, right? Like, you know, write in the second person, you know, to explain that things couldn't have been any other way. Don't tell about all the mistakes you made. Just tell the successes. And it's not as necessary or inevitable. It was a choice that was made along the way. And so, I mean, maybe give your point of view of why it's okay or even good to not always try to sound as objective as we could possibly be. Totally.
Starting point is 01:24:21 Yeah, no, I love that. And this goes back to the objectivity thing, in fact, right? Because it's sort of like when we have to put on that cloak of objectivity, right? We are sort of trying to push that other stuff under the rug. Like the mistakes that we made, the ways in which our own blind spots were maybe uncovered during the process. And what a kind of feminist approach would say is, in fact, we want to see that. Like, that's part of being transparent, in fact, is not. only telling this heroic genius story of some kind of like discovery or new knowledge that we made,
Starting point is 01:24:59 but also showing those bumps in the road, showing the blind spots, showing how we got help and support, like that is in fact much more transparent along the way. And then, you know, one of the things that we say about reason and emotion is that, you know, a kind of feminist maxim is whenever you encounter a binary, you should be deeply skeptical of it. Like, because usually it's hiding a hierarchy. And usually it's empirically false, right? So, like, if there's a gender binary, man and women, like, that's hiding a, like, it's just, like, false.
Starting point is 01:25:36 There's more than two genders, right? And we say this about the reason and emotion as well. Like, that's often a binary that we encounter of, like, reason, on one hand, emotion, on the other hand, it's hiding a hierarchy. Reason is usually seem to be on top. And emotion is this, like, messy thing. that's gendered, but more women are emotional or whatever. And yet, there's this wonderful line from Patricia Hill Collins,
Starting point is 01:26:03 where she says like a kind of feminist knowledge would be about valuing reason, emotion, and ethics sort of equally on an equal playing field. And the reason to do that is like, again, sort of acknowledging the situatedness of our own knowledge and our own selves. Like it's about being transparent about our own limitations and the fact that we have emotional motivations for studying things, for doing things. And then when we communicate our knowledge, communicating with emotion is a way to make it accessible to other people as well. So using emotion and communication is a way of being inclusive. And this, I think, comes in particular with data visualization, which can be seen as a very, you know,
Starting point is 01:26:50 It can be seen as a very scientific, technical, specialized way of communicating, which is fine if what you're doing is it's in a scientific journal or something. But when you're doing more for a general public, it can be sort of, it's like a gated community. It's like letting certain people in and it's prohibiting access by other people. So thinking through about like what are ways that we engage emotion, that we engage like multimodal sensibilities for people to access whatever sort of messages with the data. So like it, emotion brings all these things into our toolkit of communication that are lost. If you're like, no, it must be neutral, objective and appear scientific or whatever. Well, you know, David Hume,
Starting point is 01:27:39 one of my favorites said that reason should is and should be the slave of the passions, right? I mean, reason gets us what our passion is. tell us is what we want in some sense. So they need to go hand in hand. It reminds me a little bit what you just said of controversies in news reporting from wars and things like that, where you're allowed to say how many people died, but the government will try to prevent you from showing pictures of those dead people. I mean, in principle, no extra information is conveyed, but the emotional residence is a very
Starting point is 01:28:10 different thing, and that matters, right? Yes, absolutely. Yeah, there's a really interesting piece. forgetting the author, but it's something like by numbing with numbers or something. It's this really interesting piece about how often we are moved by the story of one sort of unjustified death, but not by numbers describing kind of mass death. It's a really interesting philosophical reflection on that. And then like sort of also how do you, how do we communicate the scale of things like a genocide or these kind of mass extermination events at a scale that is meaningful to us as human beings
Starting point is 01:28:52 and doesn't just numb us into feeling disempowered by the scale of it or something like that. But I always like to end the podcasts on a more or less optimistic note if possible. So, I mean, maybe say one more thing about how great it is that we can use data and analysis and all these things to make the world a better place. Yeah, not end on death. Yeah, so I think Lauren and I talk a lot about, you know, in data feminism, we try to describe some of the ways that our data are infected and polluted by the inequalities that we encounter in the world. But at the same time, we do posit and advance the idea that data are also part of the solution. And so really thinking about ways in which we put that power of data into the hands of the people who come really make social change and use data to challenge those same structural inequalities that keep showing up over and over again in our data sets and in our systems.
Starting point is 01:29:58 And did I hear correctly there? This is the most important controversy of all that we haven't even mentioned. You treat data as plural, not singular? Well, we flip back and forth. So like when we talk about big data, we tend to use singular because it's like big data is blah, blah. But if I'm talking about a data set, like a collection of things, then I tend to use plural. But I probably slip and go back and forth. You know, there you go, complexifying the world for us once again.
Starting point is 01:30:34 I feel like physicists should like complex. complexity, right? You don't know about complexity. No, no, no, kicking and screaming. They would like everything to be a perfect sphere as far as this is concerned, honestly. But complexifying is a good thing for human beings when they're not doing physics. And with that in mind, Catherine Dignazio, thanks very much for complexifying our worldview here on the Mindscape podcast. Lovely. Thank you. It was a pleasure. What if you could have even more and more and more help to pursue your goals?
Starting point is 01:31:25 At LPL Financial, we offer more ways for advisors and their clients to thrive. So what if you could? Paid advertisement. Investing involves risk, including potential loss of principal, LPL Financial LLC member FINRA, SIPC.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.