Big Technology Podcast - AI’s Drawbacks: Environmental Damage, Bad Benchmarks, Outsourcing Thinking — With Emily M. Bender and Alex Hanna

Starting point is 00:00:00 Two of AI's most vociferous critics, join us for a discussion of the technology's weaknesses and liabilities and a debate on the finer points of their arguments. We'll talk about it all after this. Welcome to Big Technology Podcast, a show for Cool-Headed, NUance Conversation of the Tech World and Beyond. We're joined today by the authors of the AI con. Professor Emily M. Bender is here. She's a professor of linguistics at the University of Washington. Emily, welcome. I'm glad to be here. Thank you for having us on your show. My pleasure. And we're also joined by Alex Hanna, the Director of Research at the Distributed AI Research Institute. Alex, welcome. Thanks for having us, Alex.

Starting point is 00:00:38 Always good to have another Alex on the show. So, look, we try to get the full story on AI here. And so today we're going to bring in, I think, two of the most vocal critics on the technology. They're going to state their case. And you at home can decide whether you agree or not, but it's great to have you both here. So let's start with the premise of the book. What is the AI con? Emily, do you want to begin?

Starting point is 00:01:00 Sure. So the AI con is actually a nesting doll situation of cons. Right down at the bottom, you've got the fact that especially large language models are a technology that's a parlor trick. It plays on our ability to make sense of language and makes it very easy to believe there's a thinking entity inside of there. This parlor trick is enhanced by various UI decisions. There's absolutely no reason that a chatbot should be using I, me pronouns because there's no I inside of it. but they're set up to do that. So you've got that sort of base level con. But then on top of that, you've got lots of people selling technology built on chatbots to, you know, be a legal assistant,

Starting point is 00:01:40 to be a diagnostic system in a medical situation, to be a personalized tutor and to displace workers, but also put a band-aid over large holes in our social safety net and social services. So it's cons from the bottom to the top. Okay. I definitely have things that I disagree were few in places on, and we will definitely get into that in the second half, especially about the usefulness of these bots and whether they should be using IRME pronouns and the whole consciousness debate. We're going to get into that. I don't think any of us think we think that these things are conscious. I just think we have a disagreement on how much the industry has played that up. But let's start with what we agree on. And I think that from the very beginning, Emily,

Starting point is 00:02:25 you were the lead author on this very famous paper about calling the large language models stochastic parrots and at the very beginning of that paper there is concern about the environmental safety and the environmental issues that large language models might bring about so on this show we talk all the time about the size of the data centers size of the models and of course there is an associated energy cost that must be paid to use these things. And so I'm curious if you, Emily, or Alex, you worked at Google, right? So you probably have a good sense of this. Can you both share, like quantify how much energy is being used to run these models?

Starting point is 00:03:11 So part of the problem is that even, you know, even if you're working at Google, you are directly working on this, they're not very public estimates of how much cost there is. I mean, the cost vary quite widely. And the only cost I think that we know was an estimate being made by folks at Hugging Face that worked on the Bloom's model because they were able to actually have some kind of insight into the energy consumption of these models. So part of the problem is the transparency of companies on this. You know, as a response at Google, after the stochastic paper was published,

Starting point is 00:03:50 one of the complaints from people like Jeff Dean, the SVP of research at Google, and David Patterson, who's the lead author of Google's kind of rebuttal to that, was that, you know, well, you didn't factor in X, Y, Z, you didn't factor in renewables that only we talk about at this one data center in Iowa. We didn't, you didn't factor into off-peak training. And so it's part of the problem. I mean, we could try to put numbers on it, but there's so much, guardedness about what's actually happening here. We can't quantify it. We don't know when it

Starting point is 00:04:28 comes to model training. I mean, we might have something like we know the number of parameters that are in a new model or in an open weights model like Lama, but we don't know how many kind of fits the starts they were with stopping training and restarting or experimenting. So, you know, we could speculate, but we know it's a lot because there are real effects in the world right now. What are those effects? What are those effects? So you see communities losing access to water sources. You see communities, you see electrical roads becoming less stable. And this is starting to be, I think, very well documented. There's a lot of journalists who are on the beat doing a lot of good work. And I also want to shout out the work of Dr. Sasha Luchione, who's been looking at this

Starting point is 00:05:12 from an academic perspective. And one of the points that she brings in is that it's not just the training of the models, but of course also the use. And especially, if you're looking at the use of chatbots in search, instead of getting back a set of links, which may well have been cached, if you're getting back an AI overview, which happens non-consensually when you try Google searches these days, right? Each of those tokens has to be calculated individually. And so it's coming out one word at a time, and that is far more expensive. I think her number is somewhere between 30 and 60 times more expensive, just in terms of the compute, which then scales up for electricity, carbon, and water, then an old-fashioned search.

Starting point is 00:05:52 I would also say that speaking about existing effects, there's also a lot of reporting coming out of Memphis right now, especially around the methane generators, that XAI has been using to power a particular supercomputer there called Colossus there, specifically around emissions there, affecting Southwest Memphis, traditionally a black and impoverished community. There's also reporting on, well, actually in research from UC Irvine in which looking at backup generators and emissions from diesel that are supported that are connected to the grid, but just because the SLAs on data centers are, you know, incredibly high, you effectively need some kind of a backup

Starting point is 00:06:49 to kick in at some time and that's going to contribute to air pollution. And which communities have been affected by the loss of water due to AI data? So I think the best reported one is the Dulles in Oregon. I mean, I think that's the one that is the best known. That is kind of pre-AI in which we're focusing on the development of Google's hyper-scaling.

Starting point is 00:07:11 And it wasn't until the Oregonians sued the city. that we knew that half of the water consumption in the city was going to Google's data center. That was before generative AI. That was before generative AI. I mean, we have to imagine the problem is probably exacerbated right now. But do we know that? I mean, you both wrote the book on this. So we have, we certainly point to environmental impacts as a really important factor.

Starting point is 00:07:41 It is not the main focus of the book. I would refer people to reporting of people like Paris Marks over at Tech Won't Save Us, did a wonderful series called Data Vampires, looking at, I think there was stories in Spain and in Chile. And yeah, so this is, you know, we are looking at the overall con and the environmental impacts come in because it is something we should always be thinking about. And also because it is very hidden, right? When you access these technologies, you're probably sitting, you know, looking at them through your mobile device or through your computer and the compute and its environmental footprint and the noise

Starting point is 00:08:18 and everything else is hidden from you in the immateriality of the cloud. I would also say that, I mean, the reporting on Memphis, I want to give a shout out to the reporting in PRISM by Ray Libby, Uy, Uyeda, I don't know if I'm not saying their surname correctly, but they have an extensive amount about the kind of water consumption of this saying that this would take about a, I think, a million gallons. I'm checking it, but I'm looking, I'm looking at the, the reporting on it. I think, um, uh, I'm seeing the exact number on this. I'm going to look at it. Yeah, so they're focusing, yeah, a million gallons of water a day to cool computers. They don't, they're saying that they need to build a gray water facility to do

Starting point is 00:09:08 it. I mean, this is not anything. that these facilities don't exist yet, so they'd have to be built, but I mean, this thing is already being constructed and is using water. So I mean, I don't think it's a far cry to say that this is happening in an era that was in the hyperscaling era

Starting point is 00:09:29 in pre the Gentry of AI era. I mean, it's the unfortunate fact about it is that a lot of these community groups are fighting this on a very local level and a lot of these things are gonna underreported on just because, but from what we know from the fights in the Dulles and London County and parts of rural Texas, I mean, we'd be surprised that those similar kinds of battles weren't being fought.

Starting point is 00:09:55 I agree with the underreporting, and that's why we're leading with it here, and we're going to go through a list of some of the things that might be wrong with generative AI. I think it is an issue. I think, Emily, you basically hit on it, right, where you're producing all these tokens when you're going to generate an AI overview that I checked. and it is you cannot opt out of it you're correct uh well you can if you add minus a i to the query okay but you have to do that each time you can't like put a setting somewhere that's interesting i didn't know about that okay so you can opt out minus a i but these things do take more computing

Starting point is 00:10:27 than traditional google search i guess the argument from these companies would be that they're just going to make their models more efficient i mean we see the increasing amounts of efficiency over time. And, you know, there might be a big upfront energy cost to train, but inference might end up being not that energy intensive. What would you say to that? I would say that we've got Brad Smith at Microsoft giving up on the plans to become net zero carbon since the beginning of Microsoft. And he said this ridiculous thing about we had a moon shot to get there. And it turns out with generative AI, the moon is five times further away, which is just an absurd abuse of that metaphor. But yeah, and you see just, you know, Google similarly also backing off of their environment goals. And so if there really

Starting point is 00:11:13 was all these efficiencies to be had, I think they wouldn't be doing that backing off. And I want to also add, I mean, I think this argument about the large amount of training in carbon use on the front end and then it tapering off with the inference. I mean, this is an argument that straight came from Google. This was, again, in the same paper by David Patterson. I think the title of the paper, I'm not going to get exactly right, was, you know, the cost of training or the cost of generative AI will, probably not generative AI. I think it was the cost of language models will plateau and then decrease or the training costs. And effectively, the argument being that you have this large investment that we can offset with renewables and then it's going to decrease.

Starting point is 00:11:59 But you have to also consider that given that the economics surrounding it, it's not one company training these, right? I mean, if it's multiple different companies training these and in multiple different companies providing inference. And so as long as there's some kind of incentive to keep on putting this in products, then they're going to proliferate. So if it was just Google, sure, maybe that might be a case in which there was some kind of planning and there was some kind of way to measure and focus on that and then it actually tapering down. But you have Google Anthropic, XAI, of course, OpenAI, Microsoft, Amazon, everyone trying to get a piece doing both training and doing inference. So I think that's, again, you know, like it's hard to put

Starting point is 00:12:49 numbers on it, but what we see in this is just a massive investment in this, and that gives a good signal to say that the carbon costs have to be incredibly high. Look, I think it's important for us to, again, to lead here. It's clear that there's some real environmental impacts. And, I mean, we have Jensen Wong, the CEO of NVIDIA, saying inference is going to take 100 times more compute than traditional LLM inference. And every, well, every top executive from these firms that I've asked, well, is inference going to take more compute? It's not exactly as much as Jensen is saying, but there is a spectrum. So these things are going to be more energy. energy intensive. And for everybody listening out there, I do think, you know, this is an important

Starting point is 00:13:35 context to take in that when we talk about AI, there's an environmental cost out there. It's not fully clear what that is, although there is one. And I agree with the authors here that more transparency makes a lot of sense. Now, let's talk about another issue that you bring up in the book, which is benchmark gaming. It's been a hot topic in our big technology discord over the past couple weeks that we see these research labs keep telling us that they have reached a new benchmark or beat a certain level on a new test. And we're all trying to figure out what that means because it does seem like a lot of them are training to the test. And you have some point of criticism in the book about the gaming of benchmarks and what that's meant to tell us. So just lay

Starting point is 00:14:23 it out for us what's going on with benchmarks and tell us about the gaming, Emily. So, yes. So when you say the gaming of benchmarks, that makes it sound like the benchmarks are reasonable and they're being misused. But I think actually most of the benchmarks that are out there are not reasonable. They lack what's called construct validity. And construct validity is this two-part test of the thing that we are trying to measure is a real thing. And this measurement correlates with it interestingly. But nobody actually establishes what these things are meant to measure as a real thing, let alone that second part. And so they are useful sales figures, right, to say, hey, we now have state of the art soda on whatever.

Starting point is 00:15:02 But it is not interestingly related to what it's named as measuring, let alone what the systems are actually meant to be for. Yeah, and I would just add that, I mean, there's a lot of work. I mean, we, prior to the book, Emily and I spent a lot of time writing on benchmark data sets. And so this has been, you know, like I'm personally upset. with the ImageNet data set. I'm thinking of another book on the MSDET data set, just from what entails. But, I mean, you know, the benchmarks, what they purport, there's a lot of different problems in the benchmarks, right? So the construct validity is probably first and foremost. And when we get something where you have something like Med Palm 2 or Med Palm 1 and 2 being measured on the U.S.

Starting point is 00:15:47 medical licensing exam, that's not really a test that determines whether one is sufficient, you know, prepared to be a medical practitioner. There's so much more involved with being a medical practitioner above and beyond taking the U.S. medical license exam. You can't take the bar and say you're ready to be a lawyer, right? I mean, there's so much more that has to do with relationships and training and other types of professionalization. There's huge literature in sociology and sociology of occupations

Starting point is 00:16:20 on what professionalization looks like and what it entails. and what kind of social skills involved and what that means and how to be adept at being in the discipline. But then the kind of different benchmarks are, there's so many different problems just in terms of the way that companies are doing signs themselves. They're releasing these benchmarks,

Starting point is 00:16:45 and often these are benchmarks that they themselves have created and released. So it may be the fact that they are, quote, unquote, teaching to the exam, but they're also, they have no kind of external validity in terms of what they're trying to do. So Open AI is saying, we had a model that did so well, we had to create a new benchmark for it. Well, who's validating that, right? I mean, even the old benchmarking culture, you had external benchmarks and multiple people would be going to it and saying, oh, we've done better in this benchmark. Now Open AI is saying we have our own benchmarks because we did.

Starting point is 00:17:24 it's so well, not like the old system was any better, but this new system is that, well, where's the independent validation of this that it says it can do this thing that is purported to say? What do you think about the ARCGI test? Yeah, well, I mean, we spent some time focusing on the ARCGI test, right? The ARCGI test, it is independent, at least that. It is, it is, it is, it is, it is, the French, French Licholet, yeah. And by the way, for everybody who's listening, it basically asks, let me see if I get this right,

Starting point is 00:17:59 it asks the models to be able to generalize its ability to understand patterns and putting shapes together. I think that's the best way to explain. Yeah. So it's a, it's a bunch of visual puzzles where it's, I think they're all in 2D grids. And in order to make this something that a large language model can handle those 2D colorful things are turned into just sequences of letters. And And the idea is that you have, I think it's sort of a few shot learning setup where you have a few exemplars and then an input and the thing is, can you find an output like that? And it is, when we want to talk about how the names of the benchmarks are already misleading, the fact that that's called ARC AGI, right? That suggests that it's testing for AGI. It's not.

Starting point is 00:18:43 It's one specific thing. And I think Shole's point is that this is something that is a very different kind of task than what people are usually. using language models for. And so the sort of gesture is towards generalization. That if you can do this, even though you weren't trained for it, then that's evidence of something. But if you look at the open AI paper-shaped object about this, they used a bunch of them as training data in order to like tune the system to be able to do the thing. So all right, fine. Supervised machine learning kind of works. Right. And the next test, there was ARCGI2 that came out with a whole bunch of new problems and instantly all the models started doing poorly on those. So let me let me just

Starting point is 00:19:29 ask this. Is there a measure that would allow the two of you to assess whether these AI models are useful or have you just written off their ability to be useful completely? So useful for what? I mean you tell me. Well that's that's sort of my point is that I think it's perfectly fine to use machine learning to do specific tasks and then you set up a measurement that has to do with the task in context. I'm a computational linguist, so things like automatic transcription are very much in my area. If I were going to evaluate an automatic transcription system, I would say, okay, who am I using it for? What kind of speed varieties? I'm going to collect some data, people speaking, have someone transcribe it for me, a person, and then evaluate how well the various models work on doing that

Starting point is 00:20:11 transcription. And if they work well enough, and it is within the tolerances of the use case for me, then great. That's good. Do you believe in the ability to be generally? So the ability to be general, and here I'm thinking of the work of Dr. Temeet Gibrew is not an engineering practice. That's an unscoped system. So what Dr. Gabrew says is the first step in engineering is your specifications. What is it that you're building? If what you're building is general, you're off on the wrong path. That's not something that you can test for, and it is not well-scoped technology.

Starting point is 00:20:47 Yeah, I mean, this notion of generality has always had some specificity in AI, too. I mean, we mentioned in the book, this idea, this is a word I struggle with. And I've taken so many time, but I'm just going to say fruit flies, right? So, right, the Josephila, the kind of fruit fly model of genomics, this idea that you have some kind of sequencing that's very common to this one very specific species, right? And there is, in the past, what that's become in AI is the game of chess. it's been gameplaying right i mean these are very specific tasks and those aren't those don't generalize to something called general intelligence as if something like that actually exists i mean

Starting point is 00:21:34 one of the problems in ai research is that the notion of intelligence is very very poorly defined and the notion of generality is very poorly defined or is scoped to what the actual benchmark or what the task is that it is being attempted to achieve. So I mean, that's, I mean, so this notion of generality is very poorly understood and it is deployed in a way that is, that makes sense sound like there is a notion of kind of general intelligence. And it seems to be the fact. I mean, and there's, you know, one of the, one of the, um, a great paper that we, we, we point

Starting point is 00:22:18 to in the footnotes of the book is this paper by Nathan Ensmerger, which Ensmerger, that is talking about how chess became the drosophilia of the AI research age and the prior AI hype cycle in the 60s and 70s. And it just happened to be you had a lot of guys that liked chess and they wanted to compete with the Soviets who had chest dominance, right? And so those tasks become kind of these tasks about like well these are the things we kind of like and and we're actually seeing some of that again it's like well we these are tasks that we think are suitable these are scoped in a way that we think are the most worthwhile problems but they're not tasked to think about well what exists in the world that is going to be helpful and useful in scope to specific

Starting point is 00:23:10 execution right this notion of an everything system is is wildly unscoked but okay so it is unscoked. But I think everybody listening or watching right now would probably say, well, just my basic use of chat GPT, it can tell me about history, it can write a poem, it can create a game, okay, I see Emily reacting already, it can search the web and give me plans, it can do all these different things in these different disciplines. So there is, I think for people listening, there would be a sense that there is an ability to go into various different disciplines and perform. And whether you say it's a magic trick or not, it's clear that it can.

Starting point is 00:23:53 And so what I'm, I guess I'm trying to get at is, I mean, is there a way to measure that or do you think that that is in itself a wrong assertion? So, yes, I think it's a wrong assertion. What chat GPT can do is it can mimic human language use across many different domains. And so it can produce the form of a poem. It can produce the form of a travel itinerary. It can produce the form of a Wikipedia page on the history of some event. It is an extremely bad idea to use it if you actually have an information need,

Starting point is 00:24:30 setting aside the environmental impacts of using chat GPT, and setting aside the terrible labor practices behind it, and the awful exploitation of data workers who have to look at the terrible outputs so that the consumer sees fewer of them. And by terrible outputs, I mean violence and racism and all kinds of sort of psychologically harmful stuff. Yes. What's that? No, we've had one of the people who've been rating this content on the show.

Starting point is 00:25:01 Folks who are interested, I'll link it in the show. Richard Mertanga was here to talk about what that experience was like. So setting aside all of that, if you have an information need, so something you generally don't know, then taking the output of the synthetic text extruding machine doesn't set you up to actually learn more on a few levels, right? Because you don't already know you can't necessarily quickly check, except maybe doing an additional search without chat GPT, at which point why not just do that search.

Starting point is 00:25:30 But also, it is poor information practices to assume that the world is set up so that if I have a question, there is a machine that can give me the answer. When I'm doing information access, instead what I'm doing is understanding the sources that that kind of information comes from, how they're situated with respect to each other, how they land in the world. And this is some work I've done with Shrog Shah on information behavior and why chatbots, even if they were extremely accurate, would actually be a bad way to do these practices. So just to, you know, back to your point, yes, this system is set up to output plausible looking text on a wide variety of topics. And that's, therein lies the danger.

Starting point is 00:26:10 because it seems like we are almost there to the Robo Doctor, the Robo Lawyer, the Robo Tudor. And in fact, not only is that not true, not only is it environmentally ruinous, etc., but that is not a good world to live in. And thinking about, thinking about... I just want to hit on this point. Yeah. This is, I agree, I disagree with you on this one. I do think that some of the points that you're making are well-founded.

Starting point is 00:26:33 We don't want these things to be lawyers right away. But let me at least point you to one use that I've had recently and you could tell me where I'm where I'm going wrong if you think I am. I mean, I'm in Paris now, a little work, little vacation at the same time. And what I've done is I've taken two documents that I've had friends who they have, they've been here often. They put together documents that they send to friends when they go here. I've uploaded that into chat cheptie.

Starting point is 00:26:59 And then I have chat cheptie, like, search the web and give me ideas of what to do. I tell it where I am. I tell it where I'm going. And it searches through, like, for instance, like all the museums, the art galleries, the festivals, the concerts, and it brings it into one place. And that's been extremely useful to me to find new cultural events, concerts. There's even a bread festival going on here that I had no idea about. And now I'm going to go because it's found it for me.

Starting point is 00:27:24 So there's a link. When it comes to this stuff, there's a link that you can go out and double check the work. But as far as finding information on the web, the fact that it's able to go and comb the internet for these events and then take into context some of the the context that I've given it with these documents I think is very impressive and that's just one use case so I'm not asking it to be a lawyer I'm kind of asking it to be what you said an itinerary planner what's wrong with that so I mean first of all you you have these lovely documents from your friends and I guess what you're saying is missing is whatever current events are so they've given you some sort

Starting point is 00:28:01 of like these are general things to look for but they haven't looked into what's going on right now. What's wrong with that? You know, on several levels, what would we do in a prior age, like even pre-internet? The local newspapers would list current events. Here's what's going on. If you landed in a city, you would go find the local, probably local indie newspaper and look up the events page. And that system was based on a series of relationships within the community between the people putting on festivals and the newspaper writers. And it helps support probably the local news information ecosystem, which was a good thing. But on top of that, if something wasn't listed, you could think about why is this not listed?

Starting point is 00:28:45 What's the relationship that's missing? Your chat GPT output is going to give you some nonsense, and you're right, this is a use case where you can verify whether this is real or not. It is also likely going to miss some things. and the things that are not surfaced for you are not surfaced because of the complex set of biases that got rolled into the system plus whatever the roll of the die was this time. And anytime someone says, well, I need chat GPT for this, usually one or two things is going on. Usually it's either there's another way of doing that that is giving you more opportunities to be in community with people to make connections. or there is some serious unmet need,

Starting point is 00:29:28 which doesn't sound like it's this case. And if we sort of pull the frame back a little bit, we can say, why is it that someone felt like the only option was a synthetic text extruding machine? And here I think you've fallen into the former of these, which is what are you missing out on by doing it this way? What are the connections you could be making to the people around you?

Starting point is 00:29:45 If you're staying in an Airbnb, maybe the Airbnb host, if you're in a hotel, the concierge, to get answers to these questions when you're looking to the machine instead. I would also say, I would also say this is, I just want to add, you know, I mean, I would also say this is a pretty like low stakes scenario, right? You can go out, you can verify these things. You can go to existing resources of, you know, event calendars that people also spend a lot of time curating online. I mean, there's a lot of stuff that's already curated online. And I mean, it's not like this, this exists in prior incidence of technology. I mean, you know, one of the people that we cite in the book, talk a lot about it is Dr. Sophia Noble's work on Google and in the kind of way that Google results, you know, present very violent content with regards to racial minorities. One of the

Starting point is 00:30:34 parts of the book that I like to reference and that a lot of people don't reference initially is this kind of part that she talks about, she talks about Yelp and she talks about Yelp and like specifically and what it's referring in terms of a black, hairdresser and the way that like Yelp effectively was like shutting this person out of business because there was specific need that she had for for black residents of the city that she was studying and braiding hair and doing other black hairstyles right and so this is this is kind of a function of all kind of information retrieval systems right you think about what they what they're including what they're excluding right so this is not very consequential here but in any kind of any

Starting point is 00:31:20 kind of area of, say, summarization or any kind of retrieval, you do need to have some kind of expertise where you can verify that and ensure that what's getting in there is not missing something huge. And it's going to basically then take this set of information access resources or systems, in this case, crawling the web and knowing that that's going to miss something. And then it's going to exacerbate that because then you cannot situate those sources in context. Okay. Let me just give my counter argument. And then we can move on from this. My counter argument would be a couple things. First of all, I don't speak

Starting point is 00:31:54 French, so the local newspaper would kind of be lost on me. I am speaking, okay, so I am staying at a residence place. We swapped apartments, so she's in my New York apartment. I'm here. So maybe she and I could have gone over that newspaper together. That's fair. But the newspaper, speaking of things that leave stuff out, the newspaper leaves stuff out all the time. It exercises editorial judgment. so it is bot editorial judgment for newspaper editorial judgment but the bot can be in some ways more comprehensive because it's searching the entire web and i'll just say one last thing about this i never felt i didn't feel the need to use it um i didn't say i need to use it to figure out what's going on like again i had these documents what's useful about it is speaking of making connections with the local

Starting point is 00:32:42 community. If I'm able to, here's the word, be efficient in my research through using it, I could spend much more time out in the community versus searching the web or reading the newspaper. So what's your thought on those arguments? Sorry, so I'm getting distracted by Alex's cat walking around. So listeners, Alex's cat is here. Alex, what's your cat's name? This is Clara and I'd lift her up, but I have a shoulder injury. But she is, She's knocking the mic around, so I'm going to not, I'm just trying to keep her off the mic. Yeah, yeah, thank you. So, you know, the efficiency argument, so this is efficiency argument in the context of leisure activities as opposed to in the context of work.

Starting point is 00:33:30 You mentioned along the way that it is searching the whole web for you. You don't know that, actually. That's right, yeah. And also the whole web includes a lot of stuff that you don't actually want. Like lots and lots and lots of the web is. just garbage SEO stuff, and maybe you're seeing more of that in your chat GPT output than you would with a search engine, which as Alex mentioned also has issues. And then finally, I'm going to take issue with you.

Starting point is 00:33:53 SEO garbage is made for the search engine. So it is. But the search engine is also in order to stay in business have to be fighting back against the SEO garbage. It's a constant battle. Probably the chatbots as well. Yeah. So you mentioned newspaper editorial judgment versus bot editorial judgment. And I'm going to take issue there because a bot is not the kind of thing that can have judgment, nor is it the kind of thing that can have accountability for exercising judgment. And so I think that, yes, this, Alex is saying this is low stakes, but if you're using it as sort of a motivation for these things being useful in the world, then you have to deal with the fact that the useful in the world is going to entail many more higher stakes things, and then we

Starting point is 00:34:33 really have to worry about accountability. I would also want to say to, I mean, there's a lot of, I think, this argument from, like, quote, unquote, capability. these, which I don't know really what that term means, and that's another poorly defined term, I think, especially when it comes to AGI. But I mean, this argument from kind of like, well, I find it useful. I don't find terribly convincing, right? I mean, it's sort of like, well, okay, you have found it useful in either a situation in which there is a way to have some kind of verification of sort of.

Starting point is 00:35:12 that you know about and have some kind of ground truth about, or you found it useful kind of from a variety of these different situations. But if I'm asking a chat bot about something about an area, I know quite a lot about, say, sociology or social movement's literature, I then have that knowledge to verify that just from my social skill in that area. And this is a term I'm kind of borrowing from a sociologist, Neil Flingsstein, and my knowledge of how to navigate those areas in my professionalization as a sociologist. Okay, but then I'm, but then once it gets into those areas on which a verifiable just escapes me, which is most areas, because we're not professionals in most areas. And although a lot of us want to be jacks of all trade,

Starting point is 00:36:04 jacks and jills of all trades, then we lose that ability and we, we don't have the, we don't have the social scale or depth of knowledge to to verify that in the same way. And so I'm really not convinced by those, well, these are useful for me in these pretty low state contexts because that slippage then means that we're going to miss some pretty big things and some really dire contexts. Okay. Well, let's turn it up a notch when we come back because we're going to talk about AI at work and AI in the medical context. And maybe we can even touch a little bit on Dumerism, which you write about in the book.

Starting point is 00:36:40 And there's plenty else on the agenda. So we'll be back right after this. And we're back here on Big Technology Podcast with Professor Emily M. Bender and Alex Hanna. They are the authors of the AICon, How to Fight Big Tech's Hype and Create the Future We Want. Here it is. So let's go to usefulness. And we'll start with generative AI in the medical context. Because why don't we just go straight for the example that.

Starting point is 00:37:10 we'll probably have the biggest disagreement on here. And I'm not saying that I think generative AI should play the role of a doctor. In fact, when I wrote my list of things I agree with you both on, I don't think that AI should be a therapist, at least not yet. And we know now that AI is the number one use according to a recent study is companionship and therapy. And the therapy side really scares me. And I think the companionship isn't the best thing in the world either.

Starting point is 00:37:38 But in medicine, I do find that there is some use for it. Medicine is a field overrun by paperwork and insurance requirements that I think have ruined the health care system because they keep doctors effectively tied to their computers writing notes as opposed to seeing patients or living their lives. And Alex, before the break, you mentioned that one of the areas that this stuff is useful is when it starts to operate in your area of expertise because you're able to verify that. So, I mean, we're going to go with one use that I find to be pretty good here and would sort of, to me, doesn't make a generative AI feel like a con. Is when a doctor is seeing a patient and they can put a transcription,

Starting point is 00:38:30 take a transcription of the conversation that they have with the patient, and then have AI synthesize what they talked about and summarize it and put it into the systems that they have for electric medical records and then verify that so they don't have to spend the time writing those summaries up and can actually go and spend some more time with patients. So what's the problem with that? There are so many problems with that. And the first thing I want to say is that you named the underlying problem when you talked about insurance requiring so much paperwork. So this is one of those situations where there's a

Starting point is 00:39:07 real problem here. It's not that doctors shouldn't be writing clinical notes. That is actually part of the care. But there is a lot of additional paperwork that is required because of the way insurance systems and especially the one of the United States are set up. And so we could work on solving that problem. And this is a case where sort of the turn towards large language models, so-called generative AI as an approach to this is showing us the existence of an issue. But that doesn't mean it is a good solution. So many problems. One is writing the clinical note is actually part of the process of care. It is the doctor reflecting on what came out of that conversation with the patient and thinking it through, writing it down, plans for next treatment. That is not something that I want doctors to get out of the habit of doing as part of the care. Now, they might feel like they don't have time for it. That's also a systemic issue. Secondly, these things are set up as like ambient listeners, which is a huge privacy issue. As soon as you've collected that data, it becomes sort of this like radioactive pile of danger. Thirdly, you've got the fact that automatic transcription systems, which are the

Starting point is 00:40:12 first step in this, do not work equally well for different language varieties. So think about somebody who's speaking a second language. Think about somebody who's got dysarthria. So an older person whose speech isn't very clear. Think about a doctor who is an immigrant to the community that they're working in who's got extra work to do now because their words are not well transcribed. And So the clinical notes thing doesn't work well for them, but the system is set up where there's these expectations that they can see more patients because the AI, in quotes, is taking care of all of this for them. And there's a beautiful essay that came out recently, I think, in stat news. And I was looking for the name of the author, didn't find it real quick, really reflecting on how important it

Starting point is 00:40:52 is to her that the doctor do that part of the care of actually pulling out from the conversation, this is what matters. And it's not just simple summarization. It is a actually part of the medical work to go from the back and forth had with the patient, all of the doctor's expertise, to what goes into that note. Yeah. So I want to add on, Emily has said so much of what I want to get at, which I think is, but I have, I think, three or four separate points in addition to that. So first off is the technical point. So there's, so tools that are, that are purported to be summarization. There's some great reporting by Garensberg and Hilda Sheldon and the AP from last October that was looking at

Starting point is 00:41:37 Whisper specifically. So that's OpenAI's ASR system, automatic speech recognition system, that said that medical transcription had basically was making up a lot of shit. And then we knew that this, they had quote unquote hallucinations. Again, this is not a term that we use in the book. We say that it's, I say it's making shit up, but that is maybe even granting two. much anthropomorphizing of the system for me. And so, but there is a lot of these things. Some, quoting from that text, some of that invented text includes racial commentary, violent rhetoric, and even imagined medical treatments.

Starting point is 00:42:15 So that's one major problem. The second problem is that medical transcription has been this area, which has been an area in which medicine has been forcing kind of this casualization of work for years, right? So medical note taking that exists in hospitals now, much of that is done remotely. So it's gone and taken this work that has been seen as kind of like this busy work or this thing that like, I don't want to write up my medical notes to be this type of work that needs to be foisted on someone else. So prior to this kind of ASR element of it is we've had these, oh, thanks for linking to

Starting point is 00:42:58 And I'll link the AP article that I'm done looking at too. Part of that work has actually been offshoreed a lot into this kind of movement of outsourcing. So a lot of that is done remotely as a part of this casualization. And the scene to be, I think, to be a lot of, I want to point out the gender notion of this. This is like a very kind of like women's based work. And that reflects a lot of the ways in which. so much of

Starting point is 00:43:29 quote, A.I. Technology wants to basically take the work that has been traditionally the domain of women and is saying, well, we can automate that or we can casualize that in different ways. And that's important because it sees this work as not actually

Starting point is 00:43:45 part of quote unquote the work. It is seen as work that should be, that ought to be casualized and offshore. And so, and I appreciate the essay that Emily because that essay is saying like, no, this is actually part of the element of doctoring. And then I want to also just kind of couch all of this stuff in the kind of political economy

Starting point is 00:44:08 of the medical industry, thinking about what does it mean to rush and put and have more and more remote medicine and having more and more doctors see more patients. And these efficiency gains from doctors isn't going to like make their jobs necessarily easier. It's going to put more of a pressure on them. Now that you're in a position where you don't have to take. medical notes you're going to be running from position to from appointment to appointment to appointment and my sister's a nurse she's a nurse practitioner and she's basically seeing this in

Starting point is 00:44:38 her job right now at her clinic she's well like now we have these things where I have to see patients now you know and if you know it's it's not going to I'm going to go and be on the beach anywhere it means that I'm going to have you know I'm going to have 9 to 10 15 minute appointments a day I'm not going to have enough proper treatment proper time to spend with patients. So if these things could be, you know, like I would say to the CODA to all of this is that if AI boosters could really offshore all of doctoring to chatbots, they would. And this is one case in which Bill Gates has said, you know, in 10 years, we're not going to have teachers and doctors. What a nightmare scenario to have non-teachers and not doctors. And Greg Carado really

Starting point is 00:45:21 gives it away. And we cite a book where he says, and Med Palm 2, you know, this thing is really efficient we're going to increase tenfold our medical ability but i wouldn't want this to be part of my family's medical uh medical journey okay but you're again here you're you're picking out what what is like some of the most extreme statements and i i started my question saying it's bill gates and bill gates can make extreme extreme statements i don't think that guy i don't think he's the guy and i think that that that doesn't reflect the broad consensus here and definitely not the question that I asked, which again was about using this to take some of the time that the doctors are using, you know, in paperwork and give that back to either the doctors or to have them be able to

Starting point is 00:46:08 see more patients. So we very much addressed that point. First of all I want to name the author of that essay. Her name is Alia Barakat, and it's a beautiful essay. She's a mathematician and also a patient with a chronic condition. Wonderful essay. But, yeah, you said give that time back to the doctors or have them see more patients, right? it is not going to be going back to the doctors.

Starting point is 00:46:25 That's not how our health care system works. And it's also going to, therefore, decrease quality of patient care. It is lose-lose, except for the hospitals maybe getting more money and certainly the tech companies that are selling this to the hospitals. Okay. I'm also curious in terms of thinking about it. I mean, yes, what is that – I'm curious in thinking about the more nuanced position and, like, who are the reference here that you're thinking of, Alex?

Starting point is 00:46:51 what's the consensus on this because I'm yeah like we see the egregious you know elements of this and I'm wondering what the medical consensus is you know like who's what's an example you know just to put poison now I'm interviewing you but like this is someone that do you think is doing this very well well well I mean someone doing this well like again I don't think that this stuff is well developed yet but I've definitely seen enough doctors just buried in paperwork and we we started this whole segment talking about how this is, I guess it's an insurance driven thing. And so it's interesting that I guess do you both not like the way that the insurance companies are guiding the system, but also think that it's good practice to have doctors write those notes for them?

Starting point is 00:47:38 Hold on. There's two use cases for doctor's notes, right? There is actually documenting for the patient and for the rest of the care team what has happened in this session. And that I think is a super important part of the work of doctoring. I believe you that there's a lot of additional paperwork that has to do with getting the insurance companies to pay back. And no, I don't like that system at all. The insurance companies are not providing any value. They are just vampires on our health care system in the U.S. Okay. I think we can we can agree on that front. And anyway, but I do I do think that as this stuff gets better, I understand a patient wants this to happen. do I think a doctor would be giving them worse care if they allowed the AI to summarize the notes

Starting point is 00:48:22 or to pick out the more important parts if this stuff was working well not necessarily so that's a that's a big if you know what does it mean when this stuff is getting better in this stuff working well do you mean kind of like the absence of making shit up right definitely I mean but we all we both we all agree that the doctor will have to verify and check this information after well I guess the problem was there then like then why are we having the doctor double check that to begin with right in an area where the doctor has 15 minutes to see every patient and there is an a i quote unquote scribe doing or quote unquote i i don't want to call it a i scribe there's an automatic speech recognition tool right doing automatic speech recognition on these things in what in what space or with what

Starting point is 00:49:10 time does the doctor have to verify those in an area i mean this is I mean, like, well, the time that they would be spending, writing those notes in the first place. Is verification an easier task than transcription? I guess that's my question. I would proffer no. I mean, just from my experience using these systems. And I mean, I'm not a doctor. Thank God. Although I've thought about it. Not that kind of doctor to the chagrin of my parents. But then I guess the question is of that, In my experience, I've used these tools for interviews, specifically in kind of qualitative interviews with data workers and have spent time with these tools and have just had such an awful time trying to think about this, especially with regards to, you know, this isn't, you know, this isn't medical terminology, but it's terminology around doing data work. We're talking about training AI systems. And it just, it is such a terrible job.

Starting point is 00:50:12 And at one point, I threw it all out. And I said, okay, I just am sending this to somebody to actually transcribe because this is not helpful for me. And it's taking me more time, starting with the transcript and then from doing it from scratch. And I've transcribed, you know, I'm not a primarily qualitative interviewer, but I've spent time, you know, transcribing dozens of interviews in my research career. And I've found it just very difficult. So, I mean, I guess the question is, is that verification, you know, is that taking? the time that could be just be used for the doctoring and working with patients and i mean like

Starting point is 00:50:51 you know holding you know everything about the insurance industry you know stable you know like is that kind of notion of thinking about you know different patient um how the patient presents how the patient is describing their their they're they're you know how they're presenting is that you know that is often the work of doing it you know and the medical training i do have is that i am at one i was at one point a licensed EMT uh and writing up PCRs is not like you know no one wants to write up the PCRs at the same time you're spending time taking note of how a patient is presenting the patient is you know Alex rhythmic just bringing back to the Alex's um that you know the patient is cyanotic around their lips like these are things that you know a health care professional would be

Starting point is 00:51:44 paying attention to is making notes maybe because they're like writing it later later so i'm thinking about this process of writing and what it does to our own practice of viewing and aiding and administering medical care okay i mean we'll agree to disagree on this front but again i i think we are all on the same page that insurance companies requiring additional writing just because they hope you don't ever get to the claim. If you don't file it, that's probably bad. And we don't think that there should be AI doctors, at least yet. That's what I say.

Starting point is 00:52:19 I think you guys probably say never. So, all right, I want to end on this, which is, or maybe we can do two more topics. I guess, like, here's my question for you. A lot of the discussion of AI's usefulness in jobs in the book discusses these tools being imposed top down. But what if they come bottom up? Like, what if a worker can find use for them and actually make their job easier by getting good at using something like a chat GPT or a Claude? Or if, you know, again, we like kind of talk through the medical use case. If a doctor does find that this is useful for them, are you opposed to that? So yes, and I think that actually

Starting point is 00:53:01 Cadbury, of all people, put it best. There's this hilarious commercial that was for the Indian market, sort of showing how the supposed efficiencies that you're getting out of this just ramps up the speed of things and doesn't leave you time to really get into the work that you're doing and be there. I think that the most credible use cases I've heard for these things are, first of all, as coding assistance. So that's sort of a machine translation problem between natural language and some programming language. And there I really worry about technical debt where you have, you know, output code that was not written by a person that's not well documented, that becomes someone else's problems debug down the line, but also in writing

Starting point is 00:53:46 emails. People hate writing emails and people hate reading emails. So you get these scenarios where somebody writes bullet points, uses chat TPT to turn it into an email, and the person on the other end might use chat GP to summarize it. And it's like, okay, so what are we doing here? Like, and again, taking a step back and saying, what are the systems that are requiring? all of this writing that everyone finds a nuisance to write and to read, can we rethink those systems? And also, I just have to say that whenever I'm on the receiving end of synthetic text, I am hugely offended. And one of the things that we put in the book is, if you couldn't be bothered. I definitely got one of those emails yesterday. And I was like, you use chat GPT for this. I know

Starting point is 00:54:21 you did. Yeah. If you couldn't be bothered to write it, why should I bother to read it? Right. Yeah. And I mean, I mean, it's very interesting in putting this and like thinking about cases in which workers are using this kind of organically. And I kind of like, in a case where it's, you know, like, this is the case where, like, first off, I've, I've heard a very little of that personally, especially for professional. I mean, I think there's plenty of workers that are finding a lot of uses, but. I would say the analog that I find to be where it's not top down is in education. And to that degree, I think that's kind of a failure in thinking about like what education is, right? I mean, in that case, it's... Well, for students to be using this to get through their

Starting point is 00:55:03 classes. Yeah, right. Exactly. Are you talking about teachers putting stuff together? Well, but I'm thinking about students at all. I'm thinking about the students, right? And I'm just thinking about areas in which, but I'm using that as sort of an analog and then thinking about what are the conditions that are forcing students to use this, right? If there's kind of cases in which this seems to be sort of useful, okay, what are the cases in which what does that say about the job? What does it say about like how the work is oriented, right? In that case, then maybe there might be, needs to be kind of different efficiencies or thinking about how the job is operating, right? I then worry then that these things become mandated in work environments.

Starting point is 00:55:41 And you're saying, well, people are using this. And so everybody's using this. And then where does that leave the people who are resistors are thinking about, well, I know this can't do a good job. So where's that putting me? And I think we've already seen such a justification for this as being a place where, Employers have been reducing positions by the scores because there's a notion that these tools can do these jobs suitably and to a certain kind of degree of proficiency, which is just not the case. That has me worried about down the line of these areas that Emily has mentioned, the kind of technical debt area, the kind of how do we know? And there's kind of an overestimation of capabilities of these tools in that case.

Starting point is 00:56:28 Okay, I know we're at time or close to time. Can I ask you one question about Doomers before we get out of here? Sure. Let's end by Dogging on Doomers. Okay. So I definitely saw that there was a chapter about Doomers here, and I was excited to read it because my position has been largely that those who are worried that large language models are going to turn us into paper clips or either marketing what they're selling or just

Starting point is 00:56:53 very into, I don't know, they like the smell of their own body odor. It does. I mean, I guess it's not a terrible thing to be worried about, but there's so much more, and it seems so unlikely that this is going to hurt us. So I definitely wanted to get your take on why you are, why you're down on dumerism. And let me just give my one caveat here. There's a line in your book that says that AI safety is just dumerism, and it's only about these long-term problems. But I've definitely heard. some of the AI safety folks like Dan Hendricks from the center of AI safety, talking about like really important near-term issues, like whether this technology could help virologists with bad intent. So I wouldn't malign the entire AI safety field. But the dumerous stuff, I hear your point. All right. So attack that and then we'll get out of here. So I just want to put in a shout out for a new book by Adam Becker called More Everything Forever, which really, goes deep into the connections between the sort of dumerous thought and these more palatable

Starting point is 00:58:05 looking sides of what's called effective altruism. And also in that context, there's a wonderful paper by Timit Gibru and Emil Torres on what they call the Tescrial bundle of ideologies. And I think that if you're, if your concern about the systems is not rooted in real people and real communities and things that are actually happening, like even this like, oh, but bad actors could use it to, you know, more quickly design, you know, viruses and stuff like that. That's still speculative, right? So anytime we are taking the focus away, it's like, has that happened? Right? This is, this is still people writing science fiction, fan fiction for themselves. And, you know, it's not, it's based on these jumped up ideas about what the technology can do

Starting point is 00:58:52 and taking the focus away from the actual harms that are happening now, including the environmental stuff we started with. Right. But, yeah. And I mean, I will, I will say. You want to get, you want to get ahead of that, right? Like we had it with social media. There were some issues with social media, but some of these, there was not a focus on some of like the potential long-term issues. And that only came up later on, at least in the beginning.

Starting point is 00:59:17 You don't agree. Wait. There are problems with social media, for sure. Yeah. And some of those problems were documented and explained early on and people were not paying attention. But they were real problems that were being documented as they were happening, as opposed to imaginaries about, well, someone's going to use this in Dr. Evil up a bad virus. Okay. Yeah. Go ahead, Alex. For the sake of time, I think that's fine. I don't have much to add. All right. Well, look,

Starting point is 00:59:47 the book is called The AI Khan, How to Fight Big Tech's Hype and Create the Future We Want. The authors are Emily M. Bender and Alex Hannah. Emily and Alex, I've been reading your work for a long time. And it's great to have a chance to speak with you. Like I said at the top, you know, for those who are listening or watching, and you may not agree with everything, either everything I said or everything our guests said. Hey, at least you now you know these arguments and you know the arguments for and against. And we trust you to make up your own opinion and do further research. And we've definitely had plenty of good stuff to keep digging into shout it out over the course of this conversation. So Emily and Alex, great to see you. Thank you so

Starting point is 01:00:28 for joining the show. Thank you for this conversation and enjoy Paris. Thanks, Alan. Have a great time. Thank you both. Thank you, everybody, for listening. We'll see you on Friday for our news recap. Until then, we'll see you next time on Big Technology Podcast.

Big Technology Podcast - AI’s Drawbacks: Environmental Damage, Bad Benchmarks, Outsourcing Thinking — With Emily M. Bender and Alex Hanna

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.