Big Technology Podcast - AI’s Drawbacks: Environmental Damage, Bad Benchmarks, Outsourcing Thinking — With Emily M. Bender and Alex Hanna
Episode Date: May 14, 2025Emily Bender is a computational linguistics professor at the University of Washington. Alex Hanna is the Director of Research at the Distributed AI Research Institute. Bender and Hanna join Big T...echnology to discuss what their new book, “The AI‑Con," which they describe as the layered ways today’s language‑model boom obscures environmental costs, labor harms, and shaky science. Tune in to hear a lively back‑and‑forth on whether chatbots are useful tools or polished parlor tricks. We also cover benchmark gaming, data‑center water use, doomerism, and more. Hit play for a candid debate that will leave you smarter about where generative AI really stands — and what comes next. --- Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice. Want a discount for Big Technology on Substack? Here’s 25% off for the first year, which includes membership to our subscriber Discord: https://www.bigtechnology.com/subscribe?coupon=0843016b Questions? Feedback? Write to: bigtechnologypodcast@gmail.com
Transcript
Discussion (0)
Two of AI's most vociferous critics, join us for a discussion of the technology's weaknesses and liabilities and a debate on the finer points of their arguments.
We'll talk about it all after this.
Welcome to Big Technology Podcast, a show for Cool-Headed, NUance Conversation of the Tech World and Beyond.
We're joined today by the authors of the AI con.
Professor Emily M. Bender is here. She's a professor of linguistics at the University of Washington. Emily, welcome.
I'm glad to be here. Thank you for having us on your show.
My pleasure. And we're also joined by Alex Hanna, the Director of Research at the Distributed AI Research Institute. Alex, welcome.
Thanks for having us, Alex.
Always good to have another Alex on the show.
So, look, we try to get the full story on AI here.
And so today we're going to bring in, I think, two of the most vocal critics on the technology.
They're going to state their case.
And you at home can decide whether you agree or not, but it's great to have you both here.
So let's start with the premise of the book.
What is the AI con?
Emily, do you want to begin?
Sure.
So the AI con is actually a nesting doll situation of cons.
Right down at the bottom, you've got the fact that especially large language models are a technology that's a parlor trick.
It plays on our ability to make sense of language and makes it very easy to believe there's a thinking entity inside of there.
This parlor trick is enhanced by various UI decisions.
There's absolutely no reason that a chatbot should be using I, me pronouns because there's no I inside of it.
but they're set up to do that. So you've got that sort of base level con. But then on top of that,
you've got lots of people selling technology built on chatbots to, you know, be a legal assistant,
to be a diagnostic system in a medical situation, to be a personalized tutor and to displace
workers, but also put a band-aid over large holes in our social safety net and social services.
So it's cons from the bottom to the top. Okay. I definitely have things that I disagree
were few in places on, and we will definitely get into that in the second half, especially about
the usefulness of these bots and whether they should be using IRME pronouns and the whole
consciousness debate. We're going to get into that. I don't think any of us think we think that
these things are conscious. I just think we have a disagreement on how much the industry has played
that up. But let's start with what we agree on. And I think that from the very beginning, Emily,
you were the lead author on this very famous paper about calling the large language models stochastic
parrots and at the very beginning of that paper there is concern about the environmental safety and
the environmental issues that large language models might bring about so on this show we talk all the time
about the size of the data centers size of the models and of course there is an associated
energy cost that must be paid to use these things.
And so I'm curious if you, Emily, or Alex, you worked at Google, right?
So you probably have a good sense of this.
Can you both share, like quantify how much energy is being used to run these models?
So part of the problem is that even, you know, even if you're working at Google,
you are directly working on this, they're not very public estimates of how much cost there is.
I mean, the cost vary quite widely.
And the only cost I think that we know was an estimate being made by folks at Hugging Face
that worked on the Bloom's model because they were able to actually have some kind of insight
into the energy consumption of these models.
So part of the problem is the transparency of companies on this.
You know, as a response at Google, after the stochastic paper was published,
one of the complaints from people like Jeff Dean, the SVP of research at Google,
and David Patterson, who's the lead author of Google's kind of rebuttal to that,
was that, you know, well, you didn't factor in X, Y, Z,
you didn't factor in renewables that only we talk about at this one data center in Iowa.
We didn't, you didn't factor into off-peak training.
And so it's part of the problem.
I mean, we could try to put numbers on it, but there's so much,
guardedness about what's actually happening here. We can't quantify it. We don't know when it
comes to model training. I mean, we might have something like we know the number of parameters
that are in a new model or in an open weights model like Lama, but we don't know how many kind of
fits the starts they were with stopping training and restarting or experimenting. So,
you know, we could speculate, but we know it's a lot because there are real effects in the world
right now. What are those effects? What are those effects? So you see communities losing access to
water sources. You see communities, you see electrical roads becoming less stable. And this is starting
to be, I think, very well documented. There's a lot of journalists who are on the beat doing a lot of
good work. And I also want to shout out the work of Dr. Sasha Luchione, who's been looking at this
from an academic perspective. And one of the points that she brings in is that it's not just the
training of the models, but of course also the use. And especially,
if you're looking at the use of chatbots in search, instead of getting back a set of links,
which may well have been cached, if you're getting back an AI overview, which happens non-consensually
when you try Google searches these days, right? Each of those tokens has to be calculated
individually. And so it's coming out one word at a time, and that is far more expensive. I think
her number is somewhere between 30 and 60 times more expensive, just in terms of the compute,
which then scales up for electricity, carbon, and water, then an old-fashioned search.
I would also say that speaking about existing effects, there's also a lot of reporting coming
out of Memphis right now, especially around the methane generators, that XAI has been
using to power a particular supercomputer there called Colossus there, specifically around
emissions there, affecting Southwest Memphis, traditionally a black and impoverished community.
There's also reporting on, well, actually in research from UC Irvine in which looking at
backup generators and emissions from diesel that are supported that are connected to the
grid, but just because the SLAs on data centers are, you know, incredibly high,
you effectively need some kind of a backup
to kick in at some time
and that's going to contribute to air pollution.
And which communities have been affected
by the loss of water due to AI data?
So I think the best reported one is the Dulles in Oregon.
I mean, I think that's the one that is the best known.
That is kind of pre-AI in which we're focusing on the development of Google's
hyper-scaling.
And it wasn't until the Oregonians sued the city.
that we knew that half of the water consumption in the city was going to Google's data center.
That was before generative AI.
That was before generative AI.
I mean, we have to imagine the problem is probably exacerbated right now.
But do we know that?
I mean, you both wrote the book on this.
So we have, we certainly point to environmental impacts as a really important factor.
It is not the main focus of the book.
I would refer people to reporting of people like Paris Marks over at Tech Won't Save Us,
did a wonderful series called Data Vampires, looking at, I think there was stories in Spain and in Chile.
And yeah, so this is, you know, we are looking at the overall con and the environmental impacts come in
because it is something we should always be thinking about.
And also because it is very hidden, right?
When you access these technologies, you're probably sitting, you know, looking at them through your mobile
device or through your computer and the compute and its environmental footprint and the noise
and everything else is hidden from you in the immateriality of the cloud.
I would also say that, I mean, the reporting on Memphis, I want to give a shout out to the
reporting in PRISM by Ray Libby, Uy, Uyeda, I don't know if I'm not saying their surname correctly,
but they have an extensive amount about the kind of water consumption of this saying that this would
take about a, I think, a million gallons. I'm checking it, but I'm looking, I'm looking
at the, the reporting on it. I think, um, uh, I'm seeing the exact number on this. I'm going
to look at it. Yeah, so they're focusing, yeah, a million gallons of water a day to cool
computers. They don't, they're saying that they need to build a gray water facility to do
it. I mean, this is not anything.
that these facilities don't exist yet,
so they'd have to be built,
but I mean, this thing is already being constructed
and is using water.
So I mean, I don't think it's a far cry
to say that this is happening in an era
that was in the hyperscaling era
in pre the Gentry of AI era.
I mean, it's the unfortunate fact about it
is that a lot of these community groups
are fighting this on a very local level
and a lot of these things are gonna
underreported on just because, but from what we know from the fights in the Dulles and
London County and parts of rural Texas, I mean, we'd be surprised that those similar kinds
of battles weren't being fought.
I agree with the underreporting, and that's why we're leading with it here, and we're
going to go through a list of some of the things that might be wrong with generative AI.
I think it is an issue.
I think, Emily, you basically hit on it, right, where you're producing all these tokens
when you're going to generate an AI overview that I checked.
and it is you cannot opt out of it you're correct uh well you can if you add minus a i to the query
okay but you have to do that each time you can't like put a setting somewhere that's interesting
i didn't know about that okay so you can opt out minus a i but these things do take more computing
than traditional google search i guess the argument from these companies would be that they're just
going to make their models more efficient i mean we see the increasing amounts of efficiency
over time. And, you know, there might be a big upfront energy cost to train, but inference might end up
being not that energy intensive. What would you say to that? I would say that we've got Brad Smith
at Microsoft giving up on the plans to become net zero carbon since the beginning of Microsoft. And he said
this ridiculous thing about we had a moon shot to get there. And it turns out with generative AI,
the moon is five times further away, which is just an absurd abuse of that metaphor. But yeah, and you see
just, you know, Google similarly also backing off of their environment goals. And so if there really
was all these efficiencies to be had, I think they wouldn't be doing that backing off. And I want to
also add, I mean, I think this argument about the large amount of training in carbon use on the
front end and then it tapering off with the inference. I mean, this is an argument that straight
came from Google. This was, again, in the same paper by David Patterson. I think the title of the
paper, I'm not going to get exactly right, was, you know, the cost of training or the cost of
generative AI will, probably not generative AI. I think it was the cost of language models will
plateau and then decrease or the training costs. And effectively, the argument being that
you have this large investment that we can offset with renewables and then it's going to decrease.
But you have to also consider that given that the economics surrounding it, it's not one
company training these, right? I mean, if it's multiple different companies training these
and in multiple different companies providing inference. And so as long as there's some kind of
incentive to keep on putting this in products, then they're going to proliferate. So if it was
just Google, sure, maybe that might be a case in which there was some kind of planning and there
was some kind of way to measure and focus on that and then it actually tapering down. But you have
Google Anthropic, XAI, of course, OpenAI, Microsoft, Amazon, everyone trying to get a piece
doing both training and doing inference. So I think that's, again, you know, like it's hard to put
numbers on it, but what we see in this is just a massive investment in this, and that gives a good
signal to say that the carbon costs have to be incredibly high. Look, I think it's important for us
to, again, to lead here. It's clear that there's some real environmental impacts. And, I mean,
we have Jensen Wong, the CEO of NVIDIA, saying inference is going to take 100 times more compute
than traditional LLM inference. And every, well, every top executive from these firms that I've
asked, well, is inference going to take more compute? It's not exactly as much as Jensen is saying,
but there is a spectrum. So these things are going to be more energy.
energy intensive. And for everybody listening out there, I do think, you know, this is an important
context to take in that when we talk about AI, there's an environmental cost out there. It's not
fully clear what that is, although there is one. And I agree with the authors here that more
transparency makes a lot of sense. Now, let's talk about another issue that you bring up in the
book, which is benchmark gaming. It's been a hot topic in our big technology discord over the
past couple weeks that we see these research labs keep telling us that they have reached a new
benchmark or beat a certain level on a new test. And we're all trying to figure out what that
means because it does seem like a lot of them are training to the test. And you have some point
of criticism in the book about the gaming of benchmarks and what that's meant to tell us. So just lay
it out for us what's going on with benchmarks and tell us about the gaming, Emily. So, yes.
So when you say the gaming of benchmarks, that makes it sound like the benchmarks are reasonable and they're being misused.
But I think actually most of the benchmarks that are out there are not reasonable.
They lack what's called construct validity.
And construct validity is this two-part test of the thing that we are trying to measure is a real thing.
And this measurement correlates with it interestingly.
But nobody actually establishes what these things are meant to measure as a real thing, let alone that second part.
And so they are useful sales figures, right, to say, hey, we now have state of the art soda on whatever.
But it is not interestingly related to what it's named as measuring, let alone what the systems are actually meant to be for.
Yeah, and I would just add that, I mean, there's a lot of work.
I mean, we, prior to the book, Emily and I spent a lot of time writing on benchmark data sets.
And so this has been, you know, like I'm personally upset.
with the ImageNet data set. I'm thinking of another book on the MSDET data set, just from what
entails. But, I mean, you know, the benchmarks, what they purport, there's a lot of different
problems in the benchmarks, right? So the construct validity is probably first and foremost. And when we get
something where you have something like Med Palm 2 or Med Palm 1 and 2 being measured on the U.S.
medical licensing exam, that's not really a test that determines whether one is sufficient, you know,
prepared to be a medical practitioner.
There's so much more involved with being a medical practitioner
above and beyond taking the U.S. medical license exam.
You can't take the bar and say you're ready to be a lawyer, right?
I mean, there's so much more that has to do with relationships and training
and other types of professionalization.
There's huge literature in sociology and sociology of occupations
on what professionalization looks like and what it entails.
and what kind of social skills involved
and what that means
and how to be adept at being in the discipline.
But then the kind of different benchmarks are,
there's so many different problems
just in terms of the way that companies are doing signs themselves.
They're releasing these benchmarks,
and often these are benchmarks that they themselves
have created and released.
So it may be the fact that they are, quote, unquote,
teaching to the exam, but they're also, they have no kind of external validity in terms of what
they're trying to do. So Open AI is saying, we had a model that did so well, we had to create a
new benchmark for it. Well, who's validating that, right? I mean, even the old benchmarking culture,
you had external benchmarks and multiple people would be going to it and saying, oh, we've done better
in this benchmark. Now Open AI is saying we have our own benchmarks because we did.
it's so well, not like the old system was any better, but this new system is that,
well, where's the independent validation of this that it says it can do this thing that
is purported to say?
What do you think about the ARCGI test?
Yeah, well, I mean, we spent some time focusing on the ARCGI test, right?
The ARCGI test, it is independent, at least that.
It is, it is, it is, it is, it is, the French, French Licholet, yeah.
And by the way, for everybody who's listening, it basically asks, let me see if I get this right,
it asks the models to be able to generalize its ability to understand patterns and putting shapes
together. I think that's the best way to explain. Yeah. So it's a, it's a bunch of visual puzzles
where it's, I think they're all in 2D grids. And in order to make this something that a large
language model can handle those 2D colorful things are turned into just sequences of letters. And
And the idea is that you have, I think it's sort of a few shot learning setup where you have a few exemplars and then an input and the thing is, can you find an output like that?
And it is, when we want to talk about how the names of the benchmarks are already misleading, the fact that that's called ARC AGI, right?
That suggests that it's testing for AGI.
It's not.
It's one specific thing.
And I think Shole's point is that this is something that is a very different kind of task than what people are usually.
using language models for. And so the sort of gesture is towards generalization. That if you can do
this, even though you weren't trained for it, then that's evidence of something. But if you look at the
open AI paper-shaped object about this, they used a bunch of them as training data in order to
like tune the system to be able to do the thing. So all right, fine. Supervised machine learning
kind of works. Right. And the next test, there was ARCGI2 that came out with a whole bunch
of new problems and instantly all the models started doing poorly on those. So let me let me just
ask this. Is there a measure that would allow the two of you to assess whether these AI models
are useful or have you just written off their ability to be useful completely? So useful for what?
I mean you tell me. Well that's that's sort of my point is that I think it's perfectly fine to use
machine learning to do specific tasks and then you set up a measurement that has to do with the task in
context. I'm a computational linguist, so things like automatic transcription are very much in my
area. If I were going to evaluate an automatic transcription system, I would say, okay, who am I using
it for? What kind of speed varieties? I'm going to collect some data, people speaking, have someone
transcribe it for me, a person, and then evaluate how well the various models work on doing that
transcription. And if they work well enough, and it is within the tolerances of the use case for me,
then great. That's good. Do you believe in the ability to be generally?
So the ability to be general, and here I'm thinking of the work of Dr. Temeet Gibrew is not an engineering practice.
That's an unscoped system.
So what Dr. Gabrew says is the first step in engineering is your specifications.
What is it that you're building?
If what you're building is general, you're off on the wrong path.
That's not something that you can test for, and it is not well-scoped technology.
Yeah, I mean, this notion of generality has always had some specificity in AI, too.
I mean, we mentioned in the book, this idea, this is a word I struggle with.
And I've taken so many time, but I'm just going to say fruit flies, right?
So, right, the Josephila, the kind of fruit fly model of genomics, this idea that you have some kind of sequencing
that's very common to this one very specific species, right?
And there is, in the past, what that's become in AI is the game of chess.
it's been gameplaying right i mean these are very specific tasks and those aren't those don't
generalize to something called general intelligence as if something like that actually exists i mean
one of the problems in ai research is that the notion of intelligence is very very poorly defined
and the notion of generality is very poorly defined or is scoped to what the actual benchmark or what
the task is that it is being attempted to achieve.
So I mean, that's, I mean, so this notion of generality is very poorly understood and
it is deployed in a way that is, that makes sense sound like there is a notion of kind
of general intelligence.
And it seems to be the fact.
I mean, and there's, you know, one of the, one of the, um, a great paper that we, we, we point
to in the footnotes of the book is this paper by Nathan Ensmerger, which Ensmerger, that is talking about
how chess became the drosophilia of the AI research age and the prior AI hype cycle in the
60s and 70s. And it just happened to be you had a lot of guys that liked chess and they
wanted to compete with the Soviets who had chest dominance, right? And so those
tasks become kind of these tasks about like well these are the things we kind of like and and we're
actually seeing some of that again it's like well we these are tasks that we think are suitable these are
scoped in a way that we think are the most worthwhile problems but they're not tasked to think
about well what exists in the world that is going to be helpful and useful in scope to specific
execution right this notion of an everything system is is wildly unscoked but okay so it is
unscoked. But I think everybody listening or watching right now would probably say, well,
just my basic use of chat GPT, it can tell me about history, it can write a poem, it can
create a game, okay, I see Emily reacting already, it can search the web and give me plans,
it can do all these different things in these different disciplines. So there is, I think for people
listening, there would be a sense that there is an ability to go into various
different disciplines and perform.
And whether you say it's a magic trick or not, it's clear that it can.
And so what I'm, I guess I'm trying to get at is, I mean, is there a way to measure that
or do you think that that is in itself a wrong assertion?
So, yes, I think it's a wrong assertion.
What chat GPT can do is it can mimic human language use across many different domains.
And so it can produce the form of a poem.
It can produce the form of a travel itinerary.
It can produce the form of a Wikipedia page on the history of some event.
It is an extremely bad idea to use it if you actually have an information need,
setting aside the environmental impacts of using chat GPT,
and setting aside the terrible labor practices behind it,
and the awful exploitation of data workers who have to look at the terrible outputs
so that the consumer sees fewer of them.
And by terrible outputs, I mean violence and racism and all kinds of sort of psychologically harmful stuff.
Yes.
What's that?
No, we've had one of the people who've been rating this content on the show.
Folks who are interested, I'll link it in the show.
Richard Mertanga was here to talk about what that experience was like.
So setting aside all of that, if you have an information need, so something you generally don't know,
then taking the output of the synthetic text extruding machine
doesn't set you up to actually learn more on a few levels, right?
Because you don't already know you can't necessarily quickly check,
except maybe doing an additional search without chat GPT,
at which point why not just do that search.
But also, it is poor information practices to assume that the world is set up
so that if I have a question, there is a machine that can give me the answer.
When I'm doing information access, instead what I'm doing is understanding the
sources that that kind of information comes from, how they're situated with respect to each
other, how they land in the world. And this is some work I've done with Shrog Shah on information
behavior and why chatbots, even if they were extremely accurate, would actually be a bad
way to do these practices. So just to, you know, back to your point, yes, this system is set up
to output plausible looking text on a wide variety of topics. And that's, therein lies the danger.
because it seems like we are almost there to the Robo Doctor, the Robo Lawyer, the Robo Tudor.
And in fact, not only is that not true, not only is it environmentally ruinous, etc., but
that is not a good world to live in.
And thinking about, thinking about...
I just want to hit on this point.
Yeah.
This is, I agree, I disagree with you on this one.
I do think that some of the points that you're making are well-founded.
We don't want these things to be lawyers right away.
But let me at least point you to one use that I've had recently and you could tell me where I'm
where I'm going wrong if you think I am.
I mean, I'm in Paris now, a little work, little vacation at the same time.
And what I've done is I've taken two documents that I've had friends who they have,
they've been here often.
They put together documents that they send to friends when they go here.
I've uploaded that into chat cheptie.
And then I have chat cheptie, like, search the web and give me ideas of what to do.
I tell it where I am.
I tell it where I'm going.
And it searches through, like, for instance, like all the museums, the art galleries,
the festivals, the concerts, and it brings it into one place.
And that's been extremely useful to me to find new cultural events, concerts.
There's even a bread festival going on here that I had no idea about.
And now I'm going to go because it's found it for me.
So there's a link.
When it comes to this stuff, there's a link that you can go out and double check the work.
But as far as finding information on the web, the fact that it's able to go and comb the
internet for these events and then take into context some of the the context that I've given it
with these documents I think is very impressive and that's just one use case so I'm not asking it
to be a lawyer I'm kind of asking it to be what you said an itinerary planner what's wrong with
that so I mean first of all you you have these lovely documents from your friends and I guess
what you're saying is missing is whatever current events are so they've given you some sort
of like these are general things to look for but they haven't looked into what's going on right
now. What's wrong with that? You know, on several levels, what would we do in a prior age,
like even pre-internet? The local newspapers would list current events. Here's what's going on.
If you landed in a city, you would go find the local, probably local indie newspaper and look up
the events page. And that system was based on a series of relationships within the community
between the people putting on festivals and the newspaper writers. And it helps support probably
the local news information ecosystem, which was a good thing.
But on top of that, if something wasn't listed, you could think about why is this not listed?
What's the relationship that's missing?
Your chat GPT output is going to give you some nonsense, and you're right, this is a use case
where you can verify whether this is real or not.
It is also likely going to miss some things.
and the things that are not surfaced for you are not surfaced because of the complex set of biases that got rolled into the system plus whatever the roll of the die was this time.
And anytime someone says, well, I need chat GPT for this, usually one or two things is going on.
Usually it's either there's another way of doing that that is giving you more opportunities to be in community with people to make connections.
or there is some serious unmet need,
which doesn't sound like it's this case.
And if we sort of pull the frame back a little bit,
we can say, why is it that someone felt like
the only option was a synthetic text extruding machine?
And here I think you've fallen into the former of these,
which is what are you missing out on by doing it this way?
What are the connections you could be making
to the people around you?
If you're staying in an Airbnb,
maybe the Airbnb host,
if you're in a hotel, the concierge,
to get answers to these questions
when you're looking to the machine instead.
I would also say, I would also say this is, I just want to add, you know, I mean, I would also say this is a pretty like low stakes scenario, right? You can go out, you can verify these things. You can go to existing resources of, you know, event calendars that people also spend a lot of time curating online. I mean, there's a lot of stuff that's already curated online. And I mean, it's not like this, this exists in prior incidence of technology. I mean, you know, one of the people that we cite in the book,
talk a lot about it is Dr. Sophia Noble's work on Google and in the kind of way that Google
results, you know, present very violent content with regards to racial minorities. One of the
parts of the book that I like to reference and that a lot of people don't reference
initially is this kind of part that she talks about, she talks about Yelp and she talks
about Yelp and like specifically and what it's referring in terms of a black,
hairdresser and the way that like Yelp effectively was like shutting this person out of business
because there was specific need that she had for for black residents of the city that she was
studying and braiding hair and doing other black hairstyles right and so this is this is kind of a
function of all kind of information retrieval systems right you think about what they what they're
including what they're excluding right so this is not very consequential here but in any kind of any
kind of area of, say, summarization or any kind of retrieval, you do need to have some kind of
expertise where you can verify that and ensure that what's getting in there is not missing
something huge. And it's going to basically then take this set of information access resources
or systems, in this case, crawling the web and knowing that that's going to miss something.
And then it's going to exacerbate that because then you cannot situate those sources in
context.
Okay. Let me just give my counter argument. And then we
can move on from this. My counter argument would be a couple things. First of all, I don't speak
French, so the local newspaper would kind of be lost on me. I am speaking, okay, so I am staying at a
residence place. We swapped apartments, so she's in my New York apartment. I'm here. So maybe she and I
could have gone over that newspaper together. That's fair. But the newspaper, speaking of things that
leave stuff out, the newspaper leaves stuff out all the time. It exercises editorial judgment.
so it is bot editorial judgment for newspaper editorial judgment but the bot can be in some ways more
comprehensive because it's searching the entire web and i'll just say one last thing about this i never felt
i didn't feel the need to use it um i didn't say i need to use it to figure out what's going on like
again i had these documents what's useful about it is speaking of making connections with the local
community. If I'm able to, here's the word, be efficient in my research through using
it, I could spend much more time out in the community versus searching the web or reading the
newspaper. So what's your thought on those arguments? Sorry, so I'm getting distracted by Alex's
cat walking around. So listeners, Alex's cat is here. Alex, what's your cat's name? This is Clara
and I'd lift her up, but I have a shoulder injury. But she is,
She's knocking the mic around, so I'm going to not, I'm just trying to keep her off the mic.
Yeah, yeah, thank you.
So, you know, the efficiency argument, so this is efficiency argument in the context of leisure activities as opposed to in the context of work.
You mentioned along the way that it is searching the whole web for you.
You don't know that, actually.
That's right, yeah.
And also the whole web includes a lot of stuff that you don't actually want.
Like lots and lots and lots of the web is.
just garbage SEO stuff, and maybe you're seeing more of that in your chat GPT output than
you would with a search engine, which as Alex mentioned also has issues. And then finally,
I'm going to take issue with you.
SEO garbage is made for the search engine. So it is. But the search engine is also in order
to stay in business have to be fighting back against the SEO garbage. It's a constant battle.
Probably the chatbots as well. Yeah. So you mentioned newspaper editorial judgment versus
bot editorial judgment. And I'm going to take issue there because a bot is not the kind of thing that
can have judgment, nor is it the kind of thing that can have accountability for exercising
judgment. And so I think that, yes, this, Alex is saying this is low stakes, but if you're using it
as sort of a motivation for these things being useful in the world, then you have to deal with
the fact that the useful in the world is going to entail many more higher stakes things, and then we
really have to worry about accountability. I would also want to say to, I mean, there's a lot of,
I think, this argument from, like, quote, unquote, capability.
these, which I don't know really what that term means, and that's another poorly defined term,
I think, especially when it comes to AGI.
But I mean, this argument from kind of like, well, I find it useful.
I don't find terribly convincing, right?
I mean, it's sort of like, well, okay, you have found it useful in either a situation in which
there is a way to have some kind of verification of sort of.
that you know about and have some kind of ground truth about, or you found it useful kind of
from a variety of these different situations. But if I'm asking a chat bot about something about
an area, I know quite a lot about, say, sociology or social movement's literature, I then have
that knowledge to verify that just from my social skill in that area. And this is a term I'm kind of
borrowing from a sociologist, Neil Flingsstein, and my knowledge of how to navigate those
areas in my professionalization as a sociologist. Okay, but then I'm, but then once it gets into
those areas on which a verifiable just escapes me, which is most areas, because we're not
professionals in most areas. And although a lot of us want to be jacks of all trade,
jacks and jills of all trades, then we lose that ability and we, we don't have the, we don't have
the social scale or depth of knowledge to to verify that in the same way.
And so I'm really not convinced by those, well, these are useful for me in these pretty low state
contexts because that slippage then means that we're going to miss some pretty big things
and some really dire contexts.
Okay. Well, let's turn it up a notch when we come back because we're going to talk about
AI at work and AI in the medical context. And maybe we can even touch a little bit on
Dumerism, which you write about in the book.
And there's plenty else on the agenda.
So we'll be back right after this.
And we're back here on Big Technology Podcast with Professor Emily M. Bender and Alex Hanna.
They are the authors of the AICon, How to Fight Big Tech's Hype and Create the Future We Want.
Here it is.
So let's go to usefulness.
And we'll start with generative AI in the medical context.
Because why don't we just go straight for the example that.
we'll probably have the biggest disagreement on here.
And I'm not saying that I think generative AI should play the role of a doctor.
In fact, when I wrote my list of things I agree with you both on,
I don't think that AI should be a therapist, at least not yet.
And we know now that AI is the number one use according to a recent study is companionship
and therapy.
And the therapy side really scares me.
And I think the companionship isn't the best thing in the world either.
But in medicine, I do find that there is some use for it.
Medicine is a field overrun by paperwork and insurance requirements that I think have ruined the health care system
because they keep doctors effectively tied to their computers writing notes as opposed to seeing patients or living their lives.
And Alex, before the break, you mentioned that one of the areas that this stuff is useful
is when it starts to operate in your area of expertise because you're able to verify that.
So, I mean, we're going to go with one use that I find to be pretty good here
and would sort of, to me, doesn't make a generative AI feel like a con.
Is when a doctor is seeing a patient and they can put a transcription,
take a transcription of the conversation that they have with the patient,
and then have AI synthesize what they talked about and summarize it and put it into the systems
that they have for electric medical records and then verify that so they don't have to spend the
time writing those summaries up and can actually go and spend some more time with patients.
So what's the problem with that?
There are so many problems with that.
And the first thing I want to say is that you named the underlying problem when you talked
about insurance requiring so much paperwork. So this is one of those situations where there's a
real problem here. It's not that doctors shouldn't be writing clinical notes. That is actually
part of the care. But there is a lot of additional paperwork that is required because of the way
insurance systems and especially the one of the United States are set up. And so we could work
on solving that problem. And this is a case where sort of the turn towards large language
models, so-called generative AI as an approach to this is showing us the existence of an issue.
But that doesn't mean it is a good solution. So many problems. One is writing the clinical note is actually part of the process of care. It is the doctor reflecting on what came out of that conversation with the patient and thinking it through, writing it down, plans for next treatment. That is not something that I want doctors to get out of the habit of doing as part of the care. Now, they might feel like they don't have time for it. That's also a systemic issue. Secondly, these things are set up as like ambient listeners, which is a huge
privacy issue. As soon as you've collected that data, it becomes sort of this like radioactive
pile of danger. Thirdly, you've got the fact that automatic transcription systems, which are the
first step in this, do not work equally well for different language varieties. So think about somebody
who's speaking a second language. Think about somebody who's got dysarthria. So an older person
whose speech isn't very clear. Think about a doctor who is an immigrant to the community that they're
working in who's got extra work to do now because their words are not well transcribed. And
So the clinical notes thing doesn't work well for them, but the system is set up where there's
these expectations that they can see more patients because the AI, in quotes, is taking care of all of
this for them. And there's a beautiful essay that came out recently, I think, in stat news. And I was
looking for the name of the author, didn't find it real quick, really reflecting on how important it
is to her that the doctor do that part of the care of actually pulling out from the conversation,
this is what matters. And it's not just simple summarization. It is a
actually part of the medical work to go from the back and forth had with the patient,
all of the doctor's expertise, to what goes into that note. Yeah. So I want to add on,
Emily has said so much of what I want to get at, which I think is, but I have, I think,
three or four separate points in addition to that. So first off is the technical point.
So there's, so tools that are, that are purported to be summarization. There's some great
reporting by Garensberg and Hilda Sheldon and the AP from last October that was looking at
Whisper specifically. So that's OpenAI's ASR system, automatic speech recognition system, that said
that medical transcription had basically was making up a lot of shit. And then we knew that this,
they had quote unquote hallucinations. Again, this is not a term that we use in the book. We say
that it's, I say it's making shit up, but that is maybe even granting two.
much anthropomorphizing of the system for me.
And so, but there is a lot of these things.
Some, quoting from that text, some of that invented text includes racial commentary,
violent rhetoric, and even imagined medical treatments.
So that's one major problem.
The second problem is that medical transcription has been this area, which has been an area
in which medicine has been forcing kind of this casualization of work for years, right?
So medical note taking that exists in hospitals now, much of that is done remotely.
So it's gone and taken this work that has been seen as kind of like this busy work
or this thing that like, I don't want to write up my medical notes to be this type of work
that needs to be foisted on someone else.
So prior to this kind of ASR element of it is we've had these, oh, thanks for linking to
And I'll link the AP article that I'm done looking at too.
Part of that work has actually been offshoreed a lot into this kind of movement of
outsourcing.
So a lot of that is done remotely as a part of this casualization.
And the scene to be, I think, to be a lot of, I want to point out the gender notion of this.
This is like a very kind of like women's based work.
And that reflects a lot of the ways in which.
so much of
quote,
A.I. Technology wants to basically
take the work that has been traditionally
the domain of women
and is saying, well, we can automate that or
we can casualize that in different ways.
And that's important because
it sees this work as not actually
part of quote unquote the work.
It is seen as
work that should be, that
ought to be casualized and offshore.
And so, and I
appreciate the essay that Emily
because that essay is saying like, no, this is actually part of the element of doctoring.
And then I want to also just kind of couch all of this stuff in the kind of political economy
of the medical industry, thinking about what does it mean to rush and put and have more
and more remote medicine and having more and more doctors see more patients.
And these efficiency gains from doctors isn't going to like make their jobs necessarily
easier.
It's going to put more of a pressure on them.
Now that you're in a position where you don't have to take.
medical notes you're going to be running from position to from appointment to appointment to
appointment and my sister's a nurse she's a nurse practitioner and she's basically seeing this in
her job right now at her clinic she's well like now we have these things where I have to see
patients now you know and if you know it's it's not going to I'm going to go and be on the
beach anywhere it means that I'm going to have you know I'm going to have 9 to 10 15
minute appointments a day I'm not going to have enough proper treatment proper time
to spend with patients. So if these things could be, you know, like I would say to the CODA to all
of this is that if AI boosters could really offshore all of doctoring to chatbots, they would. And this
is one case in which Bill Gates has said, you know, in 10 years, we're not going to have teachers
and doctors. What a nightmare scenario to have non-teachers and not doctors. And Greg Carado really
gives it away. And we cite a book where he says, and Med Palm 2, you know, this thing is really
efficient we're going to increase tenfold our medical ability but i wouldn't want this to be part of my
family's medical uh medical journey okay but you're again here you're you're picking out what what is like
some of the most extreme statements and i i started my question saying it's bill gates
and bill gates can make extreme extreme statements i don't think that guy i don't think he's the guy
and i think that that that doesn't reflect the broad consensus here and definitely not the question
that I asked, which again was about using this to take some of the time that the doctors are
using, you know, in paperwork and give that back to either the doctors or to have them be able to
see more patients.
So we very much addressed that point.
First of all I want to name the author of that essay.
Her name is Alia Barakat, and it's a beautiful essay.
She's a mathematician and also a patient with a chronic condition.
Wonderful essay.
But, yeah, you said give that time back to the doctors or have them see more patients, right?
it is not going to be going back to the doctors.
That's not how our health care system works.
And it's also going to, therefore, decrease quality of patient care.
It is lose-lose, except for the hospitals maybe getting more money
and certainly the tech companies that are selling this to the hospitals.
Okay.
I'm also curious in terms of thinking about it.
I mean, yes, what is that – I'm curious in thinking about the more nuanced position
and, like, who are the reference here that you're thinking of, Alex?
what's the consensus on this because I'm yeah like we see the egregious you know elements of this and I'm
wondering what the medical consensus is you know like who's what's an example you know just to put
poison now I'm interviewing you but like this is someone that do you think is doing this very well
well well I mean someone doing this well like again I don't think that this stuff is well developed
yet but I've definitely seen enough doctors just buried in paperwork and we we started this whole
segment talking about how this is, I guess it's an insurance driven thing. And so it's interesting
that I guess do you both not like the way that the insurance companies are guiding the
system, but also think that it's good practice to have doctors write those notes for them?
Hold on. There's two use cases for doctor's notes, right? There is actually documenting for the
patient and for the rest of the care team what has happened in this session. And that I think is a super
important part of the work of doctoring. I believe you that there's a lot of additional paperwork
that has to do with getting the insurance companies to pay back. And no, I don't like that system
at all. The insurance companies are not providing any value. They are just vampires on our
health care system in the U.S. Okay. I think we can we can agree on that front. And anyway,
but I do I do think that as this stuff gets better, I understand a patient wants this to happen.
do I think a doctor would be giving them worse care if they allowed the AI to summarize the notes
or to pick out the more important parts if this stuff was working well not necessarily so that's a
that's a big if you know what does it mean when this stuff is getting better in this stuff working
well do you mean kind of like the absence of making shit up right definitely I mean but we all we both
we all agree that the doctor will have to verify and check this information after well I guess the problem
was there then like then why are we having the doctor double check that to begin with right in an
area where the doctor has 15 minutes to see every patient and there is an a i quote unquote scribe doing
or quote unquote i i don't want to call it a i scribe there's an automatic speech recognition
tool right doing automatic speech recognition on these things in what in what space or with what
time does the doctor have to verify those in an area i mean this is
I mean, like, well, the time that they would be spending, writing those notes in the first place.
Is verification an easier task than transcription? I guess that's my question. I would proffer no.
I mean, just from my experience using these systems. And I mean, I'm not a doctor. Thank God. Although I've thought about it.
Not that kind of doctor to the chagrin of my parents. But then I guess the question is of that,
In my experience, I've used these tools for interviews, specifically in kind of qualitative interviews with data workers and have spent time with these tools and have just had such an awful time trying to think about this, especially with regards to, you know, this isn't, you know, this isn't medical terminology, but it's terminology around doing data work.
We're talking about training AI systems.
And it just, it is such a terrible job.
And at one point, I threw it all out.
And I said, okay, I just am sending this to somebody to actually transcribe because this is not helpful for me.
And it's taking me more time, starting with the transcript and then from doing it from scratch.
And I've transcribed, you know, I'm not a primarily qualitative interviewer, but I've spent time, you know,
transcribing dozens of interviews in my research career.
And I've found it just very difficult.
So, I mean, I guess the question is, is that verification, you know, is that taking?
the time that could be just be used for the doctoring and working with patients and i mean like
you know holding you know everything about the insurance industry you know stable you know like
is that kind of notion of thinking about you know different patient um how the patient presents
how the patient is describing their their they're they're you know how they're presenting is that
you know that is often the work of doing it you know and the medical training i do have is that i am
at one i was at one point a licensed EMT uh and writing up PCRs is not like you know no one wants
to write up the PCRs at the same time you're spending time taking note of how a patient is presenting
the patient is you know Alex rhythmic just bringing back to the Alex's um that you know the patient is
cyanotic around their lips like these are things that you know a health care professional would be
paying attention to is making notes maybe because they're like writing it later later so i'm thinking
about this process of writing and what it does to our own practice of viewing and aiding and
administering medical care okay i mean we'll agree to disagree on this front but again i i think
we are all on the same page that insurance companies requiring additional writing just because
they hope you don't ever get to the claim.
If you don't file it, that's probably bad.
And we don't think that there should be AI doctors, at least yet.
That's what I say.
I think you guys probably say never.
So, all right, I want to end on this, which is, or maybe we can do two more topics.
I guess, like, here's my question for you.
A lot of the discussion of AI's usefulness in jobs in the book discusses these tools
being imposed top down. But what if they come bottom up? Like, what if a worker can find use for
them and actually make their job easier by getting good at using something like a chat GPT or a
Claude? Or if, you know, again, we like kind of talk through the medical use case. If a doctor
does find that this is useful for them, are you opposed to that? So yes, and I think that actually
Cadbury, of all people, put it best. There's this hilarious commercial that was for the Indian
market, sort of showing how the supposed efficiencies that you're getting out of this
just ramps up the speed of things and doesn't leave you time to really get into the work
that you're doing and be there. I think that the most credible use cases I've heard for these
things are, first of all, as coding assistance. So that's sort of a machine translation problem
between natural language and some programming language. And there I really worry about technical
debt where you have, you know, output code that was not written by a person that's not well
documented, that becomes someone else's problems debug down the line, but also in writing
emails. People hate writing emails and people hate reading emails. So you get these scenarios where
somebody writes bullet points, uses chat TPT to turn it into an email, and the person on the other
end might use chat GP to summarize it. And it's like, okay, so what are we doing here? Like, and again,
taking a step back and saying, what are the systems that are requiring?
all of this writing that everyone finds a nuisance to write and to read, can we rethink those
systems? And also, I just have to say that whenever I'm on the receiving end of synthetic
text, I am hugely offended. And one of the things that we put in the book is, if you couldn't be
bothered. I definitely got one of those emails yesterday. And I was like, you use chat GPT for this. I know
you did. Yeah. If you couldn't be bothered to write it, why should I bother to read it?
Right. Yeah. And I mean, I mean, it's very interesting in putting this and like thinking about
cases in which workers are using this kind of organically. And I kind of like, in a case where
it's, you know, like, this is the case where, like, first off, I've, I've heard a very little
of that personally, especially for professional. I mean, I think there's plenty of workers that are
finding a lot of uses, but. I would say the analog that I find to be where it's not top down is
in education. And to that degree, I think that's kind of a failure in thinking about like what
education is, right? I mean, in that case, it's... Well, for students to be using this to get through their
classes. Yeah, right. Exactly. Are you talking about teachers putting stuff together? Well,
but I'm thinking about students at all. I'm thinking about the students, right? And I'm just thinking
about areas in which, but I'm using that as sort of an analog and then thinking about what are
the conditions that are forcing students to use this, right? If there's kind of cases in which
this seems to be sort of useful, okay, what are the cases in which what does that say about the job?
What does it say about like how the work is oriented, right?
In that case, then maybe there might be, needs to be kind of different efficiencies or thinking about how the job is operating, right?
I then worry then that these things become mandated in work environments.
And you're saying, well, people are using this.
And so everybody's using this.
And then where does that leave the people who are resistors are thinking about, well, I know this can't do a good job.
So where's that putting me?
And I think we've already seen such a justification for this as being a place where,
Employers have been reducing positions by the scores because there's a notion that these tools can do these jobs suitably and to a certain kind of degree of proficiency, which is just not the case.
That has me worried about down the line of these areas that Emily has mentioned, the kind of technical debt area, the kind of how do we know?
And there's kind of an overestimation of capabilities of these tools in that case.
Okay, I know we're at time or close to time.
Can I ask you one question about Doomers before we get out of here?
Sure.
Let's end by Dogging on Doomers.
Okay.
So I definitely saw that there was a chapter about Doomers here, and I was excited to read it
because my position has been largely that those who are worried that large language
models are going to turn us into paper clips or either marketing what they're selling or just
very into, I don't know, they like the smell of their own body odor.
It does. I mean, I guess it's not a terrible thing to be worried about, but there's so much more, and it seems so unlikely that this is going to hurt us. So I definitely wanted to get your take on why you are, why you're down on dumerism. And let me just give my one caveat here. There's a line in your book that says that AI safety is just dumerism, and it's only about these long-term problems. But I've definitely heard.
some of the AI safety folks like Dan Hendricks from the center of AI safety, talking about
like really important near-term issues, like whether this technology could help virologists with
bad intent. So I wouldn't malign the entire AI safety field. But the dumerous stuff, I hear your point.
All right. So attack that and then we'll get out of here. So I just want to put in a shout out for
a new book by Adam Becker called More Everything Forever, which really,
goes deep into the connections between the sort of dumerous thought and these more palatable
looking sides of what's called effective altruism. And also in that context, there's a wonderful
paper by Timit Gibru and Emil Torres on what they call the Tescrial bundle of ideologies.
And I think that if you're, if your concern about the systems is not rooted in real people
and real communities and things that are actually happening, like even this like, oh, but
bad actors could use it to, you know, more quickly design, you know, viruses and stuff like that.
That's still speculative, right? So anytime we are taking the focus away, it's like, has that
happened? Right? This is, this is still people writing science fiction, fan fiction for themselves.
And, you know, it's not, it's based on these jumped up ideas about what the technology can do
and taking the focus away from the actual harms that are happening now, including the environmental
stuff we started with. Right. But, yeah.
And I mean, I will, I will say.
You want to get, you want to get ahead of that, right?
Like we had it with social media.
There were some issues with social media, but some of these, there was not a focus on some
of like the potential long-term issues.
And that only came up later on, at least in the beginning.
You don't agree.
Wait.
There are problems with social media, for sure.
Yeah.
And some of those problems were documented and explained early on and people were not paying
attention. But they were real problems that were being documented as they were happening, as opposed to
imaginaries about, well, someone's going to use this in Dr. Evil up a bad virus. Okay. Yeah. Go ahead, Alex.
For the sake of time, I think that's fine. I don't have much to add. All right. Well, look,
the book is called The AI Khan, How to Fight Big Tech's Hype and Create the Future We Want. The authors are
Emily M. Bender and Alex Hannah. Emily and Alex, I've been
reading your work for a long time. And it's great to have a chance to speak with you. Like I said
at the top, you know, for those who are listening or watching, and you may not agree with everything,
either everything I said or everything our guests said. Hey, at least you now you know these arguments
and you know the arguments for and against. And we trust you to make up your own opinion
and do further research. And we've definitely had plenty of good stuff to keep digging into
shout it out over the course of this conversation. So Emily and Alex, great to see you. Thank you so
for joining the show. Thank you for this conversation and enjoy Paris. Thanks, Alan.
Have a great time. Thank you both. Thank you, everybody, for listening. We'll see you on Friday
for our news recap. Until then, we'll see you next time on Big Technology Podcast.