Today, Explained - AI is killing the internet
Episode Date: July 30, 2025In a first-of-its-kind decision, an AI company wins a copyright infringement lawsuit brought by authors. It's part of a larger fight that is remaking the internet. This episode was produced by Gabrie...lle Berbey, edited by Amina Al-Sadi, fact-checked by Rebeca Ibarra, engineered by Patrick Boyd and Andrea Kristinsdottir, and hosted by Sean Rameswaram. Vox's Future Perfect is funded in part by the BEMC Foundation, whose major funder was also an early investor in Anthropic; they don’t have any editorial input into our content. Listen to Today, Explained ad-free by becoming a Vox Member: vox.com/members. Transcript at vox.com/today-explained-podcast. Noted fan of the internet Al Gore with his boss at the time, President Bill Clinton. (Photo by Sharon Farmer/White House/Consolidated News Pictures/Getty Images) Learn more about your ad choices. Visit podcastchoices.com/adchoices
Transcript
Discussion (0)
Artificial intelligence is scraping the internet.
It's gorging all the websites to give you what you want.
It's actually kind of gorging everything
to give you what you want.
And the makers of everything are not very happy about it.
Sarah Silverman is suing, Sony is suing,
Dow Jones is suing, The New York Times is suing, authors are suing. Dow Jones is suing. The New York Times is suing. Authors are suing. But in one author
lawsuit, AI kind of won. Specifically, Anthropics AI who goes by Claude? Well Claude's not cool,
but Claude's uncool the same way I'm uncool, see? So. Claude's win in court is scaring the makers of everything.
And we're going to talk about why on Today Explained.
Today Explained.
Today Explained.
Today Explained.
Today Explained.
Today Explained.
Today Explained.
Today Explained.
Today Explained.
Today Explained.
Today Explained.
Today Explained.
Today Explained.
Today Explained.
Today Explained.
Today Explained.
Today Explained. Today Explained. Today Explained. Today Explained. Today Explained from Vox, I'm Sean Ramos from here with Jason Kebler, tech reporter and
co-founder of 404 Media.
I am a journalist who covers AI, but I'm also a business owner because we have our own small
publication and so I'm very interested in what is going to happen with all of these
AI companies getting sued on copyright grounds.
There's dozens of lawsuits at this point and I'm concerned about it both as a journalist
who has had my work scraped but also as someone who has like a direct financial interest in
it.
And so about a month ago there was this decision in a case against Anthropic,
which makes the AI tool called Claude.
And it's not necessarily that this is
the biggest AI copyright case,
but is the first real major decision where we get
a judge pointing at how he is thinking about
these issues of massive AI companies scraping authors work, scraping
artists work, scraping musicians work. And who sued Anthropic? Yeah, so it's three authors.
Their names are Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson. Three authors claim
Anthropic built a multi-billion dollar business by misusing copyrighted works and pirated writings without permission and without paying the authors for their work. This lawsuit
is really just the latest as many other authors, journalists, record labels,
artists, creators, they try to wrestle back control of their work. To be totally
honest I didn't know them before this lawsuit. To be totally honest I still
don't. They sued them because they learned that their books
were included in this data set called Books 3,
which is this really controversial at this point data set
that contains a few hundred thousand books.
And the Atlantic at one point got a copy of Books 3
and then published like this search tool that allowed authors to see,
is your book in this data set?
Author Drew Hayden Taylor had no idea
Wow!
that nine of his works were part of Books 3,
a massive data set used by tech companies to train artificial intelligence.
Well, it's a combination of being flattered and being concerned.
We're all just like little ants who don't mean anything to the big billionaires.
They don't want to pay us for our words. They'd rather just take it.
I'm so mad. If your book is on here, I'm so sorry.
I'm just like so sad for so many authors today.
These authors learned that their books were in books three,
Anthropic trained on books three,
and therefore Anthropic trained on their copyrighted works,
and so that formed the basis of this lawsuit.
So the really interesting thing is that in the early days of this debate,
and it's one of the hottest debates at the moment between artists,
journalists, authors,
and like the AI boosters and companies and maximalists
is, is it fair use to scrape this stuff en masse,
run it through a large language model,
like turn it into a huge data set
and then use large language model technology
to create these tools.
And at first the AI companies were very skittish about saying that they had
trained on copyrighted work at all.
AI should be allowed to read the internet and learn.
Shouldn't be regurgitating.
Shouldn't be, you know, violating any copyright laws, but on
individual's private work, yeah, we try not to train on that stuff.
We really don't want to be here upsetting people.
But as these cases started going to court and as they entered discovery and as it became
clear that every major AI company was training on copyrighted work, their argument went from being, well we can't
say what we trained on because it's proprietary, to of course we trained on copyrighted work,
we had to, and it's legal. And it's legal because our use of it is transformative and
therefore is protected by the fair use tenet of copyright law.
Section 107 of the Copyright Act reads,
transformative uses are more likely to be considered fair.
Transformative uses are those that add something new with a further purpose or different character
and do not substitute for the original use of the work.
That's what they argued and that's what the judge ultimately decided.
What he decided in this case was the scraping of these three authors' books was considered
fair use under copyright law.
But there is a huge caveat here where he decided that the way that Anthropic went about acquiring
the books in the first place was piracy. Okay, so the judge essentially hands down a split decision saying that, yes, this is
fair use to use these authors' work this way, but also it wasn't totally fair how you got
this stuff because it was pirated.
So I don't know, what does that mean?
Does everyone go home unhappy or was this like a huge win for Anthropic?
Doesn't feel like a huge win for the authors.
Yeah, I mean, I don't think it's a huge win for anyone yet.
And I think that the people who are saying this is a slam dunk for Anthropic,
which many people in the AI world are saying it's a huge win for Anthropic, which many people in the AI world are saying
it's a huge win for Anthropic, I think they're wrong.
And the reason that I think they're wrong is because the judge determined essentially
that it was not copyright infringement to train Claude on copyrighted material that
was legally obtained. But then they also downloaded books from this website called LibGen,
which is a piracy website that has millions of books on it.
And then also from a website called Pirate Library Mirror,
which is another piracy site that has millions of books on it.
And the judge said that obtaining the books in this way
was pretty much like cut and dry copyright infringement.
And I think the really important thing to note is that every major AI company has trained on copyrighted works that they obtained in a similar fashion.
We have done reporting at 404 Media where entire YouTube channels were scraped, you know, Netflix,
like the entirety of Netflix was scraped.
And so the specifics about how these companies obtained these works is potentially going
to be really important.
And a lot of that scraping has already been done.
A lot of that piracy has already been done.
These companies are literally some of the richest companies on earth, are affiliated
with some of the richest people on earth.
Did they really just steal all these books?
Could they not have just gone to Amazon and bought like some books or is that just too
much work for them?
Well, so the super interesting thing about this lawsuit and something that like really
like, I was like, holy shit,
like, how did they do this? Why did this happen? Is in the beginning, Anthropic pirated all
these books. They downloaded huge amounts of torrents, they scraped these piracy websites,
and they did that specifically because they didn't want to slow down. Like there's an
email that is part of this lawsuit where the CEO,
Daria Amadei says, you know, we don't want to get into, he calls them legal slash practice
slash business slog. And so they were basically like, let's do all of this. Let's pirate all the
books. Let's put it into our model, and then let's go buy copies of a
lot of other books. And so what Anthropic did was they had a whole team of people who was dedicated
to buying used books from used bookstores that were going out of business, from eBay, from these
online marketplaces, and they bought a huge, huge number of books, like physical books. They tore
the covers off of them
and they had this like giant scanning operation
where they would scan the books
and then create a digital copy of the books
and then fed that into their model.
And the judge said that all of those books
that were bought from used bookstores, no problem.
And I think that goes to show that
these AI companies are grabbing data from wherever they can find it.
It's a huge arms race to see who can get
the most data from the most number of places.
So they're doing the low-hanging fruit,
which is downloading massive data sets.
Yeah. But then they're scouring the planet,
looking for bookst stores that are going
out of business.
Like I've heard of AI companies looking for like huge physical archives of like VHS movies
and things like that and then digitizing those.
And so really they're just trying to find data wherever they can.
And it seems like when they're able to get it legally by purchasing a copy,
they're willing to do so but they're also willing to take it for free when they can.
Did we learn anything from this lawsuit that might implicate those other ones?
Yeah. I mean, I think that the piracy aspect of this is really important. And we've seen in the past, like, if you are a 13-year-old kid
who's pirating Metallica songs on Napster,
like, you can be liable for hundreds of thousands of dollars worth of damage.
Rock to Never Never Land
Lars will find you.
For just, like, a few songs.
And, like, in this case, you have seven million books.
And so, um...
Like, it will be very interesting to see whether a judge
levies like a huge financial penalty here or whether it's more of a slap on the wrist.
And I tend to think it will probably be more of a slap on the wrist
because all of Silicon Valley, all of America's largest companies
sort of have a huge amount of investment riding on
the widespread adoption of AI. And AI is now a huge part of the American economy. It's become
part of like geopolitics as well, where you have the Trump administration and really the Biden
administration was saying the same thing, saying that the United States can't fall behind China in the quest to innovate in AI and to have like widespread AI
adoption. I'll be very curious to see whether there are like actual like
serious punishments for these companies that have scraped all this data or
whether they you know wiggle out of it with a slap on the wrist
or get out of it with a series of settlements or what have you.
But I tend to think that there's probably no stopping this industry from a legal perspective.
I think that it feels too big to fail to me at this point.
404media.co is where you can find and support Jason Kebler's work instead of, you know, just stealing it.
AI companies aren't just stealing everyone's intellectual property.
They're also kind of killing the internet as we know it
right before our eyes.
We're gonna talk about that when we're back
on Today Explained.
["Today Explained Theme"]
Hey, this is Peter Kafka, the host of Channels,
a show about media and tech
and what happens when they collide.
And this may be hard to remember, but not very long ago, magazines were a really big
deal.
And the most important magazines were owned by Conde Nast, the glitzy publishing empire
that's the focus of a new book by New York Times reporter Michael Grinbaum.
The way Conde Nast elevated its editors, the way they paid for their mortgages so they
could live in beautiful homes, there was a logic to it, which was that Conde Nast itself
became seen as this kind of enchanted land.
You can hear the rest of our chat on channels wherever you listen to your favorite media
podcast. Today Explained is back with John Herman now.
He's a tech columnist at New York Magazine.
John, in the first half of the show we're talking about how this anthropic case and
judgment may or may not change the extent to which these big AI models can scrape the
internet.
But I want to talk to you about how all this scraping has already in some ways broken the
internet as we know it and how we use it.
You wrote about how AI has broken maybe like, you know, the front page of the internet for
a lot of people.
Google.com.
Tell us how.
Google could not be closer to the center of this recent AI
boom.
On one hand, they are a company that has really deep roots
in that space.
They published the foundational research
for what then became generative AI as we know it.
They've put it in all their products.
If you use any Google thing, you are seeing chatbots everywhere.
Take notes with Gemini.
Summarize this file.
Summarize a folder.
Refine this document.
Find inspiration easily.
Fresh ideas.
Elevate your writing.
Get clear, constructive.
Improve sentence flow, word choice.
They are all in on AI.
Google Search in particular has AI overviews at the top.
There's a new AI search mode that works like a chat bot instead of a search engine.
Google making a rare change to its homepage, the most visited website in the world,
pushing its AI mode tool directly into the hands of its billions of users.
With this latest move, it is changing what billions of people see when they open their
browser, still the on-ramp for the entire internet.
Meet AI mode. Ask detailed questions for better responses. AI on
Google search can provide information. While that was all happening, AI was also
sort of accelerating this feeling of decline in the Google product, which over
the years through this back-and-forth battle between the company and search
engine optimizers and companies trying to get an edge on Google
and this sort of long running dynamic had become a little spammy, a little overloaded
with ads.
Have you noticed that Google sucks lately?
I'm talking about their search.
It sucks.
Why is it so hard to find anything on Google search?
Google search is terrible.
It's bought and it sold five or six links up top all paid for.
It's just garbage, pure unadulterated garbage.
But I think a lot of people would agree that using Google in say 2023 was a kind of a degraded
experience compared to 10 years prior. It was kind of cluttered. There was more just junk in it.
There were more ads all over the interface, but also the stuff you were getting in search was a lot of low quality,
cheaply made aggregated content, stuff that was taken from somewhere else in an effort to sell a product or just serve up some ads.
The arrival of generative AI tools, which enable the creation of basically infinite passable content almost for free really accelerated
that issue.
So on one side, you have the big ecosystem that Google guides people to that is in a
sort of collapse because of this massive shock of new AI-generated content.
On the other side, you have Google the the product, becoming more and more AI centric. And in the middle, you have kind of a complicated story.
And honestly, for search users and regular people, kind of a strange experience.
Do they have a plan to make money off of this?
Obviously, they want to make money.
Has anyone asked what their long-term plan is?
So there are obvious risks to throwing away this like cluttered but lucrative product and
replacing it with a totally clean chat bot or whatever. That's not what they're doing. They are
incorporating AI answers into the main search page, which they say people like quite a bit.
So this last quarter has been really good for them. It also arrived in the context of lots of really strong data
suggesting that the way people use Google Search now
with these AI tools means that they don't really
leave it anymore.
They don't really click out and go to anything.
An AI overview might summarize three articles,
archival resource, some expert opinions,
but the number of people that actually then click
through to those opinions or to those articles is minuscule.
So Google's relationship to the web around it is pretty dramatically different.
If Google's eating up the rest of the internet, if Gemini is eating up the rest of the internet right now. And companies like ours, let's say, are no longer meeting their traffic goals, are no
longer getting any traffic from Google at all.
Does Gemini have nothing to eat?
You know what I mean?
Because everything dies?
Who's going to be feeding Gemini all the right answers in like 10 years?
We're sort of like glorifying the web a bit in this conversation.
No matter how great and incredible it is as this as this big resource, it really doesn't
go that deep.
And the idea that it is now being sort of like trawled and overfished and just sort
of consumed like a resource by these AI companies really does, I think, raise the specter of collapse.
I do think that they could find that their products are being made worse by this dynamic
and by their relationship with the web. I do think that's a real problem. And you can see this in
some of the deals that these companies make with publishers, including our parent company,
which has a deal with OpenAI, for example.
Remind people out there, or me, why companies like ours make deals with companies like ChatGPT.
The context is every media company is struggling for visitors.
Even before the Google traffic really started to
collapse it was sort of unstable.
And so in addition to like a weak advertising market, every media company is looking for
any sort of additional source of revenue.
And if you're a media executive, OpenAI showing up and saying, here is this many millions
of dollars for this many years. It looks like free money.
Of course, if you're like producing the content
or if you're even just thinking longer term
about how a media company or a website
fits into this AI picture,
you recognize that you're sort of, you know,
giving access away to something that these companies
are explicitly trying to automate.
You know, you're sort of like, in an institutional sense, training your replacement.
You're listening to AI Explains today.
But it is a deal made not quite under duress, but something close to that.
For people who miss that old version of the internet,
who miss going to Google, typing
in a query, getting a bunch of results, clicking on a few of them, getting answers that felt
credible, where do they go for that experience now?
I think there's like a funny polarized answer to this.
I just did a story on Reddit, which is having a huge moment right now.
It's been around for 20 years.
It's growing hugely.
Part of it is just a response to social media fatigue, the sense that other communities
on the web don't really exist anymore, that everything else on the web is too commercial
and whatever.
Also, a huge part of that growth is just traffic from Google.
They're having the fastest growth
they've had in almost their entire existence
because Google is just shoveling so many people into Reddit
because everything else is not really working.
So you have that.
You have a community of communities.
You have something that feels kind
of like it's of the old web.
It seems like eventually we're going
to get to the point where it's like you either want
to talk to one of these large language models,
or you just go back to calling up your friend.
Ah, I don't even know where it gets.
You just walk into the street and yell, does anyone
know of a good barber?
Yeah, I mean, it's like a real mutual suspicion
about who's using AI is really pervasive, especially online, but
also in person.
But yeah, I do think that the way that the AI training paradigm and some of the stuff
that you were talking about with Anthropic, but also just the way that Google incorporates
all this stuff, it really does kind of break the deal with the whole idea of the public
web.
Like, all right, we'll all just do this stuff in public.
We'll talk to each other.
People will build all these businesses around this to sort of connect everything and it'll
all sort of work together and whatever.
When you have like these massive sort of predatory companies just consuming all of that, harvesting
all of that and saying, all right, we are no longer part of this arrangement, we are doing something else.
More people are on Discord, more people are in group chats, more people are
either just purely consuming on social networks and not posting or just talking
privately with their friends. And I do think that this fits quite well with
that trend and probably accelerates it.
John Herman, you can read and subscribe
to New York Magazine at nymag.com.
Gabrielle Burbe produced, Amina Alsari edited,
Rebecca Ibarra fact-checked,
Patrick Boyd and Andrea Christensdottir mixed.
And by the way, Vox's Future Perfect is funded in part by the BEMC Foundation, whose major
funder was also an early investor in Anthropic, and none of them have any editorial input
into the stuff we make here at Vox.
Speaking of stuff, we hope you enjoyed the 17 hundredth episode.
If you did, you can say something nice about us most anywhere you listen.
And if you didn't, well, there's always episode 1701 tomorrow. I'm goingffy. Maybe you've seen me on TikTok or TV or interviewing celebrities
on the red carpet. But before all that, I was just another girl running late to
her desk job, transferring calls, ordering printer ink. I don't miss that.
But I do miss not working at work, gossiping with my co-workers about
celebrities. What's the latest with Bieber?
Where's Britney?
And which Jonas brother is which?
That's what I want my new podcast to feel like.
Like you and I are work besties.
We'll chat about celebrities we're obsessed with.
How could you be registered to vote and not know who Jennifer Aniston is?
Look up their star charts.
Sagittarius and the Capricorn, they do clash and have so much fun
avoiding real work together.
I'm having a silly goose of a time.
Teffy runs, Teffy laughs, Teffy over shares.
Teffy explains, but most of all, Teffy talks.
From me, the cut and box media podcast,
this is Teffy Talks.
Let's go.