Today, Explained - AI is killing the internet

Episode Date: July 30, 2025

In a first-of-its-kind decision, an AI company wins a copyright infringement lawsuit brought by authors. It's part of a larger fight that is remaking the internet. This episode was produced by Gabrie...lle Berbey, edited by Amina Al-Sadi, fact-checked by Rebeca Ibarra, engineered by Patrick Boyd and Andrea Kristinsdottir, and hosted by Sean Rameswaram. Vox's Future Perfect is funded in part by the BEMC Foundation, whose major funder was also an early investor in Anthropic; they don’t have any editorial input into our content. Listen to Today, Explained ad-free by becoming a Vox Member: vox.com/members. Transcript at vox.com/today-explained-podcast. Noted fan of the internet Al Gore with his boss at the time, President Bill Clinton. (Photo by Sharon Farmer/White House/Consolidated News Pictures/Getty Images) Learn more about your ad choices. Visit podcastchoices.com/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 Artificial intelligence is scraping the internet. It's gorging all the websites to give you what you want. It's actually kind of gorging everything to give you what you want. And the makers of everything are not very happy about it. Sarah Silverman is suing, Sony is suing, Dow Jones is suing, The New York Times is suing, authors are suing. Dow Jones is suing. The New York Times is suing. Authors are suing. But in one author lawsuit, AI kind of won. Specifically, Anthropics AI who goes by Claude? Well Claude's not cool,
Starting point is 00:00:37 but Claude's uncool the same way I'm uncool, see? So. Claude's win in court is scaring the makers of everything. And we're going to talk about why on Today Explained. Today Explained. Today Explained. Today Explained. Today Explained. Today Explained. Today Explained.
Starting point is 00:00:55 Today Explained. Today Explained. Today Explained. Today Explained. Today Explained. Today Explained. Today Explained. Today Explained.
Starting point is 00:01:03 Today Explained. Today Explained. Today Explained. Today Explained. Today Explained. Today Explained from Vox, I'm Sean Ramos from here with Jason Kebler, tech reporter and co-founder of 404 Media. I am a journalist who covers AI, but I'm also a business owner because we have our own small publication and so I'm very interested in what is going to happen with all of these AI companies getting sued on copyright grounds. There's dozens of lawsuits at this point and I'm concerned about it both as a journalist who has had my work scraped but also as someone who has like a direct financial interest in
Starting point is 00:01:38 it. And so about a month ago there was this decision in a case against Anthropic, which makes the AI tool called Claude. And it's not necessarily that this is the biggest AI copyright case, but is the first real major decision where we get a judge pointing at how he is thinking about these issues of massive AI companies scraping authors work, scraping
Starting point is 00:02:06 artists work, scraping musicians work. And who sued Anthropic? Yeah, so it's three authors. Their names are Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson. Three authors claim Anthropic built a multi-billion dollar business by misusing copyrighted works and pirated writings without permission and without paying the authors for their work. This lawsuit is really just the latest as many other authors, journalists, record labels, artists, creators, they try to wrestle back control of their work. To be totally honest I didn't know them before this lawsuit. To be totally honest I still don't. They sued them because they learned that their books were included in this data set called Books 3,
Starting point is 00:02:52 which is this really controversial at this point data set that contains a few hundred thousand books. And the Atlantic at one point got a copy of Books 3 and then published like this search tool that allowed authors to see, is your book in this data set? Author Drew Hayden Taylor had no idea Wow! that nine of his works were part of Books 3,
Starting point is 00:03:17 a massive data set used by tech companies to train artificial intelligence. Well, it's a combination of being flattered and being concerned. We're all just like little ants who don't mean anything to the big billionaires. They don't want to pay us for our words. They'd rather just take it. I'm so mad. If your book is on here, I'm so sorry. I'm just like so sad for so many authors today. These authors learned that their books were in books three, Anthropic trained on books three,
Starting point is 00:03:49 and therefore Anthropic trained on their copyrighted works, and so that formed the basis of this lawsuit. So the really interesting thing is that in the early days of this debate, and it's one of the hottest debates at the moment between artists, journalists, authors, and like the AI boosters and companies and maximalists is, is it fair use to scrape this stuff en masse, run it through a large language model,
Starting point is 00:04:18 like turn it into a huge data set and then use large language model technology to create these tools. And at first the AI companies were very skittish about saying that they had trained on copyrighted work at all. AI should be allowed to read the internet and learn. Shouldn't be regurgitating. Shouldn't be, you know, violating any copyright laws, but on
Starting point is 00:04:41 individual's private work, yeah, we try not to train on that stuff. We really don't want to be here upsetting people. But as these cases started going to court and as they entered discovery and as it became clear that every major AI company was training on copyrighted work, their argument went from being, well we can't say what we trained on because it's proprietary, to of course we trained on copyrighted work, we had to, and it's legal. And it's legal because our use of it is transformative and therefore is protected by the fair use tenet of copyright law. Section 107 of the Copyright Act reads,
Starting point is 00:05:26 transformative uses are more likely to be considered fair. Transformative uses are those that add something new with a further purpose or different character and do not substitute for the original use of the work. That's what they argued and that's what the judge ultimately decided. What he decided in this case was the scraping of these three authors' books was considered fair use under copyright law. But there is a huge caveat here where he decided that the way that Anthropic went about acquiring the books in the first place was piracy. Okay, so the judge essentially hands down a split decision saying that, yes, this is
Starting point is 00:06:13 fair use to use these authors' work this way, but also it wasn't totally fair how you got this stuff because it was pirated. So I don't know, what does that mean? Does everyone go home unhappy or was this like a huge win for Anthropic? Doesn't feel like a huge win for the authors. Yeah, I mean, I don't think it's a huge win for anyone yet. And I think that the people who are saying this is a slam dunk for Anthropic, which many people in the AI world are saying it's a huge win for Anthropic, which many people in the AI world are saying
Starting point is 00:06:45 it's a huge win for Anthropic, I think they're wrong. And the reason that I think they're wrong is because the judge determined essentially that it was not copyright infringement to train Claude on copyrighted material that was legally obtained. But then they also downloaded books from this website called LibGen, which is a piracy website that has millions of books on it. And then also from a website called Pirate Library Mirror, which is another piracy site that has millions of books on it. And the judge said that obtaining the books in this way
Starting point is 00:07:24 was pretty much like cut and dry copyright infringement. And I think the really important thing to note is that every major AI company has trained on copyrighted works that they obtained in a similar fashion. We have done reporting at 404 Media where entire YouTube channels were scraped, you know, Netflix, like the entirety of Netflix was scraped. And so the specifics about how these companies obtained these works is potentially going to be really important. And a lot of that scraping has already been done. A lot of that piracy has already been done.
Starting point is 00:08:03 These companies are literally some of the richest companies on earth, are affiliated with some of the richest people on earth. Did they really just steal all these books? Could they not have just gone to Amazon and bought like some books or is that just too much work for them? Well, so the super interesting thing about this lawsuit and something that like really like, I was like, holy shit, like, how did they do this? Why did this happen? Is in the beginning, Anthropic pirated all
Starting point is 00:08:31 these books. They downloaded huge amounts of torrents, they scraped these piracy websites, and they did that specifically because they didn't want to slow down. Like there's an email that is part of this lawsuit where the CEO, Daria Amadei says, you know, we don't want to get into, he calls them legal slash practice slash business slog. And so they were basically like, let's do all of this. Let's pirate all the books. Let's put it into our model, and then let's go buy copies of a lot of other books. And so what Anthropic did was they had a whole team of people who was dedicated to buying used books from used bookstores that were going out of business, from eBay, from these
Starting point is 00:09:17 online marketplaces, and they bought a huge, huge number of books, like physical books. They tore the covers off of them and they had this like giant scanning operation where they would scan the books and then create a digital copy of the books and then fed that into their model. And the judge said that all of those books that were bought from used bookstores, no problem.
Starting point is 00:09:41 And I think that goes to show that these AI companies are grabbing data from wherever they can find it. It's a huge arms race to see who can get the most data from the most number of places. So they're doing the low-hanging fruit, which is downloading massive data sets. Yeah. But then they're scouring the planet, looking for bookst stores that are going
Starting point is 00:10:06 out of business. Like I've heard of AI companies looking for like huge physical archives of like VHS movies and things like that and then digitizing those. And so really they're just trying to find data wherever they can. And it seems like when they're able to get it legally by purchasing a copy, they're willing to do so but they're also willing to take it for free when they can. Did we learn anything from this lawsuit that might implicate those other ones? Yeah. I mean, I think that the piracy aspect of this is really important. And we've seen in the past, like, if you are a 13-year-old kid
Starting point is 00:10:47 who's pirating Metallica songs on Napster, like, you can be liable for hundreds of thousands of dollars worth of damage. Rock to Never Never Land Lars will find you. For just, like, a few songs. And, like, in this case, you have seven million books. And so, um... Like, it will be very interesting to see whether a judge
Starting point is 00:11:07 levies like a huge financial penalty here or whether it's more of a slap on the wrist. And I tend to think it will probably be more of a slap on the wrist because all of Silicon Valley, all of America's largest companies sort of have a huge amount of investment riding on the widespread adoption of AI. And AI is now a huge part of the American economy. It's become part of like geopolitics as well, where you have the Trump administration and really the Biden administration was saying the same thing, saying that the United States can't fall behind China in the quest to innovate in AI and to have like widespread AI adoption. I'll be very curious to see whether there are like actual like
Starting point is 00:11:57 serious punishments for these companies that have scraped all this data or whether they you know wiggle out of it with a slap on the wrist or get out of it with a series of settlements or what have you. But I tend to think that there's probably no stopping this industry from a legal perspective. I think that it feels too big to fail to me at this point. 404media.co is where you can find and support Jason Kebler's work instead of, you know, just stealing it. AI companies aren't just stealing everyone's intellectual property. They're also kind of killing the internet as we know it
Starting point is 00:12:45 right before our eyes. We're gonna talk about that when we're back on Today Explained. ["Today Explained Theme"] Hey, this is Peter Kafka, the host of Channels, a show about media and tech and what happens when they collide. And this may be hard to remember, but not very long ago, magazines were a really big
Starting point is 00:13:11 deal. And the most important magazines were owned by Conde Nast, the glitzy publishing empire that's the focus of a new book by New York Times reporter Michael Grinbaum. The way Conde Nast elevated its editors, the way they paid for their mortgages so they could live in beautiful homes, there was a logic to it, which was that Conde Nast itself became seen as this kind of enchanted land. You can hear the rest of our chat on channels wherever you listen to your favorite media podcast. Today Explained is back with John Herman now.
Starting point is 00:13:49 He's a tech columnist at New York Magazine. John, in the first half of the show we're talking about how this anthropic case and judgment may or may not change the extent to which these big AI models can scrape the internet. But I want to talk to you about how all this scraping has already in some ways broken the internet as we know it and how we use it. You wrote about how AI has broken maybe like, you know, the front page of the internet for a lot of people.
Starting point is 00:14:21 Google.com. Tell us how. Google could not be closer to the center of this recent AI boom. On one hand, they are a company that has really deep roots in that space. They published the foundational research for what then became generative AI as we know it.
Starting point is 00:14:38 They've put it in all their products. If you use any Google thing, you are seeing chatbots everywhere. Take notes with Gemini. Summarize this file. Summarize a folder. Refine this document. Find inspiration easily. Fresh ideas.
Starting point is 00:14:50 Elevate your writing. Get clear, constructive. Improve sentence flow, word choice. They are all in on AI. Google Search in particular has AI overviews at the top. There's a new AI search mode that works like a chat bot instead of a search engine. Google making a rare change to its homepage, the most visited website in the world, pushing its AI mode tool directly into the hands of its billions of users.
Starting point is 00:15:14 With this latest move, it is changing what billions of people see when they open their browser, still the on-ramp for the entire internet. Meet AI mode. Ask detailed questions for better responses. AI on Google search can provide information. While that was all happening, AI was also sort of accelerating this feeling of decline in the Google product, which over the years through this back-and-forth battle between the company and search engine optimizers and companies trying to get an edge on Google and this sort of long running dynamic had become a little spammy, a little overloaded
Starting point is 00:15:50 with ads. Have you noticed that Google sucks lately? I'm talking about their search. It sucks. Why is it so hard to find anything on Google search? Google search is terrible. It's bought and it sold five or six links up top all paid for. It's just garbage, pure unadulterated garbage.
Starting point is 00:16:05 But I think a lot of people would agree that using Google in say 2023 was a kind of a degraded experience compared to 10 years prior. It was kind of cluttered. There was more just junk in it. There were more ads all over the interface, but also the stuff you were getting in search was a lot of low quality, cheaply made aggregated content, stuff that was taken from somewhere else in an effort to sell a product or just serve up some ads. The arrival of generative AI tools, which enable the creation of basically infinite passable content almost for free really accelerated that issue. So on one side, you have the big ecosystem that Google guides people to that is in a sort of collapse because of this massive shock of new AI-generated content.
Starting point is 00:17:00 On the other side, you have Google the the product, becoming more and more AI centric. And in the middle, you have kind of a complicated story. And honestly, for search users and regular people, kind of a strange experience. Do they have a plan to make money off of this? Obviously, they want to make money. Has anyone asked what their long-term plan is? So there are obvious risks to throwing away this like cluttered but lucrative product and replacing it with a totally clean chat bot or whatever. That's not what they're doing. They are incorporating AI answers into the main search page, which they say people like quite a bit.
Starting point is 00:17:38 So this last quarter has been really good for them. It also arrived in the context of lots of really strong data suggesting that the way people use Google Search now with these AI tools means that they don't really leave it anymore. They don't really click out and go to anything. An AI overview might summarize three articles, archival resource, some expert opinions, but the number of people that actually then click
Starting point is 00:18:07 through to those opinions or to those articles is minuscule. So Google's relationship to the web around it is pretty dramatically different. If Google's eating up the rest of the internet, if Gemini is eating up the rest of the internet right now. And companies like ours, let's say, are no longer meeting their traffic goals, are no longer getting any traffic from Google at all. Does Gemini have nothing to eat? You know what I mean? Because everything dies? Who's going to be feeding Gemini all the right answers in like 10 years?
Starting point is 00:18:45 We're sort of like glorifying the web a bit in this conversation. No matter how great and incredible it is as this as this big resource, it really doesn't go that deep. And the idea that it is now being sort of like trawled and overfished and just sort of consumed like a resource by these AI companies really does, I think, raise the specter of collapse. I do think that they could find that their products are being made worse by this dynamic and by their relationship with the web. I do think that's a real problem. And you can see this in some of the deals that these companies make with publishers, including our parent company,
Starting point is 00:19:24 which has a deal with OpenAI, for example. Remind people out there, or me, why companies like ours make deals with companies like ChatGPT. The context is every media company is struggling for visitors. Even before the Google traffic really started to collapse it was sort of unstable. And so in addition to like a weak advertising market, every media company is looking for any sort of additional source of revenue. And if you're a media executive, OpenAI showing up and saying, here is this many millions
Starting point is 00:20:01 of dollars for this many years. It looks like free money. Of course, if you're like producing the content or if you're even just thinking longer term about how a media company or a website fits into this AI picture, you recognize that you're sort of, you know, giving access away to something that these companies are explicitly trying to automate.
Starting point is 00:20:25 You know, you're sort of like, in an institutional sense, training your replacement. You're listening to AI Explains today. But it is a deal made not quite under duress, but something close to that. For people who miss that old version of the internet, who miss going to Google, typing in a query, getting a bunch of results, clicking on a few of them, getting answers that felt credible, where do they go for that experience now? I think there's like a funny polarized answer to this.
Starting point is 00:20:59 I just did a story on Reddit, which is having a huge moment right now. It's been around for 20 years. It's growing hugely. Part of it is just a response to social media fatigue, the sense that other communities on the web don't really exist anymore, that everything else on the web is too commercial and whatever. Also, a huge part of that growth is just traffic from Google. They're having the fastest growth
Starting point is 00:21:25 they've had in almost their entire existence because Google is just shoveling so many people into Reddit because everything else is not really working. So you have that. You have a community of communities. You have something that feels kind of like it's of the old web. It seems like eventually we're going
Starting point is 00:21:43 to get to the point where it's like you either want to talk to one of these large language models, or you just go back to calling up your friend. Ah, I don't even know where it gets. You just walk into the street and yell, does anyone know of a good barber? Yeah, I mean, it's like a real mutual suspicion about who's using AI is really pervasive, especially online, but
Starting point is 00:22:05 also in person. But yeah, I do think that the way that the AI training paradigm and some of the stuff that you were talking about with Anthropic, but also just the way that Google incorporates all this stuff, it really does kind of break the deal with the whole idea of the public web. Like, all right, we'll all just do this stuff in public. We'll talk to each other. People will build all these businesses around this to sort of connect everything and it'll
Starting point is 00:22:34 all sort of work together and whatever. When you have like these massive sort of predatory companies just consuming all of that, harvesting all of that and saying, all right, we are no longer part of this arrangement, we are doing something else. More people are on Discord, more people are in group chats, more people are either just purely consuming on social networks and not posting or just talking privately with their friends. And I do think that this fits quite well with that trend and probably accelerates it. John Herman, you can read and subscribe
Starting point is 00:23:12 to New York Magazine at nymag.com. Gabrielle Burbe produced, Amina Alsari edited, Rebecca Ibarra fact-checked, Patrick Boyd and Andrea Christensdottir mixed. And by the way, Vox's Future Perfect is funded in part by the BEMC Foundation, whose major funder was also an early investor in Anthropic, and none of them have any editorial input into the stuff we make here at Vox. Speaking of stuff, we hope you enjoyed the 17 hundredth episode.
Starting point is 00:23:46 If you did, you can say something nice about us most anywhere you listen. And if you didn't, well, there's always episode 1701 tomorrow. I'm goingffy. Maybe you've seen me on TikTok or TV or interviewing celebrities on the red carpet. But before all that, I was just another girl running late to her desk job, transferring calls, ordering printer ink. I don't miss that. But I do miss not working at work, gossiping with my co-workers about celebrities. What's the latest with Bieber? Where's Britney? And which Jonas brother is which?
Starting point is 00:24:48 That's what I want my new podcast to feel like. Like you and I are work besties. We'll chat about celebrities we're obsessed with. How could you be registered to vote and not know who Jennifer Aniston is? Look up their star charts. Sagittarius and the Capricorn, they do clash and have so much fun avoiding real work together. I'm having a silly goose of a time.
Starting point is 00:25:11 Teffy runs, Teffy laughs, Teffy over shares. Teffy explains, but most of all, Teffy talks. From me, the cut and box media podcast, this is Teffy Talks. Let's go.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.