The Journal. - How a $1.5 Billion Settlement Could Alter the Course of AI

Starting point is 00:00:00 The boom in AI over the last few years has also led to a boom in lawsuits. As artificial intelligent chat bots are growing more and more popular, a number of authors are now suing the companies behind them. Two major studios have sued an AI startup claiming it as, quote, blatantly copied famous movie characters. Sarah Silverman is going into battle against the chatbot's George R.R. Martin, and more than a dozen other authors, now suing. The New York Times becoming the first major media company to sue over AI. So we've got some movie studios suing.

Starting point is 00:00:44 We've got book publishers and authors. We've got newspaper publishers, including two subsidiaries of News Corp, our parent company. There's a lot of litigation right now over how AI companies are kind of coming into the material that they're using to train their large language models and then also what they're doing with that material once it is in their system. Companies like OpenAI have argued that training AI models amounts to fair use. Last year, a group of authors filed one of those lawsuits against the AI company Anthropic. They alleged that Anthropic, the AI startup,

Starting point is 00:01:30 infringed their copyrights for a number of books because of the way Anthropic uploaded and then used the material for training and other purposes. Last week, there was a big development in the case. Artificial Intelligence Company Anthropic has agreed to pay $1.5 billion to settle a class action lawsuit. It's big. If it's approved, it will still help influence how AI companies think about taking content, how publishers or other media companies might want to try to strike

Starting point is 00:02:07 deals with AI companies, and it starts to put a bit of a price tag on what that material is worth to the AI company. Welcome to The Journal, our show about money, business, and power. I'm Ryan Knutzen. It's Wednesday, September 10th. Coming up on the show, Anthropics proposed settlement and what it means for the future of AI. This episode is brought to you by Nespresso Professional. Designed to create meaningful moments of connection at work,

Starting point is 00:02:55 Nespresso Professional delivers exceptional coffee every time. Variety at your fingertips, you are sure to satisfy every taste. Discover the right solution for your workplace today. Unforgettable coffee, meaningful connections. Nespresso professional. Visit nespresso.com to learn more. All right. Well, so first of all, can you introduce yourself?

Starting point is 00:03:25 My name is Melissa Korn, and I'm an editor on the tech and media team. And are you a human or a robot? I am a human. Are you sure? Fairly certain. I can tell you look like a human. It's natural intelligence, not artificial. Okay, so we'll now introduce us to Anthropic, which makes the Claude Chatbot, whose intelligence is artificial.

Starting point is 00:03:47 Anthropic is a very fast-growing AI company. I mean, its valuation has tripled in just the last six or seven months. It is now valued at $183 billion. It is one of the kind of white-hot AI companies. There's quite a few of them these days. But it is really one of the giants in this space. And how does Anthropic train Claude? So Claude, we're acting as if Claude is a guy sitting in the room with us, right?

Starting point is 00:04:21 So Anthropic and all the other AI companies feed tons and tons of content into these computers. to train them, to make them learn, to make Claude learn how to put a sentence together, what's happened in history, how to do math, all of that. It's sort of like the underlying principle, right, is just sort of the same way human learns. Would you just expose it to enough material and then it'll identify the patterns and then it can start kind of, I guess, arguably, thinking for itself? Right.

Starting point is 00:04:52 And it's not quite thinking for itself yet, but it is learning how to assess information that it's given. So some of these computer models, unlike you and me, it ingests everything really, really fast, right? It takes me a few seconds to read a sentence. Cloud can read an entire book extremely quickly, so it can ingest millions of books very quickly and then get to training and get to learning. Are you saying you haven't read millions of books? I mean, I have a pretty good library at home, but I don't think it's quite millions now. And I certainly haven't done it in a matter of, you know, minutes or hours. Oh, God, that would be so cool, wouldn't it?

Starting point is 00:05:29 Just to be, like, to read every book that it came out? I don't know. I kind of enjoy the process. I relish the experience. I don't want it to go too fast. Okay, human. Last year, three humans. The author's Andrea Barts, Charles Graber, and Kirk Wallace Johnson filed a lawsuit

Starting point is 00:05:48 against Anthropic, arguing the company improperly used their copyrighted works to train his AI models. They said that they think this should be a class action lawsuit because they are representing a number of other authors. So they filed the suit last summer. Things started to move through the court system. Obviously, that's a slow, laborious process. It's not millions of books at second speed. It's a judicial process speed.

Starting point is 00:06:13 Correct, correct. So according to the lawsuit, how did Anthropic allegedly acquire the books that it used to train its models? So there were a few different ways that Anthropic got the books. The lawsuit says that Anthropic stole a lot. So they use these kind of shadow libraries, places like LibGen and there's a few others. Almost if you remember Napster back in our teen days. Limewire. Lime wire, exactly, right? Like it just people had uploaded these things and they downloaded them. That simple. They also bought books. and had some other access to books that are, you know, out of copyright. So they got the books through a few different means, and in some cases got the same book a few different ways. But they did not go out and purchase a copy of every single book

Starting point is 00:07:09 that they used to train Claude. And that's according to the lawsuit. So in the court case, what did Anthropic argue, and what did the authors argue? So Anthropic asked for this summary judgment on the fair use argument. Meaning they wanted the case dismissed. Exactly.

Starting point is 00:07:26 And just help me understand what is the fair use argument they're making here? If you wrote a book and I copied your book word for word and published it as my own, that would be problematic.

Starting point is 00:07:37 If you wrote a book and I wrote a book report about it, an analysis of it, or I used it to inspire another book that maybe has some similar characters but otherwise quite different. Or I wrote a song

Starting point is 00:07:51 about your book. That's fair use because I'm not just copying what you already did and have protected under the law. You're reading my book and then you are creating some new work out of it as transforming it into something else. Exactly. So that word transformative is important here. That's generally the bar. That it needs to be transformative or it needs to be, have been accessed in a fair way that there is, you know, compensation for the artist. And so Anthropic is arguing, yeah, we're reading the book, but we're also reading millions of books. It's just making our AI really smart. We're not just, like, stealing the book and then distributing it to other people to read.

Starting point is 00:08:27 Exactly. The authors, meanwhile, argue that Anthropics stole their books and that they deserve to be compensated. And earlier this summer, the judge reached an important decision. So in June, the judge issued summary judgment for Anthropic on the fair use argument, saying it's okay for AI companies or for Anthropics specifically here to use this content in this way, it's fair use, that part of the argument, the authors are not going to win.

Starting point is 00:09:00 And that seems like kind of a big deal here because if a judge does issue a summary judgment, that is a court precedent where a judge is saying, yes, reading all these books, using it to train your model, that's an okay thing to do. That is fair use. Right, exactly. So this was seen as a big win for Anthropic, for AI companies in general. But came with a bit of an asterisk with the judge saying, yes, the way you've used this stuff is fair use, but the way you got some of it is not okay. And that asterisk is a very expensive asterisk.

Starting point is 00:09:38 That's next. Wendy's most important deal of the day has a fresh lineup. Pick any two breakfast items for $4. New four-piece French toast sticks, bacon or sausage wrap, biscuit or English muffin sandwiches, small hot coffee, and more. Limited time only at participating Wendy's taxes extra. The judge's ruling came with some good news for Anthropic. He said the company's fair use claim was valid. It could use copyrighted books to train its AI models.

Starting point is 00:10:17 But the judge also gave Anthropics. some bad news. What he did say was Anthropic had no entitlement to use pirated copies for its library. So how they got the books became the thing at issue. In other words, it's okay to use the books,

Starting point is 00:10:34 it's not okay to steal the books. Exactly. Violating copyright law can carry a penalty of anywhere between $750 and $150,000 per book. And how many books are we talking here? The initial number was that there were 7 million works that were reused by Anthropic. The numbers, the dollar signs could start to add up very, very quickly if this class action lawsuit went forward and a jury found in favor of this class of authors against Anthropic.

Starting point is 00:11:07 It could be existential. This brings us to the proposed settlement that was announced last week. So the headline number of the proposed settlement is one. $1.5 billion, which is how much Anthropic would pay to the class to compensate them for the stolen materials, for the pirated materials. As part of the settlement, the number of impacted books got whittled down from over $7 million to around $500,000. And you get to that $500,000 from that initial $7 million because some works were ultimately purchased. There's some duplication, a lot of duplication in the initial list of what was downloaded. Some books were not actually registered

Starting point is 00:11:53 with the Copyright Office, the way one would assume they are right away. So we get to about roughly 500,000. We do not know what books, what works are included in that list yet. We do not know exactly how the payouts will come, right? Authors, publishers, who gets what portion of what money. How much money do we think will go to each author? So it's roughly $3,000 per work. But again, we don't know if that's to the author, a portion to the author, a portion to a publisher, how that gets worked out. And one and a half billion dollars, is that a lot of money to Anthropic? Anthropics valuation is $183 billion. So it's, you know, not going to bankrupt the company, which is, you know, part of why they wanted to settle.

Starting point is 00:12:38 But it's not nothing, but it is not going to debilitate them. While the authors and anthropic have agreed to the terms of the settlement, there's still one more hurdle. The judge has to approve it, and he's expressed some skepticism. So over the weekend, the judge issued an order saying, we're going to have a hearing to discuss the proposed settlement, but it was very, like, I'm not angry, I'm disappointed vibes from the letter. I'm not mad at you. I'm mad at your behavior. Pretty much. And it used the word disappointed, right?

Starting point is 00:13:16 The judge said, like, there's still a lot of unanswered questions here. And I am not totally convinced that you have enough of it worked out yet for me to approve the settlement. I see. So not necessarily it should be more money or less money, just more like there's not really enough detail here yet. Right. So he says, we don't know what the list of works is yet. We don't know what the list of authors is yet. How do you make sure everyone knows that they're eligible?

Starting point is 00:13:43 for a potential payout. How are you going to do the payouts? So they held a hearing earlier this week in San Francisco in court, and the judge was pretty harsh in laying out, especially to the plaintiff's lawyers. I don't think you have it all there yet. It's not fully baked. And he also said that he was concerned that there would be some more lawsuits,

Starting point is 00:14:11 like people coming out of the woodwork, after this, right, saying, wait, I wasn't included. I should have been included. I wasn't notified fairly. I should have been notified. The goal of a settlement is to end litigation, not launch a whole bunch of new stuff. He set a schedule of deadlines over the next few weeks. You know, you have to have a list of the works that are involved by X state. You have to have the class list by X date. They're going to convene again in a few weeks in early October, and he will presumably then approve or reject the proposed settlement. So what might this mean for all those other cases that are playing out in court right now

Starting point is 00:14:50 between creators and AI companies? It will have an impact. Again, it's not legal precedent, so it won't immediately dismiss everyone else's cases or anything like that. But the other cases are still moving forward in a lot of instances, including there was a new case filed last week, right? The day that Anthropic and the authors announced their proposed settlement, there was another lawsuit filed by some authors against Apple

Starting point is 00:15:17 about how they ingested books for their training models for AI. So there's still a lot of litigation action coming down the pike and not just in books, right? We're talking movies, we're talking songs, we're talking all sorts of creative content. It feels like as a society and as a legal system, We're feeling our way through the dark to try to figure out and find an answer to this really big question. Yes, and I think we're starting to get a few answers, right, that the AI companies can use this content.

Starting point is 00:15:53 It is fair use, but there's still a lot of unknowns about what a compensation model should look like or could look like and how that's going to differ by medium, how that's going to differ by AI company, and, And so you're seeing this in litigation, you're also seeing this in discussions between AI companies and media companies about what can they do on the front end so that they don't end up in litigation on the back end. Has anybody asked Claude, Anthropics AI machine? What does it think the answer should be for how authors and creators should be compensated? I have not, but I have a feeling I know what you're going to be doing later this morning. Certainly. Hey, Claude, uh, mind if I ask you a question? Of course, go ahead and ask me anything. How much do you think authors should be compensated for having their work used to train AI models?

Starting point is 00:16:52 That's a really complex question with valid arguments on multiple sides. Authors deserve fair compensation when their creative work contributes value to AI systems. But the mechanics of how that would work are challenging to figure out. What's your take on it? I'm asking you. I think authors should receive some form of compensation, especially when their work directly contributes to commercial AI systems. The challenge is creating a fair system that recognizes their contribution

Starting point is 00:17:22 without making AI development prohibitively expensive. Okay. Thanks, Claude. You're welcome. Before we go, I want to make sure you heard that the journal is hosting our first ever live show in New York City next month. We'll be at the Gramercy Theater on Tuesday, October 7th. There are literally only a handful of tickets left. We are almost sold out.

Starting point is 00:17:45 So if you want to be there, head to bit.ly slash the journal Live 25 for tickets and more information. You can find the link in our show notes. We'd love to see you there. That's all for today. Wednesday, September 10th. A quick note, the Wall Street Journal's parent company, News Corp, has a content. deal with Open AI. Two of News Corp's subsidiary

Starting point is 00:18:10 have also sued perplexity. The Journal is a co-production of Spotify and the Wall Street Journal. Additional reporting in this episode by Jeffrey Tractonberg and Angel Al Young. Thanks for listening. See you tomorrow.

The Journal. - How a $1.5 Billion Settlement Could Alter the Course of AI

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.