The Daily - The Writers’ Revolt Against A.I. Companies

Episode Date: July 18, 2023

To refine their popular technology, new artificial intelligence platforms like Chat-GPT are gobbling up the work of authors, poets, comedians and actors — without their consent.Sheera Frenkel, a tec...hnology correspondent for The Times, explains why a rebellion is brewing.Guest: Sheera Frenkel, a technology correspondent for The New York Times.Background reading: Fed up with A.I. companies consuming online content without consent, fan fiction writers, actors, social media companies and news organizations are among those rebelling.The comedian and actress Sarah Silverman has joined two lawsuits accusing the companies of training A.I. models using her writing without permission.For more information on today’s episode, visit nytimes.com/thedaily. Transcripts of each episode will be made available by the next workday. 

Transcript
Discussion (0)
Starting point is 00:00:01 From The New York Times, I'm Michael Barbaro. This is The Daily. Today, to refine their popular technology, new artificial intelligence platforms like ChatGPT are gobbling up the work of authors, poets, comedians, and actors without their consent. As my colleague, Shira Frankel, found, a rebellion is brewing. It's Tuesday, July 18th. Shira, it is really nice to have you back. It has been far too long.
Starting point is 00:00:46 I agree. It's great to be back here. We are turning to you in our ongoing and very diligent efforts to understand this new era in artificial intelligence and the debate that is raging over sites like ChatGPT, which have put artificial intelligence really at everyone's fingertips. And correct me if I'm wrong, but it really feels like this is shaping up as a clash between those who are really excited about the capabilities of sites like ChatGPT. You know, what can it do next? This is so interesting. And this huge group of people who are like just freaked out about it, right? And it's going too far. It's too scary. And we've done a lot of episodes about this, right? A recent one looked at students who love chat GPT
Starting point is 00:01:28 because it can do their homework for them. And their teachers and professors were like, wait a minute, you're basically cheating. And you have been reporting on the latest chapter of this clash between human and machine. So tell us about that. Well, what I've been focused on is really just all the battles
Starting point is 00:01:44 over what goes into these AI machines. And what makes them powerful, what makes them able to sort of imitate human voice is all of the content that we've put online over all these years. It's the poems and the blogs and the photographs and the illustrations that are then copied and scraped and fed into these AI machines. It's what teaches them to imitate human behavior. And in the past six months, as this software has become really powerful and very popular, more and more people have started asking questions about whether they want their content fed into AI machines. And if they don't want it there, if there's, you know, really anything they can do about it.
Starting point is 00:02:30 So tell us who exactly these people are who are asking these questions. They kind of fall into two groups. There are the people who are doing this because it's their livelihood. They publish books or articles. They have a commercial interest in the protection of their work. And then there's the other group, the hobbyists. They're the people who are writing stories for the fun of it, that are just creating art because they're passionate about something. And they're putting stuff out into the ether of the internet. They love it. They want to share it with the world. This is just a true sort of moment of human creativity.
Starting point is 00:02:59 But, you know, both of these groups are kind of seeing Chachi Pichi. They're seeing these AI companies valued at hundreds of millions of dollars. And they're realizing that their creativity is making someone else a lot of money. And they're feeling exploited. They're feeling like their creativity, their moment of inspiration is being used. So let's start with the first group of creatives, the professionals who actually make their money from this kind of work. Right. These are people like actors, animators, writers, people who make their livelihood by uploading what they're doing to the Internet. And so they've been really alarmed when a chat GPT comes along and can produce art in their style or can write a paragraph in their style.
Starting point is 00:03:45 So I go to the hotel, super fancy. You know, take for instance Sarah Silverman. And I go to check in. Oh, and the lady at the front desk recognized me and she was like, oh my God, I love you. You are in my top four all-time favorite comedians. She is a comedian, an actor, a writer who has honed an incredibly distinctive style over decades of working. And I was like, you know that I know that means I'm fourth, right? I'm not walking away from this like, ooh, maybe I'm second, you know?
Starting point is 00:04:18 And, you know, if you go to chat GPT and say, tell me a joke in the style of Sarah Silverman, which is something I tried out, it really gets her spot on. And she argues that not only has it read her jokes and read her, you know, other sort of online comedy, but that it's even read this book that she wrote, Bedwetter, which has been uploaded online and exists in online versions. Bedwetter, which has been uploaded online and exists in online versions. She appears pretty convinced that it's essentially learned who she is and what her comedic style is. It can mimic her to the degree that you think it is her writing. And I'm guessing ChatGPT likely did that without asking her permission. Right. There is no permissions being asked here because they're essentially, imagine these like giant machines that are crawling the internet at all times. And any data they come across, they collect, they scrape it. They don't know what they're doing. They're converting it into numbers. It's basically become math in these systems. And so anything online is constantly being hovered up
Starting point is 00:05:19 by these machines and fed into AI systems. And so for a creative professional like Sarah Silverman, that feels incredibly threatening in terms of their livelihood, their ability to make a living off of being an artist with an independent voice that's been honed over decades. If a machine can do that, why would you need to pay Sarah Silverman to come and write your script or pen a comedy special? But how can you be so sure that ChatGPT is really stealing from you if you're someone like Sarah Silverman,
Starting point is 00:05:51 or if you're anybody, really, whose work has been uploaded by ChatGPT? Because my guess is that this gets a little bit sticky, right? Like, how much of it is just guesswork, and how much of it is really being derived from essentially borrowed or stolen material? Definitely. I mean, well, to begin with, there are these repositories where you can go and check and see if your work has been scraped. A lot of artists do this. A lot of writers do it. And they'll see there that, you know, your name will appear, your piece of art will appear. And so for someone like Sarah Silverman, she can go and see, okay, right, my book has been scraped. But, you know, more specifically, it's really just playing with this and testing it out for yourself and toying with it.
Starting point is 00:06:30 You know, I was curious as I was reporting the story. So I went online and I typed in my name and I said, can you write a paragraph about the danger of online extremism in the voice of Shira Frankel? Which is the subject you cover, of course, over and over and over again for The Times. Yes, exactly. And I've written books. I've been writing about it for over a decade. It's out there. My material is out there. And when I tell you that even after knowing about AI for a year, even after covering this topic for a year, I was creeped out at how close it got my voice. I could have easily written this sentence. Okay, explain that. I mean, I'm not on ChatGPT right now, so I can't replicate the search, but what happened when you said, you know,
Starting point is 00:07:10 write a paragraph about the subject I know best, Shira Frankel, and it spit out the results? What about it felt so distinctively Shira Frankel-esque? So I will read it to you. So it writes this one opening sentence, which is, online platforms, once hailed as bastions of free expression, have become breeding grounds for hate, radicalization, and the propagation of dangerous ideologies. That expression, the bastions of free expression, I've used that. I googled it. I used that in an article less than a year ago, and again in an article three years ago. I didn't even realize that that was a phrase or a turn of phrase that I often used until ChatGPT repeated it back to me. Right. It's kind of a distinctive set of words you're saying that you turned to and it just borrowed from you in replicating your work.
Starting point is 00:07:54 Well, it knew my brain better than I did. I didn't realize that that was phrasing I frequently used. And I had to go into the New York Times archives to figure out, oh, yeah, it's right. I do use that. And oh, my God, I probably use it too often because this machine has learned it about me. Right. And suffice it to say, you were not, like Sarah Silverman, consulted about your work being scraped by ChatGPT. Nope.
Starting point is 00:08:16 We were never consulted. And no one at the New York Times was consulted. Got it. So Sarah Silverman did not like that experience. You described it as a little bit eerie. I'm wondering if it's a touch flattering to have ChatGPT borrow your stuff, or if you worry about the long-term economic consequences of it all, which is to say that someday ChatGPT might be able to replicate your journalism so brilliantly that maybe the Times doesn't need you on the beat anymore. you on the beat anymore. Yeah. You know, I've actually spent so much time thinking about this. And there's a part of me that was thinking, oh, God, it'd kind of be nice when I was done with reporting to plug in my notes to a machine and have it. There are days where it would be nice to have a machine write my article for me. But no, no, I thought about it more. And then I was
Starting point is 00:09:00 like, yeah, it can imitate what I've already done. But the whole point of news is that what we're bringing you is fresh and based on new reporting. So the conclusions we're drawing for readers are constantly changing. And AI can't do that. It can only repeat and regurgitate what's already been given to it, what's already in the system. And so whatever answer it gives you might be what Scherfrenkel thought about something two years ago or five years ago. But it won't be what the newest idea is or the freshest reporting has brought readers. Right. It might just keep telling the world that you think something's a bastion of free expression when you think it's a bastion of not free expression.
Starting point is 00:09:34 Exactly. If it's become a bastion of hate speech and extremism. Okay. So what can creatives do about essentially this theft, right? What can the Shira Frankles and the Sarah Silvermans of the world do? And what are they doing about this problem? So the creatives with the copyright and the resources to do it can file lawsuits. And that's exactly what we're seeing happening. There's been nearly a dozen lawsuits that have been filed against AI companies by everybody from book publishers to individuals who have copyright protections. And one of them was Sarah Silverman, who got together with another two authors to sue several AI companies, including Meta, which is the parent company of Facebook, and OpenAI, which is the parent company of ChatGPT, to say that their work was illegally
Starting point is 00:10:26 scraped and downloaded and uploaded into these AI systems. And Shira, what do legal experts think are the chances that this kind of a lawsuit from a Sarah Silverman will prevail against a company like ChatGPT? Well, you know, this is all brand new. It's brand new case law, but they know they have to establish some kind of law or precedent going forward because this is material with a copyright. For a lot of artists, books they wrote 10 years ago or 15 years ago, that's going to continue to make them money throughout their lives. And yes, they're evolving constantly as artists. They want to think about their material going forward. They don't want chat GPT to write their
Starting point is 00:11:03 jokes for them going forward. But they also want to be paid for the books that have already been published and are already out there. And so is the feeling that old school copyright law will provide someone like Sarah Silverman with the legal protections that she's seeking? Well, the legal experts seem incredibly hopeful that there is some kind of copyright protection here, and that, you know, essentially some kind of financial damages will be awarded and that some kind of monetary value will be placed on these very strong copyright protections that creative professionals get. The problem is that even if they get that money, even if, you know, Meta or OpenAI are forced to pay some kind of damages to Sarah Silverman, the content, the data, it can never be retrieved. It's out there. It's become numbers and ones and zeros fed into a
Starting point is 00:11:51 machine. There is no way to go into it, into that machine and get that data back out. Once it's in, it's in forever. And so ultimately, even if they do get that financial reward through the court system and they get, you know, the copyright protection to their material affirmed by the courts. Their data is gone. Their data is there forever. In other words, the horse is out of the stable. I assume that's true, Shira, for us, for The Times, for the work of people like you.
Starting point is 00:12:17 We can't retrieve your journalism back from a chat GPT. So is The Times, like Sarah Silverman, thinking of suing these AI companies? So the New York Times, to the best of our knowledge, is not looking at a lawsuit. What we've seen the New York Times and other news publishers do is start to think about how to start charging for this data going forward. I mean, the New York Times is creating tons of content every single day that these machines want to stay up to date. And so they're really trying to figure out if there's some kind of financial arrangement that can be put into place where these AI companies pay us.
Starting point is 00:12:54 And it's not just news publishers. Websites like Reddit, they're looking at licensing their data as well. They're saying this data is inherently valuable and we want you to pay us for it. But look, that's going to take a long time for them to come to an agreement. This is a brand new model. This is a brand new technology. And in the meantime, the data is still being scraped. In the meantime, these systems have all the articles that have already been published. They probably are being updated on new articles as they come out. And so all this material is still being fed into these machines as these talks are ongoing. Right. And will be, it sounds like, for some time. So
Starting point is 00:13:33 you're saying these professionals who have the resources to fight back are not really mounting that forceful a pushback despite the anger that you are describing here. Right. I mean, they are using the courts. They are using the legal system. But as we know, that's slow. You know, waiting for the courts to take action, waiting for lawyers to hammer out a financial agreement between a massive news organization and an AI company is a slow process. And what's interesting to me is that this other group of people I mentioned,
Starting point is 00:14:06 the hobbyists, the enthusiasts, the people that are posting to the internet for the fun of it, they're the ones that are leading the most creative type of rebellion. It's almost like a revolt against AI. We'll be right back. So, Shira, tell us more about this second group of creators who are fighting back against AI platforms. These hobbyists and amateurs, as you describe them, who aren't doing this work for a living. So, you know, actually a group I spent a lot of time interviewing and thinking about was fan fiction writers. And, you know, for anyone who's not familiar, because before I reported this story, I never actually read any fanfic.
Starting point is 00:15:00 Fanfic. Now you're an authority. I'm now deep in the fandoms. These are people who watch a movie like Star Wars and love it, but walk away from it thinking, what would happen if the main characters at the end, Kylo Ren and Rey, didn't die? What if they fell in love and got married? And I'm really sorry if I'm having movie spoilers in my answers here. What would happen if in Buffy the Vampire Slayer, two of the vampires met in high school and had a gay romance? I mean, they take these popular movies and TV shows, and they let their imaginations roam. And they publish these incredible, I mean, book-length pieces of content about their favorite movies and TV shows and books.
Starting point is 00:15:42 And what's interesting and perhaps kind of ironic is that they're using other people's art to create this. They're borrowing from ideas and characters and story arcs that are already out there, but they're doing it for the love of the characters and the storylines that they want to explore in their own writing. And how did writers of fan fiction discover that their work was being sucked up by platforms like ChatGPT? So, you know, they actually discovered this in a really, really funny way, which is that in fan fiction, you create your own characters.
Starting point is 00:16:13 And some of them have names like Bucky, which is a combination of several characters from Buffy the Vampire Slayer. Right. And so they go to ChatGPT and they type in, write me a story about Bucky falling in love with a vampire. Write me a story about Bucky on a summer day eating a popsicle. And ChatGPT knew who that character was with great specificity. as the fan fiction writers were concerned is that there's this sexual trope called the Omegaverse, which fan fiction writers really like to explore. And it's very, very specific to fan fiction.
Starting point is 00:16:50 And Chachi PT knew all about it. You're not going to tell us any more about it? I'm very confused about it myself, and I'm pretty new to fan fiction. From what I've read, there's like tentacles involved, and I don't think it's appropriate for the podcast. Sure, sure, sure. But the point is, is there's no reason an AI machine would know about it unless it had read and ingested their fan fiction writing.
Starting point is 00:17:14 Right. Why would ChatGPT know anything about this unless it was scraping fan fiction? But sure, aren't these fan fiction writers posting their stuff out there in the world for free, right? Out in public for everyone to access, including, it turns out, these AI platforms. They are. I mean, they're not necessarily expecting payment, but they feel really, you know, wounded. They're affronted at the idea that these machines have scraped what is, for them, very much a labor of love. You know, some of the fan fiction writers I spoke to had spent decades doing this. They had done it as a form of therapy. They had done it as
Starting point is 00:17:50 an act of love towards the movies and television shows that they felt creatively inspired by. And they felt like these machines are essentially attacking the very spirit of human creativity that they had been prolific about online. spirit of human creativity that they had been prolific about online. Right. This is not necessarily their full-time job. This is not how they make money. But you're saying it taps into, in some ways, kind of who they are. It's part of their identity. And so, on a moral level, they can't tolerate chat GPT just like stealing it. Right. Many of them have day jobs where they make money. But their love, their passion is going into this fan fiction. And they've established these really interesting
Starting point is 00:18:28 internal rules on these forums where they post where if one person imitates another or borrows from another without attribution, it's considered outrageous. They are booted from the community. They are excommunicated. They really have a lot of honor in how they operate online, and they want that to be respected by these machines. And so in lieu of that, they've had to get really, really creative about how they rebel or how they revolt against these AI systems. And how creative have they gotten? What are they doing? Right. So, I mean, for instance, one of the first sort of protests they launched is they got together and started feeding just absolute nonsense into ChatGPT. They figured if you're going to scrape our material, we're going to give you total irreverent nonsense to confuse you so you don't understand our characters, you don't understand our storyline, and you cannot mimic what we do.
Starting point is 00:19:20 So they can't sue, but they can gum up the works. They can basically just like shove sticks and stones into this machine and try to like grind out the gears. Exactly. I mean, they had to think creatively. And so they thought, if you want our material, here's our material. We're going to give you nonsense. And does that tactic work? I mean, it can if they do enough of it and they seem pretty committed and passionate about doing this.
Starting point is 00:19:44 But, you know, they haven't stopped there. A lot of them have also started making their content private or removing it from the web entirely. And so their thinking is, you know, until now they've shared all this freely. But if the machines are going to come in and scrape what they do, they're going to start locking it down. Okay. Both of these techniques, though, would seem to be pretty counterproductive if you're in the business of creating fan fiction. The first one just creates a bunch of gibberish, which no one wants to read if you like fan fiction. The second one would limit who can view the fan fiction you write. So aren't these rebellions against places like ChatGPT just hurting the fan fiction community? Aren't they just kind of shooting themselves in the foot? to you, but you're just hurting the fanfiction community? Are they just kind of shooting themselves in the foot? I mean, to a certain degree.
Starting point is 00:20:31 And a lot of them say they kind of feel like they are in the short term doing something that goes against the entire spirit of fanfiction, but they feel helpless. And so I think at this point, they're just kind of throwing darts at the wall to see what sticks and what works. A lot of them have been in these communities for a long time. And so their thinking is like, well, I might only get to share my writing with a couple dozen people instead of a couple thousand. But at least the machines won't get me. So it doesn't really feel like either of the two groups here that we have been talking about here, the professionals who do this for a living and hobbyists who are doing this for fun.
Starting point is 00:20:59 Neither of them seem to be much of a match for these AI platforms and stopping them from scraping their work. So that makes me think that if you're one of these writers, the better solution for you would be for the U.S. government to step in with some simple regulation that says, chat GPT, for example, you can't upload this work unless you pay for it. And I know the government is having these conversations. There have been lots of congressional hearings about it, speeches by lawmakers. So is that a possibility? The government saying no uploading unless you pay the writers and the creators? Well, it's true that the U.S. government is having talks with all these AI companies. But we have to remember that we're still waiting for the U.S. government to take action and come up with some kind of regulation about social media that's been around for over a
Starting point is 00:21:49 decade. The U.S. government is very, very slow acting, and the vast majority of members of Congress are still wrapping their head around how this technology even works. And so whatever they do is years ahead. And it's going to be tempered by the fact that they don't want to hold back these companies too much. Explain that. The U.S. government sees itself in an arms race at the moment against China when it comes to AI. Both China and the United States have a lot of scientists that are invested in this.
Starting point is 00:22:19 They have a lot of interest in being the world leaders in artificial intelligence. in being the world leaders in artificial intelligence. And so they know that every bit of regulation they put in place potentially holds back those U.S. companies, as opposed to China, where there's very little regulation on data and where there's a ton of data online that the Chinese government can easily access
Starting point is 00:22:37 and even give to Chinese AI companies if they want to speed ahead in what's considered the sort of AI arms race between the U.S. and China. So the U.S. government might have an interest in actually siding with the AI platforms over the creators because it makes us more competitive against our rivals. Yeah. I mean, they don't want to hamper U.S. AI companies to the point where they fall behind China. In which case, it feels like the only way for creative types and for publishing platforms
Starting point is 00:23:07 like The Times to fight back is to fight back really on their own for the next however many years. And I'm curious, Shira, if in your reporting, you think the complaints, you know, of the Sarah Silvermans and these publishing platforms is actually going to make the general public sympathetic and lead to a larger scale pushback against these chat GPT-like platforms, or if the reality is just that people like these things, they're excited about them, and that's going to override any of these worries that we're talking about. I mean, look, right now, the people that are angry are the people that can see that their work has been copied or scraped and regurgitated. It's people who are already seeing that the machines have ingested their work and
Starting point is 00:23:54 can copy their voices in a really realistic way. And we don't know what's going to happen going forward. I mean, companies like Google and Facebook are still deciding on how they're going to train their AI. And what happens if Facebook's AI decides to train on your data and can find posts that you wrote 10 years ago when you were in college and sound just like you? Or if Google decides to read your email and your Google Docs and it can say, hey, this is what Michael sounds like when he's planning a vacation with his family. You know, is the creepiness factor then that much more that all of us feel like our souls are being replicated by machines? Right. And does that mean that suddenly we are all the fan fiction writer? We are all Sarah Silverman.
Starting point is 00:24:38 We are all suddenly seeing these platforms slowly sucking a version of us out and in and giving it to the world in a way that is very weird. Right. Does it become a Black Mirror episode? It's unclear. But, you know, it could be that these AI systems are so extremely useful and beneficial for our lives that none of us care. Because at one point, people were really mad about Facebook sucking up their data and serving them ads and Google doing the same. And then we ultimately decided that they provided such a useful service that it was okay with us that they sucked up our data. And we're really just in the beginning of this technology. And so we don't know yet. You know, one of the fanfic writers I spoke to actually put it in a really lovely way
Starting point is 00:25:22 in that she was in the middle of writing this new piece of fiction. And it happened to be about AI robots versus humans. And she stopped midway through writing it because she didn't want to post it online and feed more into the machine. But she said that where she's stuck and the thought she's really stuck on is that in this piece of fiction she was writing, not every AI robot was bad. Some of them were helpful, some of them were nice, some of them were good, and some of them were evil. And it was really about how the corporations behind those robots used them
Starting point is 00:25:52 that decided whether they were good or evil. And so she felt like she just had so many questions about the companies running these AI and how they're using the data and how they're going to license it and what value is going to be, all these questions are swirling around in our head. And it's like, are the robots good or bad?
Starting point is 00:26:08 We don't know yet. And so we don't feel comfortable with them. And so did she end up posting any of the story online? No, she has not posted it yet. I think she still has too many questions. Well, Shira, thank you very much. We appreciate it. Thank you for having me.
Starting point is 00:26:39 We'll be right back. Here's what else you need to know today. On Monday, Russia said it would end an agreement that had allowed Ukraine to export millions of tons of grain to the rest of the world, threatening global food prices and the food supply in dozens of countries that rely on the grain. The year-old agreement, known as the Black Sea Grain Initiative, was a successful attempt to limit the global repercussions of Russia's war on Ukraine. But Russia has repeatedly complained that the agreement favored Ukraine over its own people. I deeply regret the decision by the Russian Federation
Starting point is 00:27:26 to terminate the implementation of the Black Sea Initiative. During a news conference, the head of the United Nations said that Russia's decision would cause unnecessary suffering across the world. Today's decision by the Russian Federation will strike a blow to people in need everywhere. And smoke from wildfires in Canada is returning to the U.S. this week. By Monday afternoon, it was affecting about 72 million Americans across 29 states, from the Dakotas to New York. Nearly 900 wildfires are burning across Canada. Of those, the Canadian government says that more than 500 of them are burning out of its control. Today's episode was produced by Claire Tennesketter, Rob Zipko, and Muj Zaydi. It was edited by Devin Taylor, with help from Lisa Chow.
Starting point is 00:28:29 Contains original music by Alisha Ba'itu, and was engineered by Alyssa Moxley. Our theme music is by Jim Brunberg and Ben Lansford of Wonderly. That's it for The Daily. I'm Michael Bavaro. See you tomorrow.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.