The AI Daily Brief: Artificial Intelligence News and Analysis - How The AI Backlash Killed This Literary Startup

Episode Date: August 14, 2023

Prosecraft is the first casualty of the AI backlash. Was it justified, or simply inevitable? Before that on the Brief: Amazon announces generative AI summaries of reviews as well as previews their chi...p production facility, and OpenAI is *not* going bankrupt. Today's Sponsor Netsuite | The leading business management software | Get no interest and no payments for 6 months https://netsuite.com/breakdown ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI.  Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI breakdown, we're discussing how the dust up around AI killed one startup. Before that on the brief, Amazon's latest AI investments, Anthropics scores some more dough, and a very silly story about OpenAI going bankrupt. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information about our YouTube, our Discord, and our newsletter. Welcome back to the AI Breakdown Brief, all the AI headline news you need in around five minutes. We kick off today with a set of stories surrounding Amazon and their latest moves in the AI race. This morning, the company announced that they would be bringing generative AI to the customer review experience.
Starting point is 00:00:43 Now, credit to Amazon for trying to turn something that is completely commonplace and normal for consumers into a feature worthy of note. Their announcement blog post reads, Customer reviews are one of the oldest and most important features on Amazon. When we first launched reviews in 1995, the idea was radical. People scratched their heads when we said we wanted to give customers the opportunity. to voice their honest opinions on products, the good, the bad, and everything in between. While the idea wasn't universally embraced, it was embraced by our customers. The post then goes through all the updates the company has made over the years as reviews
Starting point is 00:01:15 like those on Amazon have become commonplace across the web. They point to features like the ability to include a review title, photos, and videos, or leaving a number of stars instead of a review as an alternative. Ultimately, they say, we obsess over helping customers feel confident in their purchase decisions. And that brings them to their new AI features. The simple core of the idea is that rather than having to read through dozens and dozens or even hundreds of reviews, generative AI can help summarize the key themes from across the reviews, giving the potential buyer of a product a high-level view of in general what other customers think about it.
Starting point is 00:01:48 In addition to just a high-level summary, the new feature also allows users to single-click drill down on a type of attribute that they're interested in. For example, they say, A customer looking to understand whether a product is easy to use can easily surface reviews mentioning ease of use by tapping on that product attribute under the review highlight. Initially, these review tools will be rolled out to a part of the U.S. audience, specifically those using mobile devices, and on what Amazon calls a broad selection of products. During this testing phase, the company is hoping to fine-tune its AI models
Starting point is 00:02:17 to improve how they work and figure out what changes need to be made before they get rolled out more generally across all products and all users. Now, this is hardly the sort of earth-shaking, revolutionary announcement that we sometimes have here on the AI breakdown brief. But on the flip side, what it represents is a very, very clear and frankly sort of obvious use case where generative AI does seem very likely to be able to significantly improve a core consumer experience on the web. For those trying to understand how much of what people are saying and thinking and doing with AI right now is just hype versus
Starting point is 00:02:47 real, inevitably the longer it goes, the more we're going to have these types of stories, which, if not necessarily as exciting as some of the most hyped announcements, still actually push the field in general more towards utility. Now, of course, this isn't without challenge. Amazon has to block hundreds of millions of fake reviews every single year. But ultimately, that's a big game of whackamol. And just as with any other type of generative AI, garbage in, garbage out. In fact, with these types of tools, some are worried that fake reviews may be even harder to distinguish, which would ultimately make these summaries a lot less helpful. As a first step, Amazon is saying that it will only summarize reviews from verified purchases, but it seems likely that this is going to be an ongoing
Starting point is 00:03:25 battle as the company tries to make this feature really the best that it can be. One more bit of news from Amazon. Last week, we discussed a number of times the larger battle surrounding advanced AI chips. Specifically, we looked at all the different competitors for Nvidia, the dominant force in that field, including efforts from the big tech companies themselves to build their own custom chip solutions. Amazon is certainly in on this particular fight. AWS CEO Adam Solipsky told CNBC in an interview in June, quote, the entire world would like more chips for doing generative AI, whether that's GPUs or whether that's Amazon's own chips that we're designing. I think, that we're in a better position than anybody else on Earth to supply the capacity that our customers
Starting point is 00:04:03 collectively are going to want. A new CNBC piece discusses Amazon's effort in this field. And now AWS has more than 20 million chips in use. In 2015, they bought an Israeli chip startup, Anapurna Labs. In 2018, Amazon launched its ARM-based server chip, Gravichron. That same year, they also launched their first AI-focused chips, which was, of course, two years after Google first announced its tensor processor unit. Now, interestingly, this piece wasn't really based on some new news, but on CNBC getting what they called a behind-the-scenes tour of Amazon's chip lab in Austin, Texas. What I take from that is one, Amazon is definitely trying to use its position in the cloud computing space, in which it holds 40% of global market share in 2022, as a lever to get people
Starting point is 00:04:46 using whatever AI solutions they come up with, be it chips or something else. And second, the fact that they invited CNBC to their lab in the first place, suggests that this is a battle that they are increasingly seeing as one that needs to be fought. In other words, they've been building this stuff for quite some time, but now they're doing active PR around it. Of course, given how much generative AI came up on every earnings call over the last few weeks, it could just be that every Fortune 500 company is basically forced right now to make sure that they are clearly articulating what their strategy is, and that pressure is just heightened for a big tech giant like Amazon. Moving over into the world of startups in the AI space, Anthropic, the creator of course of Claude,
Starting point is 00:05:24 has taken on a new nine-figure $100 million investment from Korean telco giant SK Telecom. Now, Anthropic has a number of different points of differentiation. They were first to the field with a 100K context window, and even more than that, they've tried to position themselves as having a different approach to AI safety. Earlier this year, they announced their constitutional AI model, which was an alternative to different approaches to alignment like RLHF, or reinforcement learning from human feedback, instead trying to train the AI on a foundational set or constitutional set of principles, from which it could infer and make decisions about how to engage with particular areas of ethical or legal complexity based on those underlying ideals.
Starting point is 00:06:02 Now, SK Telecom had participated in Anthropics' $450 million raise just a few months ago, and it appears that this new money is part and parcel of a larger partnership, through which the company's plan to, as TechCrunch puts it, co-develop a multilingual large language model customized for global telco firms. Quote, the LLM, which SKT and Anthropic will jointly develop, will allow four Global Telco AI Alliance members, including Deutsche Telecom, E&, and SingTel to offer AI developments customized to their users in each market. The LLM would support English, Korean, German, Japanese, Arabic, and Spanish languages. Another one that I wanted to flag briefly is the announcement by PlayHT of their new PlayHT2.0 model. Now, I have used a couple of different models of voice cloning on this show. The first one that I
Starting point is 00:06:46 used was PlayHT's first model, and then more recently I've been using 11 Labs professional version. Well, PlayHT just released their 2.0 conversation text of voice AI model, and it looks really promising. One of the things that's most interesting is that they allow you to control it based on a certain emotion you're trying to convey. This seems like a super cool feature that is going to be highly useful for expanding the number of use cases that this sort of voice cloning tool can be used for, and so I wanted to highlight it even though it's likely that I do a broader comparison at some show in the future. Lastly, today, one story that I want to flag just because it's a story. gotten so much traction on Twitter, even though it's incredibly stupid, is that some random global
Starting point is 00:07:25 publication published a piece warning that OpenAI could go out of business by 2024, given how much it was spending per day of operations. It was from Analytics India Magazine, and frankly, wasn't even really positioned as reporting as much as a thought piece. Everyone from AI influencers looking for a new narrative to AI haters who revel in any idea that there is a bubble here that is starting to burst, published it breathlessly to Twitter in every version that you can imagine. This is, to be clear, a preposterous story. Not because it's not true that it's extraordinarily expensive to run chat GPT, but because there is effectively an endless bucket of money available to Open AI in perpetuity, as long as people continue to use ChatGBTBT, BT in anything
Starting point is 00:08:06 resembling the way that they have so far. Stability AI's in Ma'an Mostock put it really crisply when he said, what is this rubbish? Open AI lost twice that a day last year, and this year raised $10 billion from Microsoft, enough to maintain that burn for 37 years. This is cheap R&D relative to impact, way more bang for Buck than Web 3, Metaverse, or whatever. I bring it up not because I think that a ton of people are taking the OpenAI may go bankrupt story seriously, but just as a reminder of how important it is to be skeptical with any story you see out there right now. Like I said, whether it's someone who's just looking for any excuse to tear AI down or someone who's all in on AI because it juices their engagement,
Starting point is 00:08:45 keeping a critical eye is incredibly important. That's going to do it for today's AI breakdown brief. or watching as always, and I'll be back soon with the main AI breakdown. Before we get to the main episode, I want to tell you about today's sponsor, NetSuite. Now, I know from interacting with you guys that so many of you are executives, managers, business leaders, entrepreneurs, and all of you are basically trying to figure out how technology is changing the world and how it can change your business. On that journey, I think NetSuite can be a really valuable partner for you. NetSuite gives you the visibility and control you need to make better decisions faster. It's the software superpower behind so many of the
Starting point is 00:09:25 world's most successful businesses. And for the first time in NetSuite's 25 years as the number one cloud financial system, you can defer payments of a full NetSuite implementation for six months. That's no payment and no interest for six months, and you can take advantage of that special financing offer today. NetSuite is number one because they give your business everything you need in real time all in one place to reduce manual processes, boost efficiency, build for and increase productivity across every department. If you are listening to the AI breakdown, you have a keen sense of just how important data is to any modern business.
Starting point is 00:10:00 Having all of your information in a single place can be the difference between making the right decisions and the wrong ones. I think it's great that they've created this offer to make their service more accessible to any business that needs it. If you've been sizing NetSuite up to make the switch, then you know that this deal, no interest, no payments is unprecedented. Take advantage of this special financing offer at NetSuite. dot com slash breakdown. That's netsuite.com slash breakdown to get the visibility and control you need to
Starting point is 00:10:26 weather any storm. One more time, net suite.com slash breakdown. Thanks to NetSuite for supporting the show, and now let's get on to the main episode. Welcome back to the AI breakdown. Today, we are talking about an AI backlash that killed a startup and why it's likely not the only casualty in the wars to come. Now, over the last couple days, we've had a lot of context to talk about AI backlash. Last week, OpenAI shared more information about something called GPTBot. GPTBot is OpenAI's web crawler that goes around collecting data from the open web and scraping it to be used in the training of future AI models. Think GPT5 and beyond.
Starting point is 00:11:03 Now importantly, as part of the announcement, OpenAI also shared how to disallow the GPTBot. In other words, how to make it so that GPTBot is not able to access and scrape the data from one's website. In the first couple days after the announcement, sites, as Ars Technica put it, scramble to block the web crawler. Ars Technica writes, while wildly successful from a tech point of view, chat GPT has also been controversial by how it scraped copyrighted data without permission, and concentrated that value into a commercial product that circumvents a typical online publication model. OpenAI has been accused of and sued for plagiarism along these lines. In the hours
Starting point is 00:11:38 following the release of the instructions on how to block GPT bot, publications including the Verge, substack writer Casey Newton, and others had said that they would immediately block GPT bot. At the same time, ours points out that it's not necessarily a totally cut and dry decision. They write, For large website operators, the choice to block large language model crawlers isn't as easy as it may seem. Making some LLMs blind to certain website data will leave gaps of knowledge that could serve some sites very well, but it may also hurt others. For example, blocking content from future AI models could decrease as sites or brands cultural footprint if AI chatbots become a primary user interface for the future. That, however, has not stopped the biggest
Starting point is 00:12:15 publication so far from announcing that it would block GPTBot and indeed AI models in general. As first reported by Adweek, the New York Times has updated its terms of service to prohibit content that includes text, photographs, images, audio, and video clips, quote-unquote, look-and-feel, metadata, or compilations from being used in any AI training. The terms specify that automated tools like the GPTBot crawler cannot be used to access its content. However, as the Verge pointed out, the NYT doesn't yet seem to have appeared to make any changes to the Robotsdat text file. that would be the way that it actually actively blocked GPT bot itself.
Starting point is 00:12:49 Now, another big backlash story that came up last week came when an author discovered that AI-generated books were being sold under her name on Amazon. When one of her fans discovered fake titles under her name, author Jane Friedman, who writes about working in the writing and publishing industry, discovered that there were a number of titles, similar to the topics that she normally writes about, and printed under her name but which were definitely not her words. Friedman said, when I started looking at these books, looking at the opening pages, looking at the bio, it was just obvious to me that it had been mostly, if not entirely, AI generated. I have so much content available online for free because I've been blogging forever, so it wouldn't be hard to get an AI to mimic me. Now, initially, Friedman reported that Amazon were not particularly helpful and didn't immediately take the titles down. However, after her consternation and story went viral, the books disappeared, and Amazon said that the book sold under her name, were prohibited, and the type of imitation that it represented were prohibited by their terms. A spokesperson said, we have clear content guidelines governing which books can
Starting point is 00:13:49 be listed for sale and promptly investigate any book when a concern is raised. Now, as this story was picking up, another story about AI and authors was also starting to get traction. Novelist Hari Kunsru tweeted on August 7th, this company, Prosecraft appears to have stolen a lot of books, trained in AI, and are now offering a service based on the data. The site in question Proscraft I.O. had been created in 2017, and was actually just a side project. It came out of founder Benjie Smith's tendency to count words and phrases in books that he liked and try to determine things like how many adverbs there were. The Prosecraft site took that to a new level, analyzing lots and lots of different works compared to one another to create scores for things like vividness. However, authors like
Starting point is 00:14:33 Hari never gave their permission for their books to be used in that way. And so when they discovered Prosecraft, there was an almighty uproar. Numerous author started posting about how people could check to see if their works had been included in Prosecraft. Benjamin BLM tweeted, oh, hi, didn't see you there. Are you upset about Prosecraft? Are you tired of AI companies taking everything you've ever made and put online? Even books you haven't put online and feeding it to a meat grinder? Sign the Authors Guild open letter. Eventually, the uproar reached such a fever pitch that Benjie Smith took down the website entirely. He wrote, Today I'm taking down the Proscraft I.O. website, which had been previously dedicated to the
Starting point is 00:15:09 linguistic analysis of literature, including more than 25,000 books by thousands of different authors. Benji continues, I originally started working on this project more than 10 years ago when I began writing a memoir about a difficult time in my life. It was my first book, and I didn't know how many words I should write. I had heard that real books should be about 100,000 words. I searched the internet for guidance, but I didn't find much. So I pulled a few paperbacks off my own shelves, books by authors I admired, and counted by hand how many words were on the first few pages. Then I counted the total number of pages and multiplied the two numbers to get an estimate. I kept a little spreadsheet and it was precious to me. Precious guidance from authors whose books I
Starting point is 00:15:42 adored when I was struggling to tell my own story. Benji basically then says that led him to create his own company to make tools for authors and started trying to expand the works that he was analyzing. Benji said, I researched copyright laws, mindful of not wanting to hurt or offend the community of authors that I cared so much about. Since I was only publishing summary statistics and small snippets from the text of those books, I believed I was honoring the spirit of the fair use doctrine, which doesn't require the consent of the original author. Since I never share the text that I acquired by crawling the internet, I believe that I was in compliance with the relevant laws. And now this is the part where it gets really interesting. And Benji starts to
Starting point is 00:16:15 look a little bit less like a big tech villain that he's been painted out to be, and a little bit more like a guy whose passion project got caught up in larger tides of history. He writes, I launched the Prosecraft website in the summer of 2017, and I started showing it off to authors at writers' conferences. The response was universally positive. I've spent thousands of hours working on this project, cleaning up and annotating text, organizing and tweaking things. A small handful of authors have even reached out to me asking to have their books added to the website. I was grateful for their enthusiasm. But in the meantime, AI became a thing. And the arrival of AI on the scene had been tainted by early use cases that allow anyone to create zero effort
Starting point is 00:16:50 impersonations of artists, cutting those creators out of their own creative process. That's not something I ever wanted to participate in. Today, the community of authors has spoken out, and I'm listening. I care about you and I hear your objections. Your feelings are legitimate, and I hope you'll accept my sincerest apologies. In the future, I would love to rebuild this library with the consent of authors and publishers. I truly believe these tools are useful for creative people. But now is not the right time, I understand, and I'm sorry. And so to be clear, Prozcraft was not a A16Z back startup or something like that, nor was the Associated Company Shakespeare, the desktop app that plugged into it. This is a two-person side project. And I think that's relevant to
Starting point is 00:17:28 keep in mind as you read the visceral responses from authors once it was shut down. Ewan Morrison writes, A victory for authors. A website that used AI to analyze thousands of novels without authors' consent has been shut down by protesting authors. Warning to big tech, no, you cannot use our books to grow your AI. Don't even try it. Cabino Iglesias writes,
Starting point is 00:17:48 Prozcraft is no more. Why? Because writers got pissed. Because we won't take shit and let our work be fed into machines. The fight is just starting, but we're in this for the long haul. Now, of course, while as tempting and emotionally fulfilling as it might be to view this as the Davids of the authors fighting the Goliaths of Big Tech, Kate Nibs at Wired kind of has the right of it, when she writes, why the great AI backlash came for a tiny startup you've probably never
Starting point is 00:18:11 heard of. Jane Friedman, the author who we were just discussing about, whose works were copied on Amazon, called the piece the most thoughtful coverage she's seen yet, and says experts in copyright and fair use are consulted and quoted thankfully. One of the things that the wired piece points out is that it was hardly as cut and dry as some of these authors made it seem. Quote, publishing industry analyst Thad McIlroy doesn't approve of data scraping, but he sees the backlash against Prosecraft as majorly misguided. His term, shrieking hysteria, and some copyright experts have watched the fur with their jaws near the ground. While the argument against piracy is simple to follow, they are skeptical that Prosecraft could have been taken to court successfully. Matthew
Starting point is 00:18:49 Sag, a law professor at Emmer University, thinks Smith could have mounted a successful defense of his project by invoking fair use. SAG, along with several other experts I spoke with, pointed to Google Books and Hathie trust cases as precedent, to examples of the court's ruling in favor of projects that uploaded snippets of books online without obtaining the copyright holder's permission, determining that they constituted fair use. SAG said, I think that the reasons that people are upset really don't have anything to do with this poor guy.
Starting point is 00:19:14 I think it has to do with everything else going on. Techdirts Mike Maznick wrote something similar. He wrote a post, The Fear of AI, just killed a very useful tool. He points to one of the tweets that went viral around this from Zach Rosenberg who wrote, How dare you, Benji? I demand you take my book off your site immediately. I do not consent to this and never did. And I know my publisher never would. Maznik writes, I'm still perplexed at what the complaint is here. You don't need to consent to someone putting up statistics about their analysis of your book. But Zach's tweet went viral with a bunch of folks ready to blow up at anything that smacks of tech bro AI and
Starting point is 00:19:47 lots of authors started yelling at Smith. Masnick then points out, The Gizmodo article has a ridiculously wrong fair use analysis, saying fair use does not by any stretch of the imagination allow you to use it author's entire copyrighted work without permission as part of a data trading program that feeds into your own AI algorithm. Except dot dot dot, it almost certainly does. Again, we've gone through this with the Google Book scanning case, and the court said that you absolutely can do that because it's transformative. Mazdick continues, it seems that what really tripped people up here was the AI part of it, and the fear that this was just another VC-funded tech bro exercise of building something to get rich by using the works of creatives. Except none of that is accurate. Maznik concludes, I find all of this really unfortunate.
Starting point is 00:20:25 Smith built something really cool, really amazing, that does not in any way infringe on anyone's right. I get the knee-jerk reaction from some authors who feared that this was some obnoxious project, but couldn't they have taken 10 minutes to look at the details of what it was they were killing? I know we live in an outrage era where the immediate reaction is to turn the outrage meter up to 11. I'm certainly guilty of that at times myself, but this whole incident is just sad.
Starting point is 00:20:46 It was an overreaction from the start, destroying what had been a clear labor of love and a useful project through misleading and misguided attacks from authors. Now, there are also a number of other folks who sort of had this similar analysis, getting into the details of fair use, but ultimately it probably doesn't matter so much. Ultimately, what this is a story about is not really fair use and whether Prosecraft was in the right, but instead about a much larger cultural and economic battle that is not just brewing, but frankly here and out in the open right now. I think interestingly, even the appeal to precedent, in cases like Google Books is not going to satisfy people who view AI as a fundamental threat
Starting point is 00:21:23 to what they do and what they make their living from. The concern is completely understandable. The fear of change is for many very legitimate, but ultimately, these technologies are here and they're not just going to be screamed away. It seems to me that a far more effective approach ultimately would be to engage with some semblance of openness and nuance. Not because one has to compromise their core principles, but because frankly the technology is a genie that's not being put back in a bottle, that inevitably the legal system will deal with nuance even if people don't want it to, and that having a positive hand in shaping how all that plays out is probably going to be better ultimately than being locked in a perpetual battle. But then again, it's very easy for me to say this
Starting point is 00:22:05 from the sidelines. All I know ultimately is that we're going to have a lot more battles like this. So being able to categorize and understand to rank order them even in terms of their importance feels like it will be a very essential skill. And it's hard for me to see how killing Prosecraft specifically, whose creator is, A, just a guy, and B, so clearly sympathetic to the movement that aligned against him, is really a big win. Anyways, guys, that is going to do it for today's AI breakdown. This is an important and contentious one, so let me know what you think in the comments. Come join us on the AI Breakdown Discord, which you can find at bit.ly slash AI breakdown.
Starting point is 00:22:39 And until next time, Peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.