The AI Daily Brief: Artificial Intelligence News and Analysis - How The AI Backlash Killed This Literary Startup
Episode Date: August 14, 2023Prosecraft is the first casualty of the AI backlash. Was it justified, or simply inevitable? Before that on the Brief: Amazon announces generative AI summaries of reviews as well as previews their chi...p production facility, and OpenAI is *not* going bankrupt. Today's Sponsor Netsuite | The leading business management software | Get no interest and no payments for 6 months https://netsuite.com/breakdown ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, we're discussing how the dust up around AI killed one startup.
Before that on the brief, Amazon's latest AI investments, Anthropics scores some more dough,
and a very silly story about OpenAI going bankrupt.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown.network for more information about our YouTube, our Discord, and our newsletter.
Welcome back to the AI Breakdown Brief, all the AI headline news you need in around five minutes.
We kick off today with a set of stories surrounding Amazon and their latest moves in the AI race.
This morning, the company announced that they would be bringing generative AI to the customer review experience.
Now, credit to Amazon for trying to turn something that is completely commonplace and normal for consumers into a feature worthy of note.
Their announcement blog post reads,
Customer reviews are one of the oldest and most important features on Amazon.
When we first launched reviews in 1995, the idea was radical.
People scratched their heads when we said we wanted to give customers the opportunity.
to voice their honest opinions on products, the good, the bad, and everything in between.
While the idea wasn't universally embraced, it was embraced by our customers.
The post then goes through all the updates the company has made over the years as reviews
like those on Amazon have become commonplace across the web.
They point to features like the ability to include a review title, photos, and videos,
or leaving a number of stars instead of a review as an alternative.
Ultimately, they say, we obsess over helping customers feel confident in their purchase decisions.
And that brings them to their new AI features.
The simple core of the idea is that rather than having to read through dozens and dozens or even hundreds of reviews,
generative AI can help summarize the key themes from across the reviews,
giving the potential buyer of a product a high-level view of in general what other customers think about it.
In addition to just a high-level summary, the new feature also allows users to single-click drill down on a type of attribute that they're interested in.
For example, they say,
A customer looking to understand whether a product is easy to use can easily surface reviews mentioning ease of use
by tapping on that product attribute under the review highlight.
Initially, these review tools will be rolled out to a part of the U.S. audience,
specifically those using mobile devices,
and on what Amazon calls a broad selection of products.
During this testing phase, the company is hoping to fine-tune its AI models
to improve how they work and figure out what changes need to be made
before they get rolled out more generally across all products and all users.
Now, this is hardly the sort of earth-shaking, revolutionary announcement
that we sometimes have here on the AI breakdown brief.
But on the flip side, what it represents is a very,
very clear and frankly sort of obvious use case where generative AI does seem very likely to be
able to significantly improve a core consumer experience on the web. For those trying to understand
how much of what people are saying and thinking and doing with AI right now is just hype versus
real, inevitably the longer it goes, the more we're going to have these types of stories,
which, if not necessarily as exciting as some of the most hyped announcements, still actually
push the field in general more towards utility. Now, of course, this isn't without challenge. Amazon has to
block hundreds of millions of fake reviews every single year. But ultimately, that's a big game of
whackamol. And just as with any other type of generative AI, garbage in, garbage out. In fact, with
these types of tools, some are worried that fake reviews may be even harder to distinguish, which would
ultimately make these summaries a lot less helpful. As a first step, Amazon is saying that it will only
summarize reviews from verified purchases, but it seems likely that this is going to be an ongoing
battle as the company tries to make this feature really the best that it can be. One more bit of news
from Amazon. Last week, we discussed a number of times the larger battle surrounding advanced
AI chips. Specifically, we looked at all the different competitors for Nvidia, the dominant
force in that field, including efforts from the big tech companies themselves to build their own
custom chip solutions. Amazon is certainly in on this particular fight. AWS CEO Adam Solipsky
told CNBC in an interview in June, quote, the entire world would like more chips for doing generative
AI, whether that's GPUs or whether that's Amazon's own chips that we're designing. I think,
that we're in a better position than anybody else on Earth to supply the capacity that our customers
collectively are going to want. A new CNBC piece discusses Amazon's effort in this field. And now
AWS has more than 20 million chips in use. In 2015, they bought an Israeli chip startup, Anapurna Labs.
In 2018, Amazon launched its ARM-based server chip, Gravichron. That same year, they also launched their
first AI-focused chips, which was, of course, two years after Google first announced its
tensor processor unit. Now, interestingly, this piece wasn't really based on some new news,
but on CNBC getting what they called a behind-the-scenes tour of Amazon's chip lab in Austin, Texas.
What I take from that is one, Amazon is definitely trying to use its position in the cloud
computing space, in which it holds 40% of global market share in 2022, as a lever to get people
using whatever AI solutions they come up with, be it chips or something else. And second,
the fact that they invited CNBC to their lab in the first place, suggests that this is a battle
that they are increasingly seeing as one that needs to be fought. In other words, they've been building
this stuff for quite some time, but now they're doing active PR around it. Of course, given how much
generative AI came up on every earnings call over the last few weeks, it could just be that
every Fortune 500 company is basically forced right now to make sure that they are clearly articulating
what their strategy is, and that pressure is just heightened for a big tech giant like Amazon.
Moving over into the world of startups in the AI space, Anthropic, the creator of course of Claude,
has taken on a new nine-figure $100 million investment from Korean telco giant SK Telecom.
Now, Anthropic has a number of different points of differentiation.
They were first to the field with a 100K context window, and even more than that, they've
tried to position themselves as having a different approach to AI safety.
Earlier this year, they announced their constitutional AI model, which was an alternative
to different approaches to alignment like RLHF, or reinforcement learning from human feedback,
instead trying to train the AI on a foundational set or constitutional set of principles,
from which it could infer and make decisions about how to engage with particular areas of ethical or legal complexity based on those underlying ideals.
Now, SK Telecom had participated in Anthropics' $450 million raise just a few months ago,
and it appears that this new money is part and parcel of a larger partnership, through which the company's plan to, as TechCrunch puts it,
co-develop a multilingual large language model customized for global telco firms.
Quote, the LLM, which SKT and Anthropic will jointly develop, will allow four Global Telco AI Alliance members,
including Deutsche Telecom, E&, and SingTel to offer AI developments customized to their users in
each market. The LLM would support English, Korean, German, Japanese, Arabic, and Spanish languages.
Another one that I wanted to flag briefly is the announcement by PlayHT of their new PlayHT2.0 model.
Now, I have used a couple of different models of voice cloning on this show. The first one that I
used was PlayHT's first model, and then more recently I've been using 11 Labs professional version.
Well, PlayHT just released their 2.0 conversation text of voice AI model, and it looks really promising.
One of the things that's most interesting is that they allow you to control it based on a certain
emotion you're trying to convey. This seems like a super cool feature that is going to be highly
useful for expanding the number of use cases that this sort of voice cloning tool can be used for,
and so I wanted to highlight it even though it's likely that I do a broader comparison at some show
in the future. Lastly, today, one story that I want to flag just because it's a story.
gotten so much traction on Twitter, even though it's incredibly stupid, is that some random global
publication published a piece warning that OpenAI could go out of business by 2024, given how
much it was spending per day of operations. It was from Analytics India Magazine, and frankly,
wasn't even really positioned as reporting as much as a thought piece. Everyone from AI influencers
looking for a new narrative to AI haters who revel in any idea that there is a bubble here that
is starting to burst, published it breathlessly to Twitter in every version that you can imagine.
This is, to be clear, a preposterous story. Not because it's not true that it's extraordinarily
expensive to run chat GPT, but because there is effectively an endless bucket of money available
to Open AI in perpetuity, as long as people continue to use ChatGBTBT, BT in anything
resembling the way that they have so far. Stability AI's in Ma'an Mostock put it really
crisply when he said, what is this rubbish? Open AI lost twice that a day last year, and
this year raised $10 billion from Microsoft, enough to maintain that burn for 37 years.
This is cheap R&D relative to impact, way more bang for Buck than Web 3, Metaverse, or whatever.
I bring it up not because I think that a ton of people are taking the OpenAI may go bankrupt
story seriously, but just as a reminder of how important it is to be skeptical with any
story you see out there right now. Like I said, whether it's someone who's just looking for any
excuse to tear AI down or someone who's all in on AI because it juices their engagement,
keeping a critical eye is incredibly important. That's going to do it for today's AI breakdown brief.
or watching as always, and I'll be back soon with the main AI breakdown.
Before we get to the main episode, I want to tell you about today's sponsor, NetSuite.
Now, I know from interacting with you guys that so many of you are executives, managers,
business leaders, entrepreneurs, and all of you are basically trying to figure out how
technology is changing the world and how it can change your business. On that journey,
I think NetSuite can be a really valuable partner for you. NetSuite gives you the visibility and
control you need to make better decisions faster. It's the software superpower behind so many of the
world's most successful businesses. And for the first time in NetSuite's 25 years as the number one
cloud financial system, you can defer payments of a full NetSuite implementation for six months.
That's no payment and no interest for six months, and you can take advantage of that special
financing offer today. NetSuite is number one because they give your business everything you need
in real time all in one place to reduce manual processes, boost efficiency, build for
and increase productivity across every department.
If you are listening to the AI breakdown, you have a keen sense of just how important data is
to any modern business.
Having all of your information in a single place can be the difference between making the
right decisions and the wrong ones.
I think it's great that they've created this offer to make their service more accessible
to any business that needs it.
If you've been sizing NetSuite up to make the switch, then you know that this deal,
no interest, no payments is unprecedented.
Take advantage of this special financing offer at NetSuite.
dot com slash breakdown. That's netsuite.com slash breakdown to get the visibility and control you need to
weather any storm. One more time, net suite.com slash breakdown. Thanks to NetSuite for supporting the show,
and now let's get on to the main episode. Welcome back to the AI breakdown. Today, we are talking about
an AI backlash that killed a startup and why it's likely not the only casualty in the wars to come.
Now, over the last couple days, we've had a lot of context to talk about AI backlash. Last week,
OpenAI shared more information about something called GPTBot.
GPTBot is OpenAI's web crawler that goes around collecting data from the open web
and scraping it to be used in the training of future AI models.
Think GPT5 and beyond.
Now importantly, as part of the announcement, OpenAI also shared how to disallow the GPTBot.
In other words, how to make it so that GPTBot is not able to access and scrape the data
from one's website.
In the first couple days after the announcement, sites, as Ars Technica put it,
scramble to block the web crawler. Ars Technica writes, while wildly successful from a tech point of
view, chat GPT has also been controversial by how it scraped copyrighted data without permission,
and concentrated that value into a commercial product that circumvents a typical online publication
model. OpenAI has been accused of and sued for plagiarism along these lines. In the hours
following the release of the instructions on how to block GPT bot, publications including the Verge,
substack writer Casey Newton, and others had said that they would immediately block GPT bot. At the same time,
ours points out that it's not necessarily a totally cut and dry decision. They write,
For large website operators, the choice to block large language model crawlers isn't as easy
as it may seem. Making some LLMs blind to certain website data will leave gaps of knowledge
that could serve some sites very well, but it may also hurt others. For example, blocking content
from future AI models could decrease as sites or brands cultural footprint if AI chatbots
become a primary user interface for the future. That, however, has not stopped the biggest
publication so far from announcing that it would block GPTBot and indeed AI models in general.
As first reported by Adweek, the New York Times has updated its terms of service to prohibit
content that includes text, photographs, images, audio, and video clips, quote-unquote,
look-and-feel, metadata, or compilations from being used in any AI training.
The terms specify that automated tools like the GPTBot crawler cannot be used to access its content.
However, as the Verge pointed out, the NYT doesn't yet seem to have appeared to make any changes
to the Robotsdat text file.
that would be the way that it actually actively blocked GPT bot itself.
Now, another big backlash story that came up last week came when an author discovered that AI-generated
books were being sold under her name on Amazon.
When one of her fans discovered fake titles under her name, author Jane Friedman, who writes
about working in the writing and publishing industry, discovered that there were a number of titles,
similar to the topics that she normally writes about, and printed under her name but which
were definitely not her words.
Friedman said, when I started looking at these books, looking at the opening pages, looking at the bio, it was just obvious to me that it had been mostly, if not entirely, AI generated. I have so much content available online for free because I've been blogging forever, so it wouldn't be hard to get an AI to mimic me. Now, initially, Friedman reported that Amazon were not particularly helpful and didn't immediately take the titles down. However, after her consternation and story went viral, the books disappeared, and Amazon said that the book sold under her name, were prohibited, and the type of imitation that it represented were
prohibited by their terms. A spokesperson said, we have clear content guidelines governing which books can
be listed for sale and promptly investigate any book when a concern is raised. Now, as this story was
picking up, another story about AI and authors was also starting to get traction. Novelist Hari Kunsru
tweeted on August 7th, this company, Prosecraft appears to have stolen a lot of books, trained in AI,
and are now offering a service based on the data. The site in question Proscraft I.O. had been created in
2017, and was actually just a side project. It came out of founder Benjie Smith's tendency to
count words and phrases in books that he liked and try to determine things like how many adverbs
there were. The Prosecraft site took that to a new level, analyzing lots and lots of different
works compared to one another to create scores for things like vividness. However, authors like
Hari never gave their permission for their books to be used in that way. And so when they
discovered Prosecraft, there was an almighty uproar. Numerous
author started posting about how people could check to see if their works had been included
in Prosecraft. Benjamin BLM tweeted, oh, hi, didn't see you there. Are you upset about Prosecraft? Are you
tired of AI companies taking everything you've ever made and put online? Even books you haven't put
online and feeding it to a meat grinder? Sign the Authors Guild open letter. Eventually, the uproar reached
such a fever pitch that Benjie Smith took down the website entirely. He wrote,
Today I'm taking down the Proscraft I.O. website, which had been previously dedicated to the
linguistic analysis of literature, including more than 25,000 books by thousands of different authors.
Benji continues, I originally started working on this project more than 10 years ago when I began
writing a memoir about a difficult time in my life. It was my first book, and I didn't know
how many words I should write. I had heard that real books should be about 100,000 words. I
searched the internet for guidance, but I didn't find much. So I pulled a few paperbacks off my
own shelves, books by authors I admired, and counted by hand how many words were on the first
few pages. Then I counted the total number of pages and multiplied the two numbers to get an estimate.
I kept a little spreadsheet and it was precious to me. Precious guidance from authors whose books I
adored when I was struggling to tell my own story. Benji basically then says that led him to
create his own company to make tools for authors and started trying to expand the works that he
was analyzing. Benji said, I researched copyright laws, mindful of not wanting to hurt or offend
the community of authors that I cared so much about. Since I was only publishing summary statistics
and small snippets from the text of those books, I believed I was honoring the spirit of the
fair use doctrine, which doesn't require the consent of the original author. Since I never
share the text that I acquired by crawling the internet, I believe that I was in compliance with
the relevant laws. And now this is the part where it gets really interesting. And Benji starts to
look a little bit less like a big tech villain that he's been painted out to be, and a little
bit more like a guy whose passion project got caught up in larger tides of history. He writes,
I launched the Prosecraft website in the summer of 2017, and I started showing it off to authors at
writers' conferences. The response was universally positive. I've spent thousands of
hours working on this project, cleaning up and annotating text, organizing and tweaking things.
A small handful of authors have even reached out to me asking to have their books added to the
website. I was grateful for their enthusiasm. But in the meantime, AI became a thing. And the arrival
of AI on the scene had been tainted by early use cases that allow anyone to create zero effort
impersonations of artists, cutting those creators out of their own creative process. That's not
something I ever wanted to participate in. Today, the community of authors has spoken out,
and I'm listening. I care about you and I hear your objections. Your feelings are legitimate,
and I hope you'll accept my sincerest apologies. In the future, I would love to rebuild this library
with the consent of authors and publishers. I truly believe these tools are useful for creative people.
But now is not the right time, I understand, and I'm sorry. And so to be clear, Prozcraft was not a
A16Z back startup or something like that, nor was the Associated Company Shakespeare, the desktop
app that plugged into it. This is a two-person side project. And I think that's relevant to
keep in mind as you read the visceral responses from authors once it was shut down.
Ewan Morrison writes,
A victory for authors.
A website that used AI to analyze thousands of novels without authors' consent has been shut
down by protesting authors.
Warning to big tech, no, you cannot use our books to grow your AI.
Don't even try it.
Cabino Iglesias writes,
Prozcraft is no more.
Why?
Because writers got pissed.
Because we won't take shit and let our work be fed into machines.
The fight is just starting, but we're in this for the long haul.
Now, of course, while as tempting and emotionally fulfilling as it might be to view this as
the Davids of the authors fighting the Goliaths of Big Tech, Kate Nibs at Wired kind of has the right of it,
when she writes, why the great AI backlash came for a tiny startup you've probably never
heard of. Jane Friedman, the author who we were just discussing about, whose works were
copied on Amazon, called the piece the most thoughtful coverage she's seen yet, and says
experts in copyright and fair use are consulted and quoted thankfully. One of the things that the
wired piece points out is that it was hardly as cut and dry as some of these authors made it seem.
Quote, publishing industry analyst Thad McIlroy doesn't approve of data scraping, but he sees the backlash
against Prosecraft as majorly misguided. His term, shrieking hysteria, and some copyright experts
have watched the fur with their jaws near the ground. While the argument against piracy is simple
to follow, they are skeptical that Prosecraft could have been taken to court successfully. Matthew
Sag, a law professor at Emmer University, thinks Smith could have mounted a successful defense of his
project by invoking fair use.
SAG, along with several other experts I spoke with, pointed to Google Books and Hathie
trust cases as precedent, to examples of the court's ruling in favor of projects that
uploaded snippets of books online without obtaining the copyright holder's permission,
determining that they constituted fair use.
SAG said, I think that the reasons that people are upset really don't have anything to do
with this poor guy.
I think it has to do with everything else going on.
Techdirts Mike Maznick wrote something similar.
He wrote a post, The Fear of AI, just killed a very useful tool.
He points to one of the tweets that went viral around this from Zach Rosenberg who wrote,
How dare you, Benji? I demand you take my book off your site immediately. I do not consent to this
and never did. And I know my publisher never would. Maznik writes, I'm still perplexed at what the
complaint is here. You don't need to consent to someone putting up statistics about their analysis of your book.
But Zach's tweet went viral with a bunch of folks ready to blow up at anything that smacks of tech bro AI and
lots of authors started yelling at Smith. Masnick then points out,
The Gizmodo article has a ridiculously wrong fair use analysis, saying fair use does not by any stretch of the imagination allow you to use it author's entire copyrighted work without permission as part of a data trading program that feeds into your own AI algorithm.
Except dot dot dot, it almost certainly does.
Again, we've gone through this with the Google Book scanning case, and the court said that you absolutely can do that because it's transformative.
Mazdick continues, it seems that what really tripped people up here was the AI part of it, and the fear that this was just another VC-funded tech bro exercise of building something to get rich by using the works of creatives.
Except none of that is accurate.
Maznik concludes,
I find all of this really unfortunate.
Smith built something really cool, really amazing,
that does not in any way infringe on anyone's right.
I get the knee-jerk reaction from some authors
who feared that this was some obnoxious project,
but couldn't they have taken 10 minutes to look at the details of what it was they were killing?
I know we live in an outrage era where the immediate reaction is to turn the outrage meter up to 11.
I'm certainly guilty of that at times myself,
but this whole incident is just sad.
It was an overreaction from the start,
destroying what had been a clear labor of love and a useful project through misleading and misguided
attacks from authors. Now, there are also a number of other folks who sort of had this similar
analysis, getting into the details of fair use, but ultimately it probably doesn't matter so much.
Ultimately, what this is a story about is not really fair use and whether Prosecraft was in the right,
but instead about a much larger cultural and economic battle that is not just brewing, but frankly
here and out in the open right now. I think interestingly, even the appeal to precedent,
in cases like Google Books is not going to satisfy people who view AI as a fundamental threat
to what they do and what they make their living from. The concern is completely understandable.
The fear of change is for many very legitimate, but ultimately, these technologies are here and they're
not just going to be screamed away. It seems to me that a far more effective approach ultimately
would be to engage with some semblance of openness and nuance. Not because one has to compromise
their core principles, but because frankly the technology is a genie that's not being put back in a
bottle, that inevitably the legal system will deal with nuance even if people don't want it to,
and that having a positive hand in shaping how all that plays out is probably going to be better
ultimately than being locked in a perpetual battle. But then again, it's very easy for me to say this
from the sidelines. All I know ultimately is that we're going to have a lot more battles like this.
So being able to categorize and understand to rank order them even in terms of their importance
feels like it will be a very essential skill.
And it's hard for me to see how killing Prosecraft specifically, whose creator is, A, just a guy,
and B, so clearly sympathetic to the movement that aligned against him, is really a big win.
Anyways, guys, that is going to do it for today's AI breakdown.
This is an important and contentious one, so let me know what you think in the comments.
Come join us on the AI Breakdown Discord, which you can find at bit.ly slash AI breakdown.
And until next time,
Peace.
