This Week in Startups - AI on Trial: Inside the NY Times vs. OpenAI Lawsuit with Cecilia Ziniti | E1874
Episode Date: January 4, 2024This Week in Startups is brought to you by… MEV. Tired of the dev shop rollercoaster? Mev is your reliable technical partner, offering a well-established software development process designed to con...sistently deliver unparalleled value to their clients. Get $30,000 off your first three months at http://www.mev.com/twist Northwest Registered Agent. When starting your business, it's important to use a service that will actually help you. Northwest Registered Agent is that service. They'll form your company fast, give you the documents you need to open a business bank account, and even provide you with mail scanning and a business address to keep your personal privacy intact. Visit http://www.northwestregisteredagent.com/twist to get a 60% discount on your next LLC. The Paintbrush Loan is the earliest startup financing on the internet. No pitch deck, no business plan, no minimum time in business, and no warm intros. Plus, you get to keep your equity. Visit http://www.getpaintbrush.com to see if you qualify for a $50K startup loan in less than 2 minutes. * Today’s show: Cecilia joins Jason for an in-depth discussion about the New York Times versus OpenAI case, delving into the intricacies of fair use and analyzing the Fair Use Test (6:07), examining the legal complexities surrounding data scraping for Large Language Models (28:21), exploring possible ramifications of this legal confrontation between these titans (40:06), and more! * Timestamps: (0:00) Cecilia Joins Jason (2:44) Cecilia’s Background in Law (3:44) Jumping into the case of NY Times vs OpenAI. (6:07) Exploring Fair Use legal tests (11:57) MEV - Get $30,000 off your first three months at http://www.mev.com/twist (13:50) The case of Roy Orbison vs Two Live Crew and the music industry’s rules on fair use. (19:04) Picking apart the defense of attribution. (22:22) Northwest Registered Agent - Get a 60% discount on your next LLC at http://www.northwestregisteredagent.com/twist (24:43) Fair Use Test: Factors two and three (28:21) Legal challenges in data scraping for LLMs (32:40) Paintbrush - Visit http://www.getpaintbrush.com to see if you qualify for a $50K startup loan in less than 2 minutes (34:19) The fourth and final factor in the Fair Use Test. (40:06) Potential outcomes of NY Times vs OpenAI case (47:04) Google vs Java and legal discussions on digital platforms (51:49) Jason shares a possible solution to this case and how the subscription wall could change things. (57:54) Cecilia’s Grinch images on X (1:02:22) Legal viewpoint regarding commercial vs non-commercial use. (1:04:14) Summarizing where all this is going with the NY Times and OpenAI trial. (1:12:18) Reviewing the market-based solution to this case. * Check out GC AI: https://getgc.ai Website: ceciliaziniti.com Check out Ziniti Law: https://www.zinitilaw.com/ Check out Cecilia’s Maven Course here: https://maven.com/ceciliaz * Thanks to our partners: (11:57) MEV - Get $30,000 off your first three months at http://www.mev.com/twist (22:22) Northwest Registered Agent - Get a 60% discount on your next LLC athttp://www.northwestregisteredagent.com/twist (32:40) Paintbrush - Visit http://www.getpaintbrush.com to see if you qualify for a $50K startup loan in less than 2 minutes * Follow Cecilia X: https://twitter.com/CeciliaZin LinkedIn: https://www.linkedin.com/in/ceciliaziniti/ * Follow Jason: X: https://twitter.com/jason Instagram: https://www.instagram.com/jason LinkedIn: https://www.linkedin.com/in/jasoncalacanis * Great 2023 interviews: Steve Huffman, Brian Chesky, Aaron Levie, Sophia Amoruso, Reid Hoffman, Frank Slootman, Billy McFarland * Check out Jason’s suite of newsletters: https://substack.com/@calacanis * Follow TWiST: Substack: https://twistartups.substack.com Twitter: https://twitter.com/TWiStartups YouTube: https://www.youtube.com/thisweekin * Subscribe to the Founder University Podcast: https://www.founder.university/podcast
Transcript
Discussion (0)
The music industry is a great example of really the market wins.
And like, that's one of the points I made in the tweet.
And I think is important to think about what you think about this case, that I'm not a doomer in the sense of like, this isn't going to end AI.
Like, there's no universe where in this case would end AI.
And so the result is, do we end up with a licensing scheme?
Like, is this Napster to iTunes, right?
But to your point that this is like, it's going to be a fight and it's going to be a lot of discovery, I would predict that.
This Week in Startups is brought to you by
Mev. Tired of the Devshop rollercoaster?
Mev is your reliable technical partner
offering a well-established software development process
designed to consistently deliver unparalleled value to their clients.
Get $30,000 off your first three months
at mev.com slash twist.
Northwest Registered Agent.
When starting your business,
it's important to use a service that will actually help you.
Northwest Registered Agent.
agent is that service. They'll form your company fast, give you the documents you need to open a
business bank account, and even provide you with mail scanning and a business address to keep
your personal privacy intact. Visit northwest registeredagent.com slash twist to get a 60% discount
on your next LLC. And the paintbrush loan is the earliest startup financing on the internet.
No pitch deck, no business plan, no minimum time in business, and no warm intros.
plus you get to keep your equity.
Visit getpaintbrush.com to see if you qualify for a $50,000 startup loan in less than two minutes.
All right, everybody, welcome back to this week in startups.
You probably heard about this major New York Times lawsuit against OpenAI, you know, the makers chat GPT.
This is really a groundbreaking lawsuit here.
I think this is going to be the most important lawsuit that we've seen in AI, perhaps in technology.
ever. And so I wrote a blog post about it. Some of you may have read it at my substack,
callagana's not subsdack.com. One of the great things about the X platform and Twitter,
formerly known as Twitter, is that you meet interesting new people. Well, one of those new people I met was
Chichilia Ziniti, and she is an actual lawyer. And she did an incredible breakdown on her
Twitter while I was writing my substack. So I invited her to come here on This Week in Startups
so that we can break down what is happening in this long.
And this is an absolutely critical episode for all founders because you can get yourself in a lot of trouble if you don't follow the rules. And this is uncharted territory. I think you would agree. Welcome to the program, Chichilia. Thank you. Yeah. I'm excited to be here. Thanks for me.
So just your bona fides, as it were, you wrote a great tweet storm, by the way, and you have a background in legal. So maybe just share with the audience, you know, who you are and why you're taking the time to comment on this issue.
I'm a lawyer for tech companies, been in tech since I joined Yahoo in the early 2000s when they were still competing with Google and always been interested in the legal side.
And over the years, that's taken me different places.
I was at Morrison and Forrester, a big law firm represented Apple and Apple Samsung, which is a huge case of the day.
From there, I joined Amazon and they said, you have all this mobile phone experience.
I thought, surely I'll be working on the fire phone.
I get there, they're like, no, we're going to have the more experience.
attorneys on that. You're going to work on this device. It doesn't really work. It's called Doppler.
And that turned out to be Alexa. And it was a great career move. So I was the first lawyer on Alexa,
had a great experience there. And then went on to be a GC of different tech companies. You might
have heard. Anki was Andreessen Horowitz. It was a early robotics company. Spent some time at Cruz.
And then most recently, I was the general counsel for up. Oh, wow. So what an incredible career
thus far. Let's get into this case.
because this is a very unique case in the history, I think, of copyright.
And correct me if I'm wrong, having been in content my whole career as a journalist, publisher, Silicon
Reporter, blogs at Weblogs Inc.
I've dealt with a lot of these fair use claims.
And I've dealt with a lot of copyright claims.
I've dealt with cell phone manufacturers, you know, emailing us, oh, my God, you have a leak.
That's our copyrighted information, all this stuff.
And so there's a lot to unpack here.
But when you saw this lawsuit drop and you started unpacking it, how important is this lawsuit?
And what is the nature of the lawsuit?
For people, you know, who, you know, maybe are new to this, just briefly, what is the nature of this lawsuit?
What is the New York Times claiming here?
Yeah.
So New York Times has a content library.
One of the few content holders, more prolific than you, Jason, perhaps, going back to 1851, right?
So they reported on literally the Civil War, right?
So that amount of content, millions of articles, the allegation is that those articles were used in a couple of ways by OpenAI without consent.
So one way is training, right?
So in the complaint, New York Times actually breaks down that it was a decent percentage of the articles used to train Open AI.
I think, you know, in the like one or two percent, something where it's actually measurable.
You know, one random blog post that I wrote, you know, not going to move the needle.
But the entire New York Times archive, you know, maybe it does and that's the allegation.
So that's one.
The second theory is more on the output side.
So when you go to chat GPT and you ask for an article, they've got this exhibit, New York Times made this exhibit, exhibit J.
If you look it up, it's great.
But essentially, it has 100 instances of somebody putting the first paragraph, I put an article in,
and chat GPT gives you the rest.
Verbanum, you know, like almost, you know, one or two word changes.
But that is kind of a different, a different theory.
And it triggers the law differently.
I can get into that of some interest.
But that's really the core of this.
Yeah.
And so the nature of fair use, I am very familiar with because I've had many people
claim that we use their content, let's say, in a blog post or in this very podcast,
where we might use a short snippet of a song where I'm doing commentary on it.
or a clip of a news event that occurs.
And so I'm pretty familiar with the four-part test,
but maybe you could run our audience through the four-part test
because Open AI, I think, believes that what they're doing is fair use.
And then as part of that, I don't know that training as a concept has existed in the
copyright law.
This idea of training something, I believe, is novel to copyright law.
Am I correct in that one?
That's right.
there hasn't been, at least an adjudicated case on training yet.
There have been a lot of fair use cases that I think Open AI and New York Times will each point to ones that go their way in technology,
but there hasn't been one on training that I'm aware of that's gotten to that way.
But in terms of the fair use test, it's a super fun one.
It's four factors, as you said, but they are non-exclusive, and it's very squishy.
It's literally courts are directed to judges.
apply it, not juries. Courts are directed to balance the interest and they can consider other
factors and no one factor is fancy word dispositive. No one factor decides. So it really is something
where there's a lot of discretion and the optics of it and how, like, whether the judge wants to
rule your way tends to matter. So the four-part test is not you can take 5%. It's not you can take 12%.
It's not your you can monetize it a little bit over here.
It's open for interpretation.
Exactly.
And you have to as a judge when you make these decisions look at the totality of those four parts.
So let me get into those four parts and then go into some examples.
Let's step in.
I have a, I actually have a slide.
Should I pull that up?
All right.
Awesome.
Yeah.
Let's do it.
I mean, wow.
I love a guess with a slide depth.
I love it.
I wanted to be a law professor and then decided other things were more lucrative.
So this is my law professor.
are dying to be free. But essentially, um, fun thing about fair use, it was, uh, the original fair use
case was in the 1800s and it was a about writings that George Washington had. And another biographer
copied 353 pages of Washington's original writing and lost. It was not fair use. 350 pages was too
much. And then that opinion, um, from the 1800s got codified into the copyright act. So you're
there. Let's go through the four factors. So I used emoji because this is a new generation. The Zimmers
will help will do that. But essentially the first one is the purpose and character of the use.
And this is really where all the play is in technology cases. So I've got here, I've got the emoji
for theater for like, how are you using it? The emoji for a video game controller, because
video game cases are actually pretty instructive here. And then the emoji for the web, right? So this is
where what the court consider is here is how are,
how is the infringer using it?
Are they making a commentary?
Are they making a joke?
Are they making a parody?
Are they famous case?
Perfect 10 versus Google?
Perfect 10 was a pornographer and said to Google,
hey, your thumbnails are infringing our content
because they're literal copies that people see.
Google defended saying this,
we're using this for a different purpose.
You're not trying to be pornographically.
entertained when you're doing a thumbnail search. Maybe you are, but it's not a good substitute.
Google won that case on this fact. So that was for Google search. Now, let's go through some of those
cases. If you were doing commentary, there have been many cases where people will take a movie or
there'll be a documentary film about a movie or might use movie clips. And if you're doing
commentary on that, even if it's commercial, there's some leeway allowed for that. And then there's
parody. So if you made a parody movie like Spaceballs, is the famous Mel Brooks parody of Star Wars,
you can make a parody, you can make a joke of something. And the test, I believe, like the
subtest here is the confusion of the audience. Does the audience know who the original author is or not?
So if Saturday Night Live does a parody of, for two or three minutes of Harry Potter,
nobody is confused that that's actually Harry Potter. I mean, it's pretty obvious, right? So
this is part of it. Whereas if I did, I wrote my own fan fiction of Harry Potter and it was really good and it was a full book, you might be like, wait a second, I can't tell if J.K. Rowling did this or not. So there's something about the audience that matters here in this purpose as well, correct? Yeah. So it basically, fanfic is a great example. It actually, the examples that you gave implicate not just the audiences view, but really the full factor test. And the fact. And the fact that you gave, it actually, the examples that you gave implicate, it actually, the examples that you gave implicate. It actually, the examples that you gave implicate. It actually,
Actors kind of like it's like an inverse scaled, you know, that one goes up, another one goes down.
But in the case of Harry Potter, great example.
So J.K. Rowling sues fan sites.
And she wins because her stuff is so creative that, you know, if you have a fan site that says,
okay, this is Hagrid and has big chunks of paragraphs.
And they're getting all this revenue, lots of clicks, you know, SEO optimized website.
That's a fan site.
J.K. Rowling testified and she said, I mean, so creative.
to even testify this way.
She's like, it's as if someone came into my plum pie,
I had cooked and picked the great plums out.
And so it was like the creative aspects.
Those were kind of what triggered the case.
A lot of founders are great at going from zero to one.
This takes vision, creativity, hustle, all that great stuff.
But those same people often struggle with going from one to 100.
If you want to scale and you want to do it efficiently,
you're going to need process and you need structure.
And that starts with your product.
So if your startup needs a more structured engineering approach, you need to check out MEV.
Mev helps businesses build and maintain their products faster and more effectively.
They'll make your product more stable, scalable, and secure.
They'll build custom infrastructure that scales, and they can help build additional features for your product and more.
For each of your needs, Mev organizes an entire tech team comprised of senior engineers, delivery managers, DevOps, Q&A, and designers.
And they've been in business for 17 years.
and they've helped the following companies build complex tech products, Cartier, Toot,
and Ozempic maker, Nover Nordisk, my favorite.
So let Mev help you increase product velocity and make product engineering more sustainable.
Mev is going to give you $30,000 off the first three months.
That's right. Get $10,000 off per month right now at mev.com slash twist.
That's Mev.v.com slash twist for $30,000 off your first three months.
Interesting on the parody side, you know, the Supreme Court waited on.
I have a sound clip if your users.
Let's do it.
Sure.
Let's do it.
Because I remember when I was coming up in the industry, I always found this one, fascinating.
There was a game called Mist.
It was a famous game.
And then somebody made a parody of Mist and they sold it as packaged software.
And people got really, they weren't confused by it, but they had to make some concessions, I think.
And most of these lawsuits, am I correct, are settled out of court.
They don't go all the way.
People just say like, hey, this is not reasonable.
This feels unfair.
And then the other party says, okay, well, if we put parody on it and we made these changes, would that be okay with you? And they kind of negotiate their way out of it. Yeah, exactly. They're pretty rare to go fully to the Supreme Court or even just to court in general because usually the parties work it out. They're expensive, unpredictable, et cetera. This one did go all the way. Commonly, it's record companies or your Sonys of the world, your New York Times that are pretty big and have kind of the pockets to do it or a big financial reason to push it.
right. They have a lot of stake. Exactly. Exactly. So in this case, they have to hold the line in some ways, right? If you don't defend yourself in an instance, then the next instance it becomes harder to defend yourself. Is that correct? That's right. Also, it's like commonly people who are trying to claim for use, like they sort of know in advance. And that's certainly the case with Open AI. Like, they knew the copyright issue was coming. You know, one of the things in the complaint says that,
their board member, you know, Helen Toner, the one who departed, one of her issues with Sam
Altman was not addressing copyright properly. So if you know that, then you hire people like
me to help you like, okay, where are the edges? How can we win on these different factors?
And so, yeah, so in this case, this one is pretty woman. So the classic Roy Orbison song,
the guitar riff is pretty recognizable. And Two Live Crew made a version of it that I can play part of it.
Sure. Let me see if you can get the audio. Let's see. Yeah. Is that coming through?
Yeah, it's coming through. And now we're going to get a copyright claim here on YouTube.
Exactly. Exactly. No, I'll just play just enough to understand. We will defend it as fair use because we're doing commentary on this.
Exactly. Yeah. Right. That is, I mean, it's literally pretty well. It's literally the same, right? You know, it's a cover song in a way. It's like a cover song. Yeah. So what they did, and this is kind of interesting, is,
is, you know, instead of sort of the oh, pretty woman lyric repeated, they made it, oh, hairy woman, oh, bold woman.
Like, they made it.
It's kind of a raunchy song.
But the point is, is like, it's different enough that the argument was made that, like, this is a commentary on, you know, sort of society and wealth.
It talks about, you know, there's different, you know, you could argue that two live crew was just in a different societal place than Roy Orbison in the 60s or whenever he wrote the song.
And so it went to the Supreme Court.
In this case stands for the proposition that a use can be fair.
It can be parody or commentary, even if you're making money.
Like, you know, this is a two live crew.
They were not professors.
They were not, you know, just like writing a blog, no one would read.
They were selling music.
And so that case was important.
And it got really into the four factors.
So it's considered one of like the canonical cases.
And how did that case work out?
Did it wind up in a settlement, I would assume?
You know, after the Supreme Court ruled in Tulive Cruz favor, I'm not sure what happened.
I think the Roy Orbison estate lost rights to the work or something like that, but it turned out to be a sad thing for Roy Orbison in the end.
And now when people do do samples, there is, and the music industry is the toughest.
They're the most hardcore because it's a small group of people and they work together in unison.
You know, let's be honest, they're just super sharp elbowed people.
They've always been in terms of IP.
So they have said, like, hey, listen, you want.
to do a cover, here's the mechanicals and the licensing for that. Hey, you want to use a sample.
You have to have permission in advance. And then, hey, you can do a sample. And then Kanye just did a
Backstreet Boys cover. And people were wondering how he got the rights to the sample. It wasn't a
sample. It was a cover. And so they have their own little mechanics and traditions in the music
industry that or standards, right, they've established for this. That, I mean, that's, the music
industry is a great example of really the market wins. And like, that's one of the points I made in the
tweet. And I think is important to think about what you think about this case, that I'm not a
domer in the sense of like, this isn't going to end AI. Like, there's no universe where this case
would end AI. And so the result is, do we end up with a licensing scheme? Like, is this Napster to
iTunes, right? So Napster comes out, you know, in the time I was in college. It was like,
you could get the entire Beatles library from somebody in the dorm next door.
Like, you know, it was clearly like it felt sort of bad.
Yeah, it felt like stealing.
It felt like stealing, right?
Well, because there was no difference between downloading on Napster or downloading
on iTunes or buying a CD.
It was this, you did one in place of the other.
That's exactly right.
And iTunes came up after, right?
It was like, okay, this is a legitimate way to pay for digital music.
and people like, you know, me or you or whomever,
it didn't feel like stealing when you paid for 99 cents.
And it wasn't when you paid for it on iTunes or Spotify or wherever now.
And that industry has come out of that.
So you can see, you know, with OpenAI,
they could have a system where they figure out kind of the provenance of different
outputs and pay in some way.
Or, okay, you want me to do a verbatim Luigi, all right, there's your five cents to
Nintendo or whatever.
Like, this is not the, the 10.
tech will find a way. I'm very confident of that.
This argument by technologists is that this is too hard to do attribution is nonsense.
I mean, if you can...
Honestly, I agree with you.
Yeah, I agree.
I mean, if you can create this incredible AI that's able to make images,
you should be able to figure out what was the source of those images.
And if you can go find these libraries of content to train it on,
and then trade these very sophisticated things and set up 10,000 computers or 100,000
computers and billions of dollars worth of computers with thousands of engineers, I think you figure
out attribution. It's not that hard. And in fact, there are services that are already out, that are in the
chat GPT mode, which actually do do a citation. So the market has already proven it's possible.
Let's talk about this one piece of the four-part test. You've got the purpose and the character of your
use. Is it parity? Is it education? In education, if you're not making money or in society,
if you're doing commentary, you get a bit of protection. We want that in society. We want Mel Brooks to be able to
make jokes. Got it. We want a professor to be able to show, you know, Star Wars and give commentary
in a class in a non-commercial setting for people to learn. It's not going to compete with people,
right? So that's all really good stuff. That's good stuff that we want in society. We also want
people to be able to make fun of things and do commentary. So if John Stewart or John Oliver
want to take, I don't know, a talk that some, you know, President Trump or President Biden did,
and they want to make fun of it and use parts of it,
well, we want them to be able to be mocked in a free society,
and that doesn't kind of conflict with any.
So we understand those.
Yeah, kind of a fun, one fun little point on that is that it comes from the Constitution.
So copyright law is actually, it's federal law,
and the IP clause is Section 8 clause, Article 1, Section 8.
And it says to promote the progress of science and the useful arts,
Congress can secure limited monopolies for authors and inventors.
And that initial to promote the progress of science and the arts, that's been used by courts
to limit copyright.
So copyright could go really, really far.
Like you could allow copying never.
But that idea that it really is about societal progress, that's also what helps tech
companies, right?
So that's also why Google won the thumbnails case because they're like, look, you know,
it's super useful to have search.
How else are you going to have image search
if you don't know what image is actually in the results?
Right.
And they also had the argument,
I think in that case,
that they were doing very tiny images,
smaller percentage of the original work,
and that they weren't taking every image.
I believe there was like,
we're only taking a small amount of it.
And then they also, I think,
had the sort of ultimate rebuttal,
which was you can also,
I think they created Robots.
TXT around that time,
where you could just say,
you know what,
I don't want my site index.
And the Google was like,
if you don't want to be in the index,
don't have to be.
Exactly.
Perfect 10 then could just not be in the index and problem solved.
So then they had to make the trade of, okay, give a little bit of my content, a thumbnail
image of, you know, some photo, uh, of an adult nature.
And then I, but I get some traffic.
So maybe it's worth it.
And then the copyright holder can make that decision.
Just like I think Star Wars, Lucas was very cool with fan fiction as long as it, and fan
movies even, as long as you didn't try to monetize it.
Starting a business used to be a pain. You needed a lawyer. There were hidden fees. It was a mess.
Now, with Northwest's registered agent, it only takes 10 clicks and 10 minutes.
Northwest provides everything you need to start and maintain your business.
Every LLC, corporation, or nonprofit at Northwest Forms comes equipped with registered agent service,
a business address, a website, and hosting, email, a phone number,
and this is all covered by Northwest's privacy by default.
Again, your full business identity will be live in 10 minutes and in 10 clicks.
So here's your call to action for $39 plus state fees.
They'll form your LLC, corporation, or nonprofit, and launch your business in just minutes.
Visit Northwest Registeredagent.com slash twist today.
That's Northwest Registeredagent.com slash twist today.
So if you go on to YouTube right now, you can watch all these really creative kids running around dressed as
Jedi fighting each other and releasing episodes,
they don't get cease and dissent.
But J.K. Rowling might say,
hey, with my art, I want a different standard.
Exactly. Exactly.
And, you know, Open AI to the point that, you know,
you get lawyered up and you kind of realize what you have to fight about,
open aides has been savvy about this.
And they announced in the summertime that they'll respect robots TXT go forward.
And so these kinds of systems where you're giving the owner control,
that's going to be the kind of thing that open AI will argue, you know, matters here.
And they're doing that, you know, partly informed by precedent, but partly also because from an economic standpoint, it's the right thing to do, okay? You own your content. You have this bundle of rights. You want to license your content to make a Harry Potter restaurant. Fine. I mean, that was another case, actually, funny enough, somebody tried to do a, it was a Costco. I think it was a Casa. Yeah. So it was a restaurant actually of SpongeBob. So SpongeBob, there's the Krusty Krab, which is a SpongeBob character. And,
There was a restaurant in Houston called the Rusty Crabb.
And they tried to say, oh, this is social commentary, but it really wasn't.
It was just a Spons Bob restaurant.
And so Viacom went after them and won.
So now the percentage of the work matters in this fair use test as well, correct?
That's right.
Yeah.
So it's called the second factor is the amount and substantiality of the portion used.
Okay.
And I can pull up.
I can pull up the slide.
That's the third factor technically.
I got confused here.
I'm also not an AI.
One second.
We've proven it.
Exactly.
Like lots always.
So nature of the copyrighted work, that's factor two.
It doesn't get a ton of play because it's, you know, most work is creative.
But in this case, New York Times anticipated this issue and they've got a long thing in the
complaint about how creative their journalism is.
And they're right.
Like, you know, they spend a lot of time.
I mean, obviously, you know, you've been a journalist.
it's not just pure facts.
There's a lot of ways.
And funny enough, I didn't expect this,
but in response to my tweet thread,
there was a lot of political things.
It was like, oh, my goodness,
I can't believe the New York Times
is such a chunk of opening eyes training data.
That's why, you know,
chat, GPT is so woke.
Exactly.
Yeah, exactly.
But this is interesting.
Facts and data points are very hard to copyright.
So if you, there's like a website I use often,
which makes beautiful graphs.
I forgot the name of it, but it comes up all the time.
It's like the world and data or something like that.
There's world and data and then there's another one that comes up in SEO.
And all this company does, and they charge like a subscription for it,
is take other people's data and make a very beautiful standardized chart.
You know, I was looking for some market maps for one of my investments or some market
sizing and it had, you know, it was like something super obscure.
It was like the world button market or something.
And it had like all the countries.
Yeah, yeah.
Yeah.
And then you look in the credit.
It says source.
you know, this is, you know, Pew research, Pew data.
This is from this data.
So you can literally make any chart you want on anybody else's data.
As long as, and I think in terms of fairness, you just put that that's the source of the data.
But data is not copyrightable.
Is this correct?
Like facts and data are not copyrightable.
Yeah, so a couple of big cases on that one.
One was Fice versus rural telephone.
And it's kind of a little antique, but essentially one telephone maker.
took the phone numbers and names from another, made their own phone book, put their own ads in it.
And essentially that case was pretty important because the Supreme Court said, look, copyright is not about labor.
It's not about the work you put in. It's about the creativity. Remember, to promote progress of
science and the useful arts? Is this really about creative progress? Is copying of a telephone?
Now, there's other ways, maybe contract or other ways that you could go after. But in your scenario,
Pew, you know, if it's reported as a fact, you know, percentage of Americans on the internet every day or whatever it is, then you would be able to use that compilation. Now, there's some nuance around creativity in the compilation. So the other big, big case on facts is Oracle versus Google, right? So that went to the Supreme Court. Google copied Oracle, I think it was declaring code. And so essentially in order to be able to have Java on Chrome, they did that copying. And the
Supreme Court that was heavily litigated over years.
Yeah, I think 10 or 11 years, but any event, the Supreme Court found for Google in that
case.
So the nature of the work, yes, Statista is the name of the website that we sometimes use.
That's exactly what it is.
I've used it before.
And then there was another one e-marketer, and they've gotten in all kinds of like legal
letter kind of trouble, I believe.
I remember seeing it.
I'm not sure which site had that.
But then like, you know, other kind of re-blogging sites started doing the same thing.
So if you want to make a great business, you can just.
take other people's facts and make beautiful graphs out of it.
You see people do that all the time.
But that makes sense.
And then scraping data, there was an Israeli company that was scraping LinkedIn data.
And they were saying, hey, this is just facts.
That's another area, scraping and fair use there.
I don't know if you've seen many cases there, but they, yeah, then you get into international
jurisdictions, like what people think in Japan, India, you know, the Middle East and Europe
could be very different.
The jurisdiction could be very different in how you use data.
I think LinkedIn and Microsoft.
sued this Israeli company and lost.
Yeah. Yeah. It's interesting. I mean, with scraping, you know, thinking about sort of the
startup angle, some of it is also contract law. Like I've seen scraping cases get on like
you're literally trespassing. And this is why also, you know, some of the technical means,
like, you know, if you're scraping in such a way that you're like dedossing the site or you're
hitting it so much that, of course, there's other claims against you. And, you know, every
website, terms of use has an anti-scraping. And I've started to see in my practice.
And then a lot of companies now are putting, you know, it's against our terms of use for you to use our data for training. And, you know, you can imagine, you know, in vertical AI, you know, people doing, I don't know, AI for doctors. There's a website called Doximity, which is like, it's the LinkedIn for doctors. Okay, are they, if you scrape that content, you know, maybe you're individually doing it. You're breaking the terms of use, especially if you're doing it, you know, locked in. So there's kind of other, there's other theories. But yeah, the link.
in case was a big one and scraping overall, you know, certainly in e-commerce, it's, everybody does
it.
Yeah.
But knowing the price of a product, the price of a product across 10 different websites
across 100 different days doesn't feel like the nature of that copyrighted work is not like
some artists invested a lot of time in it.
Now, if you said 10 people to the front in the Ukraine, or in Ukraine rather, sorry,
and, you know, you spent a million dollars putting them there for six months, you know,
this is a whole different ball of wax. There's a lot of work. And that's what the New York Times is claiming here. The amount and substantiality of the portion used, that's the third part of the test. What does that mean? Yeah. So this is getting at our particular phrases, copyrightable. So interesting one here, Taylor Swift with Shake It Off. She said something like, player is going to play. And there was a rap song called Players Gonna Play some years prior. And they sued Taylor, but she won, or at least if the case went away, they agreed to drop it.
Because there's not that many ways to say that concept.
So there's this thing called the merger doctrine.
And this is actually an issue where if you,
if there's only so many ways to do something,
then you can't copyright that thing.
Yes.
But this is a fun one.
And I have a visual on this one that I think is funny.
And it's actually a doll.
It was a seventh circuit.
So that's the circuit over Chicago.
And there was this company talking about e-commerce that made apparently very
lucrative to the surprise of the court, which they say in the opinion, but basically farting dolls.
Like you buy it at, you know, at the mall or wherever, and you get a doll and it makes a root noise.
The doll on the right was basically that the makers of that doll had gone to like a toy show or something, seen it and copied it.
So, copyright suit brought by the makers of the guy in the green chair.
And the court said, look, the concept of a farting doll, that's not copyrightable.
No. I can go make one. But the court has this amazing paragraph in the opinion that's like, they could have given him a mullet, they could have given him flannel, they could have done it, they could have put him standing up. They could have had him wearing boxer shorts, you know, whatever. Like the point is these little details, too substantial of a portion of the original was used and it was not associated with the idea. Like had nothing, like the idea itself, you can express it a bunch of ways. So people in this case, in the open AI case, you know, the art.
art stuff is super fun because it's visual. So there's been a whole meme and I did another tweet
on about it about Super Mario and Luigi. And, you know, if you ask for an Italian plumber,
that's what you're getting. Yes. You know, there's other ways to have an Italian plumber, right?
Maybe he's, maybe he's really stylish and where's Prada, you know? So you could make an anime version of it.
But the fact is, the most iconic one that has had a lot of money invested in it was by Nintendo.
Listen, not every business is venture scale. If you're not,
You won't be able to raise money from VCs.
We all know that.
And not everybody has a rich family member to do their friends and family round.
So if you want to jumpstart your business with $50,000, let me tell you about Paynebrush loans.
Painbrush has created a new kind of loan product.
They connect IDSA-State startups with bank capital.
So you don't need to give up any equity, and there's no pitch deck or revenue required.
And the Paynebrush loan is available at the IDEA state.
In fact, you can apply the moment you incorporate your company.
Monthly repayment is a flat, predictable amount, which makes cash flow planning really simple.
So here's your call to action.
If you're a founder in the U.S., go to getpaintbrush.com to see if you qualify for a $50,000
startup loan in less than two minutes.
That's getpaintbrush.com to see if you qualify in less than two minutes.
One of the things I tell young founders or people in content is like, if it feels unfair,
then perhaps it is.
And you have to have empathy and take into account what the other party is going to think,
their opportunity would be.
And I think this gets us to the fourth part of the test,
which is if a new product or service is going to be made from J.K. Rowling's books
or from the New York Times Archive, who deserves that opportunity?
Am I correct?
That's the fourth part of this test?
Exactly.
And you're correct in two ways, both on the test and on the feeling.
Like, you know, I've been practicing a long time.
And a lot of these cases really are, like you said,
Napster felt kind of wrong.
And it kind of was, right?
And so it does turn on that.
But in terms of the factors, let me pull that back up.
And I can, I was very proud of my emoji.
So I can.
Emerges are great.
I could show you the emojis here.
But yeah, so basically like, you know, you see the flying money, but it's literally like,
what is the market for the original?
Yes.
And the value of that market.
And who gets to, to exploit that, right?
So intellectual property is similar to regular property, right?
If you have a piece of land who gets to put a hotel on it, right?
If you have this really juicy piece of land, right?
So similarly,
here. New York Times got with this factor by saying, open AI has already made deals for this. They know how to license data. Like they've already done it with Politico. They've already done it with the AP. It's not like there's no market for this. And so, you know, others in the thumbnail case, that was harder to prove. There was evidence put forward, oh, you can use the thumbnail for, you know, at the time we had those Nokia flip phones with the tiny little lock screen. And it was like, okay, there's a market for that. But evidence came out in that case that those were fake licensing deals done just.
for the litigation.
Here, it's clearly not.
So, yeah, so that's another factor.
And then, but it's not, like I said, it's very squishy four factors, kind of, and
they're not exhaustive either.
So, you know, a court could say, you know, there's four factors.
Then I'm going to introduce a public good factor, you know, like basically make one of.
For the effect of the use on market and original value, this is where I'm going to do a
follow up post to my original post, which is I pay as a user, 20 bucks a month or
So 20, 30 bucks a month for New York Times, and I pay 20, 30 bucks a month for chat GPT for it.
I'm paying for both.
And I recently was going to, and I'm a huge fan of the wire cutter, and I am like a crazy
product research guy.
I just love researching products, restaurants, et cetera.
I use Yelp, I use everything.
And I love wire cutter.
In fact, I tried to buy wire cutter invest in it back in the day before they sold it to the New York
Times.
And so I did a search for coffee grinders and some other stuff.
And I actually did it.
I believe ChatGPT and call it.
clawed and a couple other ones, and I was just testing it.
And it was pretty clear that they got their information from wirecutter because, you know,
it was kind of like the answers were very similar.
I am like, I think the tip of the spear here, if I get my New York Times and my wire cutter
from chat chip before, I might cancel my New York Times over time.
Is it not the case that the product open-eye built, the ability to use a chatbot to talk
to the archive of the New York Times, that is the New York Times' opportunity, not
open-a-ass. Yeah. So, you know, the example I use, you know, Martha Stewart, great media
conglomerate, and she's very, very savvy, very tech forward. She was talking a year ago,
right after ChatGPT came out about creating Martha AI that you could talk with. Because she's
similar to their time. She's got decades of really high-quality content that in a particular
voice, right? And so, yeah, that's one way. Another way is New York Times made the case in the
complaint that they calibrate very carefully what's free versus paid, right? You know, the amount of
gift things that you have. You know, if you click from Instagram, sometimes you'll get the gift
version. Basically, like, that's there, like the rights holders, essentially property to exploit is
how they would say and how they subdivide it and where they put that line, how much admission they
charge, all of those things they would say are within their rights. Now, Open AI would probably
make similar arguments to you that, okay, under that first
similar arguments to what they make out of the first factor, which is, you know, it's a different
thing. It's a new, you know, having an LLM specifically a very large language model trained on that
number of parameters, the amount of investment they've made, they've changed it into something
different to where it has a different purpose. You're going to chat GPT to have generation as opposed
to have, you know, pre-made research on a particular thing. Their claim would be, and I'm trying to
take their claim seriously here in Farrell. Their claim is, hey, we did this first. We made a
language model first.
Therefore, since the New York Times, they didn't get to it yet, this is new because doesn't
the New York Times have an unlimited amount of time to exploit their own content?
Yeah, and they could choose not to.
Or they could choose not to.
But so if Disney said, you know what, we bought Marvel and, you know, we haven't made a
Marvel theme park ride yet.
That doesn't mean somebody else gets to make the Marvel theme park ride.
Exactly.
Yeah, no.
So the time aspect, if I, if I mentioned a time aspect, that, um, that,
would be I misspoke, but essentially-
No, no, you didn't mention it. I was just building
on your thoughts on it, which is they're saying,
hey, we spent all this money to build this thing.
It's like, yeah, you did.
We are planning on building it as well
at some point. Therefore, it's our
opportunity. So I think that one fails.
That one may fail. Where I think they could say is
like, we're not trying to replace
the content. They could say
our aim is
to just merely have
the best language model possible.
And therefore, you know, the literally more, the higher volume of text that we use kind of the better.
And, you know, it's going to be more like the job, the Oracle case that I mentioned where, you know,
Google's argument was like there's only, you know, Java programmers already know how to declare these variables.
We're going to copy the declaring code to literally advance the progress of engineering.
And so Open AI could say something like, it's a different thing.
You know, we are, we use the New York Times content and other content for training.
to advance kind of the state of the art of the actual LLM, how it generates words, how it's better.
And people can kind of see this. And what they would say is like, look, GPT 3.5 and GPT4 very different.
GPT4 is much better because we did more training on more content.
Ergo, it's not the actual content itself or the creativity of the content.
It's just the fact of having content.
So that's another way they could take it.
Based on your gut, let's say this goes to the,
that and we went through this four-part test. Based on your gut, percentage-wise, New York Times wins
their argument that you can't train on our data and they have to get an injunction. What are the chances
that happens? I mean, I'm really putting you on the spot here. The odds of an injunction are very
slim. The standard for an injunction is that it causes irreparable harm to whoever it is, the plaintiff.
And it's harm that cannot be fixed with money. And so there's very few harms, really, that can't
be fixed with money. And so that and then the test for an injunction is another four-factor test.
And so when it's sort of close and when you have a technology that definitely has societal
benefits, so, you know, opening aisles say, look, we've got people, you know, diagnosing things with
Chachapit, we've, you know, saved marriages, whatever, all the, all the amazing stories about
which honestly, like, it's an incredible productivity, but, you know, I use it every day. Like, I'm a
huge user of it. And so just on that societal benefit, I would be very, very
unlikely. Injunction, very unlikely. So let's work backwards from that. Injunction,
less than 10% chance. But I would say less than 10%. I think that's probably right. But not zero.
Non-zero. Not zero. I mean, you could have something very strange like, you know, like in the Apple
case, the Apple patent case, it went sort of all the way to Biden and stuff. So you could maybe,
but I think it's unlikely. So if they were to lose, then you would be,
in the damages, but then they would also have to remove it, right?
That's a possibility, is that they have to retrain,
the settlement could be that they have to retrain things and take the New York Times out of it.
Yeah, I mean, it could be, that being said, we, you know, 4.5, GPD 4.5 is rumored to be coming out.
And so it could be that, you know, they sort of skip straight to that.
And they've known about this case for a while, like the complaints is they've been
negotiating since April. So my guess is, opening eyes, probably already,
kind of firewalled off the New York Times content.
Got it.
New York Times, I think, 535 other journalism publications,
everything from, you know, down to like the St. Louis Post Dispatch have put themselves
on that do not train list.
And so I don't know, but it's very tough for an LLM already developed, right?
It's back to Jake and Walling's example.
Like, you can't put the plums out of the case.
In this case, it's almost like, I don't know, it's baked a cake and it's like the vanilla.
Like, how are you going to get the vanilla out of a baked cake?
It's not a thing.
If we know that they trained it on one or two percent in the first versions,
and they should be able to determine that because there's going to be discovery in this case.
And this case is going to keep going.
I don't think there's a set.
I don't believe there's going to be a settlement.
I think they're going to take this to the mat in New York Times.
Because I think they regret not taking to the mat with Google back in the day.
So this is, I think, existential for them where they view it as such.
Therefore, they're going to go to the mat.
Therefore, there will be discovery.
And in discovery, there will be Slack messages or emails or conversations about
what are we going to include?
And they're going to have that open crawl
and that open crawl is going to be plain as day.
What they put in there is going to be in a hard drive somewhere.
And then there's people talking about it saying,
the New York Times is really high quality.
We should move their weight up.
And we should make this more important than, say,
business insider, which is a lot of like FACCA nonsense.
And then, you know, oh, and then there's like 4chan or Reddit.
Like maybe we'll make those a little bit, you know, less valid or maybe make them more
valid, who knows, for, in the case of Reddit. So that's all going to exist in discovery. And that's
going to be super damaging. Is it not? And the discovery part of this could be explosive.
Yeah, I mean, it could be super damaging, but it also could be helpful, right? So, I mean,
Open AI, they went for the, the nonprofit model, in part because they saw, I mean, copyright is
one flavor of issues, but they saw certainly societal issues. And so, you know, I've done, you know,
interacted with Open AI.
Replitt had a deal with Open AI going back to 2020.
So they're pretty thoughtful.
So I mean, yes, you could get,
but any discovery is always a wildcard.
You could get crazy emails.
In the Google case that I mentioned,
the Oracle Google case early on,
there was like $150,000 worth of litigation
over one email from an engineer saying,
hey, I don't see any way how to get out of this
without licensing from Oracle.
And he just said that email.
It literally put it in there, yeah.
Exactly.
And it was an email.
to, you know, like Larry and Sergei, and it was like the Lindholm email and it was like famous.
And this guy who was like, you know, a director of engineering was like, had his moment in the sun from that email.
So yeah, this is a reminder.
Never put, never discussed legal topics on electronic communication.
Exactly.
Or without a lawyer on the thread.
Like I actually use it to teach privilege because he, um, he C-Ced a lawyer, but it wasn't to a lawyer.
He wasn't asking for legal advice.
He was declaring.
So it's a fun one.
But yeah, it's, um, it's, um,
I actually could show the email if you want.
Oh, yeah, that'd be great.
All right.
So the issue here, though, is opening I can't have their cake in it too.
They can't be selling billions of dollars in secondary
and claiming the nonprofit for the good of the world
when literally the same executives who,
I'm going to use the word liberated or took without permission,
the New York Times, took without permission,
are the ones cashing in their shares at $100 billion dollar evaluation.
And pretty logical, if this thing is worth,
if the New York Times was 2% of the training data,
and if it was, let's say, the best of the training data,
and they said this is five times better than anything else,
okay, that's 10% of the good stuff.
Okay, 10% of $100 billion is $10 billion.
So we want $10 billion.
Or if this thing's going to grow to a trillion,
we want 10% of the value of the company,
and when it becomes worth a trillion,
you know, we're going to $100 billion.
Yeah, I mean, and these kinds of cases, like, you know,
it's always, and this is where, you know,
I love being a lawyer.
Like I genuinely think, yeah, right?
I love you being a lawyer.
Oh, thanks.
No, it's like, I think it's like where the advocacy really matters.
So, you know, one of the things I worked on early in my career was Apple Samsung case and I was the associate on damages and figuring out, okay, what's the value of a rounded corner on a phone?
Like that was like, how do you assess that?
And so similarly here, there's a lot of unknowns.
We don't know how valuable open AI is going to be.
We all think it's going to be worth trillions, but we don't really know.
There could be some meta could break out or one of the others could break out.
Yeah, it can become worthless.
It can become necessary.
Exactly.
Exactly.
Or, you know, a lot of the research I've seen in the last maybe a couple of months is that AI can
generate its own training data.
So there's people literally saying that we don't even need the NERC times anymore.
We can use the AI we have to rent the NERC times.
Exactly.
And so that's like another thing.
So it's really, you know, it definitely is not for the kind of phaen of heart or
stomach, there's a billion ways starting anything. But here, let me show you this, this, this, this, this, this, this, this, this, this, this, this, this, this, this, this is the
home email, because it's so fun, one second. There's always somebody on the staff while you pull it up, that thinks
they're an attorney, uh, like me, because I'm sitting here with my non-legal degree, but I've got a lot of
experience. And, uh, I always tell my team members, like, you're not an attorney. Do not talk about any legal
issues ever. We can have a phone call and talk about them. We're having an attorney, but be careful. Okay, here we go.
Exactly. So this one. So this is from Tim Lone home who was, um, I believe he was a, um,
an engineering director, and he sends it to Andy Rubin,
and then Ben Lee was a lawyer at Google.
But he says, context for discussion,
what we're trying to do.
And he calls it attorney work product,
which again, he's not an attorney.
So he tried.
He calls it confidential.
And then he says,
this is a short pre-read for her call.
And then this is the famous line that got a lot of play in litigation here in San Francisco.
What we've actually been asked to do by Larry and Sergei
is to investigate what technical alternatives exist
to Java for Android and Chrome.
We've been over a bunch of these
and think they all suck.
We conclude that we need to negotiate
a license for Java under the terms we need.
And that was the key issue in the case.
Funny enough, to your point of going to the mat,
Google ended up losing on this at the trial level,
but went up to the Supreme Court
and ended up winning over the needed copying.
But to your point that this is like
it's going to be a fight and it's going to be a lot of discovery,
I would predict that.
All right.
So what else are we missing here?
because you in your deck had some of the examples, I think,
is super compelling.
And because technologists, you know, you work with technologists,
they tend to, a portion of them think,
if I can technically figure out how to do something,
it's legal or it should be.
I don't know what to call this,
but like it's sort of might is right.
If I can technically figure out how to scrape your website
and create this or create that,
well, then it should be legal,
which is how the Napster folks felt.
Like, well, we tech,
and there's also the technical inevitability argument.
Well, it's going to happen.
And so we might as well do it.
Yeah.
Yeah.
There is something to that.
So one of the cases is, was an emulator.
So Sony is a very common, either plaintiff or defendant in IP cases because they have a lot of valuable IP.
And basically someone made an emulator of a PlayStation, an early PlayStation on a PC.
And the graphics were actually technically better on the on the PC.
And that case went to the night circuit and the emulator maker won.
Wow.
And there was a lot of copy involved.
They had to have, they had to basically reverse engineer the entire PlayStation to be able to do it.
And of course, they copied it.
Like the literal bits and bytes of the code were put onto, into the emulator.
And so there is something to that.
I mean, well, this is also the great irony of this is that while OpenAI is an organization you can sue, because it exists as an entity.
The open source community is a little bit harder to sue because they do.
don't exist as an entity, you have contributors.
So maybe you could speak to that, because if, let's say, Open AI does lose this case or settle,
which I believe is what it's going to be one of those two things.
Massive settlement, nine figures, minimum is my prediction.
And, but it will not be disclosed, but it'll be at least nine figures and with some kind
of licensing going forward.
But even if you were to do that, what's to stop as, you know, all these open source
projects come out there and somebody's decides they're going to roll their own model at.
as hardware gets better and better,
that they just rip the New York Times.
And you could buy the New York Times archive
probably from somebody in India,
in Manila,
in Israel.
There are scraping companies
that sell these things
on what I'll call the gray market.
Maybe illegal here,
maybe legal there,
maybe there's no laws there.
So maybe you could speak to that.
Do you think all this is for a move?
Open source is an interesting angle.
I mean,
I think open source had its own,
you know,
one of the things that was interesting
in the wave of AI
regulation we've seen, you know, from the EU and others, was, you know, open source had a lot of
the same objections of like, you know, people had t-shirts with algorithms printed on.
They're like, okay, if we open source, then, you know, all these bad guys will get, we'll get the
code. But sort of the market worked out. Here, I think it's tougher. I think, you know, it's going
to involve calls by the rights holders. And then what I think will happen is what we talked about
at the top of the hour, which is like, as the tech emerges, a market, like, tech for the market
for it will emerge. We're going to get the iTunes equivalent. And I think there are some startups
being funded in there. There's still at this point. I don't think it's a, it's not a,
before this case, I don't think it was being talked about enough to be a problem with a big
enough market, but now it is. Here's a possible solution. Let me see what you think of this.
I buy chat GPT for 20 bucks. And it says, um, if,
you authenticate with your New York Times subscription,
so your chat ChpT, and I authenticate with my New York Times subscription.
Then it says, okay, you're going to use Chant Chpity 4.5T for New York Times.
Yeah.
And so, but if you don't have the T and you do it on 4.5 and you say,
hey, wire cutter, what are the best things?
It says, hey, you need to have a New York Times subscription.
So authenticate with that.
And then you say, hey, I want to make Star Wars carry says, oh, you know what?
You have to use OpenAI.
You have to use Chat Chepti with Disney Plus.
So authenticate your Disney Plus, and now you can start to have fun with the Disney characters in Dolly or whatever it is.
And then they could license that to the highest bidder.
Because when I, you know, if you use Hulu and you have HBO Max or NBA or use Apple TV, they just authenticate each other's subscriptions.
You have this sort of subscription, death by a thousand subscriptions kind of concept.
What do you think of this concept?
I mean, I think it's certainly that that shape of a solution sounds right to me.
Like, I think it's...
Technically viable, too, right?
I mean, it's technically doable.
The other interesting thing is there are a bunch of startups trying to do sort of like your digital life, right?
Where Twitter search is like notoriously terrible.
You literally can't find anything on Twitter.
And how often does it happen to me that I'm like, oh, I remember there was a tweet about that.
And then like, I can't find it.
So you can imagine an LLAM that's actually trained on your entire everything you've ever consumed.
And then by the nature of if you've consumed it, then presumably at some point along what you have the rates do.
Yeah, that does number two.
There's a million ways to do it.
And that's kind of the why I characterize the lawsuit as historic is that we're at this
moment where we don't know what the tech and the market solution to this is.
And it'll emerge.
It's just, you know, maybe not in the exact way we did it.
So another fun thing, and Andreessen Horowitz, there's a investing partner that she writes,
I think it's Connie Chan.
She writes a lot about China and media in China.
In China, when you buy a Kindle book or any kind of book digitally,
you pay by the page, right?
So it's not actually a thing.
So the fact that we happen to buy whole books here in the U.S.,
that's the market that emerged,
not necessarily a foregone conclusion.
So in your example, you could have,
you know, that you're like,
do you want wire cutter?
And it could literally just be like the wire cutter slice
of the New York Times thing.
Or it could be like by the query
or it could be some kind of rev share.
Like, you know, as you said,
music industry is super sophisticated on this.
I think, you know,
the words and kind of digital print.
Publishers, let's be honest.
Publishers are kind of dopey.
They've been dopey historically.
They've never really been smart about their approach legally.
They've never held the line.
They let Google run amok.
And, you know, Rupert Murdoch got it right.
He's like, Google's nothing without us.
If those publications had grouped together in that era and told Google, listen, you know,
the top thousand publications are going to no index unless you pay us a licensing fee.
and here's what we want.
Google would have paid it, I'm sure,
and they just never had the coordination
or the hootspah that they needed to.
I think the New York Times today is so sophisticated
because they're a subscription-based business.
The move to subscription-based
makes them understand the value of their content,
and because it's subscription,
doesn't that change everything
on a legal and technical basis about this case?
The fact that there's a firewall,
maybe you could explain how the subscription wall changes this a bit.
Yeah, so strong plus one on New York Times having kind of jumped the digital divide or jump the digital, you know, evolution.
There, the New York Times food app would be, you know, it's a startup in its own right in the hundreds of millions in terms of revenue.
And, you know, recipes themselves are not copyrightable.
Obviously, the rest of it is.
But I pay for New York Times food and I have for since it came out because it's so nicely compiled and they do the, you know, 10 recipes to make for the new year.
and whatever.
Beautiful.
It's worth it.
It's a thousand percent worth it to you.
It's a thousand percent worth it.
But publishers and another example you could look to here is Kindle, right?
So they did one of the things that I point out in the thread and I think is, or, you know,
in the responses to my thread was so Amazon Kindle had the guy who's now the chairman of
co-to ventures, Dan Rose, was the head of business development for Kindle.
And he basically, oh, good.
Yeah.
He's done a bunch of, you know, did a bunch of deals with the publishers.
and initially he's public about this, Bezos said, don't tell them, we're making an e-reader.
And he's like, well, how am I going to get them to do deals with me if I can't actually say?
And so eventually they did.
But you're right that that was a moment where, you know, the tech company kind of had this power.
But what was different about Kindle was Kendall was still kind of unproven at the time.
Versus here we've got chat GPT.
Clearly, it's a runaway success.
It was the fastest-
100 million people using it.
Yeah.
Exactly.
You know, they're at, I think it was 160 or 1.6 billion ARR now.
Yeah.
And so they can't claim poverty or this isn't a real business.
This is not a student project.
So, I mean, even if you, you know, made the argument, I don't, I don't know enough about publishers to know if they're dopey.
But even if they were, like, you can see the money.
I knew them.
They were for 20 years when they, the digital, but now they're super sophisticated.
The ones that survived, it's kind of like a Darwin thing.
Like, if you survived as a publisher, you're sad.
Yeah, make sense.
Very full stop.
Okay.
Now, in your deck, you had some other examples.
Is there anything else in the deck that's super compelling we should rip through here
as we wrap up?
Let's take a look.
I love a, I love a guest showing up with a deck.
Amazing.
You've turned out to be a great guest.
Thank you for coming on the program.
Yeah, super fun.
Happy, any time you have anything legal, happy to dive in.
There was a fun part of the thread where people were making Luigi fan art.
And you could kind of tell when the model was getting,
was being aware of copyright.
So this is kind of a fun one.
So I started getting errors that said,
okay,
put Luigi in the background of my chat GPT.
It says,
I'm unable to create an image with Luigi
as it doesn't align with the content policy
for image generation.
Okay,
so this was like yesterday.
Yeah.
And then,
okay,
but clearly they didn't care about the Grinch
and blues from blues clues,
Coca-Cola,
and then I threw in in the background.
Hogwarts?
No, it's the castle from,
down and abbey. Oh, down. You're right. Yes.
That is the down. Yeah, yeah. So, I'm sorry, I've
heard. I didn't watch down an abbey.
Oh, it's amazing. No, I did.
It's so good. I'm being cheapy. I did watch it.
Okay, yeah. No, the movie itself,
if you just want the movie, it's pretty good.
So four copyright violations in one. Exactly. I mean,
you got trademark there with the Coke and everything. So yeah. And then,
you know, with the Grinch, it tried to actually
at different points. This is kind of funny. It would
try to actually do different things. I can see if I can
find you the, let me see if I can find the Grinch that it did. It did a Grinch that was,
let me pull up my GPT history. Always scary on a live demo. Yeah, always scary to pull it up.
You could have all kinds of interesting things. Exactly. No, no, no. I will pull it up because it is
funny. I think I asked for a green character that hates Christmas or something like that.
And what it did, this was really fun. So I was with my three-year-old and she wasn't fooled.
Like this one, she did not think was the Grinch. Like she said Grinch, but he's,
He's kind of different.
He's got like clothes on.
I was like Sesame Street character.
Yeah.
You know, this is clearly not the Grinch.
But later in the thread, I'm like, no, make it mean.
That's like so clearly the Grinch, right?
Jim Carrey, yeah.
Yeah, and then, you know, also the Grinch.
But then this one, it kind of goes back to...
That's a Pixar Grinch.
It goes back to a Pixar Grinch, so it's like not really right.
And then this one is like a wizard.
Like, what even is this?
Disney, maybe a Disney Grinch?
I don't know.
Yeah.
So I asked for Nordic Princess Sisters, obviously,
the Anna and Elsa, you know, so kind of did that with the braids, you know, so.
Well, this is the thing.
You can, you can know the keywords very easily of these are IP.
So if you just said, hey, give me all the Disney characters, all the Marvel characters, put their names in here.
People ask for that.
Just tell them it's against the policy.
It won't do it.
Yeah.
Moana, it just says no.
If he's just making me Moana, it won't do it.
Yeah.
And so I did this where I was trying to make my bulldog into Darth Vader.
And then it says, we can't, not going to, can't do that.
I said, make a Sith Lord bulldog.
And it's like, yeah, of course.
here you go. So I think they're trying to get this copyright thing under control, but the truth is,
especially for images, there's a finite number of styles in the world. And so it's very clear
that they have a Pixar style and they have a Marvel style. And they have stolen those styles.
Those are not their styles. So maybe you could speak to the concept of a theme or a style.
The Pixar style is unique to them. Is that defensible? And if you,
If you say, I want to make this in the style of Pixar,
should a language model that makes images be able to make you a Pixar character?
Should they be able to do that?
Yeah, I think the idea of the Pixar style,
should they be able to do in that style or inspired by?
I think so.
I mean, this is like, okay, you know, a round phase.
And, you know, I actually represent, this is fun.
This actually came out.
One of the cases I worked on at my firm,
was Barbie versus Bratz.
So the founder of MGA, which makes Brats,
worked at Mattel,
which is a very active rights holder.
They sued people for a lot of Barbie things.
And then obviously the Barbie IP is super valuable,
billion-dollar movie this summer.
So he worked there during,
and one of the defenses was that,
you know, it wasn't infringement
because there's only so many ways to make a doll.
So in the office,
really fun, we had these big dollheads everywhere,
and we looked at anime,
we looked at whatever.
And the case, also 10 years of litigation,
but should you be able to make a big-headed doll, like in the style of a Brat's doll or, you know, whatever?
Probably.
I mean, so I don't know.
I think it'll be tough where, you know, you can ask GPT now to give you a Taylor Swift-style song.
And it does a pretty good job.
So where it's something new, you know, a little bit better.
That's why the Exhibit J with a hundred verbatim things is so important.
So copyright law isn't going to stop, you know, make me a Pixar style character.
of, you know, jar of mustard or whatever.
Like, you know, like, whatever you want to pick.
It does seem that some people are confusing non-commercial use with commercial use.
So they're like, well, I could draw a Jedi bulldog.
Is that illegal if my daughter makes one?
Versus I'm charging 1995 for a product to do this and at scale with $1.6 billion in growing in revenue.
So can you explain to people why, you know, these are two different things?
in the eyes of the law. Yeah, so that was actually a big issue in the Betamax case. So the Betamax case was
VTRs or what is now VCRs. And the funny thing happened, which is Disney was one of the groups that sued
Sony. And the Supreme Court held, you know, there is a substantial non-infringing use. That's the
language, which is time shifting. So you want to watch the game. You use your VCR, you record it,
and then you watch it later.
And there was all kinds of evidence.
This is how people were using VCR.
Nevertheless, Disney was one of the petitioners in that case.
I think it was less than six years later.
Disney was the single biggest seller of VCR tapes.
And so literally, like, the tech finds away, right?
And the market finds away.
And so in terms of commercial use, you know,
there were a bunch of people in the comments
and some beautiful article in, I think it was The Guardian,
about, I can't remember his name,
the guy who came up with Mario, the game designer.
The famous guy, yeah, I know he's talking about from Nintendo.
Shiguro, I think.
Anyways, saying that he was the architect of children's dreams for a generation,
which is like a beautiful quote.
But essentially, like, if people, you know, making actual Mario's,
okay, I think that Nintendo should be able to go after that.
But making Mario style video games, no, I don't think they should be able to like after that.
try to inform the audience of where we think this is going,
the prediction for what happens in the long term here with this case
and then how it affects the wider industry.
So it does seem the number one possibility in all cases of a copyright claim is settlement.
So I guess that would be one possibility, settlement.
Then there's go to the mat and take many years and then get a judgment, right?
That's the second possibility here.
So, and then I guess there's the courts throwing this out or dismissing it, right?
Or something.
So are those the three buckets we should be looking at here, like either New York Times wins or loses or a settlement happens?
Those are the three possibilities, broadly speaking.
Yeah, I mean, winning and losing is like, you know, even in some of the famous cases, there's a process where it gets remanded, so sent back.
So some of these things for the infringement that, or, you know, if it is infringement, the things that have already happened or Exhibit J's type of examples, you know, New York Times will preserve claims on that. But as Open AI makes changes, I think you're right. I think it's either the case gets settled. And one way that could happen is, you know, they announced some kind of copyright holder symposium or something. And New York Times is like the head of this consortium. And it's like some kind of opt-in
where, you know, New York Times contest is part of it and publishers can go there and maybe
they get a little royalty. You know, something accelerated similar to what the music industry has,
where there's a very sophisticated thing where you need to get the mechanicals and you need to get
the performance rights. And it's, it's, there's like a known system and libraries for that.
We'd call that a marketplace solution emerges.
Exactly. So like one possibility is like a marketplace solution.
Another possibility is, you know, the court comes out with a ruling that says,
LLMs, just the fact of developing an LLM, is not copyright infringement,
provided you have some kind of substantial protections.
And it could come out with a test that says,
okay, if somebody asks for Luigi or Molana,
you know, anything that's like very obvious,
that should be fixed.
And there should be measures taken to address that.
Another possibility, which we, was it on the table,
is congressional action.
That's very possible.
So we actually have that,
that has happened.
where courts have, I'm sorry, Congress has codified things.
I mentioned fair use in the 1976 Copyright Act.
With the internet, we have the DMCA, and it's a very robust system, right?
Somebody asks you to take something down.
You know, they can contest it.
So an amendment to the DMCA also possible.
There's a California senator who, he was a CS major, super cool.
So California, I think senator or a U.S. center from California,
who is proposing AI regulation.
and that could be a possibility.
So that would mean,
whoever gives the most money
to be a bit cynical here,
whoever gives the most money
to their senators,
congressmen,
whatever, politicians,
and has the most influence
in the deepest pockets
for these old people,
the gerocracy,
is that what they call it?
Geriotocracy or something
that run Washington,
you know,
that would kind of feel like
it would be in favor of the
copyright holders
because copyright holders
in the United States,
we really do protect them in a major way.
So they could just say, listen, you got to get permission, full stop.
You know, that's possible.
I do think, you know, like I said,
opening eye has been very savvy and thoughtful in a lot of ways.
And, you know, part of Sam Altman's sort of charm of charm tour last fall was on this.
Like, they know who he is.
And he's like, look, I'm not Zach.
And I think he was very successful in showing that he wasn't, that he's not
Zach.
And so, you know, that's another possibility.
I think, you know, the Europe.
Europe did regulate AI and they were very proud of that.
So far it has not been regulated here.
But I think there will be a situation.
And this is, Google made a bunch of good law on that sponsored search.
So initially, if you search for a term on Google, if you search for, you know, Acme.
Acme's competitor can buy that.
Or if you search for Ford, you can get a Chevy ad.
And that was actually, it wasn't clear that that wasn't trademark infringement.
Google spent about 10 years litigating their.
a shoe and just won. And I've seen legal theorists make the case that part of the reason Google won
is that judges love Google because it's so useful. And so I think similarly for chat GPT, it's so
useful for us lawyers in particular because we are stock and trade as words that either, I don't
think that opening I will lose here. I actually, I'm going to disagree with you. I think that
Open AI will win some key points. You know, they'll probably have to make some kind of
concessions, they'll have to have copyright here in center, and they'll have to, you know, have
DMCA style things. But I don't think they're going to straight lose on the fair use, or at least
not without going to always wait at Supreme Court.
That's fascinating. I'm taking the other side of that. I think it's going to be,
we're going to come down in favor of for people who have at-scale copyright libraries,
you're going to need their permission ahead of time. And if you've used it, I think you're
going to have to unwind it, which is what I agree with you that they're probably in the process
of doing that. Sam's pretty smart. And I think it's easier to just be like, you know what,
We took it out. We redid it. It's no big deal.
Did we train our model without New York Times?
They could do that. There's been some memes on that, on the, you know, the square jaw meme.
And it's like, take my content out. And the guy's like, fine.
Yeah, okay. Sure. And then that could be possible.
Yeah, I think that's a distinct possibility. I do think you bring up a really good point,
which is having seen up close and personal what happened with Uber and also Airbnb,
I wasn't an investor in that one, unfortunately. Because people live,
loved the service so much and became addicted to it. By the time the lawsuit started to pile up,
when Austin got rid of, the city of Austin got rid of Uber and Lyft at one point, people went nuts.
And then when people-
Campaigns, I mean, it was amazing. Yeah. And their public policy, it was incredible, like,
doing that. The one distinction I would make, so Uber is a great example where, like, I even tell,
I advise my clients, like, product market fit is an incredible drug, right? It really makes your
lawsuits better. And honestly, Uber, every time they had a lawsuit, their usage just went up.
So it's like this ironic thing. And by the way, also for Uber. Yeah. This one, what's slightly different is Uber was like in the trenches city by
city versus this is federal. So you run into some of the gerontocracy or whatever issues that you mentioned.
Airbnb be the same thing. You know, people were like, well, I want to have choices of where to stay.
And I want to be able to monetize my home or my second home or my guest house. And it just felt like,
those companies were on the right side of history,
vis-a-vis consumer choice, lower prices, et cetera.
And I think that's what Chat GPT really has going for them,
which is we all want to be able to make Luigi characters
and make a birthday card for our family
or make a party invite that has the Silver Surfer
and Marvel characters on it.
So if that's the case, we're kind of like,
well, that's kind of the world we want
is where we get to use your copyrights without your permission.
No, it totally is.
But I mean, this really gets like, it goes back like kind of way back machine.
Like, I remember when I was a Yahoo and Yahoo was trying to launch like a subscription or a paid e-card service.
And I was like, I'm not going to pay for that.
I can get that free on Blue Mountain.
And it just happened to be that ads is what emerged.
But one of the things the Europeans point out, and I think a lot of thinkers point out, is it ad-supported tech is not necessarily how it had to be.
No.
Could it could have been another way.
And so, you know, similarly here, like, it could be a subscription, it could be a licensing.
Like, there's many different ways.
And it'll be interesting to see, like, eventually the law catches up.
It just, like you said, takes time.
I think now there's really very clarifying to have you on the program because the market-based solution is the likely case here.
So I think that's where I come to after an hour with you.
Market-based solution, always the best solution.
Parties get around a table and hash it out.
And then there's, of course, some liability for the mistake.
that opening I made, so they pay a speeding ticket, they give them $100 million as part of this new thing.
No harm, no foul. They can afford it. They got $10 billion laying around. It's all good.
But I like the market-based solution, and I think there's something very interesting in how the cable TV
system worked, or how bundling and subscriptions work now in authentication. Because chat CPT knows how to
do that. Sam Waltman and the team over there know how to do that. Like, they already have API
keys. So the New York Times
subscription is like an API key to unlock some things
in the New York Times, right? It could be like
a really cool feature. Like maybe the
Open AI markets to people,
hey, if you have a New York Times subscription, this is going to get a lot
better for you because when you ask your queries,
it's going to give you a bunch of stuff and say, and also
from the New York Times for further reading, boom, boom, boom, boom.
And would you like us to bookmark this? A world of possibilities
of how Open AI could work
with, you know, New York
Times to make interesting stuff.
Zavi recipes. Hey, I want to ask it about recipes. Here's the, I took a picture of my
refrigerator and then it went to the New York Times Food app and told me possibilities of what I can make
based on my spice draw. That's kind of interesting. It's amazing. And that was one of the things,
you know, I mentioned that I spent, you know, three and a half years at Amazon. And that's kind of
how Bezos thinks? And it was woven into everything where it was like, okay, how can we have this
tech work together? How can we get paid for one piece of content multiple times? How can we turn
something into self-serve. Like, you know, I loved Mark Andreessen's AIPs. Obviously, he's super in
the AI optimism phase, but I'm very optimistic too for all these like daily life, fun use cases.
So, yeah, good stuff. As career technologists, you and I, it's pretty clear that this is the one.
This is the chosen one. This technology is the manifestation of everything that's come before it,
from the PC revolution to the internet to mobile and cloud and all this.
and then big data.
All of this is built up to this moment in time.
And so it's really important we get it right.
Chichilia, you are amazing.
Where people find more of you?
Yeah.
So you can follow me on Twitter.
It's Chichilia Zinn.
Or I'm pretty active on LinkedIn as well.
I'm launching my own startup,
which is AI for lawyers.
Yeah.
Oh, I know an angel investor.
I know an angel investor.
Yeah.
He's really good at getting you your first 100 customers.
Amazing.
Yeah, no.
Does it have a name yet?
Yeah.
it's going to be general counsel AI. So that's who I am. And I thought, okay, GCAI. But
love it. Essentially still sort of, I guess, you know, stealth because we're developing the
product. But I have an engineering co-founder and we're pretty well there. But to the point that
we talked about, like LLMs are wordsmiths, right? So what are lawyers? Also wordsmith. So I've had,
you know, when you said that this is the chosen technology, I had that feeling very strongly. I've
never been excited about legal tech before. It's like, you know, CLM snooze. But this was actually like,
document management, snooze. You know, but like this is something where to the C&D point, I could write a
cease and desist letter and I just say like, here's the, here's the infringement, here's the whatever,
and make very light edits. And a one hour task becomes a five minute task, not even. And I give training
classes for lawyers. You can find me on Maven. I know you're friends with Gagan and them too. So I teach on
Maven because I just have so much energy that I got to get out, right?
Fantastic.
Everybody check our Maven.
And we'll put some links in the show notes.
You are awesome.
Please come back.
We should do a check-in when we on this case and more.
Yeah, let's do it.
This is so fun.
Okay, to show you.
Happy to do it.
Have a good one, Jason.
Thanks a lot.
And we'll see you all next time on this week and start.
Bye-bye.
