The AI Daily Brief: Artificial Intelligence News and Analysis - The AI Data Wars: Elon Threatens to Sue Microsoft While Viral AI Drake Track Stuns Music Industry

Episode Date: April 20, 2023

We're seeing more and more instances of industries, companies and individuals fighting back against their data and IP being used to train AI. On this episode, we look at Elon threatening to sue Micros...oft, Reddit changing their terms for AI companies, and the music industry loosing their ever-loving mind about a viral AI produced track featuring the synthetic voices of Drake and The Weeknd.   Subscibe to our YouTube: https://www.youtube.com/@TheAIBreakdown

Transcript
Discussion (0)
Starting point is 00:00:00 This episode of the AI breakdown originally premiered as a YouTube video on Thursday, April 20th. In it, we discuss Elon Musk threatening to sue Microsoft and open AI over their use of Twitter data to train their models, the music industry freaking out at the viral AI track featuring Drake in the weekend that didn't feature them at all, and the broader question of the emerging AI data wars. Welcome back to the AI breakdown. Today we are talking Elon Musk suing Microsoft, Reddit charging AI companies for access to its data, It's the Data Wars, baby. All right, guys.
Starting point is 00:00:35 Well, yesterday you might have been one of the 13 million people to see this little interaction. Twitter Daily News writes, news. Microsoft drops Twitter from its advertising platform as they refuse to pay Twitter's API fees. Elon Musk, not being one to let something like that go, writes, they trained illegally using Twitter data. Lawsuit time. Now, again, this has 13 million views, 207,000 likes, over 1,100 bookwomen. marks, 19,000 retweets, 1,700 quote tweets. So what's going on and what does it have to do with a larger
Starting point is 00:01:10 phenomenon happening in AI? Well, the specific offense from Microsoft was this notice. It's on the Microsoft advertising website and it says, starting on April 25th, smart campaigns with multi-platform will no longer support Twitter. As of April 25th, 2023, you'll be unable to access your Twitter account through our social management tool. Create and manage draft. or tweets, view past tweets and engagements, schedule tweets. Other social media channels such as Facebook, Instagram, and LinkedIn will continue to be available. However, it was clearly not what Elon was focused on. Elon shifted the conversation to be about AI.
Starting point is 00:01:48 So specifically, Elon is accusing, or at least implied, that Microsoft and presumably OpenAI used Twitter data to train their large language models behind things like ChatGPT. Now, OpenAI and Microsoft for the purposes of this tweet, I think, were assumed to be one and the same, even though technically OpenEye remains an independent company, even though Microsoft owns a big chunk of them based on their last couple of investments. Now, holding aside the specifics of OpenAI or Microsoft, this is an issue that is starting to come up more and more and more. You might have heard of a Getty lawsuit last summer where Getty images set out to sue stable diffusion for copyright infringement. What happened was that people started noticing that in the images that they were generating with the stability AI-based tool, there were little sections that looked an awful lot like the Getty Images logo that's on unauthorized Getty images, or rather Getty images that haven't been licensed yet.
Starting point is 00:02:50 So you'll know if you've ever Googled an image and seen exactly the perfect one, but then it's a Getty image. So there's a little bar down at the bottom that says Getty Images with the name of the photographer. And that is a protection so that people can't use these images unlicensed, right? They can't just copy paste them from Google search. Now, the way that stable diffusion was pulling images back, it showed something like that little patch. And that's because the AI wasn't able to understand that that wasn't supposed to be a part of the image.
Starting point is 00:03:19 When it was searching or trying to draw inspiration to create its AI version of the image, it was looking at all the available materials, all the available images that it had been trained on. and if a meaningful portion of them were Getty images, then it would just assume that that little box was supposed to be there. In the suit, Getty is accusing Stability AI of, quote, brazen infringement of Getty images intellectual property on a staggering scale. It claims that Stability AI copied more than 12 million images. Now, interestingly, in this lawsuit, not only was Getty focused on the fact that Stability AI had used their data set to train their model, but also that it was an issue or that they presented the watermark in a way that is
Starting point is 00:04:03 bizarre or grotesque their words and thus dilutes the quality of the Getty images mark by blurring or tarnishment. So they're really, we're hitting them from both sides. Now, fast forward, this issue is coming up more and more. Obviously, we see Elon talking about it, but he could just be talking, right? It's not for sure that he's going to actually follow through and sue Microsoft or go after OpenAI, but a company that has actually made moves specifically in this area already is Reddit. So Reddit is starting to charge AI models for learning from its huge data set. An Ars Technical article says LLMs can no longer lurk, learn, and profit from 18 years of links and chatter. This came out first in the New York Times, where Reddit founder and CEO Steve Huffman
Starting point is 00:04:49 said that it basically knew that the 18 years worth of content that it had that was generated by humans was immensely powerful and that it wasn't going to just allow companies training their large language models to access it for free. So the Reddit API will remain free for developers, working on bots or other Reddit tools, as well as to researchers, but there is a new premium category for those companies that need more information. So this is from the Reddit announcement that followed that news story. They write, we are introducing a new premium access point for third parties who require additional capabilities, higher usage limits, and broader usage rights.
Starting point is 00:05:27 Now, what's interesting about this Reddit story is that there has been some speculation, that one of the reasons that Elon really wanted to get his hands on Twitter was to access the huge, huge amount of human language data that the platform represents. As far as I've seen, Elon hasn't even really hinted at that, but it's an interesting theory. The question of data and intellectual property that underlies AI training models has also recently been in the news in the context of the music world. Over the weekend, an anonymous TikTok user posted a track called Hard on My Sleeve. It was a new Drake and the Weekend track, except it wasn't. It used AI to put Drake and the Weekend on this song, and it was a banger.
Starting point is 00:06:11 Frankly, it went extraordinarily viral. McKay Wrigley says AI music is here. This This is the first example of AI-generated music that really wowed me. This guy, Ghostwriter 77 on TikTok, made a Drake X The Weekend track that's actually kind of insane. You'll soon be able to make unlimited music by your favorite artists on demand with AI. Roberto Nixon wrote this extremely viral tweet, probably viral, because it was attached to this track, and says, listen to this AI-generated song featuring Drake in The Weekend. It goes so damn hard. It's by Ghostwriter 977 on TikTok, and it's blowing up,
Starting point is 00:06:47 on socials and streaming platforms. UMG, which controls around one-third of the global music market, has already asked streaming platforms to ban AI, a modern Napster moment. We'll be fascinating to watch this all unfold in real time. Now, very soon after he posted this, Twitter has the little, this media has been disabled in response to a report
Starting point is 00:07:08 by the copyright owner post, and this track was really eliminated from a lot of different parts of the internet. Although, of course, people downloaded it, and so it's still existing. On April 17th, Rob Abelow tweeted, This AI-generated song featuring Drake in the weekend, trading lines about Selena Gomez dropped on Saturday. It now has 20 million streams in under 48 hours.
Starting point is 00:07:30 TikTok 13 million, Twitter, 5.3 million, Spotify, 254,000. YouTube, 144,000, SoundCloud, 84,000. The track is impressive and the artist leans hard into AI in the branding. The title says it's AI. Artist's name is Ghostwriter. They wear a cloth like a ghost. the videos. The fact it's AI is a feature. It's why everyone wants to talk about it. What about when that's not interesting anymore? Will AI mashups of celebrity artists become a new art form like
Starting point is 00:07:58 some mutant child of the remix? Troy Carter, who was formerly Lady Gaga's manager and who now runs Venice Music, wrote, Ghostwriter is the new Banksy. He should never reveal his identity. Now, there are a lot of opinions around this. Some people feel like this is a clear infringements of artist's IP and copyrights and are with the industry that this shouldn't be allowed. Others, whatever they think about whether it should or shouldn't be allowed, feel like it's entirely inevitable, which is obviously exactly how so many of these conversations in AI go. Then there's also the set of people who think that there is a new thing coming for creators
Starting point is 00:08:36 in this new quote-unquote art form. Joey Palitano tweets, that sound you hear is the assembling of an army of copyright lawyers, the largest the world has ever seen, who have been in cryosleep since the Napster lawsuits, and have just now been awakened for one last job. Traffic Jam Master Jay writes something similar. Only Gen Z would dare to wake the sleeping giant that is the RIA in the battle over AI. Y'all don't remember how they absolutely beat the brakes off of Napster and Livewire. The RIAA is undefeated.
Starting point is 00:09:06 We are exiting the F-A-R-A-R-R-R-R-R-R-R-Wase and speed running towards Find Out. Now, all joking aside, I don't think that that's wrong. This is going to force a confrontation in the legal system around AI training and data and IP, at least in the context of music, but I wouldn't be surprised if the implications go a lot farther. Last year, even before ChatGPT came out, the Verge wrote an article called The Scary Truth About AI Copyright is Nobody Knows What Will Happen Next. The last year has seen a boom in AI models that create art, music, and code by learning from others' work, But as these tools become more prominent, unanswered legal questions could shape the future of the field.
Starting point is 00:09:46 This is really the point of this. We don't know what courts are going to decide. We don't know, on top of that, how people are going to be able to enforce it. Ryan Hoover from Product Hunt writes, free startup idea that will likely get you sued. AI Spotify. How it works? AI Spotify hosts AI generated music of your favorite artists. Anyone can submit music and the best song surface based on listens and likes.
Starting point is 00:10:08 music with the most listens earns a pro-rata share of subscription revenue reserved for the original artists. For example, Drake could claim money generated from his likeness on the platform. Artists that do not want to participate can opt out entirely banning any music that uses their likeness or individually allowing songs they endorse. Of course, there are many ethical and legal issues with this model, especially with labels. But maybe this is a germ of a shower thought that has potential. I like this view from Ryan. I think that it splits the difference between the take that this is going to be prevented active,
Starting point is 00:10:38 but also the reality that this is likely inevitable activity and that no amount of legal pressure and court decisions can really stop technology like this once it's out of the bottle. This at least would be a way for the AI generated set of music that's built on artist catalogs to, A, benefit them in some way, B, be clearly segmented from other types of music, and C, still not undermine all the incredible creativity that's poised to go into this. Anyways, this is a beginning of an issue that is going to be. going to do nothing but grow more and more intense. And frankly, we could see a very different landscape for AI startups in a few months to say nothing of a few years. But anyways, guys,
Starting point is 00:11:19 the data wars, they are here, they have begun. That's the story for today. It's a story that we will hear lots more about. Until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.