The AI Daily Brief: Artificial Intelligence News and Analysis - The AI Data Wars: Why Elon Musk's Rate Limits Are About More Than Twitter

Episode Date: July 3, 2023

The AI Data Wars come to Twitter as Elon Musk rate limits users in an attempt to block AI data scraping. The move follows big changes to the Reddit API that some have called the end of the internet as... we know it. Before that on The Brief: Valve has said they won't approve games for Steam that use AI art that might have copyright issues; Humane shares more information about its Ai Pin wearable; AI enthusiasm in markets is causing some people to worry. Today's Sponsor: Supermanage - AI for 1-on-1's - https://supermanage.ai/breakdown ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI.    Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe   Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown   Join the community: bit.ly/aibreakdown   Learn more: http://breakdown.network/

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI Breakdown, we are talking about the latest front in the AI Data Wars. Before that on the brief, Valve says no to Steam games that use AI art. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information. Hello, friends, welcome back to the AI breakdown. We are back with our regularly scheduled format. I did want to let you guys know that because tomorrow is a holiday in America, it's July 4th, Independence Day, there will be an AI breakdown episode, but it will not be the normal format of a brief first
Starting point is 00:00:33 followed by a main episode. It will just be one topic around whether AI is more suited for authoritarianism or whether it is a tool for freedom. I thought that was pretty appropriate for July 4th. Now, when it comes to today's episode, before we dig in, I wanted to tell you about supermanage. Supermanage AI is the type of company that is using AI to actually change how we work right now in ways that I think are hugely beneficial. For Supermanage AI specifically, they are working to make one-on-ones a key part of pretty much every business at this point, work much better. Supermanage's AI distills teams public Slack channels into a real-time brief on any employee. That means that managers can see contributions, work in progress, challenges they're facing,
Starting point is 00:01:15 sentiment, and more, and that allows them to show up for a much more meaningful conversation. The time spent in one-on-ones then is much more productive, leading to better outcomes and a more positive work experience overall. Supermanage is completely free. you can check it out. Go to supermanage. coma.ai slash breakdown and check out their tool.
Starting point is 00:01:33 All right, with that, let's dive into today's AI breakdown. Welcome back to the AI breakdown brief. All the AI headline news you need in five minutes or less. Today we kick off with a topic that is doing nothing but growing in importance, and that is, of course, questions around copyright when it comes to AI artwork.
Starting point is 00:01:52 Now, the specific context we're going to talk about is Steam and Valve. However, it's useful to look at the state of the conversation more broadly a little bit as well. In the U.S., one of the big battles on this front has been Getty Images lawsuit against stability AI, saying that it used its images inappropriately to train their stable diffusion model. Getty, clearly being pretty serious with this challenge, also brought that same complaint to the courts in the UK.
Starting point is 00:02:14 Meanwhile, taking a very different approach, reports suggest that Japan is saying that AI model training doesn't violate copyright. During a meeting of Japan's Financial Oversight Committee, Takashi Ki, a member of the House of Representatives for the Constitutional Democratic Party, said, we asked questions about generative AI from two perspectives, copyright protection and utilization in educational settings. In Japan, works for information analysis can be used regardless of the method, whether for non-profit purposes, for profit for acts other than reproduction, or for content obtained from illegal sites. So, bringing it back to today's particular context,
Starting point is 00:02:47 Valve has said that it will not approve Steam games that use AI artwork, which could be seen as copyright infringing. Now, this actually started with rumors that Valve was taking an even harder line stance, saying that Steam would no longer publish games with any AI-generated content. However, in a statement that they sent to the verge, Valve said that the company's goal is, quote, not to discourage the use of AI on Steam, but to be cautious when it comes to existing copyrighted artworks. Now, to me, this doesn't read as Valve taking some hardline stance one way or another, but instead just looks like a business who doesn't want to be on the front lines of a new legal battle, and is just covering itself as it lets this battle play out in the political sphere.
Starting point is 00:03:24 Now, other companies are addressing these same concerns, but doing so in a different way. You'll remember that last month when Adobe launched its Firefly generative AI suite, it came with a promise that it will cover legal bills related to copyright challenges for enterprises that use the Firefly product. Now let's move to a very different topic which has gotten a lot of people hyped and that is Humane's AI pin. Now you might have seen this demo back in April from the TED conference. It shows a person using a wearable device that has importantly no screen to do a variety of interesting things. They take a phone call projecting the relevant information on their They translate a sentence into a different language that still comes out in their voice,
Starting point is 00:04:02 and there were various other aspects of the demo as well. Let me show you something. Invisible devices should feel so natural to use that you almost forget about their existence. You'll note that's me and my voice speaking fluent French, using an AI speech model That's part of my own AI. The future will not be held in your hand, and it won't be on your face either. The future of technology might almost be invisible. Thank you.
Starting point is 00:04:52 Now, more information is just slowly starting to roll out about this company and its product, and people are pretty excited. Humane was started by former Apple employees and has raised $200 million over the last couple years, and so expectations are really high. Still, there remain a lot of questions. As the Verge put it, other than the name, the only revealing thing about Humane's release today is that it used AI 22 times and that the PIN, quote, uses a range of sensors that enable contextual and ambient compute interactions.
Starting point is 00:05:19 But ultimately, the Verge remains, quote, unabashedly intrigued. They say, it's a huge swing at a new form factor and potentially a whole new idea about how we're supposed to interact with technology. In a world increasingly full of screens, in our hands, on our bodies, even on our faces, humane's going the other way, and it's going to be fascinating to watch. And in fact, that's one of the things that I think is really interesting about this. On the one hand, you have Apple putting this new form factor over our eyes more directly, and on the other hand, you have Humane running in the other direction, getting rid of screens entirely.
Starting point is 00:05:51 It does feel in some ways like a battle for the future of how we interact with digital experiences, and I'm not sure that anyone knows yet which will win out. Moving over to markets for a moment, the big theme in many ways of the first half of 2023, has been the tension between divergent forces. On the one hand, we've had ever-present warnings of looming recession. We've had a Federal Reserve, which up until very recently, has continued to hike interest rates even in the face of a banking crisis. And yet, at the same time, enthusiasm, particularly around artificial intelligence,
Starting point is 00:06:22 has led markets to have a very good year. NPR writes, It's been a hell of a year so far. Three regional banks collapsed. The United States came close to defaulting on its debt for the first time in history, and the Federal Reserve continued to hike interest rates aggressively. But despite all that, the stock market surged in the first half of the year. What gives?
Starting point is 00:06:41 NPR points to two different possible explanations. The first is that the longer there has been the promise of things like a recession, the less investors are behaving like it's going to happen. But at the same time, it really points to AI as the big driving force. The issue, of course, that they point out is that the stock market's gains have not been broad base. They've been highly concentrated in tech stocks that represent. late in some way or another to AI. That creates a fragility if market narratives move in a different direction, and the AI enthusiasm bubble starts to pop. Now, speaking of technology in Wall Street,
Starting point is 00:07:13 Kathy Wood from Arc raised eyebrows in May when she said that the, quote, most impactful AI project might be Tesla's self-driving technology. This week in San Francisco, a battle looms over full self-driving technology, as California is voting whether to allow 24-7 driverless cabs from the company's Waymo and Cruise. The California Public Utilities Commission will vote on July 13th, and they are widely expected to approve both company's permit requests. Now, if full self-driving cars are an example of AI in practice, so too are the algorithms that dictate which content we get on platforms like TikTok, Facebook, and Instagram. In a bid for more transparency, Meta has released some amount of information on how AI is used in those algorithms. Last week, Meta released two dozen
Starting point is 00:07:58 explainers that focus on various features of those platforms, including Instagram stories, Facebook's news feed, and more that describes how the company determines which content to recommend to users. In a blog post from last Thursday, Met as President of Global Affairs Nick Clegg wrote, with rapid advances taking place with powerful technologies like generative AI, it's understandable that people are both excited by the possibilities and concerned about the risks. We believe that the best way to respond to those concerns is with openness. And lastly today, speaking of concern, the Arnold himself, in a recent, speech said that the AI future that Terminator had imagined is, quote, here today. Today, he said,
Starting point is 00:08:34 everyone is frightened of it of where this is going to go. If that is not tailor-made for headlines, I don't know what is. That's going to do it for today's AI breakdown brief. I'll be back soon with the main AI breakdown. Over the weekend, perennial main character Elon Musk became the main character again when he tweeted this message. To address extreme levels of data scraping and system manipulation, we've applied the following temporary limits. Verified accounts are limited to reading 6,000 posts per day, unverified accounts to 600 posts per day, new unverified accounts to 300 per day. So what's going on and what does this have to do with artificial intelligence? Welcome back to the AI breakdown. Today we are talking about the AI Data Wars. And for this,
Starting point is 00:09:19 we need to go back to April. In that month, Reddit made news by changing its policies around how third parties could use data from its site. Now, what's important to understand is that Reddit is an incredible trove of natural language data. 57 million people every day go to Reddit to engage in conversations around basically every topic you can think of. That has made it a honeypot of data for AI training. Companies like Google, OpenAI, and Microsoft have all used Reddit conversations in the development of their foundation models, and Reddit this year finally said, enough is enough. Founder and CEO, Steve Huffman, said in an interview,
Starting point is 00:09:56 the Reddit corpus of data is really valuable, but we don't need to give all of that value to some of the largest companies in the world for free. Now, an important context for Reddit is that this isn't just about them being a little peeved that other big companies got all this information for free, but also that it's preparing for a potential IPO in the next year or so,
Starting point is 00:10:14 meaning that it needs to increase its profitability, and so is likely looking at charging for API access as a way to appeal to Wall Street. However, as much as Reddit tried to position this as something that was really about AI training, that didn't stop there from being a huge community backlash. The problem was that although the API changes were nominally meant to stop companies like Google and Meta from scraping the site for data, they had big impacts on many third-party apps,
Starting point is 00:10:37 which were from smaller developers and companies that weren't abusing the API in the way that Reddit was concerned with. In the middle of June then, in protest, more than 8,000 Reddit communities went dark in protest, and that number increased after an internal memo from Steve Huffman was reported on by outlets like The Verge. In that memo, Huffman wrote, There's a lot of noise with this one, among the noisiest we've seen. Please know that our teams are on it, and like all blowups on Reddit, this one will pass as well. He even warned employees about wearing Reddit gear in public, saying, Some folks are really upset and we don't want you to be the object of their frustrations.
Starting point is 00:11:10 Now, coming into this weekend, the community was somewhere between still angry and sad. Wired wrote, The Magic of Reddit is gone. As of today, June 30th, 2023, several mobile apps for browsing the platform are closing up shop ahead of a new initiative from Reddit to charge for access to its API. The Wired piece was titled, Reddit won't be the same. Neither will the Internet. It's the latest front in the labor battle
Starting point is 00:11:32 between algorithms and the humans who feed them. The piece writes, If all of this sounds like a lot of fretting over something as wonky as an API change, it's not. It's indicative of a growing new awareness of what constitutes labor on the Internet
Starting point is 00:11:43 and how communities can have their work mind for money-making ventures, specifically ones powered by artificial intelligence. If all of this is starting to sound like a labor movement, that's because it is. AI's rise has caused a reevaluation of what people put on the internet. Artists who feel their work was scraped by AI without credit or compensation are seeking recourse. Fan fiction writers who shared their work freely to entertain fellow fans now find their niche
Starting point is 00:12:06 sex tropes on AI-assisted writing tools. Hollywood screenwriters are currently on strike to make sure AI systems aren't enlisted to do their work for them. And this brings us back then to this weekend and Elon Musk announcing that there would be new limits to how many posts people could read on the site each day. Elon was clearly trying to position it in a similar way as having to do with AI. That's why he wrote to address extreme levels of data scraping. Now, this isn't the first time Elon has made hay when it comes to AI data. In April, right around the time that Reddit announced its new API changes,
Starting point is 00:12:36 he actually threatened to sue Microsoft over Twitter data in one of his errant tweets, saying they trained illegally using Twitter data. Lawsuit time. That was in response to Microsoft announcing that they had dropped Twitter from its advertising platform because they refused to pay Twitter's API fees. When someone asked Elon if he had a long-term plan here, Elon wrote, I'm open to ideas, but ripping off the Twitter database, demonetizing it, and then selling our data to others, isn't a winning solution. Elon continued to explain this issue as relating to AI.
Starting point is 00:13:05 In a separate tweet, he wrote, drastic and immediate action was necessary due to extreme levels of data scraping. Almost every company doing AI, from startup to some of the biggest corporations on Earth, was scraping vast amounts of data. It's rather galling to have to bring large numbers of servers online on an emergency basis just to facilitate some AI startup's outrageous valuation. Now, when it comes to the specific numbers, Elon did make some changes pretty soon. From that initial 6,000 number, he increased it to 8,000 tweets for verified users
Starting point is 00:13:32 and 800 for unverified, and then increased it to 10K for verified users and 1,000 for unverified users. Investor Adam Cochran wrote, slowly he's caving. Probably in the next 12 hours, he says, oh, we magically solve scraping problem and turned off caps.
Starting point is 00:13:46 Once he finally gets it through its head, that this is dumb. Now, there are a lot of different discussions that this is brought up. For some, it has put a fine point on the need for Web 3. Alex Felitis writes, AI, centralizing technology versus Web3, decentralizing technology. After OpenAI, it's become clear that no company wants to let their data be scraped for free. This is leading many large platforms like Twitter and Reddit to start putting up Walled Gardens. In the future, the most popular LLMs will be built by companies who have the most training data available. If we continue down this path,
Starting point is 00:14:16 the only companies that will be able to gather and afford enough data will be big tech monopolies. become stronger the more centralized it gets, which is exactly why we need Web 3 as a counterweight. Think of public blockchains as the last bastion of the open internet. These are permissionless databases that anyone can access regardless of whether you are a tech monopoly or an indie hacker. Far into the future, these public blockchains won't be viewed as silly speculative bubbles. Instead, they will be the digital versions of the Library of Alexandria. Now, whether that's right or wrong, the discussion is certainly increasing, and it's a discussion that's not just about business model, but as you've seen in so many of these articles and tweets about values and what
Starting point is 00:14:52 we want the fundamentals of the internet to feel like. I'm not sure how it all plays out, but I do know that the AI Data Wars have just opened up a major new front, and it feels much more like the beginning than a conclusion. Anyways, guys, we'll wrap there. This is a situation that we'll obviously keep track of. If you enjoyed this, do me a favor. Go check out the podcast version of the show. You can find a link to it at breakdown.network. And for those of you who are already subscribed to the podcast, thanks so much. Until next time, guys, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.