The AI Daily Brief: Artificial Intelligence News and Analysis - Duolingo Replaces 10% of Contractors With AI

Starting point is 00:00:00 Today on the AI breakdown, we're looking at OpenAI's response to the New York Times lawsuit. Before that on the brief, Duolingo lays off 10% of contractors because of AI. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information about our YouTube, our newsletter, and our Discord. Welcome back to the AI breakdown brief, all the AI headline news you need in around five minutes. One of the things that people will be watching extremely closely in 2024 is, is the extent to which artificial intelligence actually starts to displace people in their jobs, be they blue-collar jobs or, more likely, it seems, white-collar knowledge worker jobs.

Starting point is 00:00:46 Well, the first little evidence of that sort of impact is arrived in the form of cuts at Duolingo. Duolingo is, of course, one of the best-known, if not the best-known apps for language learning and has, according to reports let go of around 10% of its contractors. Now, the company has been in serious damage control around this. First of all, they said that no full-time employees were impacted and that these were not layoffs. They said that the contractors had been, quote, off-boarded after finishing their projects at the end of 2023. At the same time, the company did acknowledge that AI gains in productivity was part of the

Starting point is 00:01:23 reason that they didn't need as many people to work on these particular issues. A spokesperson told Bloomberg, we just no longer need as many people to do the type of work some of these contractors were doing. Part of that could be attributed to artificial intelligence. Now, so far, there hasn't been a ton of information about exactly what these people were working on. We got one report from someone affected on the R-slash Duolingo subreddit who wrote, I worked there for five years. Our team had four core members and two of us just got the boot. The two who remained will just review AI content to make sure it's acceptable. Now, Duolingo's AI plans were well telegraphed. Back in November, the CEO told shareholders

Starting point is 00:01:59 in a letter that the company was using AI to, more quickly create text speech and images and to produce, quote, new content dramatically faster. The company said that they were also using AI to generate voices within the app and had introduced a new premium tier that had AI generated feedback in conversation in additional languages. Now, something we talk about on this show is the fact that there's going to be a large-scale conversation happening this year and beyond probably around what lines we want to draw around AI and how it benefits us or doesn't. One of the interesting things about the subreddit discussion was that the framing of the discussion was this. In December 20,

Starting point is 00:02:32 2023, Duolingo off-borded a huge percentage of their contractors who did translations. Of course, this is because they figured out that AI can do these translations in a fraction of the time. Plus, it saves them money. I'm just curious, as a user, how do you feel knowing that sentences and translations are coming from AI instead of human beings? Does it matter? A lot of the answers were pretty nuanced. For example, say no to pudding says, I like and value the human aspect of language exchange and learning, and I think that there's nuances in language that AI can't fully replicate at least of now. Even if these nuances might not necessarily be reflected in Duolingo's content, I still can't help but feel a little bit sad. Kit and Laser Fist writes,

Starting point is 00:03:06 their whole sales pitch was having native speakers cultivate content. It definitely undercuts that message. The flip side, of course, is that one of the big impacts of AI could be a total transformation in how people interact across language barriers. The question the world faces is, if language becomes no longer a barrier, is it worth the cost of translators' jobs to do so? That question is going to be played out over and over and over again a million times in the coming years, which is why I think it's so valuable to actually talk about. Now, moving on, we have a follow-up from yesterday's main story. You'll remember that we talked about G-42, which is an Emirati company that has been

Starting point is 00:03:42 at the very center of U.S.-China tensions when it comes to artificial intelligence. The company have been doing its level best to play both sides and try to stay cool with both the U.S. and China, but was coming under increasing pressure at the end of last year and actually started withdrawing from its Chinese relationships, favoring instead its U.S. partnerships. While now, the bipartisan House Select Committee on the Chinese Communist Party has identified G42 as a company that works extensively with China's military, intelligence services, and state-owned entities, and has asked the Commerce Department to look into whether they should be put under trade restrictions because of those ties. Basically, this committee has asked the Commerce Department

Starting point is 00:04:16 to consider imposing export restrictions on not only G-42, but 13 companies that are either owned or linked to it. In other words, whatever scrambling G-42, is doing to try to get out ahead of these restrictions, it may not be moving fast enough. Adding a little bit of intrigue to the situation, of course, is the fact that back in October, OpenAI and G42 had announced a partnership. While it wasn't exactly clear what that partnership entailed, it shows just how densely connected this world really is. Next up, a couple of pieces of fundraising news. Luma is a company that you might have seen in relationship to NERFs. Basically, they're creating models that allow you to capture 3D images and models with your smartphone. The company

Starting point is 00:04:53 has just raised $43 million at evaluation between $200,300 million. Now, of course, the creation of 3D models is going to open up entire new vectors of content and creativity. It's relevant for gaming, for next generation video and content creation. And this is a space that people are anticipating being contested hotly. Another big fundraise is that of Parag Agrawal, who was the CEO of Twitter before Elon took over. According to the information, Parag's new company, which doesn't have a name that they could figure out,

Starting point is 00:05:20 is building software for LLM developers and has raised $30 million from back. hackers including Kostla Ventures, Index, and First Round Capital. Finally, one announcement that I'm watching closely, it's slated to go off, probably around the time this video comes out at around 1 p.m. Eastern time today, January 9th. Rabbit appears to be a new hardware device in the personal assistant AI space along the lines of the Rewind pendant or the humane AI pin, or of course the tab. And the question is, with all of these companies competing in this sort of wearable hardware space, is there really a there there?

Starting point is 00:05:53 It's not just a question of which of these companies can compete, but whether any of them actually become a form factor that matters to the future usage of humanity. Consider me skeptical but intrigued at the same time. That's going to do it for today's AI breakdown brief. Up next to the main AI breakdown. Welcome back to the AI breakdown. Today we are looking at OpenAI's response to a recent lawsuit from the New York Times that many are considering the most significant threat to the LLM training approach that we've yet seen.

Starting point is 00:06:23 To understand OpenAI's response, let's go back to the New York Times' own announcement of their lawsuit back at the end of December. There were a couple notable things about the New York Times suit. First of all, apparently it was something that the New York Times was trying to resolve with Microsoft and OpenAI back earlier in 2023. They had approached Microsoft and OpenAI, effectively trying to license their intellectual property, as well as create, quote, technological guardrails around their products, but didn't come to any agreement. Now, Open AI for their part said that those conversations had been going well and that they were somewhat blindsided by this lawsuit. The complaint says, OpenAI seeks to free ride on the Times' massive investment in its journalism. Importantly, it accuses

Starting point is 00:07:01 OpenAI and their partners at Microsoft of, quote, using the Times content without payment to create products that substitute for the Times and steal audiences away from it. In other words, they're not just alleging that OpenAI is training their LLMs on their copyrighted material, but that they are reproducing that material in such a way that someone would plausibly use ChatGBTBT, instead of paying for a subscription to the New York Times. This will be a key part of the case. So let's look at OpenAI's blog post to get a little bit of further color. They actually break this into four sections. The first section is, we collaborate with newer organizations and are creating new opportunities. This isn't really so much a legal argument. It's more just trying to establish their bona fides that it is

Starting point is 00:07:40 important to them to actually be partners with media organizations rather than just non-contributors or thievers. They write, our goals are to support a healthy news ecosystem, be a good partner and create mutually beneficial opportunities. With this in mind, we have pursued partnerships with news organizations to achieve these objectives. Deploying our products to benefit and support readers and editors, teach our AI models about the world by training on additional historic non-publicly available content, display real-time content with attribution in chat GPT, providing new ways for news publishers to connect with readers. They point to partnerships with AP, Axel Springer, the American Journalism Project in NYU,

Starting point is 00:08:13 as an example of how they're approaching that. Now, where their legal arguments start is in Section 2. They write training is fair use, but we provide an opt-out because it's the right thing to do. They argue training AI models using publicly available internet materials is fair use as supported by longstanding and widely accepted precedents. We view this principle as fair to creators, necessary for innovators and critical for U.S. competitiveness. Now, basically, this is just a list of links to their arguments or to precedential arguments for why they believe this.

Starting point is 00:08:41 But obviously, this is going to be the very crux of this case and any other case that makes it to eventually the Supreme Court. The key overarching question of all of this AI training is whether training is actually fair use. Now, their argument for why, if they believe the training is fair use, would they allow for an opt-out, which you remember they started doing last year, their argument is, quote, legal right is less important to us than being good citizens. What about the idea that ChatGPT reproduced something from wirecutter in almost exact detail? While they write, regurgitation is a rare bug that we are working to drive to zero.

Starting point is 00:09:14 They say, our models were designed and trained to learn concepts in order to apply them to new problems. Memorization is a rare failure of the learning process that we are continually making progress on, but it's more common when particular content appears more than once in training data, like if pieces of it appear on lots of different public websites. So we have measures in place to limit inadvertent memorization and prevent regurgitation and model outputs. We also expect our users to act responsibly, intentionally manipulating our models to regurgitate is not an appropriate use of our technology and is against our terms of use.

Starting point is 00:09:42 Because models learn from the enormous aggregate of human knowledge, any one sector including news is a tiny slice of overall training data. And any single data source, including the New York Times, is not significant for the models intended learning. Now, this is obviously where they're starting to get a little bit more forceful. They're hinting here that there has been some amount of intentional manipulation to regurgitate. And that's what brings them to bullet four. The New York Times is not telling the full story. Basically, OpenAI says here, the conversations had been going well all the way up to their last interaction, which had been December. 19th. But then on December 27th, they heard about the lawsuit by reading about it in the New York Times.

Starting point is 00:10:18 They said that the conversation had focused around a partnership around real-time data display with attribution, but that it wasn't about solely paying for access to New York Times data, as, quote, like any source, their content didn't meaningfully contribute to the training of our existing models and also wouldn't be sufficiently impactful for future training. But here's where they say things get fishy. Quote, along the way, they had mentioned seeing some regurgitation of their content, but repeatedly refused to share any examples, despite our commitment to investigate and fix any issues. We've demonstrated how seriously we treat this as a priority, such as in July, when we took down a chat GPT feature immediately after

Starting point is 00:10:50 we learned it could reproduce real-time content in unintended ways. Interestingly, the regurgitations the New York Times induced appear to be from years-old articles that have proliferated on multiple third-party websites. It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate. Even when using such prompts, our models don't typically behave the way the New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts. Despite their claims, this misuse is not typical or allowed user activity and is not a substitute for the New York Times. Regardless, we are continually making our systems more resistant to adversarial

Starting point is 00:11:27 attacks to regurgitate training data and have already made much progress in our recent models. So basically, they're saying that this random wirecutter example was some combination of one, cherry-picked out of many, many attempts, and two, something that the New York Times had to work really hard to get ChatGBTGPT to produce, which minimizes their claim that ChatGPT is reasonably a substitute for the New York Times, which is of course part of their copyright claim. Now, the New York Times only comment in response was from their lead counsel Ian Crosby, who wrote, the blog concedes that OpenAI used the Times work along with the work of many

Starting point is 00:11:59 others to build ChatGPT. As the Times complained states, through Microsoft's BingChat, chat recently rebranded as co-pilot and OpenAI's ChatGPT, defendants seek to free ride on the Times' massive investment in its journalism by using it to build substitute of products without permission or payment. That's not fair use by any measure. So effectively, we've just got the entrenchment of both sides here. So what did the community think? Well, as mixed opinions.

Starting point is 00:12:23 Brian Romley writes, this is a well-thought-out response. Matthew Berman says OpenAI just dropped a bold response to the New York Times copyright lawsuit. They directly hit back with the claim that the New York Times is not telling the whole story. OpenAI even says NYT manipulated prompts including lengthy excerpts of articles in order to get our model to regurgitate. Andrew Ng, the co-founder of Coursera, wrote, After reading the New York Times lawsuit against OpenAI and Microsoft, I find my sympathies more with OpenAI and Microsoft than with the New York Times.

Starting point is 00:12:48 The suit, one, claims among other things that OpenAI and Microsoft use millions of copyrighted NYT articles to train their models. Two, gives examples in which open AI models regurgitated NYT articles almost verbatim. But the presentation muddies one and two, and I saw a lot of commentary on social media that, because of what I believed is a muddied presentation, draws a link between them that I'm not sure what people think it is. On one, I understand why media companies don't like people training on their documents, but believe that just as humans are allowed to read documents on the open internet, learn from

Starting point is 00:13:16 them, and synthesize brand new ideas, AI should be allowed to do so too. I would like to see training on the public internet covered under fair use. Society will be better off this way. The whether it actually is will ultimately be up to legislators and the courts. On two, I suspect a lot of the examples of chat GPT regurgitating articles nearly verbatim were due to a rag-like mechanism where the user prompt causes the system to browse the web, retrieve a specific article and then print it out. If this is the case, then to open AI's credit, they seem to have already updated their software to make this much less likely, and this is a much

Starting point is 00:13:45 easier problem to fix than if an LLM were to regurgitate texts using only the pre-trained weights, which as far as I know very rarely happens. To be clear, I believe independent media is important for democracy and must be protected. I also sympathize with media businesses worried about generative AI disrupting their business, but I'm not convinced the New York Times lawsuit is the right way to do this. usual caveat, I am not a lawyer and not giving legal advice or any other form of advice here. Now, lawyer Cecilia Ziniti was less impressed. She writes, TLDR, the blog post is weak, little data and odd citations, a missed opportunity for OpenAI, who has a good fair use case. To start, two odd choices by OpenAI. One, they use a Dali image for the blog icon.

Starting point is 00:14:22 It looks like an indie artist's work on Facebook. Why remind the reader about generative art too? Second, the blog post author is OpenAI. Better to have a person sign and humanize OpenAI. Hundreds of OpenAI employees sign the letter for Sam to stay. Not one signed this. Maybe they didn't want to be deposed. The biggest issue, though, she finds outside of style is the substance of the fair use part. Cecilia writes, the topic is fair use. OpenAI has a great chance to win here. LLMs literally transform what they're trained on to new words. Transformative use is fair use per lots of great cases. But OpenAI skips any mention of actual fair use cases. Instead, OpenAI cites support from Adobe, IBM, and Gramerly, who all support GenAI because

Starting point is 00:14:59 they do it, surprise, creators no one has ever heard of, authors who are dot-da-dot lawyers in Berkeley, Where are any big names? OpenAI could have gotten, say, their investor, Reid Hoffman, author of four books and 250 podcasts, to sign. Instead, they got no one. Why? Ultimately, though, Cecilia points out that this post is ultimately just a PR battle. She writes, so what will happen from this blog post? Substantively, we can expect regurgitation to be the new hallucination, referring to Open AI's naming of what the New York Times claims by identifying it as a problem that can be solved. Legally, however, she says, nothing. Open AI's court response isn't due for some weeks. We'll have to wait for the court to decide. Ultimately, I think Cecilia is right, but I think that there really are two battles happening simultaneously. One is a public opinion

Starting point is 00:15:43 battle and the second is a legal battle. I think that it could be a split decision and that that split decision could have a lot of impacts. Ultimately, I can't envision any scenario where this doesn't make it all the way to the Supreme Court. So to some extent, everything before then is just prelude. Anyways, friends, that is the story from here. This is a battle that is coming big time in 2024. Until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - Duolingo Replaces 10% of Contractors With AI

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.