The AI Daily Brief: Artificial Intelligence News and Analysis - Automated AI Translations Come to Spotify, Undermining Numerous AI Startups

Episode Date: September 26, 2023

Last week Youtube announced automated dubbing features and now Spotify has unveiled AI translation for podcasts. NLW explores the news, as well as what it says about the challenge of being a startup. ...Before that on the Brief: Getty Images launches a rights-ready image generation platform. ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI.  Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI Breakdown, we're looking at Spotify's new AI dubbing feature and what it means for startups in the AI space. Before that on the brief, Getty releases a new copyright safe image generator that may be appealing to corporations. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information about our Discord, our YouTube channel, and our newsletter. Welcome back to the AI Breakdown Brief. All the AI headline news you need in around five minutes. We start today with an announcement from Getty Images of a very unwieldy-named generative AI by Getty Images, which is a new tool built in collaboration with Invidia that promises to do image generation in a way that is copyright protected.
Starting point is 00:00:46 Now, Getty Images is, of course, one of the biggest stock image services in the world. So it is a natural extension then to build an AI image generation model that is trained on top of those images. presumably because Getty owns the copyright to those images, the generative AI works that are created on top of them are much more likely to be commercially safe and have less risk of copyright issues in the future than RSA images created by a model that has a different set of sources. Now, it should, of course, be noted that copyright protection does not apply to the training of AI models in the way that some people think it might. It is almost assured that that will become the subject of Supreme Court battles in the near
Starting point is 00:01:24 future. However, in the meantime, businesses have to make decisions about which platforms they are or aren't going to use. And you have to think that a lot of corporate legal offices are going to be much more comfortable saying yes to something created by Getty trained on their own image database than on the corporate use of, say, Mid Journey or Dolly 3. In terms of the quality, I haven't had a chance to see it for myself yet. The Verge reports the photos look better than expected. Stock photos already have an artificial soulless quality to them, and I was not surprised that some of the first few images the tool generated also felt devoid of feeling. But Getty's tool did well at rendering realistic feeling human figures. Indeed, this author writes, the photos I got felt more human than when I
Starting point is 00:02:04 try the same prompt with stable diffusion, and the Getty image fooled my friends when I texted it to them. At the same time, it also appears that there are much stricter limitations on what type of images the Getty tool won't produce. For example, any prompt with the name of an actual person was prohibited, at least in this test. That includes not only producing images of people, but also doing in the art of the style of a well-known artist. Now, Getty isn't the only company that's trying to cozy up to corporate customers by offering a version of a generative AI platform that is potentially more legally in the clear. Back in June, when Adobe started to reveal more details of its Firefly platform,
Starting point is 00:02:38 that company's enterprise offering came with a guarantee to cover legal bills in the case of claims of copyright. As the VP of Digital Media at Adobe called it, this meant full indemnification for the content created through the Firefly features. Now, part of this is that Adobe, like Getty, apparently trained its image generation model on its portfolio of rights approved images as well, as well as public domain content and openly licensed content such as Creative Commons images. Ultimately, I think that if you are a business or an enterprise who wants to be using generative AI images, but whose legal office was very
Starting point is 00:03:09 nervous about it, this is likely to be a very welcome service. Next, we move on to the writer's strike. Now, the more than 150 days of this writer's strike have seemed too many watching from the perspective of the artificial intelligence space to be not only a question of Hollywood and the relationship between content creators and the studios, but also a bit of a preview of what it looks like to try to use collective bargaining and a traditional labor movement structure to win protections from the consequences of artificial intelligence. Now, there are a lot of op-eds going around, like this one from reason that's called win for Writers Guild, loss for AI. And so far as I can tell, we don't have enough information to really understand exactly what the wording was around artificial
Starting point is 00:03:53 intelligence in this tentative agreement. We knew going into these negotiations that writers wanted to prohibit the use of AI in the creative process, while studios wanted to be able to experiment with it as much as they wanted. What we don't know is where those details landed. It sounds from some reports that I've read, however, that this was one of the major last outstanding details and that figuring out the nitty-gritty of AI in the context of this three-year agreement was one of the of the final steps to actually getting it out the door. To the extent that we get more information later this week about what the actual deal terms were, we will certainly analyze it to see how relevant it is for other areas beyond just Hollywood. Now, speaking of some skepticism of AI,
Starting point is 00:04:32 Signals President Meredith Whitaker had very strong comments about AI at the recent TechCrunch Disrupt event. Whitaker called AI basically a surveillance technology and argued that it was going to exacerbate the problems of big tech that we face already. Whitaker said it requires the surveillance business model. It's an exacerbation of what we've seen since the late 90s and the development of surveillance advertising. AI is a way, I think, to entrench and expand the surveillance business model. The Venn diagram is a circle. She continued, the use of AI is also surveillance, right? You know, you walk past a facial recognition camera that's instrumented with pseudoscientific emotion recognition, and it produces data about you right or wrong that says you are happy,
Starting point is 00:05:08 you are sad, you have a bad character, you're a liar, whatever. These are ultimately surveillance systems that are being marketed to those who have power over us generally. Our employers, governments, control, et cetera, to make determinations and predictions that will shape our access to resources and opportunities. Now, I think this is a hugely important conversation. It does not need to be a priori true, but the implications of AI as a surveillance technology are very, very clear. Indeed, if you go look at, for example, the EU's recent AI Act, one of the big things it takes on is this exact question. It takes a harms or risks-based approach to regulation, meaning that they identify certain use cases of AI as potentially more dangerous than others. For example, the use
Starting point is 00:05:48 by authorities to profile potential criminals is a lot more severe a use case than, for example, creating some images online. That means that it comes with harsher restrictions and more difficult compliance. And of course, in the EU framing, some things are beyond the pale such that they're just not approved at all, not even with any amount of compliance. One included in that, for example, is prediction of crimes based on AI crunching huge amounts of data. Bringing a back to to this surveillance question, however, and more specifically the question of the extent to which this reinforces the ad-based model of technology, Snapchat has announced that Microsoft is a new partner for the ads that are going into its AI product, My AI. Basically, sponsored links
Starting point is 00:06:27 are finding their way into AI chatbot conversations, as they, of course, always were going to do. Announced at Microsoft's advertiser event last week, Snap will be creating new sponsored links, powered by Microsoft advertising's ads for chat API. Now, one of the things that I'm interested in is the extent to which I've seen AI companies attempt to get people to pay directly for access to AI tools, rather than just sponsoring them with advertisements, as has been the model of the internet for so long. Is that a trend that might actually continue? Or is it just the short-term byproduct of compute being so expensive and scarce right now, but that will ultimately be arbitraged away by these big companies that use free but ad-supported access to hook users
Starting point is 00:07:05 who aren't sure they otherwise would use AI tools, because, hey, they get to do it for free, right? Anyways, I think that's going to be an interesting trend to watch and see how it shapes up. Now, speaking of that difficulty accessing compute, yet another AI chip company has raised a bunch of millions of dollars to be even more competitive in a very competitive space. This time, it's 2015 founded Nearon, who has raised $49 million as a Series B extension to bring its total raise to $190 million. Now, Neuron is a little bit different than a company like Nvidia. What Neuron does is provide lower-powered, reconfigurable AI chips that are designed.
Starting point is 00:07:39 designed to go into and work with existing systems, such as those that operate inside driverless vehicles. So in many ways, this is part of a trend. And interestingly, when you dig into what's actually going on in the AI chip space, a lot of it isn't just companies that are trying to compete with Nvidia, although there are some of them. And indeed, many of those efforts are coming from within the big enterprises themselves, the Amazon's and Googles of the world. But more companies like this that are focusing on some specific use case or some custom purpose chip or something like inference-focused chips, basically things that are a different part of the value stack that are still important, but not necessarily those big data center chips that Nvidia seems to have such a lock on.
Starting point is 00:08:16 Lastly, today, in what I think has to be the least surprising news article I've read for the last couple of weeks, Gizmodo with a very cynical tone, has declared that Coca-Cola's new AI-generated soda flavor falls flat. Coke GPT, they say, is about as half-baked as all of the hype surrounding AI. Now, you can probably tell from that subtitle the perspective that these particular authors are bringing to the piece, but there is sort of a shark jumping phenomenon that was always going to happen as soon as big marketers started to slap AI on labels as a way to attract consumer attention. We've seen this countless times with new technologies. It never works for long, and it appears at least initially that Koch's efforts here might have lasted even less time
Starting point is 00:08:58 than some other recent experiments. In any case, I will certainly keep my eyes out for one of these Y-3,000 models to have as an artifact, if nothing else. But for now, that is going to do it for today's AI breakdown brief. Thanks as always for listening or watching, and I'll be back soon with the main AI breakdown. Hey, guys, one more quick thing before we get into the main episode. If you subscribe to the newsletter, you've seen this, and you might have heard it on an earlier episode. But right now, I am getting information from you guys, the listeners, about what you are looking
Starting point is 00:09:27 for in terms of AI educational resources. A bunch of you have filled out the survey already, and it's so helpful. But if you would take the about one minute and go to bit.ly slash AI breakdown survey, I would love to know what type of online courses you might need, what you're trying to learn more about, whether you'd be interested in a community of learners. I'm getting really close to making some decisions about what we're going to do next, and I really want all of your input.
Starting point is 00:09:50 Again, it'll take about one minute and you can find it at bit.ly slash AI breakdown survey. Thanks so much, and now on with the show. Welcome back to the AI breakdown. Well, AI Product Fall continues, and this time it is Spotify with the announcement of a new feature through which podcasts can offer listeners anywhere in the world the ability to listen in their own language. The feature is called voice translation and isn't just overdubbing, but is overdubbing in the podcaster's own voice. Now, so far, the program has just worked with a small group of podcasters, including Dax Shepard and Monica Padman from Armchair Expert, Lex Friedman,
Starting point is 00:10:28 the ringers Bill Simmons and Stephen Bartlett, who does the diary of a CEO. So far, they've only released translations into Spanish, but are planning to launch French and German translations in the coming weeks. Now, Spotify developed this tool, but they did it based on OpenAI's technology. Said Zayad Sultan, Spotify's VP of personalization, by matching the creator's own voice, voice translation gives listeners around the world the power to discover and be inspired by new podcasters in a more authentic way than ever before. Yesterday, Lex Friedman tweeted, this is me speaking Spanish, thanks to a amazing work by Spotify AI engineers. The translation and voice cloning are fully done by AI. Language can create barriers of understanding and thus fuel division. I can't wait for AI to
Starting point is 00:11:07 break down this barrier and reveal our common humanity. How do you think? I mean of advice, yeah that we're talking about, a day what opinion about the difficult? Now, this news may strike you as somewhat familiar to something we talked about last week. Last week, YouTube held their own event called made on YouTube that, was once again full of AI announcements. At made on YouTube the company announced Dreamscreen,
Starting point is 00:11:33 which is a new AI generated image or video background creator for short videos. They announced AI Insights, which gives creators suggestions for what they should do videos on based on who's been watching their videos and what other types of videos they're watching around the site. They also even announced an AI assistive search for creator music, basically giving people the ability to use a few words to describe the video and have AI recommend a rights-approved soundtrack that would work.
Starting point is 00:11:56 But buried in all of these announcements was automatic dubbing with Aloud. YouTube writes, One way creators look to expand their audiences is through dubbing their content into languages beyond their own. But not all creators have the resources to dub their content professionally. So we're bringing Aloud into YouTube, an AI-powered dubbing tool that will help creators open up their content to the world. Now, Aloud was first announced by Google back in March of 2022. On their blog, they wrote a post called Overcoming the Language Barrier in videos with Aloud. Aloud makes video dubbing easy and cost-effective, getting us one step closer towards overcoming the language barrier in videos.
Starting point is 00:12:32 Aloud was a product that came out of Google's in-house incubator, which they call Area 120. The announcement post pointed out a few realities of modern video consumption. First, that even though 80% of the world doesn't speak English, a significant majority of videos created were in English. Second, that although subtitles can be useful, they're not necessarily ideal, especially for mobile-on-the-go types of usage. Now, back in June, we heard that YouTube was testing this AI-powered dub tool to translate creator videos, and at that time we heard that the company had been testing out allowed AI dubbing with, quote, hundreds of creators. At that time, at least, allowed could only dub into Spanish and Portuguese. However, it's not clear if that's been updated with this latest made on YouTube announcement.
Starting point is 00:13:12 Now, even before this in February, YouTube had created an option to upload multiple audio tracks for an individual video that could be sorted by language. So for creators that had taken the time to actually create a second dubbed version of their video, they at least had the ability to supply that audio and have YouTube serve up the right language for the audience that was watching the video at the time. Of course, an automated AI-driven service is much more likely to mainstream this feature, given how much time and effort and cost there would be associated with this type of dubbing in the past. So I think one dimension of this story is the extent to which AI is rapidly ripping down linguistic barriers. This is, of course, what Lex called out in his tweet about it.
Starting point is 00:13:49 And I do think it's fascinating to think about how different the world might be if everyone can consume all of the content without having to worry about what original language it was created in. We're obviously still a fairly far way away from that, given that the parameters of these tests are still fairly limited, but the trajectory here is quite clear. Now, another interesting dimension of this, however, is a little bit more on the inside baseball side about how the AI industry is developing. AI entrepreneur David Park quote tweeted Lex Friedman and said, this is heartbreaking for the founders who have been tagging Lex every day with pretty much the exact same demo. Now that Spotify and YouTube are adding AI voice cloning and translation to their
Starting point is 00:14:27 platforms, I'm curious to see how these founders pivot. This space is moving so fast. Now, by way of example, Wondercraft just announced dubbing as well. Wondercraft is a Y-C-backed company that is focused on creating AI tools for content creators, particularly podcasters, and for them dubbing into multiple languages was an obvious next step. Now, right now, one of the way to beckybacked company. Wondercraft seems to be a little bit ahead of Spotify and YouTube, at least based on what's available at the moment. As they wrote earlier today, Spotify announced yesterday their AI-translated podcasts using Lex Friedman speaking Spanish as a demo. Lex in Spanish is great, but what's better? Lex in 13 languages. The company says Wondercraft's dubbing engine is perfect for podcasts, YouTube videos, e-learning videos, lectures, interviews, meditation tracks, and documentaries.
Starting point is 00:15:11 And the languages available include Spanish, German, French, Arabic, and about eight others as well. And yet, of course, the question going back to David Park's post is what can a company like Wondercraft do to stay ahead of the curve? Or put differently, how much does the advantage of content creators already being locked into the Spotify platform or the YouTube platform mean that they're going to inevitably choose those automated, integrated tools over even a theoretically more powerful tool from a third-party service? Now, this is a new dimension of challenge to the startup ecosystem that has been coming up a lot more now. We've talked a lot about the ways in which the startup ecosystem around artificial intelligence has evolved in unexpected ways that perhaps challenged some assumptions that people had six or 12 months ago. We've talked a number of times about Sam Hogan's Twitter essay in which he argued that the companies who had put a U.X or U.S.
Starting point is 00:16:02 or U.I wrapper around OpenAI technology in an attempt to appeal to the enterprise were some of the biggest initial losers in the space, because ultimately they just didn't have a real source of differentiation. They didn't really have a strong moat. And as companies like OpenAI have launched their own enterprise tools, those startups have found themselves without a really good argument for why them, and not just ChatGPT Enterprise. As Invidias, Dr. Jim Fan wrote, ChatGPT Enterprise, the beginning of the end of many B2B thin wrapper startups. Now, the other piece of that story is that in addition to startups often finding that they
Starting point is 00:16:34 don't really have strong moats when it comes to the enterprise, another issue has been concern among enterprises about sharing proprietary data that could be used to customize and improve AI models with untested, unproven companies. This creates another reason why incumbents, be they big tech platforms or existing consultant relationships, perhaps have a leg up more than they might in another technology area that doesn't benefit so much from deep customization based on proprietary information. Now, the other dimension of this competition is how much integration into platforms that people are already using will box out options from the outside. That's what we were just asking in the context of Spotify versus something like Wondercraft. But another area in which
Starting point is 00:17:15 that came up recently was, of course, the announcement of Dolly 3. Dr. Jim Fanagan wrote a Twitter essay called Here's Why Dolly 3 once deployed will improve at a faster rate than mid-journey. His third point was called ecosystem. Integration with ChatchipT is such a killer move. Jim writes it's almost trivial to add the existing puzzle pieces to Dolly 3, such as the code interpreter and browser. Want to apply a filter? Just call the OpenCV API instead of running the model. Want a reference image? Call the search plugin to emulate Bard with Google Lens integration. What's more, he writes, there is the existing user base.
Starting point is 00:17:46 Mid Journey has 16 million users. ChatGyBT's got 100 million. Now, I think Mid Journey and Dolly 3 will be a particularly telling battle. Mid Journey is not just some brand new, wet behind the years-wide combinator startup. They are a young startup, yes, but one that has quickly jumped into the pole position in their particular area and won a huge number of devotees, as well as people who have spent a lot of time learning how to work their system for maximum results. 16 million users might be less than 100 million users, but 16 million is nothing to sneeze at,
Starting point is 00:18:15 and each of those 16 million users is there for exactly the purpose that Mid Journey is there for. Now, of course, for Mid Journey to have any hope, they will likely need not only to have feature parity when it comes to the quality of images generated, but actually be better in some distinct ways in order to overcome what Dolly 3 is offering, especially in the integrated chat GPT package. But the point of this, of course, is that the landslux of competition when it comes to AI has these really interesting dynamics that challenge assumptions from previous technology areas and that I think are having a pretty big impact on how things develop. I think it's certainly too early to count out companies like Wondercraft and there are many
Starting point is 00:18:52 peers who are working on this type of dubbing. Hey Jen is another one that has had a ton of viral content on Twitter doing language translation and dubbing for video. I hope that companies like this can stay competitive with the giants because I think that they have such a stronger incentive to innovate, to be efficient, to add new features, to expand the language is available. But I also think at the end of the day, I'm glad that these platforms are understanding how valuable these tools are going to be for creators and integrating them directly as well. AI product fall continues to be a fascinating, competitive, accelerating kind of time. And I appreciate you hanging out here as we go through it together. But that is going to do it for this episode of the AI breakdown. Thanks as always for
Starting point is 00:19:32 hanging out, and until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.