The Vergecast - AI might help edit the next generation of blockbusters

Episode Date: September 21, 2021

For the next four Tuesdays, Verge senior reporter Ashley Carman will explore how artificial intelligence and machine learning are shaping the future of a variety of industries. In this episode, Ashley... explores how AI is being used to streamline video creation.  Guests include VP of Adobe Sensei Scott Prevost, co-founder and co-CEO of Flawless Scott Mann, and Verge senior reporter James Vincent.  This podcast was made by producer Liam James, senior audio director Andru Marino, senior reporter James Vincent, and senior reporter Ashley Carman.  Read more here Learn more about your ad choices. Visit podcastchoices.com/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 Support for the show comes from Retool. Too many companies run critical operations on duct taped spreadsheets, Slack workflows, and whatever else they could cobble together. Not because they want to, but because building internal tools means weeks of waiting on someone else's backlog. That's where Retool comes in. Build custom internal tools just by describing what you need. Prompts something like,
Starting point is 00:00:22 Build Me a Revenue Dashboard on our Salesforce data. And Retool actually builds it on your company's data, in your cloud with enterprise security built in. Go to retool.com slash Verchcast. We all need to retool how we build software. Hey, Verchcast listeners, it's Neelai. For the next few Tuesdays in the Vergecast feed, we're running a little mini series we made
Starting point is 00:00:47 about the many uses of artificial intelligence and machine learning across a variety of contexts. It's all hosted by Verge Senior Reporter Ashley Carman. We had episode one last week. Ashley's back for episode two. Hey, Ashley, how's going? I'm back. Hello. So what terrifying things is artificial intelligence going to do this week for us?
Starting point is 00:01:06 This week we're talking about AI and how it's going to affect the video business. On an audio show. On an audio show. It's bold, but we're crossing new frontiers. How is AI impacting the video business? Well, there's actually tools being used already. You might not even realize what you're looking at has been touched by AI in Eli. Is it all Tom Cruise deepfakes?
Starting point is 00:01:25 We do talk about deepfakes a little bit, but it's not about deep fakes. It's not all deep fakes. All right. Well, I'm very excited for this episode. Episode two, the Vergecast AI series. Let's listen to it. When I say artificial intelligence for video, what do you think about? We're all probably aware of how social video apps like TikTok, Instagram, and YouTube
Starting point is 00:01:49 use AI and machine learning for recommendations, moderation, and ad targeting, and the all-powerful algorithm is in everyone's lexicon at this point. Or maybe you think of deepfakes, those scary videos that put someone's face on another person's body to make them look like they're doing something they otherwise wouldn't. Today, we're instead going to focus on how visual AI is being used for good and as a tool to help people streamline their creative process. Yes, that might mean AI taking on a bigger role in the very human act of being creative. But what if instead, the AI just assisted us or guided our hand? Sensei was founded on this firm belief that we have that AI is going to democratize and amplify human creativity,
Starting point is 00:02:35 but not replace it. That's Scott Provo, VP of Adobe Sensei. Sensei is Adobe's platform for integrating artificial intelligence into Adobe's consumer products like Photoshop and Premiere. Adobe's stance is Sonsei shouldn't make the media for you, but rather it should make it easier and less time-consuming for you to make the work you want. Was Sensei able to automate so much of that production work, it means that whereas before you might have been able to work up two ideas,
Starting point is 00:03:03 Now maybe you can work up 10 different ideas. And it's through that sort of expansion of your own ability to create that you might find that sort of outlier idea that's the true, true creative spark, the one that really shines. Let's use Photoshop as an example. Last fall, Adobe released a feature called neural filters, which you can guess are filters in Photoshop that use neural networks to edit photos.
Starting point is 00:03:32 These filters do things like, remove artifacts from compressed images, add makeup, or smooth a person's face. It can also change the direction of lighting in a room. With tools like these, work that used to take an editor hours to do ends up taking only seconds. There's one particular neural filter called Smart Portrait, which allows you to import a photo of the face and then very easily edit the expression. You can make the person smile or frown or angry.
Starting point is 00:03:59 You can change the age of the face. You can change the hair. We can change where the eyes are looking, the tilt of the head. And all of these things can be done by just moving a slider. Editing a still image is one thing. But think about video. Thousands of frames that need to be adjusted or altered. Adobe has built features into Premiere Pro, its video editing software, that utilizes machine
Starting point is 00:04:22 learning to fix or edit objects in video that would take hours or even days to do manually. There was a small team of documentary filmmakers who shot a ton of full footage, and when they got back to edit it, they realized that there were some specs on the camera lens that ruined all of the footage. It was across the entire footage, but we happened to have a feature in Adobe Premiere Pro called Context Aware fill for video that lets you remove objects from the video. So you can identify the object in one frame, and then it uses the knowledge of all the other frames and the motion to be able to understand what was actually behind that object and fill it in. And so this team of documentary filmmakers was able to remove all of those spots from the footage
Starting point is 00:05:16 that they had shot and literally saved the day. Otherwise, they would have had to reshoot everything. You know, instead of having to edit frame by frame by frame to remove it from every frame, they basically push the button once. Adobe also makes tools for later in the creative process, like for when you're ready to publish your work. We have things like auto reframe, right, which intelligently reframes and reformats video content for different aspect ratios. So say you have a video that was shot vertically and you want to change it to square or a video that was shot in landscape and you want to change it to vertical. You know, in the past, you would have to go and edit frame by frame. to make that adjustment in order to keep all the important stuff in view at any particular time.
Starting point is 00:06:05 But Sensei does that automatically in a matter of seconds, which is just game-changing for being able to take a video and then publish across various different social media outlets that have different formats. Other elements of the creative process, such as searching stock images, become less time-consuming when AI understands what's in the pictures and helps you narrow down what you're looking for. We have some very powerful image similarity tools that let you start with one image and then find images that have similar content, similar compositions, similar colors. You can even pick an object from an image and then say, I want to find other images that have this object but in a different location.
Starting point is 00:06:46 We can literally drag it to a different part of the canvas and it will search for other images of that object in that location. Those kinds of tools are beneficial with creating marketing assets like email campaigns, social media video, and other advertising media, anything that requires a quick turnaround. Overall, though, Adobe says AI should play a very specific role in the creative process. Maybe just knowing these tools exist and are easy to use is enough to inspire you to try something new. We think of it as sort of part teacher, part muse, and part assistant.
Starting point is 00:07:22 We don't think of it as the person and the AI being separate. We really think of it more and more as a collaboration between them. You know, the AI can help to teach the student in some sense. And, of course, by everything that our customers are doing is helping to train the AI. So this sort of mutually beneficial kind of relationship. This field is advancing quickly for consumers. Techniques like these were previously only available to professionals with large budgets and specific training and resources.
Starting point is 00:07:52 Now AI is creating alternative and easier ways for everyone to produce the work they want to make. But we also wanted to check in with the big budget professionals too. Where are we seeing AI being implemented in Hollywood and the big screen? While deepfakes haven't really made it onto the big screen just yet, most studios are actually just relying on traditional CGI for now. The place where directors and Hollywood studios are on the way to using AI is for dubbing. My name is Scott Mann. I'm a co-CEO and co-founder of a company called Flawless,
Starting point is 00:08:25 which specializes in cutting-edge VFX and filmmaking tools that use AI in particular. The product that Flawless is currently working on is what they're calling TrueSync, which uses machine learning to create realistic, lip-synced visualizations on actors for multiple languages. In our last episode, we talked about using AI to create synthetic voices that speak in multiple languages that are foreign to the original voice talent. But what if you could also make it look like an actor in a movie is speaking that language? their lips moving synchronously with the dubbed version of the film. Scott Mann understands why the film industry would want this.
Starting point is 00:09:03 He's a director himself. I'd done a film back in 2015, I think, called Heist. I'm here to ask a favor. How big? $300,000 pay. Get out of here. Time's up. Go!
Starting point is 00:09:14 I did an amazing cast, including Robert De Niro. And so I finished the film in its home language, as you usually do. And then it's when I saw a foreign dub of the same film. And I realized how it did it. different it was, not just in terms of like other voices playing the parts, but the words were different. Like the script had been altered. The performances were very different. And I kind of was heartbroken watching it really that so much changes once you hand it off. And I kind of discovered that that that was this 100-year-old problem, that dubbing has kind of accepted that the mouth movements
Starting point is 00:09:44 do not marry. When an American film is brought to non-English-speaking countries, the script is often rewritten and re-performed to try and sync with the timing of the original film. Because of this, the translation is not always exact. And you're in this kind of horrible wrestle that despite however good anyone is doing that process, it's kind of trying to break everything to fit it into a broken image. And I think that film really kind of set me off on looking for a solution to that problem.
Starting point is 00:10:15 Scott quickly realized that the technology available in the industry at the time, using CGI to reconstruct an actor's mouth and move it to the translated dialogue, was not going to offer the solution he wanted. Even the very best artists, doing the very best methods of traditional VFX where you're creating a model, you're lighting it, you're matching it, you do all these huge efforts and huge layers of work. It just doesn't hold up to the human eye because we've studied faces for our entire lifetimes and we know the subtleties. Instead, Scott found that using neural networks with tons of data of facial expressions and mouth movements made his idea of reality. You're training a network to understand how one person speaks, so the mouth movements of an ooh and an art.
Starting point is 00:10:56 different vizimes and phonemes that make up our language, are very specific, very person-specific. And that's why it requires such kind of detail in the process to really get something authentic that speaks like that person spoke-like. And it's really about retiming mouth shapes and movements from different places that were recorded earlier in the movie into later places. It's kind of like very slight and very subtle, deep editing of the mouth movements.
Starting point is 00:11:22 This would be pretty easily implemented in movies and TV in theory, because there are already many scene takes from various angles that are captured during production, so the team wouldn't need any additional footage. Now, I know this is an audio-only podcast, and we're talking about something inherently visual, so it's hard to actually show a demo here. But from what flawless has shared so far,
Starting point is 00:11:45 which you can see on their website, I wouldn't say it's flawless, but it's pretty impressive. There are moments that may look off, like when Robert De Niro's lips rarely touch when he's speaking, but it's not totally distracting, and when it works, it works well. Especially a scene the company shares from Forrest Gump speaking Japanese. The emotion of the character is still there
Starting point is 00:12:15 and makes for a more believable dub. You sort of forget that it's another voice actor behind the scenes. What's interesting about using AI to go down this path is its incentives are not necessarily efficiency or saving money in time, but immersion. You're not trying to save time, you're not trying to reduce costs or replace someone, really. That's not the end of our business.
Starting point is 00:12:40 The streamers and the studios have been building an global distribution network, but they don't have global content. People do not tend to watch sub-dent of material, and that's reflected, I would say, in the value of that content when it's sold internationally, as in it's exceptionally low, typically under 5% of what in normal value is. Eyeballs on content, essentially, is where the value is,
Starting point is 00:13:02 and people are not putting their eyeballs on that content. So maybe in the long-grown you are making more money by building a larger catalogue of foreign films. But Scott believes the more content the world shares with each other, the better. Films that we didn't even know existed, that were made in other places that we are just not exposed to and vice versa around the world, and we've got all these different languages we speak.
Starting point is 00:13:23 And I think through that, we'll get to understand culture better because currently, if there's a film in a different language, what's typically happening is it gets remade and it gets remade into the different languages. And when that happens, it kind of culturally is changed as a film. And it's kind of retold through a different lens. And I think the best way of kind of humankind to come together is to have a better true understanding
Starting point is 00:13:47 and being able to empathize with our kind of neighbors. and that's going to be the great benefit of being able to access global films and content. Scott says Flawless is currently working on a couple productions implementing its True Sync technology and will have a worldwide release in early 2022. But as with any AI changing in industry, we have to think about job replacement. With most Adobe products, sure, if you alone create, edit, and publish the projects you work on, AI tools will save a ton of time for you. But in larger production houses, where each role is delegated to a specific specialist, retouchers, colorists, editors, social media managers, those teams might end up downsizing.
Starting point is 00:14:36 We asked Adobe about this. Anytime technology comes along, people have said it's going to destroy jobs. And it certainly does shift some of the jobs. You know, we think some of the work that creatives used to do in production, they're not going to do as much of that anymore. They may become more like art directors. And what we think is that it actually allows the humans to focus more on the creative aspects of their work and to explore this broader creative space. Scott if flawless maintains a similar sentiment.
Starting point is 00:15:07 Their truce-ync technology might end up shrinking the amount of translators needed for film dubbing. That's fair. And I would say, look, that is obviously still to some degree necessary in some places. Like truthfully, that role is kind of a director, right? It's like what you're doing there, you're trying to kind of convey that performance. But you're right, that is one aspect of it that is kind of, will be reduced on. And it's kind of taking that side of the industry and growing that side of the industry. So will script supervisors end up becoming directors, or photo retouchers end up becoming art directors?
Starting point is 00:15:40 Maybe. But what we are seeing today is that a lot of these tools are already combining workflows from various points of the creative process. audio mixing, coloring, graphics, all used in one piece of video software. So if you're working in the visual media space, instead of specializing in specific creative talents, maybe your job is going to require you to be more of a generalist. The boundaries between images and videos and audio and 3D and augmented reality are going to start to blur.
Starting point is 00:16:11 It used to be that there were people who specialized in images and people who specialized in video. And now you see people working across, all of these mediums. And so, you know, we think that Sensei will have a big role in basically helping to kind of connect these things together in meaningful ways.
Starting point is 00:16:32 Before we get too far into the future, I want to take what we've learned here and talk about it with a colleague of mine, James Vincent, who you are well aware of. He's our London-based reporter who covers AI and machine learning and he's reported on AI in this specific industry quite a bit. How is AI going to shape the way
Starting point is 00:16:50 we make, consume, and sell visual mediums going forward. We're going to take a break, but when we're back, I'll be asking James to level out the hype for artificial intelligence in video, movies, TV, photo, and of course, deepfakes. Support for this show comes from Shopify. Starting something new isn't just hard.
Starting point is 00:17:14 It can be really scary, too. So much work goes into this thing that you're not entirely sure will even work. But here's a better thought. What if it did all work? What if your instincts were actually right all along? Shopify wants to help you get there. They're the commerce platform behind millions of businesses worldwide
Starting point is 00:17:31 and nearly 10% of all e-commerce in the U.S. From established brands like Allbirds and Heinz to companies just getting started. Their design tools make it simple to create the exact online presence you're envisioning with hundreds of ready-to-use templates available. And with built-in marketing tools, you can launch full email and social campaigns in just a few clicks. So you can connect with customers wherever they are.
Starting point is 00:17:56 It's time to turn those what-ifs into with Shopify today. You can sign up for your $1 per month trial today at Shopify.com slash vergecast. You can go to shopify.com slash vergecast. That's Shopify.com slash vergecast. So we're back with James Vincent, a senior reporter at the verge who specializes in AI and machine learning. Hello. Hi, Ashley. How you doing? It's always such a treat to get to talk to you.
Starting point is 00:18:33 It's always lovely to be a talking head. I love it. So as you know, on this episode, we're talking about AI in the visual medium. Yeah. One question that has really been sticking out to me throughout this, and I just need you to, like, level set for me. You know, we're talking to Adobe about how their effects and their different tools could be used to kind of like standardize how people treat their visual content. So I'm just curious, like, does this mean we're leading up to a world in which content could end up looking the same? Yeah, I mean, I think it's a really, really interesting question. And it sort of is one that isn't entirely unique to AI.
Starting point is 00:19:10 I feel like if you think about when Instagram filters first became a thing, and everyone started putting the same filters on their photos. And there was this sort of like, you know, it became a fashion. It became a glut of this one thing. And then people got bored of it and they moved on to the next thing. And I think the really thing that gets overlooked in AI sometimes, which, you know, obviously you bring up here is that it sort of is backward facing sometimes in that it learns from data that is in the past and some people think that
Starting point is 00:19:38 that means it's not so good at creating novelty, as it were. So I think it's kind of entirely plausible that you get these set looks perhaps that come in with AI filters. But I feel that people will get bored of them and they will move on to the next thing very quickly in the same way that you see Snapchat augmented reality filters, for example, come and go. And they become the hot new thing for a couple of weeks and then they find something else instead. Fashion will move on. It'll find new things to do. And then one of the other things that came up during this episode was sort of this
Starting point is 00:20:12 job replacement, job loss theme. Yeah. Do you buy that integrating this sort of AI into the visual medium? We can escape it with basically zero job loss. I don't know. I don't know. Honestly. But I have been writing about like AI and
Starting point is 00:20:31 robotics and automation job losses for years and years and years now. And actually my experience is that it comes down to the choices made by the companies. If they want to keep on the same amount of people, they will do that. If they want to say, well, you know, we can do the same amount of work with fewer staff, then they will make those cuts. It really comes down to what the companies want to do and what the macro trends in the economy are. If you have a recession, then you're going to lose jobs. If there's less people buying advertising, there's less people, you know, there's less demand for this content, then absolutely you're going to lose jobs. I would say that automation doesn't necessarily mean you have to. I would say that that comes down to the company's choices. So if jobs are lost
Starting point is 00:21:12 because of this, I don't think AI will be to blame. I think managers and bosses will be to blame. What are you watching for? What are you interested to see? Where do you think this is all heading? Well, the big next thing, which is still filtering into the industry and we haven't got there yet, but it's happening quite, it's happening, is deepfakes for non-shady, non-horrible stuff. And now I want to be clear, you know, deepfakes have a bad reputation for a good reason. It's not because they've been used for political misinformation. I feel that was something that people were worried about and it hasn't really happened. But they have definitely been used for non-consensual pornography, essentially.
Starting point is 00:21:49 That's the big horrible use case for them at the moment. So that's a real problem. Companies need to do more to deal with it. law needs to be more aware of it, there need to be better ways to address it. But there are other uses cunning in now, which are just adding deepfakes into the usual set of tools used by creators. I think one really odd thing I saw recently, I don't know if you saw this at all, was Bruce Willis did a deep fake of himself in a series of Russian mobile phone adverts. Did you see that at all?
Starting point is 00:22:19 I didn't see it. Mississippi. Mississippi, very interestingly, it is not Bruce Willis as he looks to. It is Bruce Willis, as he looked 20, 30 years ago. It's Bruce Willis in his sort of heyday, and they did one interview with the people who made the deep fake. And they were like, yeah, who wants to see Bruce Willis now? He's old. We gave the people what they want. And basically they have these series of little vignettes, and he appears alongside a sort of very famous Russian comedian.
Starting point is 00:22:48 They get into all sorts of scrapes, and then they get saved by their fabulously well-priced mobile phone data plan. That's the script each time. And Bruce Willis just says like one or two things. He has one or two lines. But I just love the idea someone went up to Bruce Willis and was like, Bruce, how would you fancy making money without doing any work at all? All you need to do is sign this paper. And then we're going to take your image from all these old films and we're going to put you into these Russian TV adverts that you now have to worry about. You know, you just get a check.
Starting point is 00:23:21 Who cares? And I think that that is something that's really going to change a lot of the economics. And I think it's something you talked about with Veritone, right, with the voice actors hiring out their voices. Right, in our voice synthesis episode. Yeah, I feel it's like a similar thing in that if you have built up an image, you will then in future be able to rent that image out in a way that is very cheap for you. And I think that would change how people think about celebrity endorsements instead because, you know, it becomes not something where that point. person has had some involvement, but literally all they've done is sign a piece of paper that says, yes, you can use my likeness for this, this and that. I mean, would that, would you be freaked out
Starting point is 00:24:01 by that if you were getting sold deep fake endorsements by, you know, your favorite celebrities or most relatable celebrities? I don't know. I don't know if I'd be freaked out, but it is interesting because I think we hinted at this in the previous episode where it was sort of this idea that obviously your voice changes over time. So if you have your prime voice, the voice that made you famous or whatever it is. If you could preserve that voice and continue to sell it, that's like a really lucrative opportunity. So it's interesting to think that like Bruce Willis, because he filmed so much as a younger person, is now able to monetize his young self that he never would be able to monetize now. It's like kind of tragic in some ways because I do feel like older actors receive
Starting point is 00:24:42 less work or they get casted as like the grandpa or whatever it is. And now it's like, oh, you can still be that cool action hero that you have sort of potentially lost that. on, which doesn't freak me out. It's just like interesting how that might change who we see and how we see them. That's actually really depressing, Ashley. Thank you for that. Welcome to my mind. But you're right, because it'll totally lock in some of the worst trends we see with celebrity now, which is like you only have value if you're young, you know, you have value if you're beautiful, you have value for a very small window in your career and then you're forever trying to recapture that specific moment you had.
Starting point is 00:25:20 And with the power of AI, you can just lock that person in time in amber for that one specific point and sell it over and over and over again. And I mean, it also, though, ensures that even after they die, they could still continue to be a cultural icon. Yes. Which is interesting, too. Like, we see the holographic, you know, holograms or whatever sometimes come out. But I feel like the deep fake technology actually could ensure that like Arnold Schwarzenegger,
Starting point is 00:25:45 I'm just going with action heroes now. could play the Terminator. Over and over. A hundred years or now or whatever it's going to be. But I feel we are already on that path. If you think, you know, the Star Wars stuff that they've done. And like they had, oh, spoilers for the end of season two of the Mandalorian. But, you know, Luke Skywalker comes back and it's a young Mark Hamill.
Starting point is 00:26:08 Are you a Jedi? I am. I read an article. They tested deepfakes for doing that. It was not good. enough and they went with the old school CTI. But again, you have this like, it becomes less about the person and it becomes all about the intellectual property.
Starting point is 00:26:26 Yep. And therefore, it's about what does Disney own and obviously Disney now owns this entire vast universe and the Marvel universe and, you know, and they can just keep pumping stuff out. And I wonder if that's going to change how we think about celebrities and actors and favorite films and I don't know. Is it good? Is it bad? I mean, I love this as a place to end. but let's end on a high note here. So can you give us your utopia version of AI in the future with visual medium?
Starting point is 00:26:54 So the utopian version is, again, it comes down to economics, right? I feel like one version we've talked about now is dominated by big corporate beer moths like Marvel and Disney, who are sort of churning out the same stuff. And the other version is that these tools are super simple and easy to use. Everyone gets them and it sort of unlocks new access and creativity. I feel like the conversation you had with flawless was really about that sort of barrier to entry and access and I really like I love what they do
Starting point is 00:27:24 because I feel that they have this idea that AI can help break down borders and God, I cringe just saying that. I'm sorry, oh my God. But you know, if you can have a thing where you take a foreign film and it wouldn't get the audience, it would, but it's an amazing film.
Starting point is 00:27:41 It's the greatest piece of art to come out of the world in 20 years but it's in Swedish. So, you know, it's not going to get seen by American audiences. But if you can take that film, press a button that dubs it seamlessly into familiar voices and actors. I kind of think that's a win for humanity, isn't it? Sort of? No, yeah, for sure.
Starting point is 00:28:00 Like, the access. So I feel that's a positive vision, definitely. And just to finish on our Luke Skywalker example here, theoretically, if Luke Skywalker was cheap enough that you could rent him as a fan and create your own movie featuring Luke Skywalker. Yeah. There we go. Fandoms.
Starting point is 00:28:19 A whole new thing, new fan fiction, except the movie medium. I feel like this is, we're trending towards the metaverse in this conversation, which is another big topic. Because yeah, then it becomes like who has access to the intellectual property, who gets to play as it, and who gets to use these characters. Yeah. And so possibly there is this future bursting with creativity and ideas and access. and the sort of universal fun and there's the world we live in at the moment. Which way do we think it's going to go?
Starting point is 00:28:49 We will let the audience determine for themselves. They can only take what we say and do what they will with it. But thank you as always, James. This was amazing. No problem at all, Ashley. Absolutely, absolutely my pleasure to chat. Thanks again for listening to this Vergecast AI mini-series.
Starting point is 00:29:12 This podcast is made by producer Liam James, Senior Audio Director Andrew Marino, senior reporter James Vincent and me, senior reporter Ashley Carmen. Talk soon.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.