The Vergecast - AI might help edit the next generation of blockbusters
Episode Date: September 21, 2021For the next four Tuesdays, Verge senior reporter Ashley Carman will explore how artificial intelligence and machine learning are shaping the future of a variety of industries. In this episode, Ashley... explores how AI is being used to streamline video creation. Guests include VP of Adobe Sensei Scott Prevost, co-founder and co-CEO of Flawless Scott Mann, and Verge senior reporter James Vincent. This podcast was made by producer Liam James, senior audio director Andru Marino, senior reporter James Vincent, and senior reporter Ashley Carman. Read more here Learn more about your ad choices. Visit podcastchoices.com/adchoices
Transcript
Discussion (0)
Support for the show comes from Retool.
Too many companies run critical operations on duct taped spreadsheets,
Slack workflows, and whatever else they could cobble together.
Not because they want to, but because building internal tools
means weeks of waiting on someone else's backlog.
That's where Retool comes in.
Build custom internal tools just by describing what you need.
Prompts something like,
Build Me a Revenue Dashboard on our Salesforce data.
And Retool actually builds it on your company's data,
in your cloud with enterprise security built in.
Go to retool.com slash Verchcast.
We all need to retool how we build software.
Hey, Verchcast listeners, it's Neelai.
For the next few Tuesdays in the Vergecast feed,
we're running a little mini series we made
about the many uses of artificial intelligence
and machine learning across a variety of contexts.
It's all hosted by Verge Senior Reporter Ashley Carman.
We had episode one last week.
Ashley's back for episode two.
Hey, Ashley, how's going?
I'm back. Hello.
So what terrifying things is artificial intelligence going to do this week for us?
This week we're talking about AI and how it's going to affect the video business.
On an audio show.
On an audio show.
It's bold, but we're crossing new frontiers.
How is AI impacting the video business?
Well, there's actually tools being used already.
You might not even realize what you're looking at has been touched by AI in Eli.
Is it all Tom Cruise deepfakes?
We do talk about deepfakes a little bit, but it's not about deep fakes.
It's not all deep fakes.
All right.
Well, I'm very excited for this episode.
Episode two, the Vergecast AI series.
Let's listen to it.
When I say artificial intelligence for video, what do you think about?
We're all probably aware of how social video apps like TikTok, Instagram, and YouTube
use AI and machine learning for recommendations, moderation, and ad targeting, and the all-powerful
algorithm is in everyone's lexicon at this point.
Or maybe you think of deepfakes, those scary videos that put someone's face on another
person's body to make them look like they're doing something they otherwise wouldn't.
Today, we're instead going to focus on how visual AI is being used for good and as a tool to help people streamline their creative process.
Yes, that might mean AI taking on a bigger role in the very human act of being creative.
But what if instead, the AI just assisted us or guided our hand?
Sensei was founded on this firm belief that we have that AI is going to democratize and amplify human creativity,
but not replace it.
That's Scott Provo, VP of Adobe Sensei.
Sensei is Adobe's platform for integrating artificial intelligence
into Adobe's consumer products like Photoshop and Premiere.
Adobe's stance is Sonsei shouldn't make the media for you,
but rather it should make it easier and less time-consuming for you to make the work you want.
Was Sensei able to automate so much of that production work,
it means that whereas before you might have been able to work up two ideas,
Now maybe you can work up 10 different ideas.
And it's through that sort of expansion of your own ability to create
that you might find that sort of outlier idea
that's the true, true creative spark, the one that really shines.
Let's use Photoshop as an example.
Last fall, Adobe released a feature called neural filters,
which you can guess are filters in Photoshop
that use neural networks to edit photos.
These filters do things like,
remove artifacts from compressed images, add makeup, or smooth a person's face.
It can also change the direction of lighting in a room.
With tools like these, work that used to take an editor hours to do ends up taking
only seconds.
There's one particular neural filter called Smart Portrait, which allows you to import
a photo of the face and then very easily edit the expression.
You can make the person smile or frown or angry.
You can change the age of the face.
You can change the hair.
We can change where the eyes are looking, the tilt of the head.
And all of these things can be done by just moving a slider.
Editing a still image is one thing.
But think about video.
Thousands of frames that need to be adjusted or altered.
Adobe has built features into Premiere Pro, its video editing software, that utilizes machine
learning to fix or edit objects in video that would take hours or even days to do manually.
There was a small team of documentary filmmakers who shot a ton of full
footage, and when they got back to edit it, they realized that there were some specs on the camera
lens that ruined all of the footage. It was across the entire footage, but we happened to have
a feature in Adobe Premiere Pro called Context Aware fill for video that lets you remove objects
from the video. So you can identify the object in one frame, and then it uses the knowledge of all the
other frames and the motion to be able to understand what was actually behind that object and fill it in.
And so this team of documentary filmmakers was able to remove all of those spots from the footage
that they had shot and literally saved the day. Otherwise, they would have had to reshoot everything.
You know, instead of having to edit frame by frame by frame to remove it from every frame,
they basically push the button once.
Adobe also makes tools for later in the creative process, like for when you're ready to publish your work.
We have things like auto reframe, right, which intelligently reframes and reformats video content for different aspect ratios.
So say you have a video that was shot vertically and you want to change it to square or a video that was shot in landscape and you want to change it to vertical.
You know, in the past, you would have to go and edit frame by frame.
to make that adjustment in order to keep all the important stuff in view at any particular time.
But Sensei does that automatically in a matter of seconds, which is just game-changing for being
able to take a video and then publish across various different social media outlets that have
different formats.
Other elements of the creative process, such as searching stock images, become less time-consuming
when AI understands what's in the pictures and helps you narrow down what you're looking
for. We have some very powerful image similarity tools that let you start with one image and then find
images that have similar content, similar compositions, similar colors. You can even pick an object from an
image and then say, I want to find other images that have this object but in a different location.
We can literally drag it to a different part of the canvas and it will search for other images
of that object in that location. Those kinds of tools are beneficial with creating marketing
assets like email campaigns, social media video, and other advertising media, anything that requires
a quick turnaround.
Overall, though, Adobe says AI should play a very specific role in the creative process.
Maybe just knowing these tools exist and are easy to use is enough to inspire you to try
something new.
We think of it as sort of part teacher, part muse, and part assistant.
We don't think of it as the person and the AI being separate.
We really think of it more and more as a collaboration between them.
You know, the AI can help to teach the student in some sense.
And, of course, by everything that our customers are doing is helping to train the AI.
So this sort of mutually beneficial kind of relationship.
This field is advancing quickly for consumers.
Techniques like these were previously only available to professionals with large budgets
and specific training and resources.
Now AI is creating alternative and easier ways for everyone to produce the work they want to make.
But we also wanted to check in with the big budget professionals too.
Where are we seeing AI being implemented in Hollywood and the big screen?
While deepfakes haven't really made it onto the big screen just yet,
most studios are actually just relying on traditional CGI for now.
The place where directors and Hollywood studios are on the way to using AI is for dubbing.
My name is Scott Mann.
I'm a co-CEO and co-founder of a company called Flawless,
which specializes in cutting-edge VFX and filmmaking tools that use AI in particular.
The product that Flawless is currently working on is what they're calling TrueSync,
which uses machine learning to create realistic, lip-synced visualizations on actors for multiple languages.
In our last episode, we talked about using AI to create synthetic voices
that speak in multiple languages that are foreign to the original voice talent.
But what if you could also make it look like an actor in a movie is speaking that language?
their lips moving synchronously with the dubbed version of the film.
Scott Mann understands why the film industry would want this.
He's a director himself.
I'd done a film back in 2015, I think, called Heist.
I'm here to ask a favor.
How big?
$300,000 pay.
Get out of here.
Time's up.
Go!
I did an amazing cast, including Robert De Niro.
And so I finished the film in its home language, as you usually do.
And then it's when I saw a foreign dub of the same film.
And I realized how it did it.
different it was, not just in terms of like other voices playing the parts, but the words were different.
Like the script had been altered. The performances were very different. And I kind of was heartbroken
watching it really that so much changes once you hand it off. And I kind of discovered that
that that was this 100-year-old problem, that dubbing has kind of accepted that the mouth movements
do not marry. When an American film is brought to non-English-speaking countries, the script is often
rewritten and re-performed to try and sync with the timing of the original film. Because of this,
the translation is not always exact.
And you're in this kind of horrible wrestle
that despite however good anyone is doing that process,
it's kind of trying to break everything to fit it into a broken image.
And I think that film really kind of set me off
on looking for a solution to that problem.
Scott quickly realized that the technology available in the industry at the time,
using CGI to reconstruct an actor's mouth
and move it to the translated dialogue,
was not going to offer the solution he wanted.
Even the very best artists, doing the very best methods of traditional VFX where you're creating a model, you're lighting it, you're matching it, you do all these huge efforts and huge layers of work.
It just doesn't hold up to the human eye because we've studied faces for our entire lifetimes and we know the subtleties.
Instead, Scott found that using neural networks with tons of data of facial expressions and mouth movements made his idea of reality.
You're training a network to understand how one person speaks, so the mouth movements of an ooh and an art.
different vizimes and phonemes that make up our language,
are very specific, very person-specific.
And that's why it requires such kind of detail in the process
to really get something authentic that speaks like that person spoke-like.
And it's really about retiming mouth shapes and movements
from different places that were recorded earlier in the movie
into later places.
It's kind of like very slight and very subtle, deep editing of the mouth movements.
This would be pretty easily implemented in movies and TV in theory,
because there are already many scene takes from various angles
that are captured during production,
so the team wouldn't need any additional footage.
Now, I know this is an audio-only podcast,
and we're talking about something inherently visual,
so it's hard to actually show a demo here.
But from what flawless has shared so far,
which you can see on their website,
I wouldn't say it's flawless, but it's pretty impressive.
There are moments that may look off,
like when Robert De Niro's lips rarely touch when he's speaking,
but it's not totally distracting,
and when it works, it works well.
Especially a scene the company shares from Forrest Gump speaking Japanese.
The emotion of the character is still there
and makes for a more believable dub.
You sort of forget that it's another voice actor behind the scenes.
What's interesting about using AI to go down this path
is its incentives are not necessarily efficiency
or saving money in time, but immersion.
You're not trying to save time,
you're not trying to reduce costs or replace someone, really.
That's not the end of our business.
The streamers and the studios have been building
an global distribution network,
but they don't have global content.
People do not tend to watch sub-dent of material,
and that's reflected,
I would say, in the value of that content when it's sold internationally,
as in it's exceptionally low, typically under 5% of what in normal value is.
Eyeballs on content, essentially, is where the value is,
and people are not putting their eyeballs on that content.
So maybe in the long-grown you are making more money
by building a larger catalogue of foreign films.
But Scott believes the more content the world shares with each other, the better.
Films that we didn't even know existed,
that were made in other places that we are just not exposed to
and vice versa around the world,
and we've got all these different languages we speak.
And I think through that, we'll get to understand culture better
because currently, if there's a film in a different language,
what's typically happening is it gets remade
and it gets remade into the different languages.
And when that happens, it kind of culturally is changed as a film.
And it's kind of retold through a different lens.
And I think the best way of kind of humankind to come together
is to have a better true understanding
and being able to empathize with our kind of neighbors.
and that's going to be the great benefit of being able to access global films and content.
Scott says Flawless is currently working on a couple productions implementing its True Sync technology
and will have a worldwide release in early 2022.
But as with any AI changing in industry, we have to think about job replacement.
With most Adobe products, sure, if you alone create, edit, and publish the projects you work on,
AI tools will save a ton of time for you.
But in larger production houses, where each role is delegated to a specific specialist, retouchers, colorists, editors, social media managers, those teams might end up downsizing.
We asked Adobe about this.
Anytime technology comes along, people have said it's going to destroy jobs.
And it certainly does shift some of the jobs.
You know, we think some of the work that creatives used to do in production, they're not going to do as much of that anymore.
They may become more like art directors.
And what we think is that it actually allows the humans to focus more on the creative aspects of their work
and to explore this broader creative space.
Scott if flawless maintains a similar sentiment.
Their truce-ync technology might end up shrinking the amount of translators needed for film dubbing.
That's fair.
And I would say, look, that is obviously still to some degree necessary in some places.
Like truthfully, that role is kind of a director, right?
It's like what you're doing there, you're trying to kind of convey that performance.
But you're right, that is one aspect of it that is kind of, will be reduced on.
And it's kind of taking that side of the industry and growing that side of the industry.
So will script supervisors end up becoming directors, or photo retouchers end up becoming art directors?
Maybe.
But what we are seeing today is that a lot of these tools are already combining workflows from various points of the creative process.
audio mixing, coloring, graphics, all used in one piece of video software.
So if you're working in the visual media space,
instead of specializing in specific creative talents,
maybe your job is going to require you to be more of a generalist.
The boundaries between images and videos and audio and 3D and augmented reality
are going to start to blur.
It used to be that there were people who specialized in images
and people who specialized in video.
And now you see people working across,
all of these mediums.
And so, you know, we think that
Sensei will have a big role
in basically helping to kind of connect
these things together in meaningful ways.
Before we get too far into the future,
I want to take what we've learned here
and talk about it with a colleague of mine,
James Vincent, who you are well aware of.
He's our London-based reporter
who covers AI and machine learning
and he's reported on AI in this specific industry quite a bit.
How is AI going to shape the way
we make, consume,
and sell visual mediums going forward.
We're going to take a break, but when we're back,
I'll be asking James to level out the hype
for artificial intelligence in video, movies, TV, photo,
and of course, deepfakes.
Support for this show comes from Shopify.
Starting something new isn't just hard.
It can be really scary, too.
So much work goes into this thing
that you're not entirely sure will even work.
But here's a better thought.
What if it did all work?
What if your instincts were actually right all along?
Shopify wants to help you get there.
They're the commerce platform behind millions of businesses worldwide
and nearly 10% of all e-commerce in the U.S.
From established brands like Allbirds and Heinz
to companies just getting started.
Their design tools make it simple to create the exact online presence
you're envisioning with hundreds of ready-to-use templates available.
And with built-in marketing tools,
you can launch full email and social campaigns in just a few clicks.
So you can connect with customers wherever they are.
It's time to turn those what-ifs into with Shopify today.
You can sign up for your $1 per month trial today at Shopify.com slash vergecast.
You can go to shopify.com slash vergecast.
That's Shopify.com slash vergecast.
So we're back with James Vincent, a senior reporter at the verge who specializes in AI and machine learning.
Hello.
Hi, Ashley. How you doing?
It's always such a treat to get to talk to you.
It's always lovely to be a talking head. I love it.
So as you know, on this episode, we're talking about AI in the visual medium.
Yeah.
One question that has really been sticking out to me throughout this, and I just need you to, like, level set for me.
You know, we're talking to Adobe about how their effects and their different tools could be used to kind of like standardize how people treat their visual content.
So I'm just curious, like, does this mean we're leading up to a world in which content could end up looking the same?
Yeah, I mean, I think it's a really, really interesting question.
And it sort of is one that isn't entirely unique to AI.
I feel like if you think about when Instagram filters first became a thing,
and everyone started putting the same filters on their photos.
And there was this sort of like, you know, it became a fashion.
It became a glut of this one thing.
And then people got bored of it and they moved on to the next thing.
And I think the really thing that gets overlooked in AI sometimes,
which, you know, obviously you bring up here is that it sort of is backward
facing sometimes in that it learns from data that is in the past and some people think that
that means it's not so good at creating novelty, as it were. So I think it's kind of entirely
plausible that you get these set looks perhaps that come in with AI filters. But I feel that
people will get bored of them and they will move on to the next thing very quickly in the same way
that you see Snapchat augmented reality filters, for example, come and go. And they become the hot
new thing for a couple of weeks and then they find something else instead.
Fashion will move on.
It'll find new things to do.
And then one of the other things that came up during this episode was sort of this
job replacement, job loss theme.
Yeah.
Do you buy that integrating this sort of AI into the visual medium?
We can escape it with basically zero job loss.
I don't know.
I don't know.
Honestly.
But I have been writing about like AI and
robotics and automation job losses for years and years and years now. And actually my experience is that
it comes down to the choices made by the companies. If they want to keep on the same amount of people,
they will do that. If they want to say, well, you know, we can do the same amount of work with fewer
staff, then they will make those cuts. It really comes down to what the companies want to do and what the
macro trends in the economy are. If you have a recession, then you're going to lose jobs. If there's
less people buying advertising, there's less people, you know, there's less demand for this content,
then absolutely you're going to lose jobs. I would say that automation doesn't necessarily mean
you have to. I would say that that comes down to the company's choices. So if jobs are lost
because of this, I don't think AI will be to blame. I think managers and bosses will be to blame.
What are you watching for? What are you interested to see? Where do you think this is all
heading? Well, the big next thing, which is still filtering into the industry and we haven't got there yet,
but it's happening quite, it's happening, is deepfakes for non-shady, non-horrible stuff.
And now I want to be clear, you know, deepfakes have a bad reputation for a good reason.
It's not because they've been used for political misinformation.
I feel that was something that people were worried about and it hasn't really happened.
But they have definitely been used for non-consensual pornography, essentially.
That's the big horrible use case for them at the moment.
So that's a real problem.
Companies need to do more to deal with it.
law needs to be more aware of it, there need to be better ways to address it. But there are
other uses cunning in now, which are just adding deepfakes into the usual set of tools used by
creators. I think one really odd thing I saw recently, I don't know if you saw this at all, was Bruce
Willis did a deep fake of himself in a series of Russian mobile phone adverts. Did you see that
at all?
I didn't see it. Mississippi. Mississippi,
very interestingly, it is not Bruce Willis as he looks to.
It is Bruce Willis, as he looked 20, 30 years ago.
It's Bruce Willis in his sort of heyday, and they did one interview with the people who made the deep fake.
And they were like, yeah, who wants to see Bruce Willis now?
He's old.
We gave the people what they want.
And basically they have these series of little vignettes, and he appears alongside a sort of very famous Russian comedian.
They get into all sorts of scrapes, and then they get saved by their fabulously well-priced mobile phone data plan.
That's the script each time.
And Bruce Willis just says like one or two things.
He has one or two lines.
But I just love the idea someone went up to Bruce Willis and was like, Bruce, how would you fancy making money without doing any work at all?
All you need to do is sign this paper.
And then we're going to take your image from all these old films and we're going to put you into these Russian TV adverts that you now have to worry about.
You know, you just get a check.
Who cares?
And I think that that is something that's really going to change a lot of the economics.
And I think it's something you talked about with Veritone, right, with the voice actors hiring out their voices.
Right, in our voice synthesis episode.
Yeah, I feel it's like a similar thing in that if you have built up an image, you will then in future be able to rent that image out in a way that is very cheap for you.
And I think that would change how people think about celebrity endorsements instead because, you know, it becomes not something where that point.
person has had some involvement, but literally all they've done is sign a piece of paper that says,
yes, you can use my likeness for this, this and that. I mean, would that, would you be freaked out
by that if you were getting sold deep fake endorsements by, you know, your favorite celebrities or
most relatable celebrities? I don't know. I don't know if I'd be freaked out, but it is interesting
because I think we hinted at this in the previous episode where it was sort of this idea that obviously
your voice changes over time. So if you have your prime voice, the voice that made you famous or
whatever it is. If you could preserve that voice and continue to sell it, that's like a really
lucrative opportunity. So it's interesting to think that like Bruce Willis, because he filmed so
much as a younger person, is now able to monetize his young self that he never would be able to
monetize now. It's like kind of tragic in some ways because I do feel like older actors receive
less work or they get casted as like the grandpa or whatever it is. And now it's like, oh, you can
still be that cool action hero that you have sort of potentially lost that.
on, which doesn't freak me out. It's just like interesting how that might change who we see
and how we see them. That's actually really depressing, Ashley. Thank you for that.
Welcome to my mind. But you're right, because it'll totally lock in some of the worst trends we
see with celebrity now, which is like you only have value if you're young, you know, you have
value if you're beautiful, you have value for a very small window in your career and then you're
forever trying to recapture that specific moment you had.
And with the power of AI, you can just lock that person in time in amber for that one
specific point and sell it over and over and over again.
And I mean, it also, though, ensures that even after they die, they could still continue
to be a cultural icon.
Yes.
Which is interesting, too.
Like, we see the holographic, you know, holograms or whatever sometimes come out.
But I feel like the deep fake technology actually could ensure that like Arnold Schwarzenegger,
I'm just going with action heroes now.
could play the Terminator.
Over and over.
A hundred years or now or whatever it's going to be.
But I feel we are already on that path.
If you think, you know, the Star Wars stuff that they've done.
And like they had, oh, spoilers for the end of season two of the Mandalorian.
But, you know, Luke Skywalker comes back and it's a young Mark Hamill.
Are you a Jedi?
I am.
I read an article.
They tested deepfakes for doing that.
It was not good.
enough and they went with the old school CTI.
But again, you have this like, it becomes less about the person and it becomes all about
the intellectual property.
Yep.
And therefore, it's about what does Disney own and obviously Disney now owns this entire vast
universe and the Marvel universe and, you know, and they can just keep pumping stuff out.
And I wonder if that's going to change how we think about celebrities and actors and favorite
films and I don't know. Is it good? Is it bad?
I mean, I love this as a place to end.
but let's end on a high note here.
So can you give us your utopia version of AI in the future with visual medium?
So the utopian version is, again, it comes down to economics, right?
I feel like one version we've talked about now is dominated by big corporate beer moths like Marvel and Disney,
who are sort of churning out the same stuff.
And the other version is that these tools are super simple and easy to use.
Everyone gets them and it sort of unlocks new access and creativity.
I feel like the conversation you had with flawless
was really about that sort of barrier to entry and access
and I really like I love what they do
because I feel that they have this idea
that AI can help break down borders
and God, I cringe just saying that.
I'm sorry, oh my God.
But you know, if you can have a thing
where you take a foreign film
and it wouldn't get the audience, it would,
but it's an amazing film.
It's the greatest piece of art
to come out of the world in 20 years
but it's in Swedish.
So, you know, it's not going to get seen by American audiences.
But if you can take that film, press a button that dubs it seamlessly into familiar voices and actors.
I kind of think that's a win for humanity, isn't it?
Sort of?
No, yeah, for sure.
Like, the access.
So I feel that's a positive vision, definitely.
And just to finish on our Luke Skywalker example here,
theoretically, if Luke Skywalker was cheap enough that you could rent him as a fan
and create your own movie featuring Luke Skywalker.
Yeah.
There we go.
Fandoms.
A whole new thing, new fan fiction, except the movie medium.
I feel like this is, we're trending towards the metaverse in this conversation, which is another big topic.
Because yeah, then it becomes like who has access to the intellectual property, who gets to play as it, and who gets to use these characters.
Yeah.
And so possibly there is this future bursting with creativity and ideas and access.
and the sort of universal fun
and there's the world we live in at the moment.
Which way do we think it's going to go?
We will let the audience determine for themselves.
They can only take what we say
and do what they will with it.
But thank you as always, James.
This was amazing.
No problem at all, Ashley.
Absolutely, absolutely my pleasure to chat.
Thanks again for listening to this Vergecast AI mini-series.
This podcast is made by producer Liam James,
Senior Audio Director Andrew Marino,
senior reporter James Vincent and me, senior reporter Ashley Carmen. Talk soon.
