Instant Genius - Deepfakes, with Sam Gregory
Episode Date: July 25, 2022Technologist Sam Gregory explains what deepfakes are and why they have seen a sudden rise. Hosted on Acast. See acast.com/privacy for more information. Learn more about your ad choices. Visit podcastc...hoices.com/adchoices
Transcript
Discussion (0)
Ambition comes in all shapes and sizes.
At First Citizens Bank, we roll with your goals
because we're built for what you're building.
Fit for your ambition for Citizens Bank.
No one goes to Hank's for his spreadsheets.
They go for a darn good pizza.
Lately, though, the shop's been quiet.
So Hank decides to bring back the $1 slice.
He asks co-pilot in Microsoft Excel
to look at his sales and costs
to help him see if he can afford it.
Co-pilot shows Hank where the money's going and which little extras make the dollar slice work.
Now, Hanks has a line out the door. Hank makes the pizza. Co-Pilot handles the spreadsheets.
Learn more at M365 copilot.com slash work.
Kayak gets my flight, hotel, and rental car right, so I can tune out travel advice that's just plain wrong.
Bro, Skycoin, way better than points.
Never fly during a Scorpio full moon.
Just tell the manager, you'll sit.
No, instant room upgrade.
Stop taking bad travel advice.
Start comparing hundreds of sites with kayak
and get your trip right.
Bad advice?
You're talking to me?
Kayak, got that right.
This podcast is sponsored by name, audio, and focal.
Streaming has made music more accessible than ever,
but true listening is about more than ease.
It's about quality.
British audio experts name audio,
alongside French acoustic specialist, focal,
combine handcrafted tradition with,
cutting-edge innovation and high-end materials, delivering digital precision with analogue warmth.
So you can experience exceptional sound at home. Music just as the artist intended.
Visit name audio.com to learn more.
BBC Science Focus magazine. This is Instant Genius, a bite-sized masterclass in podcast form.
I'm Alex Hughes, staff writer at BBC Science Focus magazine. This week, I'm joined by the
technologist Sam Gregory. He is the program director at Witness, an organization that uses video
to promote human rights. He tells me about the rise of deep fakes, their use, and why we should
be worried about them. So I think the best place to start here is what is a deep fake?
What is a deep fake? So a deep fake is a way in which you make someone look like they said or did
something they never did. And we often use deep fakes to describe a whole range of ways in which
it's become easier to manipulate video and to edit video and images more seamlessly.
And what is the purpose of them?
That's a good question, because deep fakes are a technical process, right? They're a way in which
we use new forms of machine learning or artificial intelligence to create these more
realistic fakes of people, of events, of, indeed, of faces of people who never existed, right?
So the original purpose that brought them to the public eye was, in fact, one of the areas that
has been one of the most destructive uses of deep fakes, which was in 2017, a Reddit user started
using these tools, these new forms of machine learning, to place the face.
the faces of actresses and celebrities into pornographic videos to replace their faces and place
them in or replace the faces of the actresses and the porn videos with the faces of the celebrities.
And so when we think about deepfakes that the first place that they were used in the public eye
was really in these non-consensual sexual images.
Is there other ways that have been used?
I mean, I've seen them quite heavily used in the, for actors.
There was a famous one around Tom Cruise or politicians.
Yeah, so again, it's really the key thing is to think, what do we mean by deepfakes, right?
So if we're thinking about face swaps, right, you have all these non-consensual sexual images
that involved swapping the faces of celebrities and actresses important.
movies, as well as, of course, ordinary women who are targeted by this. We also had right at the
start all those Nicholas Cage face swaps. So, you know, there was just a proliferation of people
swapping Nicholas Cage's face into many other movies, right? So it became a meme of the Nicholas Cage
Deepfake. And I think, you know, what we've seen is that there are, you know, a range of ways people use
deepfakes. And there's also a range of deep fake techniques. And so I think it's worth, often, when
people working in this space talk about synthetic media, not just deep fakes, because what we're
talking about here is a whole range of things. So we're talking about the face swap, right, which is this
classic deep fake, where you take someone's face and you swap it with someone else's face. But we're
also talking about things like what's known as lip sync dubbing. That's when you make someone's
lips move to a different soundtrack. And we're also talking about what people talk about as
puppetry, which is when you make someone's body or face or facial expressions move.
based on the actions of someone else, right?
So essentially the idea of a puppet.
And so when you look across that range of ways
in which you can use these synthetic media tools,
you see a range of usages.
So you've got the usages that are in the eye, right,
like the Nicholas Cage deepfakes, you take Nicholas Cage's face
and you put him in another movie.
Or the ones that were really prominent,
and I think people saw them last year,
like the Tom Cruise deepfakes.
deep Tom Cruise, right? These were ones where, you know, Tom Cruise or someone who looked like
Tom Cruise, looked at the camera, you know, said something that sounded exactly like him. And,
you know, everyone was puzzled, right, because they emerged on TikTok, and it wasn't clear that
Tom Cruise, the real Tom Cruise, had a TikTok account. And so this is the sort of the really
professional end is, you know, the Tom Cruise deep fake on TikTok or the way in which
they made the young Luke Skywalker in the Mandalorian or the Nicholas Cage, uh,
face swaps, or sassy justice, which was a TV series from the South Park creators in the US,
right? So you've got that part of it, which is the really commercial end. And then in the
middle, you've got people using things like these lip sync dubbing, right, the ability to put
words in someone's mouth to do things like make David Beckham speak in multiple languages,
right? So an advocacy group that was campaigning against malaria made David Beckham appear
to, you know, give an advocacy message about malaria.
area in multiple languages.
And of course, David Beckham is a tremendous footballer.
He is not a talented linguist, right?
So they made him look as if he could speak in seven or eight languages
and make it look as if his lips matched those languages.
So you have those lips and dubbing and that area is really growing, right?
So it's really growing as an area that people think about for, you know,
dubbing movies, for making personalized messaging and for, you know, doing things like the David
Beckham ad.
And then the really simple end of deep fakes and synthetic media is the stuff you can do in an app like, you know, ReFace or Face app, right, which is, or Wombo, right?
Like make yourself appear to sing a song, make yourself swap yourself with a celebrity in a pre-programmed video.
Or even in something like the deep nostalgia app where you're just reanimating the face of a long dead relative and maybe making them say a few words.
So the real key with deep fakes is to recognize there's such a range, right, between,
you know, they're really complicated and in fact hard to make ones.
And I think that's really important for people to know, like the Tom Cruise one,
all the way through to the, you know,
simple one-click reanimating your grandmother type ones you can do online.
And I'm not going to make you go through all of these,
but I think especially the face swapping,
maybe being the most prominent one.
What's the process for those being made?
Yeah, so face swaps and again, deep fake face swaps are sort of where this really came
into the eye.
They're based on a form of machine learning, typically called deep learning, which is where computers essentially learn by example, right?
So they learn how to create a face of someone based on a range of examples, what's known as training data.
So you feed lots of examples of, say, someone's face into these algorithms, and they progressively improve in developing a version of that face, right?
And there's a range of different scientific methods underlying it.
Most of the sort of face swap techniques are built on what are known as neural networks,
which are modeled on the human brain, the way we learn by example.
And you are essentially training one of these algorithms or giving it the data to train itself
to make a better and better version of something they've been provided,
an example of, like a face.
Now, the way that people often think about the face swaps and the most prominent technique has been something called generative adversarial network or people often hear with the word GAN.
And what you're having there is, in fact, two of these networks, two of these neural networks competing against each other.
One is trying to create, for example, really good fakes of my face and the other one is trying to detect those.
They're essentially competing.
And so, you know, the fakes, the forger of my face creates a better and better version of my face.
The other network is trying to detect it, giving essentially feeding back into the system,
and you basically have this sort of competition to improve the faking of the face.
So that's, you know, the underlying technology here.
There are many different ways to build those networks.
And typically also, I think this is really important when we're thinking about those face swaps.
People are using multiple methods, right?
So they may be using one of these deep learning methods,
maybe a generative adversarial network or something called a CNN,
a convolutional neural network.
But often they're also adding on kind of CGI effects, right?
So this builds on techniques that we've had for 30 years in the film industry, right?
And so take a look at that Tom Cruise deep fake.
That required lots of this data of Tom Cruise,
probably gathered in a range of different circumstances.
it required a really talented impersonator,
who's the underlying person who then has their face replaced.
And undoubtedly, it had also some more traditional CGI visual effects happening on it.
So there's not one deep fake creation method,
and certainly when it comes to the really sophisticated deep fakes,
people are, you know, takes real work to get them right, right?
You need to keep shifting the parameters on these algorithms.
You need to think about the data you fed it,
and you may need to do this additional CGI visual effects.
work at the end. And something that I think is quite interesting with a lot of this is a lot of the
futuristic technologies that are seen online now use a tremendous amount of energy and quite costly
to produce. Is that the same with these kind of technologies? Yeah. So, you know, they require
computing power, right? So they're not blockchain, right? They're not, they're not, you know,
taking up the energy of a, you know, of a small country in order to generate.
Bitcoin or something like that.
They are computationally intensive.
And that's been one of the big sort of races, right,
from companies like Nvidia and others is to develop the GPUs,
the computational power that you can use to do this.
And so really, deepfakes in their current form
come from the intersection of two things.
They come from these advances in deep learning,
these advances in artificial intelligence and machine learning,
and they come from the fact that we have more computational power available to us to do it.
But it's not cheap to do a really good deep fake.
And I think, you know, as we also look ahead to what the threats are, you know,
it's worth remembering that at least for the moment to do the really good face swap deepfakes
is still computationally intensive, take some investment in those GPs, take some money.
Something that for me at least, I don't think deepfakes have been around that long
or at least have been recognizable in the public eye for that long.
Is it a new issue or is it something that is just maybe only recently been identified
by the mass public?
Yeah, I guess the question is, you know, media manipulation,
like the ability to edit videos, edit photos, manipulate videos.
We've had that for a long time.
Actually, Daniel, just actually let me loop back to the question.
You're asking more about like literally deepfakes,
so you're asking about kind of like the ability to manipulate it.
video and photos I can go either way.
I think more on the side of deep fakes.
So really the advances that allowed us to have the deep learning side of this
are really in the last eight or nine years, 10 years, right?
So the ability to do this to use these algorithms that learn from the data you feed it
in order to then build, for example, a fake face of URI.
that's really a technical advance of the last 10 years.
So really, you know, the public eye has seen deepfakes really in real time as they've been developed, right?
And I think that's partly a function, of course, of the horrible origins of deep fakes as we first learned about them, really,
because they were being used in what continues to be one of their most pervasive and malicious usages to make these, you know, non-consensual sexual images of women.
So, you know, very quickly that they jumped into the public eye because of this malicious usage that was being used in 2017, 2018.
And now they're highly visible.
And, you know, I've been leading this project for about five years called prepare,
don't panic about deep fakes.
And one of the things I, you know, I try and emphasize to folks is, you know,
there are very clear malicious usages now, like the non-consensual sexual images.
But there's also been a lot of hype, right?
So if you, you know, you think about the headlines in, say, 2018, 2020, you know,
deep fakes will disrupt, you know, the elections, pick an election anywhere globally.
I think those have done a disservice to us actually really trying to,
work out, what is the technical underpinnings here? What are the real threats? What is the real
things we need to focus on? So they've been in the public eye for five years, but often in this
rather distorted way that doesn't capture the real threats that exist and overhypes the reality
of what you can actually do with them. I mean, you've touched this quite a lot, I think, already,
but it's not just a video then. A deep fake can be multiple different formats, but is when it's
say a voice manipulation or something in a different form,
is that still what you'd consider a deep fake,
or is that purely when it's using the faces and swapping faces?
Yeah, it's a great question whether, you know,
what deepfakes means?
And we're actually sort of trapped in the first word that was used literally by the creator
who was making these non-conceptual sexual images.
The Reddit user is called deepfakes, right?
And, you know, and so that's tough.
Like I think a lot of folks prefer to use terms like synthetic media because it allows us to include the face swaps.
It allows us to include the lip sync dubbing.
It allows us to include the ability to make someone's face or body move based on another source.
It allows us to include the ability to create events and faces that never existed, the so-called, this person does not exist, right?
If you see a face that looks hyper-realistic, but it is someone who never existed.
And they also allow us to include things like audio and image generation, right,
like the ability to make someone's voice sound like someone else.
And indeed, something like Dali or imagine these tools we see that allow us to generate,
you know, images based on text.
So we're sort of trapped by the word deepfakes.
I think it's often more helpful to say synthetic media
and recognize that allows us to talk about this whole range that cuts across video,
images and audio.
So do you think maybe Deepfakes gives it a bad PR and that we're,
pinning it down something that isn't entirely fully accurate for the entire field?
Yeah, deep fakes absolutely is, you know, because it's, you know, because of its origins in
creating non-consensual sexual images, because of its implications that it's trying to fool us,
right, that these are fakes, I think it does a disservice and it also means that actually
there's lots of legitimate usages that people are trying to pursue in the commercial field for this,
really trying to diversify consumer creativity,
allow you to make more complicated videos,
improve dubbing, make better personalized messaging within a business.
All of those companies definitely push back on the word deepfakes
because of these negative ideas that it's about forgery and faking,
not just improving video creativity and because of its origins
in these non-consensual sexual images.
You touched on it a little bit just there.
Is there ways to use this kind of technology positively?
Is it something that we should be developing
and we should be developing more for positive uses?
Yeah, so there are five or six really positive usages
that we've seen already, right?
So first of all, obviously people use this
for really powerful satire and parody and creative potential.
I love the work of folks like there's a satirical maker in Brazil
called Bruno Sartori, who makes these fabulous kind of soap opera-derived deep fakes of the president
there, President Bolsonaro and the former President Lula.
And they're just hilarious, they're funny, that he captures their movements and their faces.
You know it's a deep fake, but it's tremendously powerful satire and parody, right?
So, you know, creative satire and parody is one very powerful usage we see already.
You know, a second is just you or I being able to do this, right?
if you use an app like ReFace or Face app, you know, it's, it can be just pure fun, right?
Like consumer fun.
It can also have some really interesting usages like Reface is a Ukrainian app.
And they allowed you to swap your face with Ukrainian president Zelensky, right?
And sort of express your solidarity with him and be part of the, you know, the movement for solidarity around the invasion of Ukraine.
Right.
So there's consumer creativity.
There's this satirical power.
And then there are business usages, right?
You can dub much more easily.
You know, you don't have to have subtitles on a movie if you can swap the,
make the lips move in a different language in a much more realistic way.
You know, and so that has real power.
You can create AI-enabled voice, right?
So if people remembered the voice of Val Kilmer in the new Top Gun movie, right,
generated with these AI-derived techniques to make him sound like his voice,
would have sounded if he hadn't had an illness that affected his vocal cords.
So you have these really powerful usages,
and there's definitely an explosion of businesses also saying,
you know, we're moving into a video culture, right?
Like, you know, young people learn by searching YouTube,
by using TikTok as a search engine or Instagram.
So instead of having messages sent, you know, in text and an email,
why not have either a person or a computer-generated avatar be able to say those messages in a realistic way?
So there's a big growth of the sector of companies who are saying, you know, we can make, you know, someone in the company or more often a computer-generated avatar, you know, say things rather than, you know, putting them in text or in another format.
And then, you know, I come out of the human rights space and, you know, I'm worried about the ways people get targeted by deep fakes and have done a lot of work on kind of the misinformation and disinformation side.
But there have also been these amazing examples of people using these types of tools to do things like protect vulnerable individuals.
There was a movie called Welcome to Chechnya that featured very vulnerable LGBTQ activists in Chechnya.
And they did an amazing thing.
They recruited volunteers outside the country.
And they created deepfakes faces, or essentially deep fake faces of the volunteers.
And they swapped them with the vulnerable activists in the film in Chechnya.
You see these life-like people, but the faces are in fact the faces of the activists outside the country.
So, you know, we've got this explosion of ways in which people are using synthetic media tools for creativity, for positive social usages, and for just plain business functions.
Thank you for listening to this episode of Instant Genius.
That was Sam Gregory.
To hear him tell me about how to fight Deepfix and the future of the technology, head over to Instant Genius.
Extra, available only on Apple Podcasts. The new issue of BBC Science Focus magazine is out now.
Pick up a copy in store or visit sciencefocus.com.
This podcast is sponsored by Name, Audio and Focal. The texture and emotional depth of music
can be lost through digital sources or poor signal. Name audio believes you can have digital
precision with analogue warmth. Alongside French acoustic.
specialist vocal, Name creates high-end audio systems combining innovation with craftsmanship
so you can listen to music, just as the artist intended. Discover more at name audio.com.
Ryan Reynolds here for MintMobil. I don't know if you knew this, but anyone can get the same
premium wireless for $15 a month plan that I've been enjoying. It's not just for celebrities,
so do like I did and have one of your assistants assistants switch you to MintMobile today. I'm told
it's super easy to do at mintmobile.com slash switch.
Up front payment of $45 for three-month plan, equivalent to $15 per month required.
Intro rate first three months only, then full price plan options available.
Taxes and fees extra.
See full terms at mintmobile.com.
