Radiolab - Breaking News
Episode Date: November 19, 2019Today, two new technological tricks that together could invade our most deeply held beliefs and rewrite the rules of credibility. Also, we release something terrible into the world. Support Radiol...ab by becoming a member today at Radiolab.org/donate.
Transcript
Discussion (0)
Hey, Simon Adler here, a producer at the show.
Today, we're going to play you an episode we made a while back, back in 2017,
and it's about how hard it's getting to decipher fact from fiction.
And we wanted to do that because, well, next week we'll be putting out a story
showing the real-world political consequences of this new reality.
What happens when these tools you're about to hear about get released into the wild?
And so, without further ado, I give you breaking news.
Wait, you're listening.
Okay.
All right.
Okay.
All right.
You're listening to Radio Lab.
Radio Lab.
From.
W. N. Y.
C.
See?
Yeah.
All right.
Hello.
Hello.
Hello.
I can hear you.
They can't hear me.
No, we can hear you.
We can hear you.
Yeah, you can.
What we can also hear is us twice.
Us twice.
Yeah.
Hey, I'm Jedai Boomrod.
I'm Robert Crilwich.
This is Radio Lab.
And today,
Oops.
Oops.
You don't hear us.
We have a story about how the echoes of you can go out into the world and come back, invite you in all of us really, in the butt.
Oh, wait, maybe we're fine now.
Is my echo gone?
Yes.
Okay.
Okay, we're good.
We're good.
And it comes to us from our producer, Simon Adler.
Yeah, Nick, hello.
I'm sorry.
Okay, so this is Nick.
My name is Nick Bilton.
I'm a special correspondent for Vanity Fair.
And his beat, you could say, is trying to predict the future of technology.
To look into the future into this kind of crystal ball and try to predict what the next five, 10, 15 years would look like for the media industry.
Do you have a good batting record?
Like, did you call some big ones?
Oh, yeah.
You know, phones in our pockets that would be like supercomputers that social media would drive news, not newspapers and so on and things like that.
So it's been pretty good.
reached out to you because I came across this article that you wrote, an article that sent
shivers down my spine, and I'm not one to typically be given shivers by articles. So I guess
how did you stumble into all of this, and where does this start for you? So I was sitting
around with some friends in my living room, and a friend of mine mentioned, oh, did you see this thing
that Adobe put out recently? We live in a time when more people than ever before believe that they
can change the world.
And that conversation led Nick to a video, a video online of the Adobe Max 2016 conference.
There are tons and tons of people in the audience.
And up in front of them, it looks like the stage of an Apple product launch, but sort of beach-themed.
Why beach?
I have absolutely no idea.
It's a little TMI, don't you think?
There are two hosts that are sitting in these, like, lifeguard chairs.
Say you, say me.
media in Jordan Peel.
Jordan Peel is in Kean Peel, Jordan Peel?
Yes.
And then the other host is this woman,
Kim Chambers, who is a marathon swimmer and an Adobe employee.
Ooh.
And then...
Please welcome to the stage.
On walks.
Zayu.
Zayu Jid.
Hello, everyone.
Young guy.
Glasses.
You guys have been making weird stuff online.
With photo editing.
And he says Adobe is known for Photoshop.
We're known for editing.
photos and doing magical things visually.
Well, we'll do the next thing today.
Let's do something to human speech.
Pulls up a screen on a Mac computer.
Well, I have obtained this piece of audio
where there's Michael Key talking to Peel
about his feeling after getting nominated.
Kegan Michael Key had been nominated for an Emmy,
and he and Jordan Peel were talking about it.
There's a pretty interesting joke here.
so let's just hear it.
I jumped on the bed, and I kissed my dogs and my wife, in that order.
Not a bad joke.
So let's do something here.
Okay, so suppose Michael Key wanted to send this audio to his wife.
In other words, what if Kegan Michael Kee was feeling like,
that was a little bit rough on my wife?
That was a little bit mean.
You know, maybe he wanted to go and rewrite history
and say that he kissed his wife.
wife before the dogs.
So he actually wants his wife to go before the dogs.
So, okay, so what do we do easily?
So Zeyu clicks a button, and the program automatically generates a transcript of the audio and
projects it up on the screen behind him, you know, just text of what Kegan Michael Key said.
Okay, let me zoom in a little bit.
And then...
Copy, paste.
He just highlights the word wife and pastes it over in front of dogs.
Okay, let's listen to it.
clicks play. And I kissed my wife and my dogs.
Whoa.
Oh, so he was able to
move the, edit the audio by moving
the text around in the text box. Yes, exactly.
Okay, well, that's kind of cool. Kind of impressive.
Wait.
But then... Here's more. Here's more.
We can actually type something
that's not here.
Wait, wait, what?
Just hang on, just hang on.
I heard that, actually on that day,
Michael actually kissed our Jordan.
So to recover the truth, let's do it.
He goes back into that little word box.
So let's remove the word my here.
Your secrets out, Jordan.
And just type the word Jordan.
So he types it out, J-O-R-D-A-N.
And just to be clear, Keegan Michael Key did not say Jordan anywhere in this clip.
And here we go.
And I kissed Jordan and my dogs.
Wait, he just typed in a word that the guy never said,
and it made the guy say the word that he never said,
as if he actually said it?
Exactly.
Well, you were witch.
Jordan jumps out of his lifeguard chair,
starts sort of stomping around the stage.
You were demon.
Oh, yeah.
I have magic.
And the last magic I'm going to show you guys is we can actually type small phrases.
So let's say, okay, so we're moving.
He deletes the words, my.
dogs, any types, three times.
Oh.
And playback.
And I kissed Jordan three times.
All right, wait, you're saying that Kegan Michael Key never said, ever said Jordan, never said three, never said times, never said any of those words.
And somehow just from the typing in of it, the guy is now saying them and we're hearing them in his voice.
That's what just happened.
Yep, that is exactly what the demo claims.
It's essentially Photoshop for audio.
Nick Belton again.
You could take as little as 20 minutes of someone's voice,
type the words, and it creates in that voice that sentence.
With just 20 minutes of the guy talking.
Yes.
But how? How in heaven do you do this?
Okay, and so we're here at Adobe.
What exactly do you do here?
Sure. I'm the product manager for audio.
This is during Gleaves.
I flew out to Seattle and tracked him down to ask him exactly that question.
So essentially what it does is it does an analysis of the speech, and it creates models.
And it basically...
And he explained to me that this program, which they call voco, by the way,
what it does is it takes 20 minutes, or actually 40, if you want the best results,
of you talking, and it figures out all of the phonetics of your speech, all of the sounds that you make.
Finds each little block of sound and speech that is in the recordings.
Chops them all up, and then when you go and type things,
in, it will recombine those
into that new word. But what
if it encounters a sound that I've never made?
Well, the theory
is, in 40 minutes of speech,
which is the amount they recommend you feed in,
you're going to probably say
just about every sound
in the English language. So if
really, so like phonetically I go
I run through the gamut in 40 minutes?
Yes. Wow.
Well, and like
what would you or what are you hoping
people would use
a product like Voco for?
So for the video production tools
and for what audition is used for a lot
is dialogue editing. The whole idea, Durence said,
is to help people that work in movies and TV.
A lot of our customers
record great audio on set,
the actors and the dialogues and everything.
And when they come back,
if sometimes there's a mistake or they make a change.
Like the actor on set said shoe,
but what he was pointing at was obviously a boot.
And right now, they do what's called ADR.
They'll bring the actor in.
They'll re-record some lines, and they'll try and drop that into the video.
But you're not using the same microphones.
You're not in the same location.
The actor might be sick that day, so his voice sounds different.
And a lot of times you can really hear that stand out in productions if they don't get it just right.
But with VOCO, you just delete the word shoot, type in boot, and boom.
There it is.
Using the same source media and the same characteristics and have it just sound seamless
and natural. And so it's going to be
a sort of, the hope is that it will make
the lives of professional post-production editors easier the world over.
That's our hope right now, yeah.
But that's not exactly...
Well, it's, I mean, it's...
What Nick Bilton thought when he saw this video.
It could be Donald Trump's voice or Vladimir Putin.
So I saw that, and I thought, wow,
imagine if audio clips start getting shared around the internet
as fake news
of a fake conversation between
Vladimir Putin and Paul Manafort
about trying to get Trump into the White House
or something like that.
Right.
And I was like, whoa, this is scary stuff.
But we're just getting started.
In the words of John Raymond Arnold,
played by Samuel L. Jackson in the movie Jurassic Park
in his own voice.
Hold on to your butts.
Things are about to get a lot
Grazier.
So, forget voices for a second, because now...
One, two, three, four, five. One, two, three, four, five.
It's FaceTime.
All right, we are at the Paul G. Allen Center at the University of Washington in Seattle.
So I left Adobe and went across town to talk to the head of the Grail Lab.
Hello.
Hello.
Yeah.
Hi.
Hi.
Nice to meet you.
Dr. Ira Kamelmacher Schliserman.
So I'm a professor in the computer science department at the University of Washington and also work at Facebook.
Can I just have you come a little closer to go there.
Okay, just to back up for a second.
When Nick first saw the VOCO demonstration, he started to wonder, okay, like, how could this be used down the road?
My original thesis was, oh, well, maybe what will happen is that you will be able to create 3D actors, just like you did in Star Wars.
Then join it with the VOCO stuff to create a fake Hillary Clinton and, you know, Donald Trump having a conversation or making out or whatever it is you want to do.
And that led him to investigate the type of work that Ira does.
So I've been using these terms like facial reenactment and facial manipulation.
Are those the right words?
And then what the hell do these words mean?
Yeah, so, I mean, it's all a way of animating faces.
And it started from the movies, right?
The concept is to drive these remotely controlled bodies called avatars.
Think like the aptly named movie Avatar.
Or...
Sergeant, yes.
Going a little further back.
There's no sign of intelligent life anywhere.
Toy Story.
And to make the characters come alive, what you need is the expressions of the actors playing them.
So the movie space means that you will bring a person to his studio.
Then you cover their face with these sticky sensory marker things.
And then you will spend hours over hours capturing the person's little dynamics.
Like smile.
Open mouth, teeth.
No teeth.
Sad. Surprised, surprised, disturbed.
Things like that.
Angry, bloated.
And from that, they create a virtual character capable of emoting all those expressions.
And to make that character believable, the animators sometimes have to model a bone structure and muscles.
And as you can imagine, this can get very, very expensive.
And so what people like ERA started to wonder was, like, can this be done on a budget?
So she and others in the field started feeding videos of faces into computers and trained those computers to break down the face into a series of points.
Our models are about 250 by 250.
That is 62,500 points on one human face.
And once we know that, right, we can track the points.
So once you can track how my face moves through a video clip by these 250 by 250 points,
what can you then do with that information?
Well, I can apply the points on the face on a different model of a different person.
Now this is where things get quite strange
because instead of being able to map all of your facial movements
onto a computer-generated virtual character or person,
what ERA and others in this field of facial reenactment
have figured out how to do is to map your facial movements
onto a real person, a pre-recorded real person.
What?
What does that even mean?
Yeah, how does that work?
The best example of this is this piece of software that Nick showed us.
This software that I found from these university students called face-to-face.
We present a novel real-time facial reenactment method that works with any commodity webcam.
There's a video demo of this, and when you open it up, this very monotone voice comes in saying...
Since our method only uses RGB data for both the source and target actor.
And you're like, what the heck is this? And the screen pops up.
Here we demonstrate our method in a live setup.
On the right, you've got this heavyset man,
goatee spiked hair.
On the right, a source actor is captured with a standard webcam.
He's arching his eyebrows, he's pursing his lips, he's opening his mouth widely.
Sort of like if you're making funny faces for a two-year-old kind of thing?
Yeah, and then...
This input drives the animation of the face in the video shown on the monitor to the left.
On the left, you've got this Dell computer screen displaying a CNN clip of George Bush.
This is a real clip of Bush back from 2013, and his face is there looking right.
at the camera, occupies most of that screen.
A significant difference to previous methods.
And what you start to notice is when the man with the goatee smiles,
George Bush in the CNN clip also smiles.
And when the man raises his eyebrows, George Bush raises his eyebrows.
And you realize this man is controlling George Bush's face.
Wait, so this is a guy in the present controlling a past George Bush,
a real George Bush from an old video clip?
Yeah.
Okay, I pulled up a video.
for you here.
Okay, cool.
And a little while back
when we were just learning
about this,
we happened to have
our friend Andrew Morant
who writes for the
New Yorker in the studio.
So that is
George Bush's face.
Mm-hmm.
What?
Oh, God.
Oh, God.
That's terrifying.
His...
Okay, so, yeah,
I cannot stop watching
George Bush's face.
Oh, they're doing with Putin
now.
Holy God.
So I just have a guy
just sort of going,
and then that's what Putin is doing.
Yeah.
Uh-oh, now it's Trump.
You know, I mean, those videos online had my mouth the gate.
Again, Nick Belton.
This is a form of puppetry where...
Your face is the puppeteer.
And the only thing is that George W. Bush is the puppet.
So I sit in front of a camera.
I smile and the business is taken care of?
That's real time.
This isn't like you have to render some software
on your computer.
It's literally you download a clip
or you take a clip from cable news
and you turn on your webcam
and however long it takes you to do it
and you're done.
It's the same as to shooting a video on your phone.
What is this for?
So what are the applications of this?
I want to be able to help develop telepresence.
This is Ira again.
So telepresence.
Yeah.
So for example,
so my mom lives in Israel
and I'm here.
And what does it be cool if I could have some kind?
It's kind of crazy, but right,
but if I could have some kind of hologram of her sitting on my couch here
and we can have a conversation.
And going one step further, one of your colleagues,
a guy by the name of Steve Sites.
I'm a professor at the University of Washington,
and I also work part-time at Google.
He told me that they see this technology as like a building block
that could one day be used.
to essentially virtually bring someone back from the dead.
I just think this technology combined with virtual reality and other innovations
could help me just be there in the room with Albert Einstein or Carl Sagan.
That's sort of the motivation.
That's what they want to do.
That's the motivation.
Talk to ghosts?
Well, for them, yes.
And when I was talking to some folks who work in commercials,
they're developing their own version of this.
And the idea is that they're going to make.
a million or a billion dollars off of this, because say you bring, I don't know, Jennifer
Aniston in to film some makeup commercial.
And in the makeup commercial, in English, she says, so come and buy this product.
This is the best sort of whatever product around.
Right now you've got China, which is a booming market.
You maybe want to market things to China, and you'd really like to be able to use
Jennifer Aniston.
Problem is, Jennifer Aniston doesn't speak Mandarin.
So either you use the same audio clip and you have someone come in and speak Mandarin over her and the lips don't line up,
or you have to hire a Mandarin speaking actor to come in and do the part of Jennifer Aniston.
With this technology, all you have to do is record Jennifer Aniston once.
You can hire a Mandarin speaker, and the Mandarin speaker's voice will be coming out of Jennifer Aniston's mouth as if she had said it in front of the camera.
Her lips would be moving as if she were a perfect Mandarin speaker.
Exactly, exactly.
Wow.
I think that part of it is actually incredible.
That's amazing.
Yeah.
Oh, my God.
I'm amazed and completely frightened by what you're telling me.
And that's the whole point of what Nick was writing about that gave me shivers.
That someday, if you join the video manipulation with the voco voice manipulation...
You're the ultimate puppet.
You can create anyone talking about anything that you want.
In their own voice.
And having any kind of emotion around it.
And you'd have it right there for everyone to see in video.
And all you need to do is take that and put it on Twitter or Facebook.
And if it's shocking enough, minutes later, it's everywhere.
Like the timing of you guys making this thing and then this explosion of fake news.
Like how do you guys think about it?
about how it could be used for nefarious purposes?
Yeah, it's a good question.
Again, you're a Kemmelmacher Schliserman.
I feel like when every technology is developed,
then there is this danger of with our technology,
you can create fake videos and so on.
Or I don't want to call it fake videos,
but to create video from audio, right?
But they are fake videos.
Yeah, yeah.
But the way that I think about it is that
like scientists are doing their job
in showing, like, inventing the technology.
and showing it off, and then we all need to think about the next steps, obviously.
I mean, people should work on that.
And the answer is not clear.
Maybe it's education.
Maybe it's every video should come up with some code now that this is, like, authentic video or authentic text, and you don't believe anything else.
I mean, yeah.
But, like, maybe it was the timing more than anything, but I saw this video, and it really felt like, oh, my God, like America can't handle this right.
now.
Like, we're in a moment where truth seems to be sort of an open, what is true has become
an open discussion.
And this seems to be adding fuel on the fire of sort of competing narratives in a way that
I find troubling.
And I'm just curious that you don't.
I think that people, if people know that such technology exists, then they will be more
skeptical, my guess. I don't know. But if people know that fake news exists, if they know that fake
texts exist, fake videos exist, fake photos exist, then everyone is more skeptical in what they read
and see. But like a man in North Carolina, I think he was from North Carolina, believed from a fake
print article that Hillary Clinton was running a sex ring out of a pizza parlor in D.C.,
which is like insane. This man believed it and showed up with a gun. And if people are at a moment
where they are willing to believe stories as ludicrous as that,
like I don't expect them to wonder if this video is real or not.
So what are you asking?
Well, I'm asking, do you, are you afraid of the power of this?
And if not, why?
I'm just giving my opinion.
I don't know.
I'm answering your questions, but I'm a technologist.
I'm a computer scientist.
So not really, because I know how to, I know that.
because I know that this technology is reversible.
I mean, nobody, well, there is not worried too much.
Have you seen these videos, otherwise I can text it?
Yeah.
Okay.
Yeah, I have.
And as we were feeling worried and more than that, surprised that the folks making these technologies weren't,
we decided to do a sort of gut check, see if we were totally off base
and get in touch with one of the guys who's on the front lines of this.
Can you describe what was going through your head when you were watching Bush's face?
I can tell you exactly what I was thinking.
I was thinking, how are we going to develop a forensic technique to detect this?
This is Hani Farid.
I am a professor of computer science at Dartmouth College.
He's sort of like a Sherlock Holmes of digital misdeeds,
which means that he spends a lot of time sitting around, looking at pictures and videos.
Trying to understand where has this come from, has it been manipulated, and should we trust it?
He's done work for all sorts of organizations.
AP, the Times, Reuters,
who want to know if, say, a picture is fake or not?
They often will ask me, particularly when, like, this just happened actually yesterday,
images came out of North Korea.
And every time images come out of these regimes where there's a history of photo manipulation,
there are real concerns about this.
So I was asked to determine if they've been manipulated in some way,
and if so, how had they been manipulated?
And how the heck would you do that?
Well, every time you manipulate data, you're going to leave something behind.
So let's say you do some funny business to a photo.
You might create some noticeable distortion in the picture itself,
but you also might distort the data.
And we're in the business of basically finding those distortions in the data.
For example, imagine he gets sent a photo.
It's probably a JPEG.
JPEG, which now is 99% of the image formats that we see out there,
is what is called a lossy compression scheme.
Just a fancy way to say that when a photo is taken and stored as a JPEG,
The camera, you know, just to save space, throws a little bit of the data away.
So, for example, if I went out to the Dartmouth green right now,
took a picture of the grass.
The camera isn't going to store all those millions of little variations of green hidden in the grass
because that would be just a huge file.
It's going to save space by throwing some of those greens away.
You just don't notice if it changes like a lot or a little bit less than that.
It's just grass as far as you can tell.
Now, here's Honey's trick.
Every camera has a subtly different palette of greens that's going to keep and greens that's going to throw away.
This varies tremendously from device to device.
An iPhone compresses the image much more.
So less greens.
And a high-end Nikon or a high-end cannon.
Which would keep more of those variations of green.
Now, if you hold these two pictures side by side, you might not be able to tell the difference.
But, Honey says, when you look at the underlying pixels, there are different recognizable patterns.
If you take an image off of your iPhone,
I should be able to go into that JPEG and look at the packaging and say,
ah, yes, this should have come out of an iPhone.
But if that image is uploaded to Facebook and then redownloaded
or put into Photoshop and resaved,
it will not look like JPEG consistent with an iPhone.
So basically he can see at the level of the pixels or data
whether the picture has been messed with in any way.
Huh.
And this is, of course, just one of many different ways that honey can spot a fake.
Yeah.
Let me ask, like, if you could go up against the top 100 best,
counterfeiters. Do you think you'd catch them 10% of the time, 50% of the time? Just out of
curiosity, what's your sense? I would say we could probably catch 75% of the fakes. But I would say
that would take a long time to do. This is not an easy task. And so, you know, the pace at which
the media moves does not lend itself to careful forensic analysis of images. I'm always
amazed that, you know, get these emails, you're like, all, you got 20 minutes. And you would need, you
know, half a day a day per image.
Oh.
Still a very manual and a very human process.
So is this video editing and this audio editing that's coming down the pipeline here?
Yeah.
I guess should I be terrified?
Yes, you should.
Oh, no.
Did you really mean that?
Yeah, I think it's going to raise the fake news thing to a whole new level.
I did see some artifacts, by the way, in the videos.
They are not perfect.
But that's neither here nor there because the ability of technology to manipulate and alter reality is growing at a breakneck speed.
And the ability to disseminate that information is phenomenal.
So I can't stop that, by the way, because at the end of the day, it's always going to be easier to create a fake than to detect a fake.
Thank you very much.
Jad himself just handed me a cup of water, which shows none of you have gotten too big for your britches.
And that could be a serious problem.
I would like to have seen Peter Jennings do that ever for this guy.
My name is John Klein, co-founder and CEO of TAP Media.
Before that, president of CNN, U.S.
Before that, I was executive vice president of CBS News,
where I was executive in charge of 60 minutes, 48 hours, and a bunch of other things.
And he's had to react to some serious evolutions in the media industry.
He was manning the helm as social media exploded,
as smartphones became ubiquitous,
And consequently, he had to deal with figuring out how and if to trust thousands of hours of video taken on these smartphones and sent in by viewers, what to broadcast and what not to.
And so we wanted to know how someone in his position would think about these fake videos.
So we sent him all of the different demos and videos we'd come across just to see what he thought.
First thought was that this is the kind of thing that a James Bond villain would put to use or the Joker in Batman.
or an eighth-grade girl who, right, wants to be most popular.
Yeah, exactly.
Yeah.
You know, I mean, this is, there's so many ways to abuse this.
Blows your mind.
I mean, it goes to the very core of communication, of any sort, whether it's television or radio or interpersonal.
Is what I'm seeing true?
what I'm hearing real.
Over the course of your career,
you've seen multiple technological developments
that have impacted the media
in rather profound ways.
Where is your terror level right now
or your fear level caused by this
relative to all of the other sort of advancements
that have occurred over your career?
It's terrifying.
And it hurtles us even faster
toward that point where no one believes anything.
How do you have a door?
democracy in a country where people can't trust anything that they see or read anymore.
What we saw happen with the fake news during the election cycle was that all the, it didn't even need to matter if anyone, you know, would rebuff it afterwards.
This is Nick Bilton again.
It would reach millions and millions of people in mere seconds.
And that was it.
It had done its job.
And I think that with this audio stuff and the video stuff that's going to come down online in the next few years, it's going to do the same thing.
But no one's going to know what's real and what's not.
I moved on her, actually.
You know, she was on Palm Beach.
I moved on her.
And I failed.
And what's more, Nick says, if you think about the video that came out of Donald Trump from Access Hollywood.
I'm automatically attracted to beautiful.
I just started kissing.
The thing that was really interesting about that video.
And when you're a star, they let you do it.
You can do anything.
Whatever you want.
Grab him by the .
You don't actually see Donald Trump
until the very last second
when he gets off the bus.
How are you?
Hi.
You only hear him.
Make me a soapster.
Have a little hug with Donald.
And so if that technology existed today,
I can guarantee you
that Donald Trump would have responded
by saying, oh, it's fake news,
it's fake audio, you can't see me,
I didn't say that.
And it would just be this video's word
against his.
Actually, that's kind of like,
for me, that's sort of the real part.
problem here. Like, you create this possibility for, like, plausible deniability. That's so broad. You know what I mean? It's like, you know, it's like the tobacco industry in the 60s and 70s. I was just reading this great article by the writer Tim Harford about this. In the 60s and 70s, the tobacco industry led this very calculated effort to sort of push back against cancer science by, you know, just injecting a little bit of doubt here, a little bit of. A little bit of.
bit of doubt there. Right, but on the other hand, this and on the other hand, that. And the idea
was to create just enough wiggle room that nothing happens. They do that with climate change too.
Exactly. And it's that little bit of doubt that creates paralysis. And is that what's going
to happen that like there's going to be paralysis now writ large? Because now we're talking about
the very things we see, the very things we hear. But wait, but don't you think that before we get
completely carried away with the threat of this technology?
Maybe we should just find out literally where we are now.
Yeah.
We should give it a spin.
Yeah.
Mm-hmm.
So at this moment, do you think making one of these clips is possible?
Yeah, I think it's entirely possible.
I would be careful what it is.
After the break, things get fake.
Howdy, everyone.
It's Angela, calling from Dallas, Texas.
Radio Lab is supported in part by the Alfred Peaceloom Foundation,
Enhancing public understanding of science and technology in the modern world.
More information about Sloan at www.sloan.org.
Thanks, Radio Lab.
Chad. Robert.
Radio Lab.
So we're back.
We're going to now fake something.
We're going to build our own video from scratch.
Fake words, fake faces.
Because we want to know, like, in use, how dangerous are these technologies really?
Can they make a convincing fake?
are they as easy as advertised?
So we will find out by giving the assignment, as always,
to our long-suffering, Simon Adler.
So while I was in Seattle talking to Duren Gleeves,
I not so subtly hinted that I would really like to give vocal a whirl.
Let's say I had my hands on it somehow.
What can I do with it?
Well, right now nothing, because we haven't shared it with anybody.
At first, I just thought he didn't want me to be able to play around with it,
but then I realized.
And I don't even have a personal copy for myself yet.
Oh, so it's not even on the premises here?
No, it's still very much contained to research.
Oh.
But...
Hi, yeah, hi, you there.
Hey, Matt.
Oh, yeah, I'm here.
Great.
Eventually, I got in touch with this guy.
So I'm Dr. Matthew Ayelet.
I'm the chief science officer at Cereproc Limited.
Which is a vocal synthesis research company based in Edinburgh.
Yeah.
Okay, so I called you up because I was...
is hoping that you could help me to make a video clip that has, I don't know, like George
Bush or Barack Obama saying things that they have never said.
Yeah, that sounds great.
That's it?
He was just game?
Yeah.
Now, see, the thing is, what his company does is not quite the same as voco.
What they do is, like, for a client, they'll create a voice that you can then just type in
words or sentences and make that voice say whatever you want it to say.
I feel sad.
That's an interesting.
They've created voices with a variety of accents.
Great roots at Blossamer.
In a variety of languages.
And in their spare time when they're not making voices for clients,
they're building celebrity voices.
And it just so happens they've got a Barack Obama and a George Bush bot.
Yeah.
How did you create a George Bush robot?
Well, a great thing about George Bush is that he was president of the United States for some time.
Good morning.
Good morning.
Which means he had to give.
The weekly presidential address.
A week ago today, I received a great honor.
And the other great thing about the address is it's completely copyright free.
So we're allowed to do anything we like with that audio.
For the people of America.
Maybe things that they haven't envisaged that we're going to do with it.
Real quick digression here, just because it's absolutely fascinating.
It looks like we're actually about to enter this really sticky gray area when it comes to voice ownership.
For example, an audiobook.
So if you record an audiobook and you've signed over a little,
the rights to those audio files, to the publisher.
The publisher has the copyright.
You don't own it. You do not own
your own voice. Is that really true?
Yeah.
Anyway, back to Bush.
So I took all those weekly addresses.
About six hours worth, which is a lot more taped than Voco's
20 minutes, but what he did with it is
pretty similar. Right. He fed them into this
machine learning algorithm along with their
transcripts, and then what the
program will do...
It will take the text, and it will analyze it
in terms of the linguistics.
this is the word social security.
Social.
The word social is made out of the sounds
S-O-S-O-Sch-L, right?
Mm-hmm.
And so we'll cut those sounds up
into lots of little tiny pieces.
And it did that for all of the words
in all of these addresses.
Around 80,000 in total.
Put them all in this database
with tons of info about what sound came before it,
after, et cetera.
And...
Once that database is built,
all that's left to do?
I type in some text and then I push go and try and find a set of little sounds which will
join together really nicely.
And then I push play and see how well they came out.
So what we did was we found an old video of former presidents George Bush and Barack Obama together.
They're shaking hands, making generic statements.
The exact clip isn't important.
But we wondered, could we turn that clip from a boring,
meet and greet to a scenario where Bush is telling Obama a joke. So we convinced a comedy writer
Rachel Axler, who works for the show Veep, to write his few jokes and sent it off to Matt,
and this is what the computer spat out. And well, it goes something like, knock, knock, who's there?
Oval. Oval, who? Oval. I think it's something about the Oval Office, probably.
That was a very good joke, Mr. President. My wife Flore tells it better.
What the hell is that?
Wait, what?
That was terrible.
That was not, technically that was like...
I don't even get...
A, I don't understand that joke at all.
And that's literally what the computer spat out?
That is what the computer spat out.
And truth be told, I don't think it's anywhere...
It is not worthy of the negative response that you are giving it here.
That's terrible.
Let me show you another one.
So happy to be joining forces with this good man.
To put cortisol in your drinking water.
What?
Corpacol.
It's a...
help protect people's teeth so they don't get fillings.
Isn't that fluoride?
Oh, shoot. I think I signed the wrong bill.
That's a legit.
Pretty good. That's a legitimate. No, the robots are terrible.
I couldn't hear cortisol. But the joke is funny. I like the joke.
But the robots just massacred that joke, which is in itself kind of a joke.
Well, I do think, yeah, let me get into it. Well, I think that you two are far more critical
than you should be and you are far more critical than the average listener. However, Matt,
you're so wrong about that. But anyway.
Matt did tell me that conversations, getting people to talk back and forth to each other are still really difficult for a synthesizer to do.
So, you know, conversational stuff is always difficult.
And in fact, it's going to be a long time before we get really, really easy conversational synthesis.
There's all sorts of barriers to that.
There's a human quality to a conversation that the synthesizers can't quite capture yet.
But he also told us that, you know, once we add the video, or if we add a video to this,
it will smooth out a lot of the problems.
When you have the faces as well speaking,
people are not focusing on the audio,
and you can't hear the errors in the same way.
So...
Hello?
Hey, is this Kyle.
Oh, yeah.
Great, great.
I found these two grad students.
My name is Shenzke Saito.
My name's Kyle O'Shevsky.
From the University of Southern California,
USC.
They also do a lot of facial reenactment research
and agreed to help us.
But making these visuals also turned out to be way
harder than we thought. Turned out the clip we chose posed some serious challenges. There were too many
side shots of Obama's face. The lighting was all wrong. And eventually, I got an email one late
Sunday night saying, it's not going to work. Okay. So now I think I can draw a line here,
and I can point out that we maybe got over-excited about this technology. It is not yet ready
for true deceit. You have been fumbling and fumbling and fumbling here.
I have not been fumbling.
I am not the running back here.
I find it interesting psychologically that Simon feels like it's a personal failure.
I don't like to fail.
You should.
This is failure.
So, okay, just on Simon's behalf, on the behalf of actually trying to answer the question,
we felt like, okay, maybe we should try this one last time.
Let's find a simpler Obama video and with the audio rather than like whole phrases.
Let's just do a couple of word replacements here or there.
By the way, the only reason we're using Obama is that he seems to be the guy, all these technologies are built around.
In any case, we chose the video of Obama's last weekly address, and we chose the audio from a talk he'd given in Chicago after he'd left off.
So, uh, what's been going on while I've been going?
In this speech, he sort of talks about what he's going to do next, how he's still going to keep fighting for what he believes is right.
Filled with idealism and absolutely certain that somehow I was going to change the world.
But we thought, what if in an alternate reality he didn't want to keep fighting?
What if he could at that moment see the divisions ahead?
And he was just like, that's too much, I give up.
Now, truth is, we didn't think too hard about this because we didn't have much time.
We just whipped it together, did a script based on words Obama used with a few changes.
Send it off to the guys at USC.
And I videotaped myself saying this new script so that we could use that video of my face to puppetize the former president.
and when we got the final video back
I have to say it was
I was expecting it to be horrible
and we were to have a good laugh
but it went from like laughy giggly
to oh
wait this is creepy
I was suddenly
I had been gangbusters
we got to release this thing
and not tell anybody and try to fake out the entire world
but when I saw it there was a
reluctancy
You mean you went, oh no?
I went, oh, God.
Yeah, yeah, I thought, oh, this, this.
You know, my personal thought was like it was convincing enough that I got genuinely spooked.
But, you know, just in fairness, we shouldn't sit around talking about something people can't see.
Go to futureoffakenews.com and check it out for yourself.
It's all one word, futureoffakenews.com.
And it'll pop right up.
You can see.
Tell us what you think.
You can see how Simon made the video.
Check it out.
anyhow, the whole process
got us all thinking
like, oh, wow, if we
a bunch of idiots can do this
for no money
very, very quickly,
what will this mean
to like a newsroom, for example,
just to start there?
We're at the level now
with this kind of thing
where we need
technologists
to verify or knock down.
Again, news executive
John Klein.
I don't think journalists
English majors are going to be the ones to solve this.
You may have been editor of your school paper,
but this is beyond your capability.
But if you're good at collaborating with engineers and scientists,
you'll have a good chance of working together to figure it out.
So we need technical expertise more than we ever have.
Can I ask you, in your heart, let me compare your heart to my heart for a second.
In my heart, I want somebody to tell the researchers, yes, sorry, you can't do that.
Sorry, you know, I know it's really cool.
I know you probably are really proud of that algorithm, but some men in black are going to walk in right now,
and they're going to take your computers away, and you just can't.
Sorry, society is going to overrule you right now.
Is there a part of you that just dictatorially wants to just squash this?
Well, sure.
But wouldn't you still have the, what are they, the FSB in Moscow or the CIA utilizing this and developing it anyway, weaponizing it, so to speak?
Probably.
I think that the top down model could never contain that.
John says ultimately what's happening is probably going to be bigger than any one organization.
or anyone newsroom can solve.
He said it'll probably end up coming down to the 14 and 15-year-olds of tomorrow
who will grow up using this technology, making fake videos,
being the victims of fake videos,
and that maybe in the maze of them having to parse truth from fiction
in such a personal way, some kind of code will develop.
I'm an optimist by nature.
I look at this and I say, well, somebody's going to figure it out.
What worries me is the larger context within which this takes place.
This is all occurring within a context of massive news illiteracy.
And the consumers seem to be just throwing their hands up and tiring of trying to even figure it out.
And so just the work involved in getting to the bottom of the truth is unappealing to a growing
percentage of the audience. And I'm not sure where Gen Z, the teenagers of today, come out on this,
let's hope that they are more willing to do the work, maybe out of self-interest, maybe so that they're
not dissed by the girl in social studies. But that's our best hope for overcoming it, because
everybody else seems to be sick of trying.
Reporter Simon Adler, this piece was produced by
Simon and Annie McKeown.
Very special thanks to Kyle O'Shefsky and the entire team at USC's Institute for Creative Technology
for all their work manipulating that video of President Obama.
And thanks to Matthew A Lett for synthesizing so, so many words for us.
Rachel Axler for writing us the jokes that we tried to use.
Sohum Pawar for building us an amazing website, Angus Neal, Amy Parle, everybody in the WNYC
newsroom for advising.
us and giving us reaction shots to the face-to-face video.
And to David Carroll for putting us in touch with Nick Bilton in the first place.
And to Nick Bilton for inspiring this whole story with his article.
He's got a new one, a book, actually, American Kingpin about the founder of a black market
website called The Silk Road.
And to Super Sorin-Sewedinacorn, computer scientist who works in ERA's Lab, who helped us understand
what the heck was going on.
And finally, you can see the video that we created, as well as a bunch of other kind of
crazy clips that we mentioned throughout this episode.
It's at futureoffakenews.com.
It's all one word.
Futureoffakenews.com.
And with that,
my real co-host and I will bid you adieu.
I'm Chad Aboumrod.
I'm Robert Crulwich.
That's who we really are.
I'm glad we could finally be honest about that.
Yeah, all these years.
Message to news from an external number.
This is John Klein, calling from the Frontiers of Media.
My name is Dr. Matthew Ayelis, and I am the chief science officer of Cereph.
I am Hani Farree, Professor of Computer Science at Dartmouth College.
Radio Lab was created by Jed Aubimrod.
Chad Aberrod.
It is produced by Soren Wheeler.
Gilling Keith is our director of sound design.
Our starting suit, Simon Adler, David Gabel, Tracy Hunt, Max Kielty, Robert Coleridge,
Annie McEwen, Lateef Nassiz, Melissa McDonnell, Ariane Wack, and Molly Webster.
With help from Soham Poir, Rebecca Chesson, Rebecca Cheson, Nigel Batali, Citi Wang, and Katie Ferguson.
Our fact checker, Michelle Harris.
End of message. To hear the message again, press 2. To delete it, press 7.6.
