Your Undivided Attention - Is It AI? One Tool to Tell What’s Real with Truemedia.org CEO Oren Etzioni
Episode Date: October 10, 2024Social media disinformation did enormous damage to our shared idea of reality. Now, the rise of generative AI has unleashed a flood of high-quality synthetic media into the digital ecosystem. As a res...ult, it's more difficult than ever to tell what’s real and what’s not, a problem with profound implications for the health of our society and democracy. So how do we fix this critical issue?As it turns out, there’s a whole ecosystem of folks to answer that question. One is computer scientist Oren Etzioni, the CEO of TrueMedia.org, a free, non-partisan, non-profit tool that is able to detect AI generated content with a high degree of accuracy. Oren joins the show this week to talk about the problem of deepfakes and disinformation and what he sees as the best solutions.Your Undivided Attention is produced by the Center for Humane Technology. Follow us on Twitter: @HumaneTech_ RECOMMENDED MEDIATrueMedia.orgFurther reading on the deepfaked image of an explosion near the PentagonFurther reading on the deepfaked robocall pretending to be President Biden Further reading on the election deepfake in Slovakia Further reading on the President Obama lip-syncing deepfake from 2017 One of several deepfake quizzes from the New York Times, test yourself! The Partnership on AI C2PAWitness.org Truepic RECOMMENDED YUA EPISODES‘We Have to Get It Right’: Gary Marcus On Untamed AITaylor Swift is Not Alone: The Deepfake Nightmare Sweeping the InternetSynthetic Humanity: AI & What’s At Stake CLARIFICATION: Oren said that the largest social media platforms “don’t see a responsibility to let the public know this was manipulated by AI.” Meta has made a public commitment to flagging AI-generated or -manipulated content. Whereas other platforms like TikTok and Snapchat rely on users to flag.
Transcript
Discussion (0)
Hey everyone, this is Tristan.
We have a special episode for you today with computer scientist Oren Etzioni
to talk about a new tool to detect AI-generated content.
As AI technology is getting better,
the ability to tell reality from unreality will only grow.
Already deepfakes are being used to scam people, extort them, influence elections,
and there are lots of sites out there that claim to be able to detect
if a piece of content was created using AI.
But if you've used these sites,
you know that they're unreliable, to say the least.
There are also folks out there
who are working to build better tools,
people who understand the science of artificial intelligence,
and want to see a future where we can actually know what's real on the internet.
And one of those folks is Orin Etzioni,
who's the founding CEO of the Allen Institute for Artificial Intelligence,
and his non-profit True Media.org
has just launched their state-of-the-art AI detection tool.
And I'll just say that here at the Center for Humane Technology,
We think it's critical not to just point out problems, but highlight the important work that's being done to help address these problems.
So I'm super excited to have Oren on the show today to talk about that work.
Oren, welcome to your undivided attention.
Thank you, Jusanne.
It's a real pleasure to be here.
And as I'll explain in a minute, it's particularly meaningful to have this conversation with you personally.
Well, let's get right into that, because I believe you and I first met actually at the meeting in July 23 with President Biden about,
artificial intelligence. Could you tell us that story?
With pleasure, I suddenly got an email and there was an invitation to join President Biden,
Governor Newsom, some key members of his staff in a small meeting in San Francisco.
And the idea was for a few of us to get together and share with him our thoughts and ideas about AI,
to give him a sense of what is most important.
And I'm probably one of the more optimistic AI folks to just
you would have on the podcast.
So I came in to this small meeting, all kind of, I wouldn't say guns blazing, but all
bright-eyed.
And we had a really interesting conversation.
A number of us spoke about moonshot projects, but also about concerns.
The thing that is amazing is I came out of the meeting, particularly worried about the
scenario that you brought up.
The concern that you highlighted was the potential of deepfakes to suddenly and sharply affect our society, whether it's economically, where there was a sudden drop in the markets with the picture of the Pentagon being bombed, which you highlighted in other potential economic scenarios.
And in the context of meeting with the president, I naturally thought of the political scenarios
where we've seen things like the fake robocall from President Biden in the New Hampshire primary.
We've seen an example in Slovakia that was released two days before the election.
And I became obsessed with the concern.
What happens if in the 48 hours, 24 hours before the election, somebody releases a fake that,
will tip the scale, particularly in our highly contested,
very narrowly divided electorate and election that we're having in November.
Let's actually just talk about the tool that you are launching
and what the process would actually like to build it,
because you're a nonprofit and you had to raise independent money to do this,
which is kind of one of the interesting things about AI,
there's trillions going into increasing how powerful AI is
and all the things that it can do,
but there's not trillions going into making it safe.
That is very, very true.
I think our investment in AI is really quite unbalanced.
And I came out of our meeting, as I mentioned, with this concern in my head.
And I said, OK, let's see what the available tools can do.
And the first thing I find out is there basically aren't any.
So my first realization was that there's a huge gap in the market.
I was very fortunate to be able to meet with Garrett Camp, the co-founder of Uber,
who funded us out of Camp.org, his philanthropic organization.
And then we set out to build this tool to make it available to media organizations,
to fact-checkers, but ultimately to the general public,
to any concerned citizen, to enable you to do a very simple thing.
Take a social media URL from Facebook, from X, what have you, TikTok.
paste it into our search box and say analyze,
assess whether it contains something that's fake,
that's been manipulated by AI, or that's real.
We have this tool.
It's really available at truemedia.org.
Welcome everybody to check it out.
How actually does this work, Oran?
What is the way that you train a model to detect a deep fake?
I think there are two really important points to explain.
The first one is just the mechanics.
And I want to highlight that interacting with our tool is super simple.
And you can do it either by taking an image that's on your computer or video, audio, and just uploading it to our site.
And within a minute or so, you'll get our assessment.
Or you can just take a social media URL from TikTok, from Facebook, Instagram, all these places.
even we just, true social, Trump's network, which has seen as share of fakes.
You just paste in the URL into our search box on our homepage at tromedia.org.
You hit analyze and you'll get the results back.
So the user interaction is very simple.
But now let's go under the hood and talk about what happens when you do that.
So what happens when you do that is, first of all, conceptually two things.
First of all, we extract the media.
then we send it to a whole bunch of vendors.
We just send it to them and say,
what do you think, reality defender, Hithe?
So these are existing deep fake detectors,
and you're trying to get a kind of a mixture of experts,
a mixture of detectors synthesis?
Exactly, exactly.
And we want to be as comprehensive as possible in doing that.
While they're doing the analysis,
we also have our own models that look at various technical characteristics.
For example, the distribution of visual nodes,
or areas of blurriness, all kinds of telltale signs that we've developed that assess this.
I'll tell you one very cool idea, just to give you a sense of how deep this goes.
And by the way, we don't just analyze the signal.
We analyze the semantics.
We get transcripts of the videos and assess them.
We look all over the web using something called reverse search to say, has this image occurred elsewhere before?
Is it a modified version of something that we've seen?
So we use a lot of tricks under the hood because there's no silver bullet.
We use every trick that we can find to do that.
But I want to share with you something that just gives you a sense of how involved this can get.
So there's a technology called lip syncing, which is where you take a video of a person.
Then you lay down a different audio track.
So now they're saying things they didn't actually say.
And that's really weird, right, because their lips aren't aligned, right?
That's like very bad dubbing in the old days.
But now with lip syncing technology, they can actually modify the lips so that it looks
like the person, there's a famous example of this with Obama a few years back, it's gotten
much better since then.
It looks like the person is actually saying what you're making them say, right?
This is terrible.
It turns out that there are subtle discrepancies between the audio track and the video track.
So actually, Hani Farid and one of his students, right, he's a Berkeley professor, a major authority in the field of forensics, had the idea, well, what if we analyze the transcript, right? We record the audio and transcribe it. And then we use lip reading software to analyze what the lips are saying. Now, because of these discrepancies, what you see visually with the lip reading and what you hear in the transcript is going to be quite different. And when that's the case, that's
a hint that this is fake.
So it shows you the creativity and the great lengths
that we can go to to try and find this smoking gun
that tells you, aha, this is a fake.
Could you talk a little bit about the performance
or the sort of accuracy of the existing systems
and why the system might be more accurate?
Sure.
So first let's clarify what we're talking about very specifically,
And that's images, videos, and audio.
We don't deal with factual questions, which can be subject to interpretation, or with text,
which can definitely be faked.
That's a whole other arena.
But in these three things, there are not really tools available where you can do this.
And actually, we've got it even further and put a bot on X, right, where a lot of this stuff is rampant,
where you can just tag at True MediaBot, and it'll take.
what's in the thread, the media that's in the threat,
analyze it and post its analysis in response.
So we are democratizing the ability of anyone
and everyone to use this technology.
But now to go to your question about quality,
it's extremely important.
So as an academic, I started looking at different models
and different vendors and assessing the tools.
I very quickly determined that there's no silver bullet here.
There's some very strong claims made by different people.
And there are also some very high quality technologies.
We have a number of partners,
including Pindrop on the audio side,
Reality Defender, Hive, Sensity in Europe,
a number of organizations that do a good job
in doing the analysis.
The first thing we did is we said,
when we get a query, why don't we send it to all of them
in parallel to hit their APIs and collect the responses
and form an analysis?
So we did that.
And naturally, when you can consult all the experts,
Simultaneously, you tend to get a better result.
We then went further and used open source models,
ones from academia, ones that we've developed ourselves,
and we fine-tuned it on the data that we find in social media.
But the bottom line is we sit at comfortably above 90% accuracy,
which is very good, but also, you know, full disclosure, error prone, right?
That means if you do 100 queries, 10 of them,
we can make mistakes on, and we do various things that I could talk about in the user interface
to address that. So you don't get the wrong impression of what our assessment says.
So obviously there's kind of a cat and mouse game because people remember famously that when you
generated images of people in the last few years, what it was bad at was getting the hands right.
So if you looked closely, there was always more fingers on the hands than they wouldn't be there
on a normal human being. And that's a signal that a human being can detect. But as AI gets better and
better, those signals that are visible to human beings go away. And instead you have to look for
more of these invisible signals. I'm just curious. Was there anything surprising about, you know,
what you discovered about the signals that a machine can pick up that a human eye or human ear
cannot? Yes, we found that the technology advanced to a very key point. And in fact, now people
can no longer tell. Actually, a lot of people think, oh, whether I can see the hands or not,
I can squint and glance, and I can tell.
So we launched a quiz taking social media items only, political deep fakes.
They've been posted on social media.
And we found that people typically cannot tell.
The New York Times did multiple quizzes, a very recent one with videos, previous one with faces.
When you take these quizzes, you are quickly humbled.
You cannot tell.
So the fact of the matter is, even in the current state of technology, and as you pointed out,
it keeps getting better, people are diluting themselves.
if they think they can tell.
Yeah, and I think it's just so important people to remember
that I remember the days when I would see deepfix
and, you know, it causes alarm when you see where it goes,
but you would always say,
but at the end of the day, I can still tell
that this is still generated by a computer.
And I think in many areas of AI,
whether it's, you know, AI capabilities in biology
and chemistry and math and science
and, you know, generating fake media,
we look at the capabilities today
and we say, oh, but see it fails here, here,
and then we say, so see, there's nothing really to be worried about.
But if you look at the speed at which this is going,
we don't want to have guardrails after the capabilities are so scaled up.
We want to really get those guardrails in now.
So I'm so grateful that you've been doing this work.
I thought for a moment what we do is just set the table a little bit
because there's a whole ecosystem of players in this space.
And there's different terms people throw around.
Watermarking media, the provenance of media, disclosure, direct disclosure,
indirect disclosure versus detection of things.
Could you just sort of give us a little lay of the land
of the different approaches that are in this space.
I think President Biden's executive order
called for watermarking of media.
So all these terms like provenance
and watermarking and others
refer to technologies that attempt
to stamp, to track the origin of a media item,
I'll just use image for simplicity,
and to track changes to it,
and to give you that information up front.
That's very important technology,
and it only has one thing,
major Achilles heel, which is currently, it's completely impractical. And it's impractical for two
key reasons. The first one is, turns out that these watermarks and there's visible ones,
invisible ones are relatively easy to remove. The second one is, even if somehow we were able to
successfully insist on watermarks on all AI-generated media. And that's, as you point out, a big
if, it makes no difference unless the app that you're using to consume the media looks for it,
right? So if it's your browser, if it's your Facebook app, if it's your TikTok app, if it doesn't
insist on detecting it, then it doesn't matter, right? Because nobody is going to go through
a bunch of machinations, not nobody, but most people just consume what they're given. They're not
going to turn various things on. So unless the app that you're using as a consumer reflects the
watermark, it doesn't do anything whether it's there or not. And for that reason, right, if you
remove it, nobody would even notice. And if it's there, nobody would notice either. We have to
reach the point, for this to be practical, we have to reach the point where the way that we consume
media tells us always, right, whether this is fake or real. Shouldn't it be the case that Facebook
and Twitter and even dating apps like Tinder
sort of implement this deep fake detection
directly into the way that all these products work.
Like wouldn't, shouldn't they just implement true media
embedded as checking all the media that's running through it?
Absolutely.
If you want assurance that what you see is real,
that content that you record isn't stolen, basically, in various ways,
then we need to do something like that.
I did have conversations with the major providers,
as you'd call them, social media networks.
Generally speaking, I wouldn't say that they're rushing to do this.
They don't have the economic incentive.
And that's why legislation is appropriate.
Of course, there are free amendment issues and so on.
And so starting with certain cases, child pornography is actually one that we've done better on than others.
And certainly politics is a key one.
You can't just have political ads.
We're such visual creatures.
All kinds of political ads or social media posts that appeal to people's baser instincts that confuse people, all that without any kind of appropriate analysis and enforcement.
Oftentimes in our work, we repeat the line from Charlie Munger, who is Warren Buffett's business partner.
If you show me the incentive, I will show you the outcome.
And as you said, the companies don't have an incentive to implement this, especially if it costs the money.
And that's one of the questions I wanted to ask you is I'm assuming one of the reasons that they don't just implement deep fake detection is that involves running more servers, more compute, you know, every piece of content that gets uploaded, cost them some server costs to like process the image, process the treat, process the TikTok video, put it on to a server.
And what this would involve is doing an extra check that would cost them some extra money.
So what is this cost that's getting in the way of that?
There is a cost here, but really given the tremendous abilities of these organizations of these organizations.
organizations, the cost is very small. I think the concern is of a different sort. And you're
absolutely right, there are disincentives to do this, but it isn't so much that cost. So what are the
disincentives? First of all, the stuff, kind of the worst stuff in some sense plays very well.
As you've documented, right, the algorithms spread that stuff. People click on it. So the first and foremost
incentive is revenue, not cost. This stuff makes money. And we have a kind of tragedy of the
the comments here. They don't see that it's their responsibility to make sure that we have
access to the truth or even information about, right? Something could be true, but still
manipulated, right? It doesn't mean that it's wrong, but they don't see a responsibility to let the
public know this was manipulated by AI. Yeah, well, I can also see how with the 10% failure rate,
if you were to have the tech platforms like Facebook or TikTok, you know, be forced to
implement this deep fake detection, and they got it wrong 10% of the time that would leave them
victim to all sorts of rightful attacks, that they're dismissing content that's actually real.
I think the concern about accuracy is a very fair concern, but I also think that having information,
particularly educated information, is always a good thing. So I would not suggest that anybody
toss anything out, take down requests, and so on. That's where we get into First Amendment
issues. What I would suggest is adding a label that this seems to be automatically synthesized.
This may be suspicious according to an assessment created by TrueMedia.org, which is, of course,
a nonprofit, nonpartisan body, or by some other one. And I do think that we can drive up the
accuracy above 90%, but it will always make mistakes, which is why I actually think that the most
important thing is to when something may be suspicious and it may be due to its origin and
maybe due to our analysis or a different tool, you just need to take an extra minute before
you forward it to 100,000 followers or they forward it and it spread so virally. And the biggest
thing we can do as this technology plays out is to take a moment and say, wait a minute,
am I sure this is true? Where did this actually come from?
You're highlighting a really important point that I really want listeners to get,
which is something in our work we call perverse asymmetries.
You can create a deep fake 100% of the time.
You cannot accurately detect a deep fake 100% of the time.
And to get it to 100% takes years and years and years of research
that now you or have signed up for your organization and nonprofit having to do
all this work to get to closer and closer and closer.
So wherever there's these asymmetries, we should be investing in the defensive first
rather than just proliferating
all these new AI offensive capabilities into society.
So, Orrin, I think what we really want to do in this podcast
is paint a picture of how we get to the ideal future.
And you're working hard on building one tool.
But as you said, it's not a silver bullet
and it's just one tool of a whole ecosystem of solutions.
Do you have kind of an integrated idea
of the suite or ecosystem of things
that you would love to see
if we treated this problem as kind of the Y2K for reality breaking down?
I think ecosystem is exactly the right phrase here
because I think that we all have a role to play.
I think regulators, as we just saw in California
and hopefully we'll become federal or in many states follow California
and then when you have a patchwork of regulations at the state level,
sometimes it's elevated to a federal and even international regulation.
So I think that's an important.
component. It needs to be done right. There's a balance here of the burden on the companies
on protecting free speech rates, but at the same time creating an appropriate world for us,
particularly in these extreme harm cases like politics, non-consensual pornography, etc.
So I think that's incredibly important. Once you have those, you need the tools to then detect
that something is fake, whether it's watermarking, post hoc detection like we do it formedia.org
or a combination of both, you can't have regulations on the books without enforcement.
So I think regulation and enforcement go hand in hand.
I really hope that in that context, the social media companies will step up and realize
that we're encountering what I think of is really an Achilles heel for democracy,
the combination of AI and social media in a way that can disrupt elections.
And I really believe that they can step up more.
And then the last, and in some ways, the most important thing is what we're doing today in this conversation, raising awareness, increasing media literacy, making sure that everybody exercises common sense that is appropriately, everybody is appropriately skeptical about if you see something, are you sure that it's real?
And even today, we do have organizations like Reuters, AP News, others who have extensive fact-checking organizations.
And sometimes we just need to take a little bit of time to make sure that what we're seeing and having an emotional response to really is real.
If we work together across all these different elements of the ecosystem, I do think that things will improve.
And I am worried that it's going to get worse before it gets better.
Well, with that somber note, that was a good note to end on.
Thank you, Oren, for coming on your undivided attention.
Thank you, Dresan.
Thank you for inspiring me to engage in this work.
I'm both inspired, terrified, but also pleased that we are giving it our best shot in 2024.
So before we go, I just want to say this is obviously a massive problem.
And it's going to take a whole suite an ecosystem of solutions and new tools,
from things like proof of humanity verification
to requiring that we have whole new cameras
that have cryptographic signatures embedded within every photo that we take.
And these things would require some new laws that are being proposed right now.
And I want to give a shout out to the huge ecosystem of people
who've been hard at work on this problem for a very long time.
From the partnership on AI and the ontology of different sort of issues in the space
to the Coalition for Content Providence and Authenticity or C2PA,
which is a collection of companies that were working on these issues
for a while. Nonprofits like witness.org and companies like TruePick. I want to make sure people
go and check out their work because we need all of these initiatives to be successful. And one last
thing, please don't forget to send us your questions. You can email us at undivided at humaneck.com
or tape a voice memo on your phone and then send it to us. One of the weird things about recording
this podcast is our little team sits on a Zoom call and we do these episodes and we can't feel
the millions of you that are out there
for whom you know
you want to go deeper into these topics you have questions
you enjoy certain episodes you don't like certain
episodes and we really do want to hear from you
so please send us your feedback
and your questions and we can incorporate
them into a future ask us anything
your undivided attention
is produced by the Center for Humane Technology
a non-profit working to catalyze
a humane future
our senior producer is Julia Scott
Josh Lash is our researcher and producer
and our executive producer is Sasha Fegan.
Mixing on this episode by Jeff Sudakein,
original music by Ryan and Hayes Holiday.
And a special thanks to the whole Center for Humane Technology team
for making this podcast possible.
You can find show notes, transcripts,
and much more at HumaneTech.com.
And if you like the podcast,
we'd be grateful if you could rate it on Apple Podcast
because it helps other people find the show.
And if you made it all the way here,
let me give one more thank you to you
for giving us your undivided attention.
Thank you.
