Your Undivided Attention - Is It AI? One Tool to Tell What’s Real with Truemedia.org CEO Oren Etzioni

Starting point is 00:00:00 Hey everyone, this is Tristan. We have a special episode for you today with computer scientist Oren Etzioni to talk about a new tool to detect AI-generated content. As AI technology is getting better, the ability to tell reality from unreality will only grow. Already deepfakes are being used to scam people, extort them, influence elections, and there are lots of sites out there that claim to be able to detect if a piece of content was created using AI.

Starting point is 00:00:32 But if you've used these sites, you know that they're unreliable, to say the least. There are also folks out there who are working to build better tools, people who understand the science of artificial intelligence, and want to see a future where we can actually know what's real on the internet. And one of those folks is Orin Etzioni, who's the founding CEO of the Allen Institute for Artificial Intelligence,

Starting point is 00:00:52 and his non-profit True Media.org has just launched their state-of-the-art AI detection tool. And I'll just say that here at the Center for Humane Technology, We think it's critical not to just point out problems, but highlight the important work that's being done to help address these problems. So I'm super excited to have Oren on the show today to talk about that work. Oren, welcome to your undivided attention. Thank you, Jusanne. It's a real pleasure to be here.

Starting point is 00:01:15 And as I'll explain in a minute, it's particularly meaningful to have this conversation with you personally. Well, let's get right into that, because I believe you and I first met actually at the meeting in July 23 with President Biden about, artificial intelligence. Could you tell us that story? With pleasure, I suddenly got an email and there was an invitation to join President Biden, Governor Newsom, some key members of his staff in a small meeting in San Francisco. And the idea was for a few of us to get together and share with him our thoughts and ideas about AI, to give him a sense of what is most important. And I'm probably one of the more optimistic AI folks to just

Starting point is 00:01:59 you would have on the podcast. So I came in to this small meeting, all kind of, I wouldn't say guns blazing, but all bright-eyed. And we had a really interesting conversation. A number of us spoke about moonshot projects, but also about concerns. The thing that is amazing is I came out of the meeting, particularly worried about the scenario that you brought up. The concern that you highlighted was the potential of deepfakes to suddenly and sharply affect our society, whether it's economically, where there was a sudden drop in the markets with the picture of the Pentagon being bombed, which you highlighted in other potential economic scenarios.

Starting point is 00:02:48 And in the context of meeting with the president, I naturally thought of the political scenarios where we've seen things like the fake robocall from President Biden in the New Hampshire primary. We've seen an example in Slovakia that was released two days before the election. And I became obsessed with the concern. What happens if in the 48 hours, 24 hours before the election, somebody releases a fake that, will tip the scale, particularly in our highly contested, very narrowly divided electorate and election that we're having in November. Let's actually just talk about the tool that you are launching

Starting point is 00:03:31 and what the process would actually like to build it, because you're a nonprofit and you had to raise independent money to do this, which is kind of one of the interesting things about AI, there's trillions going into increasing how powerful AI is and all the things that it can do, but there's not trillions going into making it safe. That is very, very true. I think our investment in AI is really quite unbalanced.

Starting point is 00:03:56 And I came out of our meeting, as I mentioned, with this concern in my head. And I said, OK, let's see what the available tools can do. And the first thing I find out is there basically aren't any. So my first realization was that there's a huge gap in the market. I was very fortunate to be able to meet with Garrett Camp, the co-founder of Uber, who funded us out of Camp.org, his philanthropic organization. And then we set out to build this tool to make it available to media organizations, to fact-checkers, but ultimately to the general public,

Starting point is 00:04:34 to any concerned citizen, to enable you to do a very simple thing. Take a social media URL from Facebook, from X, what have you, TikTok. paste it into our search box and say analyze, assess whether it contains something that's fake, that's been manipulated by AI, or that's real. We have this tool. It's really available at truemedia.org. Welcome everybody to check it out.

Starting point is 00:05:00 How actually does this work, Oran? What is the way that you train a model to detect a deep fake? I think there are two really important points to explain. The first one is just the mechanics. And I want to highlight that interacting with our tool is super simple. And you can do it either by taking an image that's on your computer or video, audio, and just uploading it to our site. And within a minute or so, you'll get our assessment. Or you can just take a social media URL from TikTok, from Facebook, Instagram, all these places.

Starting point is 00:05:40 even we just, true social, Trump's network, which has seen as share of fakes. You just paste in the URL into our search box on our homepage at tromedia.org. You hit analyze and you'll get the results back. So the user interaction is very simple. But now let's go under the hood and talk about what happens when you do that. So what happens when you do that is, first of all, conceptually two things. First of all, we extract the media. then we send it to a whole bunch of vendors.

Starting point is 00:06:13 We just send it to them and say, what do you think, reality defender, Hithe? So these are existing deep fake detectors, and you're trying to get a kind of a mixture of experts, a mixture of detectors synthesis? Exactly, exactly. And we want to be as comprehensive as possible in doing that. While they're doing the analysis,

Starting point is 00:06:31 we also have our own models that look at various technical characteristics. For example, the distribution of visual nodes, or areas of blurriness, all kinds of telltale signs that we've developed that assess this. I'll tell you one very cool idea, just to give you a sense of how deep this goes. And by the way, we don't just analyze the signal. We analyze the semantics. We get transcripts of the videos and assess them. We look all over the web using something called reverse search to say, has this image occurred elsewhere before?

Starting point is 00:07:07 Is it a modified version of something that we've seen? So we use a lot of tricks under the hood because there's no silver bullet. We use every trick that we can find to do that. But I want to share with you something that just gives you a sense of how involved this can get. So there's a technology called lip syncing, which is where you take a video of a person. Then you lay down a different audio track. So now they're saying things they didn't actually say. And that's really weird, right, because their lips aren't aligned, right?

Starting point is 00:07:41 That's like very bad dubbing in the old days. But now with lip syncing technology, they can actually modify the lips so that it looks like the person, there's a famous example of this with Obama a few years back, it's gotten much better since then. It looks like the person is actually saying what you're making them say, right? This is terrible. It turns out that there are subtle discrepancies between the audio track and the video track. So actually, Hani Farid and one of his students, right, he's a Berkeley professor, a major authority in the field of forensics, had the idea, well, what if we analyze the transcript, right? We record the audio and transcribe it. And then we use lip reading software to analyze what the lips are saying. Now, because of these discrepancies, what you see visually with the lip reading and what you hear in the transcript is going to be quite different. And when that's the case, that's

Starting point is 00:08:36 a hint that this is fake. So it shows you the creativity and the great lengths that we can go to to try and find this smoking gun that tells you, aha, this is a fake. Could you talk a little bit about the performance or the sort of accuracy of the existing systems and why the system might be more accurate? Sure.

Starting point is 00:09:01 So first let's clarify what we're talking about very specifically, And that's images, videos, and audio. We don't deal with factual questions, which can be subject to interpretation, or with text, which can definitely be faked. That's a whole other arena. But in these three things, there are not really tools available where you can do this. And actually, we've got it even further and put a bot on X, right, where a lot of this stuff is rampant, where you can just tag at True MediaBot, and it'll take.

Starting point is 00:09:35 what's in the thread, the media that's in the threat, analyze it and post its analysis in response. So we are democratizing the ability of anyone and everyone to use this technology. But now to go to your question about quality, it's extremely important. So as an academic, I started looking at different models and different vendors and assessing the tools.

Starting point is 00:09:58 I very quickly determined that there's no silver bullet here. There's some very strong claims made by different people. And there are also some very high quality technologies. We have a number of partners, including Pindrop on the audio side, Reality Defender, Hive, Sensity in Europe, a number of organizations that do a good job in doing the analysis.

Starting point is 00:10:19 The first thing we did is we said, when we get a query, why don't we send it to all of them in parallel to hit their APIs and collect the responses and form an analysis? So we did that. And naturally, when you can consult all the experts, Simultaneously, you tend to get a better result. We then went further and used open source models,

Starting point is 00:10:42 ones from academia, ones that we've developed ourselves, and we fine-tuned it on the data that we find in social media. But the bottom line is we sit at comfortably above 90% accuracy, which is very good, but also, you know, full disclosure, error prone, right? That means if you do 100 queries, 10 of them, we can make mistakes on, and we do various things that I could talk about in the user interface to address that. So you don't get the wrong impression of what our assessment says. So obviously there's kind of a cat and mouse game because people remember famously that when you

Starting point is 00:11:19 generated images of people in the last few years, what it was bad at was getting the hands right. So if you looked closely, there was always more fingers on the hands than they wouldn't be there on a normal human being. And that's a signal that a human being can detect. But as AI gets better and better, those signals that are visible to human beings go away. And instead you have to look for more of these invisible signals. I'm just curious. Was there anything surprising about, you know, what you discovered about the signals that a machine can pick up that a human eye or human ear cannot? Yes, we found that the technology advanced to a very key point. And in fact, now people can no longer tell. Actually, a lot of people think, oh, whether I can see the hands or not,

Starting point is 00:12:01 I can squint and glance, and I can tell. So we launched a quiz taking social media items only, political deep fakes. They've been posted on social media. And we found that people typically cannot tell. The New York Times did multiple quizzes, a very recent one with videos, previous one with faces. When you take these quizzes, you are quickly humbled. You cannot tell. So the fact of the matter is, even in the current state of technology, and as you pointed out,

Starting point is 00:12:30 it keeps getting better, people are diluting themselves. if they think they can tell. Yeah, and I think it's just so important people to remember that I remember the days when I would see deepfix and, you know, it causes alarm when you see where it goes, but you would always say, but at the end of the day, I can still tell that this is still generated by a computer.

Starting point is 00:12:48 And I think in many areas of AI, whether it's, you know, AI capabilities in biology and chemistry and math and science and, you know, generating fake media, we look at the capabilities today and we say, oh, but see it fails here, here, and then we say, so see, there's nothing really to be worried about. But if you look at the speed at which this is going,

Starting point is 00:13:07 we don't want to have guardrails after the capabilities are so scaled up. We want to really get those guardrails in now. So I'm so grateful that you've been doing this work. I thought for a moment what we do is just set the table a little bit because there's a whole ecosystem of players in this space. And there's different terms people throw around. Watermarking media, the provenance of media, disclosure, direct disclosure, indirect disclosure versus detection of things.

Starting point is 00:13:31 Could you just sort of give us a little lay of the land of the different approaches that are in this space. I think President Biden's executive order called for watermarking of media. So all these terms like provenance and watermarking and others refer to technologies that attempt to stamp, to track the origin of a media item,

Starting point is 00:13:52 I'll just use image for simplicity, and to track changes to it, and to give you that information up front. That's very important technology, and it only has one thing, major Achilles heel, which is currently, it's completely impractical. And it's impractical for two key reasons. The first one is, turns out that these watermarks and there's visible ones, invisible ones are relatively easy to remove. The second one is, even if somehow we were able to

Starting point is 00:14:26 successfully insist on watermarks on all AI-generated media. And that's, as you point out, a big if, it makes no difference unless the app that you're using to consume the media looks for it, right? So if it's your browser, if it's your Facebook app, if it's your TikTok app, if it doesn't insist on detecting it, then it doesn't matter, right? Because nobody is going to go through a bunch of machinations, not nobody, but most people just consume what they're given. They're not going to turn various things on. So unless the app that you're using as a consumer reflects the watermark, it doesn't do anything whether it's there or not. And for that reason, right, if you remove it, nobody would even notice. And if it's there, nobody would notice either. We have to

Starting point is 00:15:17 reach the point, for this to be practical, we have to reach the point where the way that we consume media tells us always, right, whether this is fake or real. Shouldn't it be the case that Facebook and Twitter and even dating apps like Tinder sort of implement this deep fake detection directly into the way that all these products work. Like wouldn't, shouldn't they just implement true media embedded as checking all the media that's running through it? Absolutely.

Starting point is 00:15:45 If you want assurance that what you see is real, that content that you record isn't stolen, basically, in various ways, then we need to do something like that. I did have conversations with the major providers, as you'd call them, social media networks. Generally speaking, I wouldn't say that they're rushing to do this. They don't have the economic incentive. And that's why legislation is appropriate.

Starting point is 00:16:11 Of course, there are free amendment issues and so on. And so starting with certain cases, child pornography is actually one that we've done better on than others. And certainly politics is a key one. You can't just have political ads. We're such visual creatures. All kinds of political ads or social media posts that appeal to people's baser instincts that confuse people, all that without any kind of appropriate analysis and enforcement. Oftentimes in our work, we repeat the line from Charlie Munger, who is Warren Buffett's business partner. If you show me the incentive, I will show you the outcome.

Starting point is 00:16:51 And as you said, the companies don't have an incentive to implement this, especially if it costs the money. And that's one of the questions I wanted to ask you is I'm assuming one of the reasons that they don't just implement deep fake detection is that involves running more servers, more compute, you know, every piece of content that gets uploaded, cost them some server costs to like process the image, process the treat, process the TikTok video, put it on to a server. And what this would involve is doing an extra check that would cost them some extra money. So what is this cost that's getting in the way of that? There is a cost here, but really given the tremendous abilities of these organizations of these organizations. organizations, the cost is very small. I think the concern is of a different sort. And you're absolutely right, there are disincentives to do this, but it isn't so much that cost. So what are the disincentives? First of all, the stuff, kind of the worst stuff in some sense plays very well.

Starting point is 00:17:43 As you've documented, right, the algorithms spread that stuff. People click on it. So the first and foremost incentive is revenue, not cost. This stuff makes money. And we have a kind of tragedy of the the comments here. They don't see that it's their responsibility to make sure that we have access to the truth or even information about, right? Something could be true, but still manipulated, right? It doesn't mean that it's wrong, but they don't see a responsibility to let the public know this was manipulated by AI. Yeah, well, I can also see how with the 10% failure rate, if you were to have the tech platforms like Facebook or TikTok, you know, be forced to implement this deep fake detection, and they got it wrong 10% of the time that would leave them

Starting point is 00:18:28 victim to all sorts of rightful attacks, that they're dismissing content that's actually real. I think the concern about accuracy is a very fair concern, but I also think that having information, particularly educated information, is always a good thing. So I would not suggest that anybody toss anything out, take down requests, and so on. That's where we get into First Amendment issues. What I would suggest is adding a label that this seems to be automatically synthesized. This may be suspicious according to an assessment created by TrueMedia.org, which is, of course, a nonprofit, nonpartisan body, or by some other one. And I do think that we can drive up the accuracy above 90%, but it will always make mistakes, which is why I actually think that the most

Starting point is 00:19:22 important thing is to when something may be suspicious and it may be due to its origin and maybe due to our analysis or a different tool, you just need to take an extra minute before you forward it to 100,000 followers or they forward it and it spread so virally. And the biggest thing we can do as this technology plays out is to take a moment and say, wait a minute, am I sure this is true? Where did this actually come from? You're highlighting a really important point that I really want listeners to get, which is something in our work we call perverse asymmetries. You can create a deep fake 100% of the time.

Starting point is 00:20:03 You cannot accurately detect a deep fake 100% of the time. And to get it to 100% takes years and years and years of research that now you or have signed up for your organization and nonprofit having to do all this work to get to closer and closer and closer. So wherever there's these asymmetries, we should be investing in the defensive first rather than just proliferating all these new AI offensive capabilities into society. So, Orrin, I think what we really want to do in this podcast

Starting point is 00:20:34 is paint a picture of how we get to the ideal future. And you're working hard on building one tool. But as you said, it's not a silver bullet and it's just one tool of a whole ecosystem of solutions. Do you have kind of an integrated idea of the suite or ecosystem of things that you would love to see if we treated this problem as kind of the Y2K for reality breaking down?

Starting point is 00:20:56 I think ecosystem is exactly the right phrase here because I think that we all have a role to play. I think regulators, as we just saw in California and hopefully we'll become federal or in many states follow California and then when you have a patchwork of regulations at the state level, sometimes it's elevated to a federal and even international regulation. So I think that's an important. component. It needs to be done right. There's a balance here of the burden on the companies

Starting point is 00:21:25 on protecting free speech rates, but at the same time creating an appropriate world for us, particularly in these extreme harm cases like politics, non-consensual pornography, etc. So I think that's incredibly important. Once you have those, you need the tools to then detect that something is fake, whether it's watermarking, post hoc detection like we do it formedia.org or a combination of both, you can't have regulations on the books without enforcement. So I think regulation and enforcement go hand in hand. I really hope that in that context, the social media companies will step up and realize that we're encountering what I think of is really an Achilles heel for democracy,

Starting point is 00:22:09 the combination of AI and social media in a way that can disrupt elections. And I really believe that they can step up more. And then the last, and in some ways, the most important thing is what we're doing today in this conversation, raising awareness, increasing media literacy, making sure that everybody exercises common sense that is appropriately, everybody is appropriately skeptical about if you see something, are you sure that it's real? And even today, we do have organizations like Reuters, AP News, others who have extensive fact-checking organizations. And sometimes we just need to take a little bit of time to make sure that what we're seeing and having an emotional response to really is real. If we work together across all these different elements of the ecosystem, I do think that things will improve. And I am worried that it's going to get worse before it gets better. Well, with that somber note, that was a good note to end on.

Starting point is 00:23:13 Thank you, Oren, for coming on your undivided attention. Thank you, Dresan. Thank you for inspiring me to engage in this work. I'm both inspired, terrified, but also pleased that we are giving it our best shot in 2024. So before we go, I just want to say this is obviously a massive problem. And it's going to take a whole suite an ecosystem of solutions and new tools, from things like proof of humanity verification to requiring that we have whole new cameras

Starting point is 00:23:42 that have cryptographic signatures embedded within every photo that we take. And these things would require some new laws that are being proposed right now. And I want to give a shout out to the huge ecosystem of people who've been hard at work on this problem for a very long time. From the partnership on AI and the ontology of different sort of issues in the space to the Coalition for Content Providence and Authenticity or C2PA, which is a collection of companies that were working on these issues for a while. Nonprofits like witness.org and companies like TruePick. I want to make sure people

Starting point is 00:24:13 go and check out their work because we need all of these initiatives to be successful. And one last thing, please don't forget to send us your questions. You can email us at undivided at humaneck.com or tape a voice memo on your phone and then send it to us. One of the weird things about recording this podcast is our little team sits on a Zoom call and we do these episodes and we can't feel the millions of you that are out there for whom you know you want to go deeper into these topics you have questions you enjoy certain episodes you don't like certain

Starting point is 00:24:43 episodes and we really do want to hear from you so please send us your feedback and your questions and we can incorporate them into a future ask us anything your undivided attention is produced by the Center for Humane Technology a non-profit working to catalyze a humane future

Starting point is 00:25:00 our senior producer is Julia Scott Josh Lash is our researcher and producer and our executive producer is Sasha Fegan. Mixing on this episode by Jeff Sudakein, original music by Ryan and Hayes Holiday. And a special thanks to the whole Center for Humane Technology team for making this podcast possible. You can find show notes, transcripts,

Starting point is 00:25:19 and much more at HumaneTech.com. And if you like the podcast, we'd be grateful if you could rate it on Apple Podcast because it helps other people find the show. And if you made it all the way here, let me give one more thank you to you for giving us your undivided attention. Thank you.

Your Undivided Attention - Is It AI? One Tool to Tell What’s Real with Truemedia.org CEO Oren Etzioni

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.