Short Wave - Adversarial AI

Starting point is 00:00:00 You're listening to Shortwave from NPR. Hey everybody, Maddie Safaya here again. This time with NPR's special correspondent, Dina Temple Rastin. Hey, Dina. Hey there. So you're here because you've been doing some really cool reporting about artificial intelligence as part of your special series. I'll be seeing you. Yeah, we did a story that it was explaining how AI works and how it's finding its way into everything from refrigerators to insurance, even conservation.

Starting point is 00:00:29 But you also found out that for all of its potential, there are some real concerns about hacking into AI. There's actually a whole field of study that is focused on this. It's called adversarial or evil AI. Ooh, evil. And it's a big enough concern that DARPA, the military's research arm, has created this whole program to study it. And it's called guaranteeing AI robustness against deception. Or luckily, it has a short name guard. The government is so good at naming things, Dina. It is quite the name. So DARPA's really good at creating tongue twisters. But basically what they're trying to do is imagine adversaries hacking into AI systems. And as they see it, it can affect everything from my public opinion to driverless cars. So it has huge implications. Today on shortwave, adversarial AI. How does it work and how can we stop it?

Starting point is 00:01:28 Okay, Dina, let's start with the basics. What makes AI so vulnerable to hacking? It's the way it makes decisions. It's a bit of a black box. Humans look at the totality of something. And AI, what it does is it just ingests millions and millions of data points to categorize things, to learn about them, to find patterns. And then once it finds those patterns, it kind of finds shortcuts to get to those patterns quicker. And that's where the vulnerability lies. Here's an example. So let's say you have AI and you want it to identify a particular kind of music, say, to identify disco.

Starting point is 00:02:01 Oh, okay, Dina. Bump that disco? I know. You didn't see that coming, did you? So you train the AI system with tons and tons of disco music. That feels like abuse, but keep going. Yes, but it's AI so it can't feel. Yes, so it finds 10 things that are always in disco music, but never in orchestral music.

Starting point is 00:02:22 Let's say it figures out that disco always has a certain number of beats per minute, or it calculates how many horns are in a piece of disco music. And then let's say the AI notices that there's never an oboe in disco music. Right. But here's something that's interesting in that piece of it. of music you were just dancing to? Yeah. There's an obo. No, there was not. Yeah, there was. Here, let me bump up the obo for you. You snuck that oboe in on me. That's right. Your human ears can't recognize the oboe, but the AI can. So if I were an adversarial AI person, I would sneak

Starting point is 00:03:01 that obo in there so that the AI noticed it, but you don't. Right. That obo is enough for AI to decide that's it, not disco. So that's the way you fool AI. Exactly. That feels like disturbingly simple, Dina. It's an incredibly oversimplified example and a good excuse to use disco. So here's a real world experiment out of Carnegie Mellon University. They trained a computer to use facial recognition to identify different people. And the computer dutifully ingests all these different pictures and identifies them every time, exactly right. And then they put these big, colorful glasses on a subject who didn't have glasses before and the computer completely misidentifies Just because he put on a pair of big colorful glasses? Well, they're not ordinary glasses, to be fair. They were sort of like oversized clown glasses. But yes, basically, that's how they fooled it. But the experiment that changed everything involved driverless cars. Okay, tell me about it. So first, let me introduce you to the lead scientist of the experiment. She's a UC Berkeley professor named Don Song, and I met up with her in San Francisco. Wow, this is quite a view.

Starting point is 00:04:06 Yeah, the view is nice. For the greater good, I tore myself away from the view, and I asked her if she'd show me the short video she made with colleagues from Berkeley, the University of Michigan, University of Washington, and Stony Brook. And this video went viral. So what's on the video? So the video doesn't have any sound, and it's less than a minute long. Perfect for podcast, Dina. Exactly. Welcome to picking an experiment that is completely unhelpful on radio.

Starting point is 00:04:29 But it did rock the AI community because it showed how vulnerable AI can be. So in the video, you'll see two frames side by side. There's like a split screen. In both frames, you'll see the vehicle is driving towards the end of the road, where there's a person holding a stop sign. And each one of these screens is subtitled. So instead of French, what you're seeing is AI making its decisions with words down at the bottom of the screen. And it's making its decisions in a subset of AI called image classification.

Starting point is 00:05:01 You'll see the prediction given by the image classification system to try to predict what the traffic sign is. So it's sort of like the car starting to think, hmm, a sign is coming, I'm going to have to make a decision. Right. So the way you have to visualize this is that one of these stop signs is completely untouched. And one stop sign is altered. Song has put the stickers on it just below the S and one above the O. Okay, so what happens?

Starting point is 00:05:27 Well, so as the car gets close to the sign, the subtitles on the screen are telling you what it's deciding to do. And when it gets close to the regular stop sign, it says, prepare to stop. But when it gets to that other sign, the one with the stickers on it, it thinks the sign is saying speed limit 45 miles an hour. That is not what it's saying. It's not what it's saying, but it blows right through the intersection. Now, this is an experiment, so nobody got hurt. But those carefully placed stickers were all it took to full the AI. That feels like too simple, like scarily simple to trick it.

Starting point is 00:05:57 Wild, right? Yes. So to be fair, it was an incredibly long and sophisticated process to figure out, where the stickers should go, so they didn't just sort of slap a couple stickers on and hope for the best. And the research team knew exactly which pixels the AI was looking at, and those are the very ones they altered. But like, Dina, these are also two signs that are totally different shapes. And colors. Yes. Yes. So I asked that question, too. And it's because the AI is not seeing the sign in the way that we think of seeing a sign with our eyes. It sees each sign as a mathematical

Starting point is 00:06:29 equation, not a shape. So what the experiment showed is that while AI has come a really long way, it's far from having the performance that you and I have as humans. And because it doesn't look at the totality of things, like color, shape, that sort of thing, there are really easy ways to fool it. We need to understand that the machine learning system is not as powerful as what people think. We still have a lot of work to do. Decades before there's a safe self-driving car? We do really need new and more breakthroughs before we can really get there. So would you ride in a driverless car? Not today. I mean, I'll enjoy having a test ride. Dina, if she's not getting in one, I'm not getting in one. Yeah, well, let's put it this way.

Starting point is 00:07:20 I drive a clutch, so I like a lot of control in my car. So I'm not getting into one either. So how do we prevent people from hacking into AI? Well, song is just one research. researcher who's working on all this adversarial or evil AI research, DARPA, the military's top research arm, it's working this problem too with something called Guard that we mentioned before, guaranteeing AI robustness against deception. So have they come up with any solutions? They more came up with some broad aims. So, for example, we talked about how AI is a bit of a black box. We don't know how it's making decisions. But imagine if you could train it to tell you how it's making decisions. Then you know where someone might find a vulnerability to fool it and you can make that

Starting point is 00:08:01 part more robust. You can imagine from a battlefield perspective, because there's so much AI that's being sort of injected into different weapons that we have, they have a huge concern that an adversary can fool AI and make whatever the weapon is do the wrong thing. So what's the bottom line here? Like, should we in general be worried about AI? Yes, not in the way you are in the movies. It's not going to be machines that take over. It's more going to be AI making, innocent mistakes that an adversary is sort of teaching it to make. You know, DARPA is thinking about this because back of the day, they actually introduced ARPANet, which is the basis of the internet. And they rather naively thought that no bad guys would be doing something. They thought

Starting point is 00:08:42 that there would be researchers exchanging information, everybody would be happy, and then we get hackers. So they've learned a lesson from that. And the lesson is, look, as we develop AI, let's make it stronger from the outset. Let's figure out how people could misuse it and put in systems that make it more resilient so they can't do that. Tina Temple Rastin is a special correspondent for NPR. And you can check out her series, I'll be seeing you, on our website and on NPR One. I'm Maddie Safaya. Come back tomorrow for a story about Nazi Germany's attempt to build a nuclear reactor.

Starting point is 00:09:21 And how that story was almost lost to history. That's tomorrow on Shortwave from NPR. I have a what? A grace? Wow. Wow. Am I too bro? Do you think I went for us?

Starting point is 00:09:36 I think that is the opposite of bro. Grace is the opposite of bro. I just want to point out that nobody's talking about my grace. Clearly, I have none. No. Fair.

Short Wave - Adversarial AI

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.