Short Wave - How Hackers Could Fool Artificial Intelligence

Starting point is 00:00:00 Hey, everybody, Maddie Safai here. Today, we've got one for you that we first published last year when Shortwave was just two weeks old, barely lifting our little baby podcast head up on our own for the first time. The episode's all about how artificial intelligence works and how it can be hacked, plus disco music. We're back tomorrow with a new episode. And don't forget to subscribe to or follow Shortwave wherever you get your podcasts. You're listening to Shortwave. from NPR.

Starting point is 00:00:33 Hey everybody, Maddie Safaya here again. This time with NPR's special correspondent, Dina Temple Rastin. Hey, Dina. Hey there. So you're here because you've been doing some really cool reporting about artificial intelligence as part of your special series. I'll be seeing you. Yeah, we did a story that it was explaining how AI works and how it's finding its way into everything,

Starting point is 00:00:54 from refrigerators to insurance, even conservation. But you also found out that for all of its potential, there are some real concerns about hacking into AI. There's actually a whole field of study that is focused on this. It's called adversarial or evil AI. Ooh, evil. And it's a big enough concern that DARPA, the military's research arm, has created this whole program to study it. And it's called guaranteeing AI robustness against deception. Or luckily, it has a short name guard. The government is so good at naming things, Dina. It is quite the name.

Starting point is 00:01:28 So DARPA is really good at creating tongue-twitness. But basically what they're trying to do is imagine adversaries hacking into AI systems. And as they see it, it could affect everything from my public opinion to driverless cars. So it has huge implications. Today on Shortwave, adversarial AI. How does it work and how can we stop it? Okay, Dina, let's start with the basics. What makes AI so vulnerable to hacking?

Starting point is 00:02:01 It's the way it makes decisions. It's a bit of a black box. Humans look at the totality of something. And AI, what it does is it's just indefinitely. just millions and millions of data points to categorize things, to learn about them, to find patterns. And then once it finds those patterns, it kind of finds shortcuts to get to those patterns quicker. And that's where the vulnerability lies. Here's an example.

Starting point is 00:02:22 So let's say you have AI and you want it to identify a particular kind of music, say, to identify disco. Oh, okay, Dina. Bump that disco? I know. You didn't see that coming, did you? So you train the AI system with tons and tons of disco music. That feels like abuse, but keep going. Yes, but it's AI so it can't feel.

Starting point is 00:02:44 Yeah, so it finds 10 things that are always in disco music, but never in orchestral music. Let's say it figures out that disco always has a certain number of beats per minute, or it calculates how many horns are in a piece of disco music. And then let's say the AI notices that there's never an oboe in disco music. Right. But here's something that's interesting. In that piece of music you were just dancing to? Yeah.

Starting point is 00:03:12 There's an obo. No, there was not. Yeah, there was. Here, let me bump up the oboe for you. You snuck that oboe in on me. That's right. Your human ears can't recognize the oboe, but the AI can. So if I were an adversarial AI person, I would sneak that oboe in there so that the

Starting point is 00:03:31 AI noticed it, but you don't. Right. That oboe is enough for AI to decide that's it, not disco. So that's the way you fool AI. Exactly. That feels like disturbingly simple, Dina. It's an incredibly oversimplified example. Okay, okay.

Starting point is 00:03:46 And a good excuse to use disco. So here's a real world experiment out of Carnegie Mellon University. They trained a computer to use facial recognition to identify different people. And the computer dutifully ingests all these different pictures and identifies them every time, exactly right. And then they put these big, colorful glasses on a subject who didn't have glasses before. And the computer completely misidentifies him. Just because he put on a pair of big colorful glasses. Well, they're not ordinary glasses, to be fair.

Starting point is 00:04:13 They were sort of like oversized clown glasses. Sure. But yes, basically, that's how they fooled it. But the experiment that changed everything involved driverless cars. Okay, tell me about it. So first, let me introduce you to the lead scientist of the experiment. She's a UC Berkeley professor named Dawn Song, and I met up with her in San Francisco. Wow, this is quite a view. Yeah, the view is nice. For the greater good, I tore myself away from the view, and I asked her if she'd show me the short video she made with colleagues from Berkeley, the University of Michigan, University of Washington, Stony Brook. And this video went viral. So what's on the video? So the video doesn't have any sound and it's less than a minute long. Perfect for podcast, Dina.

Starting point is 00:04:53 Exactly. Welcome to picking an experiment that is completely unhelpful on radio. But it did rock the AI community because it showed how vulnerable AI can be. So in the video, you'll see two frames side by side. There's like a split screen. In both frames, you'll see the vehicle is driving towards the end of the road where there's a person who. holding a stop sign. And each one of these screens is subtitled. So instead of French, what you're seeing is AI making its decisions with words down at the bottom of the screen. And it's making its decisions in a subset of AI called image classification. You'll see the prediction given by the image classification system to try to predict

Starting point is 00:05:35 what the traffic sign is. So it's sort of like the car starting to think, hmm, a sign is coming. I'm going to have to make a decision. Right. The way you have to visualize this is that one of these stop signs is completely untouched. And one stop sign is altered. Song has put the stickers on it just below the S and one above the O. Okay.

Starting point is 00:05:54 So what happens? Well, so as the car gets close to the sign, the subtitles on the screen are telling you what it's deciding to do. And when it gets close to the regular stop sign, it says, prepare to stop. But when it gets to that other sign, the one with the stickers on it, it thinks the sign is saying speed limit 45 miles an hour. That is not what it's saying. It's not what it's saying, but it blows right through the intersection. Now, this is an experiment, so nobody got hurt. But those carefully placed stickers were all it took to fool the AI.

Starting point is 00:06:21 That feels like too simple, like scarily simple to trick it. Wild, right? Yes. So, to be fair, it was an incredibly long and sophisticated process to figure out where the stickers should go. So they didn't just sort of slap a couple stickers on and hope for the best. And the research team knew exactly which pixels the AI was looking at, and those are the very ones they altered. But, like, Dina, these are all. also two signs that are totally different shapes. And colors. Yes. Yes. So I asked that question

Starting point is 00:06:48 too. And it's because the AI is not seeing the sign in the way that we think of seeing a sign with our eyes. It sees each sign as a mathematical equation, not a shape. So what the experiment showed is that while AI has come a really long way, it's far from having the performance that you and I have as humans. And because it doesn't look at the totality of things like color, shape, that sort of thing. There are really easy ways to fool it. We need to understand that the machine learning system is not as powerful as what people think. We still have a lot of work to do.

Starting point is 00:07:25 Decades before there's a safe self-driving car? We do really need new and more breakthroughs before we can really get there. So would you ride in a driverless car? Not today. I mean, I'll enjoy having a test rights. Dina, if she's not getting in one, I'm not getting in one. Yeah, well, let's put it this way. I drive a clutch, so I like a lot of control in my car, so I'm not getting into one either.

Starting point is 00:07:53 So how do we prevent people from hacking into AI? Well, someone researcher who's working on all this adversarial or evil AI research. DARPA, the military's top research arm, it's working this problem, too, with something it called Guard. that we mentioned before, guaranteeing AI robustness against deception. So have they come up with any solutions? They more came up with some broad aims. So, for example, we talked about how AI is a bit of a black box. We don't know how it's making decisions.

Starting point is 00:08:22 But imagine if you could train it to tell you how it's making decisions. Then you know where someone might find a vulnerability to fool it, and you can make that part more robust. You can imagine from a battlefield perspective, because there's so much AI that's being sort of injected into different weapons that we have, they are. They have a huge concern that an adversary can fool AI and make whatever the weapon is do the wrong thing. So what's the bottom line here? Like, should we in general be worried about AI?

Starting point is 00:08:48 Yes, not in the way you are in the movies. It's not going to be machines that take over. It's more going to be AI making innocent mistakes that an adversary is sort of teaching it to make. You know, DARPA is thinking about this because back of the day, they actually introduced ARPA-Net, which is the basis of the Internet. And they rather naively thought that no bad guys would be doing something. They thought that there would be researchers exchanging information, everybody would be happy. And then we get hackers. So they've learned a lesson from that.

Starting point is 00:09:17 And the lesson is, look, as we develop AI, let's make it stronger from the outset. Let's figure out how people could misuse it and put in systems that make it more resilient so they can't do that. Tina Tupperaston is a special correspondent for NPR. And you can check out her series, I'll be seeing you, on our website. on NPR One. I have a what? A grace? Wow. Wow. Am I too bro? Do you think I went bro? I think that is the opposite of bro. Grace is the opposite of bro. I just want to point out that nobody's talking about my grace. Clearly I have none.

Short Wave - How Hackers Could Fool Artificial Intelligence

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.