CyberWire Daily - Three pillars of Artificial Intelligence. [Research Saturday]

Episode Date: May 12, 2018

Bobby Filar is a Principal Data Scientist at Endgame, and coauthor of the research paper, The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation. The report surveys th...e landscape of potential security threats from malicious uses of AI, and proposes ways to better forecast, prevent, and mitigate these threats. Bobby Filar joins us to discuss the paper, and his views on the evolving role of AI in cybersecurity.  The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 You're listening to the Cyber Wire Network, powered by N2K. data products platform comes in. With Domo, you can channel AI and data into innovative uses that deliver measurable impact. Secure AI agents connect, prepare, and automate your data workflows, helping you gain insights, receive alerts, and act with ease through guided apps tailored to your role. Data is hard. Domo is easy. Learn more at ai.domo.com. That's ai.domo.com. Hello, everyone, and welcome to the CyberWire's Research Saturday. I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down threats and vulnerabilities and solving some of the hard problems of
Starting point is 00:01:10 protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us. And now, a message from our sponsor, Zscaler, the leader in cloud security. Enterprises have spent billions of dollars on firewalls and VPNs, yet breaches continue to rise by an 18% year-over-year increase in ransomware attacks and a $75 million record payout in 2024. These traditional security tools expand your attack surface with public-facing IPs that are exploited by bad actors more easily than ever with AI tools. It's time to rethink your security. Zscaler Zero Trust plus AI stops attackers
Starting point is 00:02:00 by hiding your attack surface, making apps and IPs invisible, eliminating lateral movement, connecting users only to specific apps, not the entire network, continuously verifying every request based on identity and context, simplifying security management
Starting point is 00:02:18 with AI-powered automation, and detecting threats using AI to analyze over 500 billion daily transactions. Hackers can't attack what they can't see. Protect your organization with Zscaler Zero Trust and AI. Learn more at zscaler.com slash security. I think the bulk of the researchers were brought together after a conference that occurred December 2016. That's Bobby Feiler. He's a principal data scientist at Endgame.
Starting point is 00:02:56 The research we're discussing today is titled The Malicious Use of Artificial Intelligence, Forecasting, Prevention, and Mitigation. use of artificial intelligence, forecasting, prevention, and mitigation. There was kind of a formal panel. Some of the conversation was laid out about the needs or requirements to establish a little bit more rigor within the AI research community, to establish norms, kind of garner some sort of aspects of safety and ethics. Over the next six months, I believe they kind of laid the groundwork coming together with the three pillars, kind of the political, digital, and physical security aspects. Myself and Hiram Anderson, another data scientist here at Endgame, were brought in to specifically contribute to the cybersecurity or information security component. And then it was a lot of, hey, let's try to get together and meet virtually 22 opinions
Starting point is 00:03:51 and try to figure out what the common themes are going to be, what we want this to read like. We didn't want it to be kind of a klaxon or call to arms about any sort of robot apocalypse. We wanted it to be very pragmatic, a very thoughtful approach that was more policy-driven than kind of the pop culture mainstream media was reporting on. The Oxford and Cambridge researchers, kind of the principal researchers, once everything was written up and we felt comfortable with it, took it and kind of brought us across the finish line with editing. There was a ton of help from OpenAI as far as getting some of the news out there and
Starting point is 00:04:34 PR. Yeah, I think it's a remarkably accessible report. You know, my hat's off to you. Let's start, though, I think maybe the most fundamental question here is, at the outset, defining artificial intelligence and machine learning. I think particularly in the cybersecurity world, and when it comes to marketing in particular, I think those terms have gotten a little bit fuzzy. So can you help, for the purposes of this paper? How did you approach those definitions? I mean, first, there is no way marketing has ever overstated what AI is. That seems
Starting point is 00:05:10 implausible to me. I think there's a pretty big misunderstanding, and it's a byproduct of Hollywood, our favorite TV shows and things like that about what AI really is. For me growing up, it was, you know, things like RoboCop, which was super exciting. And then later iRobot and all of this fun stuff. Lieutenant Commander Data. Exactly. Exactly. So there were all these references in pop culture about what AI was supposed to be. When in reality, it can be much more mundane. I think we've seen recently there's been some more interesting aspects. But its core, it's really just kind of statistics in machine learning.
Starting point is 00:05:57 And for the folks listening that don't really understand machine learning, there are three main pillars. that don't really understand machine learning, there are three main pillars. This concept of supervised learning, where you have examples of something with some sort of label, and then you train a model to recognize it. A good example is like spam detection. Unsupervised learning is where you have a bunch of disparate data, and there are no labels, and you attempt to cluster them together based on commonalities in hopes to deriving some sort of information. And a lot of times that's used to derive labels.
Starting point is 00:06:32 So you could look at things like economic factors, locations, schools and all of this and attempt to group together and categorize people or districts into, you know, red versus blue, rich versus poor, things like that. And then the final one, and one that gets referenced, I think quite a bit in the report is this idea of reinforcement learning, where you're treating whatever problem you're trying to solve like a game, and you're allowing an algorithm to try to figure out the best way to solve a problem on its own. And then based off of some sort of reward function or feedback loop, it takes that information, adjust its parameters accordingly, and then tries again. And it does this instead of, you know, five or six times, like it would take us to learn how to shoot a basketball or, you know, swing a golf club. It
Starting point is 00:07:26 does it millions of times in a very small setting is a way to perfect a particular approach. So let's go through, you all lay out a general framework for AI and security threats. Can you take us through what you discussed there? Yeah. So at its core, we try to focus on kind of three pillars, the political spectrum, the physical spectrum, and then the cyber spectrum. I think for most people, the things that you would hear about or are hearing about are on the political side,
Starting point is 00:08:01 particularly right now with things like Cambridge Analytica. Um, there was a more, uh, I think humorous one that occurred with, uh, Jordan Peele from the comedy group, Key and Peele, where he made a, uh, Obama lip syncing video where he, he basically used AI, uh, and some of these algorithms that are readily available now to people who are non-practitioners or non-PhDs and basically created a video that allowed him to put forth a script that it seemed like Obama was reading from that said the AI apocalypse was coming and all that fun stuff. So you're starting to see more and more of
Starting point is 00:08:46 that. Another one that kind of came out that I think will have interesting ramifications down the road is this idea of deep fakes. So the ability to more or less morph any picture and overlay it with a picture of anybody you know or anybody you're interested in seeing. And there's a lot of potential safety and security concerns there where basically if you make somebody angry, all of a sudden your face could be on an inappropriate picture. There's blackmail concerns and things like that. And I think what you're starting to see is that sort of accessibility to AI and the lower cost of entry to using it is going to lead to both really good positive
Starting point is 00:09:33 breakthroughs and with that, the exploitation of those for more nefarious means. Well, let's explore that specifically. I mean, the cost issue. People that I've interviewed over the past year or so, they've said a lot of times the bad guys when it comes to cyber attacks, they have shied away from AI because it's been expensive and there are cheaper things that work. You know, just mass spamming or phishing campaigns or things like that. So one of the things that this paper points out is that the cost of these tools is decreasing and the availability of these tools is increasing. Yeah, yeah. And I think for the most part, the researchers you've spoken to are absolutely correct. And that's certainly one thing that we try to emphasize in this report, that there have been no outright uses within cybersecurity of AI being used. there have been no outright uses within cybersecurity of AI being used. Researchers within the InfoSec community have come up with a variety of use cases, and it's certainly plausible. I think we're still at that stage where, like you said, it's just a little bit easier right now to
Starting point is 00:10:39 take the low-hanging fruit approach because it is still effective. But I think what you'll see is the advent or use of machine learning and AI in defensive technologies within cybersecurity will lead to a little bit more generalized approach to tackling the threat landscape. And as those little pockets are shored up, it will require the attacker to become a little bit more sophisticated. And I think they'll look to the very tools that we're employing as an opportunity to attack. So it's kind of a cat and mouse game, if you will, or in its truest form, a red versus blue between AI algorithms and the people they are meant to stop where they will use those those exact algorithms against us so i think as far as the the cost concern or resource concern is is being considered you have platforms right now and there are dozens of them that make programming ai models and algorithms as easy as like a dozen lines of code
Starting point is 00:11:46 in things like Python. You don't necessarily need a math PhD if you want to become more familiar about the underlying concepts. The openness of the AI community and particularly the educational aspects, things like massively open online courses have made the concepts that much easier to understand as well. So between those two, and then just the overall availability of models that are pre-trained, you don't really need to know anything about, which is like the Jordan Peele case, where he just grabbed a lip sync model from the internet and then use that to his own ends. You're talking about coming up with an idea, getting in a super advanced piece of technology that didn't exist five years ago, and then turning
Starting point is 00:12:37 it into a piece of political propaganda, you know, and you're done before lunch. That to me is utterly fascinating that this sort of thing can transpire so quickly and so easily that at its core is one of the things that we're trying to get across in the report. I think a good case study that was mentioned was one in the InfoSec section about a company and two researchers from ZeroFox. They do
Starting point is 00:13:06 social media analysis and things like that actually up in Baltimore, where you're at. A good group of guys, they had this idea that, hey, what if we started reading people's tweets and then we used a generative model? So something you see things like AI that can generate Harry Potter stories. It's kind of the same concept. You train it on a subset of your tweets and then you feed it a little seed, like a topic, and then it produces 140 characters that seem semi-realistic of something you'd be interested in or something that you may have even tweeted at one time. So they did that and then they slapped a URL with a, I think a Google shortener application and then fed it to a bunch of their friends to see who would click on it. It was amazing, the effectiveness. I mean, you're talking going from like five to 10% effectiveness to 60 to 70%
Starting point is 00:14:03 effectiveness, all with, you know, with applying a little bit of data collection and a few hours of Python programming. And I think when it comes down to those numbers, that cost is low enough where attackers will start considering that as a potential means to an end. So being able to put something in a familiar voice by training, I guess, the stylistic specifics of that voice. Yeah, it's, you know, spearfishing is obviously super successful, but that requires a lot
Starting point is 00:14:39 of manual work. You're talking potentially several hours of OSINT work, fully understanding, you know, maybe the look or feel of a particular password reset email or a Capital One credit card statement or anything like that. Suddenly, if you have access to data, which so many of us put Facebook, Twitter, Instagram, things like that, just out there readily available, not only our own, but we also show our preferences based off of who we follow. We become very susceptible at that point to, you know, uh, you see this with Twitter, uh, with like promoted tweets is I set out to RSA last week. Every single promoted tweet I got was from a vendor with
Starting point is 00:15:26 hashtag RSAC. Right. And it's like, I am moderately interested in this because I am in the area. I will most likely walk by this vendor. And it is interesting to see whatever marketing language they're using. Now, imagine that with a non-information security professional, and it's about their favorite basketball team, their knitting club that they're associated with, something super specific, but it really required the attacker not to know anything like that,
Starting point is 00:15:54 just to download your tweets and then pass it through an algorithm. I think that's super interesting from a research standpoint, and it's kind of scary from just an everyday layman standpoint. Yeah. And one of the things that the research points out is the ability, the increasing ability of these systems to create synthetic images basically from scratch. And it tracks
Starting point is 00:16:20 sort of the system's ability to create realistic looking human faces. And we're at the point now where a synthetic image looks like a photorealistic image of a face. And it strikes me that from a political point of view, from a societal point of view, that if we hit the point where photographic evidence is no longer photographic evidence, and as you say, the generation of these can be done automatically in a matter of minutes rather than, you know, someone having to spend a lot of time in Photoshop or cutting and pasting and so forth. Well, that kind of changes the game, doesn't it? Yeah, yeah, it really does. And, you know, towards the end of our report, our call to action
Starting point is 00:17:02 is, you know, it's one part vigilance, one part openness and taking responsibility. But a huge kind of third component is education. And that education comes in the form of education towards the overall population, but specifically policymakers, so that they're aware of kind of the various side effects of this sort of dual use, dual nature technology. And I think if you or any of the listeners were paying attention during the Zuckerberg questioning on the Hill a week or two ago, you can start to see that like, you know, these congressmen and congresswomen are starting to have to tackle these problems that are, by and large, very, very technical and very sophisticated.
Starting point is 00:17:52 And it's one thing when it's data collection, which is kind of a very hard concept and relatively straightforward to understand. Data collection, unbeknownst to the user, is a bad thing. Data collection, unbeknownst to the user, is a bad thing. It's a whole nother when it's like political propaganda being created and distributed through sites like Twitter and Facebook and Instagram that are indistinguishable from everyday reality. I think that's a more terrifying sort of process. And I think that is something that policymakers are going to be made aware of in the near term. To that point about, you know, the Facebook grilling from the members of Congress, you know, it strikes me that I guess the argument could be made that the fact that the policymaking is a slow, deliberative process, you can make the argument that for a long time,
Starting point is 00:18:46 that's a feature, not a bug. But as the rate of change increases, the velocity increases when it comes to the developments in things like AI, I wonder, is policy always going to lag? And do we have to make sort of fundamental changes to just be able to keep up? Yeah, that's an interesting question. And do we have to make sort of fundamental changes to just be able to keep up?
Starting point is 00:19:07 Yeah, that's an interesting question. And certainly one that was kicked around quite a bit in the chat rooms and Google Docs that the researchers of this report were talking about. what we as researchers need to do and need to be more open to is, and this isn't to say that we're not doing an okay job right now. It's just like anything we can do better. And that's just being more open with a lot of the research that we're doing, red teaming it. There's a, there's a huge component of that. I think in the report, both anecdotes, stories, as well as the recommendation to do this. And a good example is kind of next generation antivirus, which is something that I'm sure you're familiar with. It's a big marketing term, obviously. But, you know, these are platforms that are meant to eliminate the need for signature-based AV with the expectation that you can get out ahead of threats, which is fantastic. It's proven to be very successful. It's an arduous process that requires massive amounts of samples correctly labeled.
Starting point is 00:20:21 But at the end of the day, it's still a byproduct of the data it sees. So even if it generalizes very well and can pick up on little nuances here and there, it's still very prone to attack. And with things like VirusTotal, where you can kind of submit a sample and then see a broad spectrum of vendors and how they respond to the sample you submit. There's been research and myself and Hiram Anderson teamed up with UVA to do this very thing, which is, could you use AI kind of against AI? Is this possible? And we set up kind of this game that could be thought of is like, you know, when you were in college and you tried to sneak into a bar. You showed up the first time and you wore a hat.
Starting point is 00:21:11 And you're like, this hat makes me look older. You tried it and the bouncer was like, no, no, no, no. That's not going to work. And then you're like, you know what? I bet if I grew out my beard a little bit, that would help. Go back again. It's like, better, but no. And then finally you're like, uh, well, maybe I just
Starting point is 00:21:28 need to get a fake ID. And then once you get the fake ID, you're in, and you're like, oh, that was the solution all along. That was the hole in the process. So it's kind of the same idea with this attacking next gen AV with kind of this reinforced game with malware, where you take a piece of malware and you throw it at next gen AV with, with kind of this reinforced game, uh, with malware, where you take a piece of malware and you throw it at next gen AV, it spits back good or bad. Some next gen AVs are a little bit more helpful and they spit back a number of like a confidence score or a probability
Starting point is 00:21:59 of maliciousness. And this helps even more because you can start to understand the ebb and flow of you making a decision or altering one piece of or part of code and the effect that that has on the score. So if you can do that enough, you can start to learn like, oh, well, if I pack the binary and then scrub strings and do this or change it to Russian language pack, then all of a sudden I can get past. And it's all of a sudden you learn a recipe for bypass. And this is a complicated process, certainly. It's one that requires a little bit of overhead, a little bit of resources. But at the end of the day, if you're an attacker and you have one shot to get it right,
Starting point is 00:22:43 what better approach is there than to have access to all this information offline, craft this perfect piece of malware using artificial intelligence, and then suddenly you're through. And those are the sorts of concerns, particularly from like a next gen AV standpoint that not only researchers need to be made aware of, but consumers and politicians and things like that. And I think that is a very closed example within InfoSec that could be propagated across the physical and political security spectrum as well. There's certainly this aspect of red teaming that needs to occur and then reporting back to policymakers
Starting point is 00:23:26 so we perform some sort of due diligence on that end. Yeah and I think you know in your example the other thing that happens is you know word gets around that the fake ID is the way into that bar so you have fewer people trying hats and growing mustaches. Right, right. As somebody who works on this problem, again, a lot of this stuff is very, very fascinating because it's all conceptual right now, but it's very easy to look down the road, you know, 12 to 24 months from now and start to understand like, yeah, there's a process that could occur. And depending on whether or not attackers start to believe that the juice is worth the squeeze, we may start seeing that. So the onus is on us to kind of eat our own dog food and just like red teaming any other
Starting point is 00:24:22 security tool, take the results from that and empower the product itself. We do a lot of things where we propose adversarial training where you're generating like all of these instances of kind of morphed malware and then saying like, well, just because this doesn't look like the malware it once was, that doesn't mean that it's any less bad. So now let's train less bad. So now let's train on that. So it's at least seen this sort of change in behavior. So we catch it the
Starting point is 00:24:50 next time. And it's all about, yeah, staying ahead, making sure model drift doesn't occur, that the models don't become stale, just trying to stay as current and up to date as possible. Now, looking ahead, you're looking down the road, what were some of the conclusions from the group? Does it seem like appropriate attention is being paid to this? Is there hope? Is it gloom and doom? Where did you all land with that? It's a little bit of both. And to be honest, this was something where researchers were split in a lot of ways about, you know, is it all bad? Is everything fine? I think for the most part, AI is always going to get kind of a negative connotation. And there's plenty to blame from that. And maybe Hollywood is... We've all seen the Terminator.
Starting point is 00:25:41 Exactly. But at the end of the day ai is being used every day for for things to make our lives easier as well self-driving cars is obviously one where there have been unfortunate accidents that have occurred and clearly we're not there yet but once we are you're talking about a fantastic opportunity to to help you know increase the efficiency of our highways and commutes and things like that. There's also things like AI being used to identify medical problems through x-rays, using computer vision. These are all very, very good things that are occurring. And very few people can say that that isn't the case. But that being said, any technology like this can be abused. It can be morphed and kind of twisted to some sort of, as I said earlier, kind of nefarious end.
Starting point is 00:26:34 And those are the things that we need to be more considerate about. And I think, and the report goes into this as well, out of all those spaces, physical, political, and information security, we should be looking at the information security first as kind of a standard on how to handle this. Because the information security community has been having to deal with these technologies being morphed and twisted to nefarious ends for a long time. And yeah, we don't have it down to a science on how to handle it best, but we have attempted to do things like disclosure, best practices, things like that, red teaming, have all become kind of mainstays. And yeah, it's not perfect, but it at least provides a roadmap of what could and. And yeah, it's not perfect, but it at least provides a roadmap of what could and
Starting point is 00:27:25 should be done, particularly within this kind of new, new fangled space of, of AI. Yeah. It's an opportunity for us to lead the way. Right. And I think a lot of the researchers that we worked with were political and physical security researchers. And, and they were the, they were kind of the first ones to come to this and say like, well, you guys have things like vulnerability disclosure. And I'm like, yeah, yeah, we do. And it works and it's reasonably effective. And we have things like bug bounties and we try to open up some of our tools to the community, but we could obviously do better. to the community, but we could obviously do better. And I think things like explainability,
Starting point is 00:28:13 interpretability of what these AI models are doing in each of these fields, information security is certainly one. Like, why did you call this binary bad? I think we need to be personally accountable from an ethics standpoint, in a safety standpoint. So people start to understand why things are occurring and why they're not. And I think that's something that you'll see researchers and security vendors in particular take more seriously in the next 12 to 18 months. I mean, I think one thing, and this would be more of a shameless plug,
Starting point is 00:28:41 but some of your viewers will likely be attending Black Hat and DEF CON this year. If you're interested in the AI aspects of how it's being employed or deployed in information security, go to those conferences and ask around. Talk to vendors. If you're at a vendor booth, try to grab a technical person. I know we at Endgame try to supply researchers or data scientists. DEF CON this year, I'm part of a committee that's standing up a village specifically for artificial intelligence just to educate, provide examples and demonstrations on how AI can be used for both good and bad. So whether you're a practitioner, a decision maker, or just a casual observer, you can walk away with at least some understanding from the people who use it day in and day out on kind of the effect that it could have within your life.
Starting point is 00:29:34 Let me ask you this. Years ago, Carl Sagan, the famous scientist, he had what he called his baloney detection kit, which was a way to detect if someone was trying to fool you, trying to pull one over on you. Do you have any recommendations along those lines for folks who are trying to cut through the marketing noise when it comes to this stuff? And any guidance for if you really want to learn about this, but you want to not be fooled by the marketing, what's a good approach to that? That's actually a great question and one that probably doesn't get asked enough. But yeah, I would never recommend walking in blind faith.
Starting point is 00:30:17 And I would imagine that most people listening to this would take that same approach. My advice would be for any vendor that claims machine learning, AI techniques, it really starts with data. Try to get a better idea of where their data is coming from. If it's a closed source and they can't talk about it, get them to talk about the number of samples they have or the diversity of that data. Because bias can creep up very, very quickly in
Starting point is 00:30:46 these situations. If they only have malware and that malware is specifically Russian and Chinese, then the first time somebody at your company downloads a piece of software with a Russian or Chinese language pack that's completely benign, it will likely get flagged. That's just the nature of bias. And that bias exists within ourselves, and it exists within the realm of models and machine learning. So data is a big thing. Another big thing is trying to understand how they're training that data, the models they're using, and then how often. Just because of how rapid and dynamic the information security space is, particularly from an attacker perspective, that shift in speed of attacks and discrepancy in attacks can lead to models that aren't trained very often becoming very stale,
Starting point is 00:31:41 leading to bypass and things like that. So I would say trying to determine whether or not the machine learning pipeline is mature in the sense that it's trained consistently on fresh data. They're accounting for things like old data sloughing off and not being useful anymore. Those two things are very, very important and could at least provide some sort of background to you in feeling a little bit more confident in whether or not you believe kind of the spiel that you're being pitched. Our thanks to Bobby Filer from Endgame for joining us. The research we discussed today is titled The Malicious Use of Artificial Intelligence, Forecasting, Prevention, and Mitigation.
Starting point is 00:32:25 We've got a link to the research paper in the show notes of this episode. And now, a message from Black Cloak. Did you know the easiest way for cyber criminals to bypass your company's defenses is by targeting your executives and their families at home? Black Cloak's award winning digital executive protection platform secures their personal devices, home networks, and connected lives. Because when executives are compromised at home, your company is at risk. In fact, over one third of new members discover they've already been breached. Protect your executives and their families 24-7, 365, with Black Cloak.
Starting point is 00:33:14 Learn more at blackcloak.io. The Cyber Wire Research Saturday is proudly produced in Maryland out of the startup studios of Data Tribe, where they're co-building the next generation of cybersecurity teams and technologies. Our amazing CyberWire team is Elliot Peltzman, Puru Prakash, Stefan Vaziri, Kelsey Bond, Tim Nodar, Joe Kerrigan, Carol Terrio, Ben Yellen, Nick Valecki, Gina Johnson, Bennett Moe, Chris Russell, John Petrick, Jennifer Iben, Rick Howard, Peter Kilby, and I'm Dave Bittner. Thanks for listening.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.