CyberWire Daily - Three pillars of Artificial Intelligence. [Research Saturday]
Episode Date: May 12, 2018Bobby Filar is a Principal Data Scientist at Endgame, and coauthor of the research paper, The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation. The report surveys th...e landscape of potential security threats from malicious uses of AI, and proposes ways to better forecast, prevent, and mitigate these threats. Bobby Filar joins us to discuss the paper, and his views on the evolving role of AI in cybersecurity. The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
You're listening to the Cyber Wire Network, powered by N2K. data products platform comes in. With Domo, you can channel AI and data into innovative uses that
deliver measurable impact. Secure AI agents connect, prepare, and automate your data workflows,
helping you gain insights, receive alerts, and act with ease through guided apps tailored to
your role. Data is hard. Domo is easy. Learn more at ai.domo.com.
That's ai.domo.com.
Hello, everyone, and welcome to the CyberWire's Research Saturday.
I'm Dave Bittner, and this is our weekly conversation with researchers and
analysts tracking down threats and vulnerabilities and solving some of the hard problems of
protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us.
And now, a message from our sponsor, Zscaler, the leader in cloud security.
Enterprises have spent billions of dollars on firewalls and VPNs, yet breaches continue to rise by an 18% year-over-year increase in ransomware attacks
and a $75 million record payout in 2024.
These traditional security tools expand your attack surface with public-facing IPs that are exploited by bad actors
more easily than ever with AI tools.
It's time to rethink your security.
Zscaler Zero Trust plus AI stops attackers
by hiding your attack surface,
making apps and IPs invisible,
eliminating lateral movement,
connecting users only to specific apps,
not the entire network,
continuously verifying every request
based on identity and context,
simplifying security management
with AI-powered automation,
and detecting threats using AI
to analyze over 500 billion daily transactions.
Hackers can't attack what they can't see.
Protect your organization with Zscaler Zero Trust and AI.
Learn more at zscaler.com slash security.
I think the bulk of the researchers were brought together after a conference that occurred December 2016.
That's Bobby Feiler. He's a principal data scientist at Endgame.
The research we're discussing today is titled The Malicious Use of Artificial Intelligence, Forecasting, Prevention, and Mitigation.
use of artificial intelligence, forecasting, prevention, and mitigation.
There was kind of a formal panel. Some of the conversation was laid out about the needs or requirements to establish a little bit more rigor within the AI research community, to establish
norms, kind of garner some sort of aspects of safety and ethics. Over the next six months,
I believe they kind of laid the groundwork coming together
with the three pillars, kind of the political, digital, and physical security aspects.
Myself and Hiram Anderson, another data scientist here at Endgame, were brought in to specifically
contribute to the cybersecurity or information security component. And then it was a lot of, hey, let's try to get together and meet virtually 22 opinions
and try to figure out what the common themes are going to be, what we want this to read like.
We didn't want it to be kind of a klaxon or call to arms about any sort of robot apocalypse. We wanted it to be very pragmatic, a very thoughtful approach
that was more policy-driven than kind of the pop culture
mainstream media was reporting on.
The Oxford and Cambridge researchers, kind of the principal researchers,
once everything was written up and we felt comfortable with it,
took it and kind of brought us across the finish line with editing.
There was a ton of help from OpenAI as far as getting some of the news out there and
PR.
Yeah, I think it's a remarkably accessible report.
You know, my hat's off to you.
Let's start, though, I think maybe the most fundamental question here is,
at the outset, defining artificial intelligence and machine learning. I think particularly
in the cybersecurity world, and when it comes to marketing in particular, I think those terms have
gotten a little bit fuzzy. So can you help, for the purposes of this paper? How did you approach those definitions?
I mean, first, there is no way marketing has ever overstated what AI is. That seems
implausible to me. I think there's a pretty big misunderstanding, and it's a byproduct of
Hollywood, our favorite TV shows and things like that about what AI really is. For me growing up, it was, you know,
things like RoboCop, which was super exciting. And then later iRobot and all of this fun stuff.
Lieutenant Commander Data.
Exactly. Exactly. So there were all these references in pop culture about what AI
was supposed to be. When in reality, it can be much more mundane.
I think we've seen recently there's been some more interesting aspects.
But its core, it's really just kind of statistics in machine learning.
And for the folks listening that don't really understand machine learning,
there are three main pillars.
that don't really understand machine learning, there are three main pillars.
This concept of supervised learning, where you have examples of something with some sort of label,
and then you train a model to recognize it.
A good example is like spam detection.
Unsupervised learning is where you have a bunch of disparate data, and there are no labels, and you attempt to cluster them together based on commonalities in hopes to deriving some sort of information.
And a lot of times that's used to derive labels.
So you could look at things like economic factors, locations, schools and all of this and attempt to group together and categorize people or districts into, you know, red versus blue, rich versus poor,
things like that. And then the final one, and one that gets referenced, I think quite a bit
in the report is this idea of reinforcement learning, where you're treating whatever
problem you're trying to solve like a game, and you're allowing an algorithm to try to figure out the
best way to solve a problem on its own. And then based off of some sort of reward function or
feedback loop, it takes that information, adjust its parameters accordingly, and then tries again.
And it does this instead of, you know, five or six times, like it would take us to learn how to
shoot a basketball or, you know, swing a golf club. It
does it millions of times in a very small setting is a way to perfect a particular approach.
So let's go through, you all lay out a general framework for AI and security threats. Can you
take us through what you discussed there? Yeah. So at its core, we try to focus on kind of three pillars,
the political spectrum, the physical spectrum,
and then the cyber spectrum.
I think for most people,
the things that you would hear about
or are hearing about are on the political side,
particularly right now with things like Cambridge Analytica. Um, there was a
more, uh, I think humorous one that occurred with, uh, Jordan Peele from the comedy group,
Key and Peele, where he made a, uh, Obama lip syncing video where he, he basically used AI,
uh, and some of these algorithms that are readily available now to people who are
non-practitioners or non-PhDs and basically created a video that allowed him to put forth
a script that it seemed like Obama was reading from that said the AI apocalypse was coming
and all that fun stuff.
So you're starting to see more and more of
that. Another one that kind of came out that I think will have interesting ramifications down
the road is this idea of deep fakes. So the ability to more or less morph any picture and
overlay it with a picture of anybody you know or anybody you're interested in seeing.
And there's a lot of potential safety and security concerns there
where basically if you make somebody angry,
all of a sudden your face could be on an inappropriate picture.
There's blackmail concerns and things like that.
And I think what you're starting to see is that sort of accessibility to AI and the lower cost of entry to using it is going to lead to both really good positive
breakthroughs and with that, the exploitation of those for more nefarious means.
Well, let's explore that specifically. I mean, the cost issue. People that I've interviewed
over the past year or so, they've said a lot of times the bad guys when it comes to cyber attacks, they have shied away from AI because it's been expensive and there are cheaper things that work.
You know, just mass spamming or phishing campaigns or things like that. So one of the things that this paper points out is that the cost of these tools is decreasing and the availability of these tools is increasing.
Yeah, yeah. And I think for the most part, the researchers you've spoken to are absolutely correct.
And that's certainly one thing that we try to emphasize in this report, that there have been no outright uses within cybersecurity of AI being used.
there have been no outright uses within cybersecurity of AI being used. Researchers within the InfoSec community have come up with a variety of use cases, and it's certainly plausible.
I think we're still at that stage where, like you said, it's just a little bit easier right now to
take the low-hanging fruit approach because it is still effective. But I think what you'll see is the advent or use of machine learning and AI in defensive technologies within cybersecurity
will lead to a little bit more generalized approach to tackling the threat landscape.
And as those little pockets are shored up, it will require the attacker to become a little bit more sophisticated.
And I think they'll look to the very tools that we're employing as an opportunity to attack.
So it's kind of a cat and mouse game, if you will, or in its truest form, a red versus blue between AI algorithms and the people they are meant to stop
where they will use those those exact algorithms against us so i think as far as the the cost
concern or resource concern is is being considered you have platforms right now and there are dozens
of them that make programming ai models and algorithms as easy as like a dozen lines of code
in things like Python. You don't necessarily need a math PhD if you want to become more familiar
about the underlying concepts. The openness of the AI community and particularly the educational
aspects, things like massively open online courses have made the concepts that
much easier to understand as well. So between those two, and then just the overall availability
of models that are pre-trained, you don't really need to know anything about, which is like the
Jordan Peele case, where he just grabbed a lip sync model from the internet
and then use that to his own ends. You're talking about coming up with an idea,
getting in a super advanced piece of technology that didn't exist five years ago, and then turning
it into a piece of political propaganda, you know, and you're done before lunch. That to me is utterly fascinating
that this sort of thing can transpire so quickly
and so easily that at its core
is one of the things that we're trying to get across
in the report.
I think a good case study that was mentioned
was one in the InfoSec section
about a company and two researchers from ZeroFox. They do
social media analysis and things like that actually up in Baltimore, where you're at.
A good group of guys, they had this idea that, hey, what if we started reading people's tweets
and then we used a generative model? So something you see things like AI that can generate Harry Potter stories.
It's kind of the same concept. You train it on a subset of your tweets and then you feed it a
little seed, like a topic, and then it produces 140 characters that seem semi-realistic of something
you'd be interested in or something that you may have even tweeted at one time. So they did that and then they slapped a URL with a, I think a Google shortener application
and then fed it to a bunch of their friends to see who would click on it. It was amazing,
the effectiveness. I mean, you're talking going from like five to 10% effectiveness to 60 to 70%
effectiveness, all with, you know, with applying a little bit of data collection
and a few hours of Python programming.
And I think when it comes down to those numbers,
that cost is low enough where attackers will start considering
that as a potential means to an end.
So being able to put something in a familiar voice by training, I guess, the stylistic
specifics of that voice.
Yeah, it's, you know, spearfishing is obviously super successful, but that requires a lot
of manual work.
You're talking potentially several hours of OSINT work, fully understanding, you know,
maybe the look or feel of a particular password reset email or a Capital One credit card statement
or anything like that. Suddenly, if you have access to data, which so many of us put Facebook,
Twitter, Instagram, things like that, just
out there readily available, not only our own, but we also show our preferences based off of
who we follow. We become very susceptible at that point to, you know, uh, you see this with Twitter,
uh, with like promoted tweets is I set out to RSA last week. Every single promoted tweet I got was from a vendor with
hashtag RSAC. Right. And it's like, I am moderately interested in this because I am in the area.
I will most likely walk by this vendor. And it is interesting to see whatever marketing language
they're using. Now, imagine that with a non-information security professional,
and it's about their favorite basketball team,
their knitting club that they're associated with,
something super specific,
but it really required the attacker
not to know anything like that,
just to download your tweets
and then pass it through an algorithm.
I think that's super interesting
from a research standpoint,
and it's kind of scary
from just an everyday layman standpoint.
Yeah. And one of the things that the research points out is the ability, the increasing
ability of these systems to create synthetic images basically from scratch. And it tracks
sort of the system's ability to create realistic looking human faces.
And we're at the point now where a synthetic image looks like a photorealistic image of a face.
And it strikes me that from a political point of view, from a societal point of view,
that if we hit the point where photographic evidence is no longer photographic evidence,
and as you say, the generation of these can be done automatically in a
matter of minutes rather than, you know, someone having to spend a lot of time in Photoshop or
cutting and pasting and so forth. Well, that kind of changes the game, doesn't it?
Yeah, yeah, it really does. And, you know, towards the end of our report, our call to action
is, you know, it's one part vigilance, one part openness
and taking responsibility. But a huge kind of third component is education. And that education
comes in the form of education towards the overall population, but specifically policymakers,
so that they're aware of kind of the various side effects of this sort of
dual use, dual nature technology.
And I think if you or any of the listeners were paying attention during the Zuckerberg
questioning on the Hill a week or two ago, you can start to see that like, you know,
these congressmen and congresswomen are starting to have to tackle these problems that are, by and large, very, very technical and very sophisticated.
And it's one thing when it's data collection, which is kind of a very hard concept and relatively straightforward to understand.
Data collection, unbeknownst to the user, is a bad thing.
Data collection, unbeknownst to the user, is a bad thing.
It's a whole nother when it's like political propaganda being created and distributed through sites like Twitter and Facebook and Instagram that are indistinguishable from everyday reality. I think that's a more terrifying sort of process.
And I think that is something that policymakers are going to be made aware of
in the near term. To that point about, you know, the Facebook grilling from the members of Congress,
you know, it strikes me that I guess the argument could be made that the fact that
the policymaking is a slow, deliberative process, you can make the argument that for a long time,
that's a feature, not a bug.
But as the rate of change increases,
the velocity increases when it comes to the developments
in things like AI,
I wonder, is policy always going to lag?
And do we have to make sort of fundamental changes
to just be able to keep up?
Yeah, that's an interesting question. And do we have to make sort of fundamental changes to just be able to keep up?
Yeah, that's an interesting question.
And certainly one that was kicked around quite a bit in the chat rooms and Google Docs that the researchers of this report were talking about. what we as researchers need to do and need to be more open to is, and this isn't to say that we're not doing an okay job right now. It's just like anything we can do better. And that's just
being more open with a lot of the research that we're doing, red teaming it. There's a, there's
a huge component of that. I think in the report, both anecdotes, stories, as well as the recommendation to do this.
And a good example is kind of next generation antivirus, which is something that I'm sure you're familiar with.
It's a big marketing term, obviously. But, you know, these are platforms that are meant to eliminate the need for signature-based AV with the expectation that you can get out ahead of threats, which is fantastic.
It's proven to be very successful.
It's an arduous process that requires massive amounts of samples correctly labeled.
But at the end of the day, it's still a byproduct of the data it sees.
So even if it generalizes very well and can pick up on little nuances here and there,
it's still very prone to attack. And with things like VirusTotal, where you can kind of submit a
sample and then see a broad spectrum of vendors and how they respond to the sample you submit. There's been research and myself and
Hiram Anderson teamed up with UVA to do this very thing, which is, could you use AI kind of against
AI? Is this possible? And we set up kind of this game that could be thought of is like,
you know, when you were in college and you tried to sneak into a bar.
You showed up the first time and you wore a hat.
And you're like, this hat makes me look older.
You tried it and the bouncer was like, no, no, no, no.
That's not going to work.
And then you're like, you know what?
I bet if I grew out my beard a little bit, that would help.
Go back again.
It's like, better, but no.
And then finally you're like, uh, well, maybe I just
need to get a fake ID.
And then once you get the fake ID, you're in,
and you're like, oh, that was the solution all along.
That was the hole in the process.
So it's kind of the same idea with this attacking
next gen AV with kind of this reinforced game with malware, where you take a piece of malware and you throw it at next gen AV with, with kind of this reinforced game, uh, with malware, where you take
a piece of malware and you throw it at next gen AV, it spits back good or bad. Some next gen AVs
are a little bit more helpful and they spit back a number of like a confidence score or a probability
of maliciousness. And this helps even more because you can start to understand the ebb and flow of you making a decision or altering one piece of or part of code and the effect that that has on the score.
So if you can do that enough, you can start to learn like, oh, well, if I pack the binary and then scrub strings and do this or change it to Russian language pack, then all of a sudden I can get past.
And it's all of a sudden you learn a recipe for bypass.
And this is a complicated process, certainly.
It's one that requires a little bit of overhead,
a little bit of resources.
But at the end of the day, if you're an attacker
and you have one shot to get it right,
what better approach is there than to have access
to all this information offline, craft this perfect piece of malware using artificial
intelligence, and then suddenly you're through. And those are the sorts of concerns, particularly
from like a next gen AV standpoint that not only researchers need to be made aware of, but consumers and
politicians and things like that.
And I think that is a very closed example within InfoSec that could be propagated across
the physical and political security spectrum as well.
There's certainly this aspect of red teaming that needs to occur and then reporting back to policymakers
so we perform some sort of due diligence on that end. Yeah and I think you know in your example
the other thing that happens is you know word gets around that the fake ID is the way into that bar
so you have fewer people trying hats and growing mustaches. Right, right. As somebody who
works on this problem, again, a lot of this stuff is very, very fascinating because it's all
conceptual right now, but it's very easy to look down the road, you know, 12 to 24 months from now
and start to understand like, yeah, there's a process that could occur. And depending on
whether or not attackers start to believe that the juice is worth the squeeze, we may start seeing
that. So the onus is on us to kind of eat our own dog food and just like red teaming any other
security tool, take the results from that and empower the product itself.
We do a lot of things where we propose adversarial training
where you're generating like all of these instances
of kind of morphed malware and then saying like,
well, just because this doesn't look like the malware
it once was, that doesn't mean that it's any less bad.
So now let's train less bad. So now
let's train on that. So it's at least seen this sort of change in behavior. So we catch it the
next time. And it's all about, yeah, staying ahead, making sure model drift doesn't occur,
that the models don't become stale, just trying to stay as current and up to date as possible.
Now, looking ahead, you're looking down the road,
what were some of the conclusions from the group? Does it seem like appropriate attention is being paid to this? Is there hope? Is it gloom and doom? Where did you all land with that?
It's a little bit of both. And to be honest, this was something where researchers were split in a lot of ways about, you know,
is it all bad? Is everything fine? I think for the most part, AI is always going to get
kind of a negative connotation. And there's plenty to blame from that. And maybe Hollywood is...
We've all seen the Terminator.
Exactly. But at the end of the day ai is being used every day for for
things to make our lives easier as well self-driving cars is obviously one where there have been
unfortunate accidents that have occurred and clearly we're not there yet but once we are
you're talking about a fantastic opportunity to to help you know increase the efficiency of our highways and commutes and things like that.
There's also things like AI being used to identify medical problems through x-rays,
using computer vision. These are all very, very good things that are occurring. And very few
people can say that that isn't the case. But that being said, any technology like this can be abused.
It can be morphed and kind of twisted to some sort of, as I said earlier, kind of nefarious end.
And those are the things that we need to be more considerate about. And I think,
and the report goes into this as well, out of all those spaces, physical, political, and information security, we should be looking at the information security first as kind of a standard on how to handle this.
Because the information security community has been having to deal with these technologies being morphed and twisted to nefarious ends for a long time.
And yeah, we don't have it down to a science on how to handle it best,
but we have attempted to do things like disclosure, best practices,
things like that, red teaming, have all become kind of mainstays.
And yeah, it's not perfect, but it at least provides a roadmap
of what could and. And yeah, it's not perfect, but it at least provides a roadmap of what could and
should be done, particularly within this kind of new, new fangled space of, of AI.
Yeah. It's an opportunity for us to lead the way.
Right. And I think a lot of the researchers that we worked with were political and physical
security researchers. And, and they were the, they were kind of the first ones to come to this and say like,
well, you guys have things like vulnerability disclosure. And I'm like, yeah, yeah, we do.
And it works and it's reasonably effective. And we have things like bug bounties and
we try to open up some of our tools to the community, but we could obviously do better.
to the community, but we could obviously do better. And I think things like explainability,
interpretability of what these AI models are doing in each of these fields, information security is certainly one. Like, why did you call this binary bad? I think we need to be personally accountable
from an ethics standpoint, in a safety standpoint. So people start to understand why things are occurring
and why they're not.
And I think that's something that you'll see researchers
and security vendors in particular take more seriously
in the next 12 to 18 months.
I mean, I think one thing,
and this would be more of a shameless plug,
but some of your viewers will likely be attending
Black Hat and DEF CON this
year. If you're interested in the AI aspects of how it's being employed or deployed in information
security, go to those conferences and ask around. Talk to vendors. If you're at a vendor booth,
try to grab a technical person. I know we at Endgame try to supply researchers or data scientists. DEF CON this
year, I'm part of a committee that's standing up a village specifically for artificial intelligence
just to educate, provide examples and demonstrations on how AI can be used for both good and bad.
So whether you're a practitioner, a decision maker, or just a casual observer, you can walk away with at least some understanding from the people who use it day in and day out on kind of the effect that it could have within your life.
Let me ask you this.
Years ago, Carl Sagan, the famous scientist, he had what he called his baloney detection kit, which was a way to detect if someone was trying
to fool you, trying to pull one over on you. Do you have any recommendations along those lines
for folks who are trying to cut through the marketing noise when it comes to this stuff?
And any guidance for if you really want to learn about this, but you want to not be fooled by
the marketing, what's a good approach to that?
That's actually a great question and one that probably doesn't get asked enough.
But yeah, I would never recommend walking in blind faith.
And I would imagine that most people listening to this would take that same approach.
My advice would be for any vendor that claims machine learning, AI techniques,
it really starts with data.
Try to get a better idea of where their data is coming from.
If it's a closed source and they can't talk about it,
get them to talk about the number of samples they have
or the diversity of that data.
Because bias can creep up very, very quickly in
these situations. If they only have malware and that malware is specifically Russian and Chinese,
then the first time somebody at your company downloads a piece of software with a Russian
or Chinese language pack that's completely benign, it will likely get flagged. That's just the nature of bias. And that bias exists within ourselves, and it exists within the realm of models and
machine learning. So data is a big thing. Another big thing is trying to understand
how they're training that data, the models they're using, and then how often.
Just because of how rapid and dynamic the information security space
is, particularly from an attacker perspective, that shift in speed of attacks and discrepancy
in attacks can lead to models that aren't trained very often becoming very stale,
leading to bypass and things like that. So I would say trying to determine whether
or not the machine learning pipeline is mature in the sense that it's trained consistently on
fresh data. They're accounting for things like old data sloughing off and not being useful anymore.
Those two things are very, very important and could at least provide some sort of background
to you in feeling a little bit more confident in whether or not you believe kind of the spiel that
you're being pitched. Our thanks to Bobby Filer from Endgame for joining us. The research we
discussed today is titled The Malicious Use of Artificial Intelligence, Forecasting, Prevention,
and Mitigation.
We've got a link to the research paper in the show notes of this episode.
And now, a message from Black Cloak.
Did you know the easiest way for cyber criminals to bypass your
company's defenses is by targeting your executives and their families at home? Black Cloak's award
winning digital executive protection platform secures their personal devices, home networks,
and connected lives. Because when executives are compromised at home, your company is at risk.
In fact, over one third of new members discover they've already been breached.
Protect your executives and their families 24-7, 365, with Black Cloak.
Learn more at blackcloak.io.
The Cyber Wire Research Saturday is proudly produced in Maryland out of the startup studios of Data Tribe,
where they're co-building the next generation of cybersecurity teams and technologies.
Our amazing CyberWire team is Elliot Peltzman, Puru Prakash, Stefan Vaziri, Kelsey Bond,
Tim Nodar, Joe Kerrigan, Carol Terrio, Ben Yellen, Nick Valecki, Gina Johnson, Bennett Moe,
Chris Russell, John Petrick, Jennifer Iben, Rick Howard, Peter Kilby, and I'm Dave Bittner. Thanks for listening.