CyberWire Daily - Can ransomware turn machines against us? [Research Saturday]
Episode Date: February 4, 2023Tom Bonner and Eoin Wickens from HiddenLayer's SAI Team to discuss their research on weaponizing machine learning models with ransomware. Researchers at HiddenLayer’s SAI Team have developed a pro...of-of-concept attack for surreptitiously deploying malware, such as ransomware or Cobalt Strike Beacon, via machine learning models. The attack uses a technique currently undetected by many cybersecurity vendors and can serve as a launchpad for lateral movement, deployment of additional malware, or the theft of highly sensitive data. In this research the team raising awareness by demonstrate how easily an adversary can deploy malware through a pre-trained ML model. The research can be found here: WEAPONIZING MACHINE LEARNING MODELS WITH RANSOMWARE Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
You're listening to the Cyber Wire Network, powered by N2K. data products platform comes in. With Domo, you can channel AI and data into innovative uses that
deliver measurable impact. Secure AI agents connect, prepare, and automate your data workflows,
helping you gain insights, receive alerts, and act with ease through guided apps tailored to
your role. Data is hard. Domo is easy. Learn more at ai.domo.com.
That's ai.domo.com.
Hello, everyone, and welcome to the CyberWire's Research Saturday.
I'm Dave Bittner, and this is our weekly conversation
with researchers and analysts
tracking down the threats and vulnerabilities,
solving some of the hard problems of protecting ourselves
in a rapidly evolving cyberspace.
Thanks for joining us.
So we've been investigating the sort of machine learning attack surface for some time now,
and we realized quite early on that there was some low-hanging fruit
in terms of being able to execute code through machine learning models.
Joining us this week are Tom Bonner and Owen Wickens.
They're from Hidden Layer's SIA team, the Synaptic Adversarial Intelligence team.
The research we're discussing today is titled,
Weaponizing Machine Learning Models with Ransomware.
As part of that, we were looking at ways
in which malware could be deployed through models.
That's Tom Bonner.
I think it's probably good to preface it with the fact that ML is being used
in nearly every vertical these days.
That's Owen Wickens.
There was a recent survey that surveyed CEOs,
and they said about 86% of them had said that ML is part of their critical business function.
And with that, we were thinking, how much of an attack vector is this?
So we've been researching this now for the last six, seven, eight months.
And we've been finding that there's particular weak points within machine learning that people just haven't really considered or haven't looked at yet. Because I suppose with any new
technology, it tends to race on ahead of security consideration. So with models themselves,
training is a huge cost, right? I mean, financially, as well as time, as well as
processing capability. And to solve this, people use pre-trained models.
And pre-trained models are essentially the result of this massive computation,
and they can be shared freely and easily.
And there's actually a huge community built around the open source sharing of models,
very similar to open source software.
But with that has come a little bit of, I suppose, lax scrutiny in that models can be hijacked and code can be inserted in them.
And I suppose that's kind of where we are with the research that we put out in that we're trying to really shine the light on the fact that these models can be abused so readily and have been able to be abused for so long.
Well, Tom, let's go through this together then.
Take us through step by step.
I mean, how did you go about this exploration here?
So the first thing we looked at was a very popular machine learning library called PyTorch.
It's used for quite a lot of text generation models, image classifiers, things like that.
And under the hood, it's storing its data using a format called Pickle.
This is part of the Python library for serializing data.
Now, unfortunately, there's been a big red warning box in the Pickle documentation
for probably about the last 10 years saying,
do not use this if you do not trust the source of the data, because you can embed executable code in a pickle file.
Now, I think we'd known about this for quite some time, but we just sort of wanted to take it to its logical extreme, if you will.
we just sort of wanted to take it to its logical extreme, if you will. So we looked at ways of abusing the pickle file format to execute arbitrary code. And also, we looked at ways in
which we could then embed and hide the malware in a model as well. So we ended up using an old
technique called steganography for embedding secret messages into other kind of plain text
messages would be the original form. Now, in this case, we actually targeted what are called the
weights and biases in a model, so perhaps more colloquially known as neurons. And by targeting
the neurons in the model, we were able to change them very slightly
and embed a malicious payload in there in such a way that wouldn't really affect the efficacy
of the machine learning model at all. And then using Pickle to execute arbitrary code,
when the model is loaded, we can reconstruct the malware payload from the neurons and execute it.
can reconstruct the malware payload from the neurons and execute it.
So what this means is that the model itself, it looks as normal, really.
It loads and runs as normal.
But when it is loaded up on a data scientist system or up in the cloud, wherever you're deploying this pre-trained model,
it's going to automatically execute malware upon load.
Now, what is the normal amount of scrutiny that a model like this would get
from a security point of view?
If someone is using this, is deploying it,
to what degree do they trust it out of the box?
That's a very good question.
And really the sort of crux of the problem is that
most security software is not really looking too deeply into machine learning models these days.
There are a lot of what are called model zoos, which are online repositories where people can
share their pre-trained models, places like Hugging Face or TensorFlow Hub. And I think data scientists are quite used to just downloading a
model, loading up on their machine, loading it up on a sort of cloud or AWS instance,
without really doing any sort of security checks to see if it's been tampered with or
subverted in a malicious manner. So yeah, really, this is why we took things to such an extreme
was to highlight that malicious code can quite easily be embedded in these things
and automatically executed when you load them.
And now a message from our sponsor, Zscaler, the leader in cloud security.
Enterprises have spent billions of dollars on firewalls and VPNs,
yet breaches continue to rise by an 18% year-over-year increase in ransomware attacks and a $75 million record payout in 2024.
increase in ransomware attacks and a $75 million record payout in 2024, these traditional security tools expand your attack surface with public-facing IPs that are exploited by bad actors more easily
than ever with AI tools. It's time to rethink your security. Zscaler Zero Trust plus AI stops
attackers by hiding your attack surface, making apps and IPs invisible,
eliminating lateral movement,
connecting users only to specific
apps, not the entire network,
continuously verifying
every request based on identity
and context, simplifying
security management with AI-powered
automation, and detecting
threats using AI to analyze
over 500 billion daily transactions.
Hackers can't attack what they can't see. Protect your organization with Zscaler Zero Trust and AI.
Learn more at zscaler.com slash security.
Oh, and we've seen this sort of thing on GitHub as a supply chain issue where, you know, somebody can have a repository there, something gets changed, people are relying on it, and the change, the malicious change makes it sway into people's production pipeline. Is this the same sort of thing you're
imagining here, where someone would surreptitiously insert something into one of these models and
it goes undetected? Yeah, absolutely. I think it's very similar to your traditional supply
chain attack. And I suppose the limitations of such an attack are really up to the imagination of the attacker.
This can spread out to be an initial access point.
It could spread to be a source of lateral movement.
You could deploy other malware, have a remote backdoor for repeat access into the environment.
And I think what makes these attacks kind of a little bit, not more dangerous than traditional attacks,
but just as, is that often with models,
you'll have a lot of access to training data.
And that training data may contain
personal identifiable information.
Or they'll also have access to the model binary themselves
or other models that have been trained
within the environment.
And in that instance, if you've been training a model for the last couple of months and
that's had a lot of your sensitive data go into it, if that gets stolen, that could be
a huge financial cost as well as quite a large, I suppose, losing your advantage really against
other companies if that's taken.
And obviously, there's the potential for things to be ransomed back as well.
And, you know, basically following the kind of more traditional cybersecurity attack format that we've seen in the past.
And I would just add to that as well, that there's very little in terms of sort of integrity checking or signing around models.
So yeah, from a supply chain perspective,
it's pretty scary.
It would be very easy for an attacker
to subvert a model
and a reputable vendor
could end up distributing it downstream
to their clients
and nobody would really be able
to know or tell at the moment.
And what degree of technical proficiency or sophistication would be required to have the
skills to be able to do something like what you all have outlined here in your research?
It's actually quite low. So the technical skills required, I would say right now,
this is pretty much in the domain of script kit is to be able to pull off.
We released some tooling to do this, but we're by no means the first.
There are others who've released tools for targeting the pickle file format,
for performing steganography on neural networks and ML models.
So really, it's just a case of stringing together
the right commands these days
and inserting your malicious payload.
It's not an awful lot of skill for an attacker.
And Owen, in terms of detecting this,
are there tools that will do this
or techniques that you all can recommend?
There is.
There's been research into securing the pickle
file format in the past because it's inherently vulnerable.
One of those is fickling by trail of bits. They've put in
very good work really into detecting abuse of pickle
files. But there's also a whole host of other ways that pickles
can be abused as
well. And that can potentially be another major pitfall. So I suppose it would be silly of me to
pass up the opportunity to say that that's something that we do look at inside Hidden
Layer is a way of scanning models and verifying integrity of them to ensure that they're not housing malicious payloads and such.
But other than that, it's not been extremely explored
within the industry as far as automated ways go
outside of tools such as Fickling.
Tom, are you aware of this sort of thing being exploited in the wild?
Have we seen any examples of this?
of this sort of thing being exploited in the wild?
Have we seen any examples of this?
We're just starting to uncover, yeah,
sort of in the wild attacks using these techniques.
Just recently, we've started to see common tools, things like Cobot Strike, Metasploit,
leveraging pickle file formats to execute code.
Again, going back to the fact that a lot of
antivirus and edr solutions aren't really monitoring pickle python and things like that very closely
we've seen a new framework recently as well called mythic and that allows to craft pickle payloads
that will automatically execute say shell code or a known binary. And from there, you can load up a C2
or some sort of initial access or initial compromise malware.
So what are your recommendations then?
I mean, for folks who may be concerned about this,
what sort of things can they put in place to protect themselves?
Well, first and foremost, do not load untrusted. In fact,
don't really load any machine learning models you've downloaded from the internet on your
corporate machine or in your very expensive cloud environment where it could potentially be hijacked
for coin mining or things like that. Aside from that, careful scrutiny of models is scanning the models for for malware
for payloads evaluating sort of the the behavior of models as well so we can use sandboxes for
example to to check the the behavior of a model when it's loaded and make sure it's not doing
things like spawning up cmd.exe to create a reverse shell.
And also for suppliers of models, looking into signing models so that we can verify the integrity
and even ensure they're not corrupt in any way.
We're sort of lacking basic mechanisms like that for models at the moment.
Owen, any final thoughts?
Probably is also worth mentioning that we did also release a Yara rule
for public consumption to detect a lot of different types
of malicious pickle files.
So that is something that we tried to provide people with today
so that they can look and scan their models.
Tom also touched on a really interesting point there,
the use of coin miners within production cloud computing environments.
I mean, if there's one thing those have access to,
it's vast amounts of GPU computational power.
And you can imagine with a lot of traditional attacks,
you'd see coin miners accidentally ending up as an initial stage
in, I suppose, in victim victim environments and you can imagine now
if they so happen to get into
a massive SageMaker instance or something like that
how much illicit
fortune could be made.
Our thanks to Tom Bonner and Owen Wickens from Hidden Layer for joining us today.
The research is titled, Weaponizing Machine Learning Models with Ransomware.
We'll have a link in the show notes. Thank you. ThreatLocker is a full suite of solutions designed to give you total control, stopping unauthorized applications, securing sensitive data, and ensuring your organization runs smoothly and securely.
Visit ThreatLocker.com today to see how a default-deny approach can keep your company safe and compliant. The Cyber Wire Research Saturday podcast
is a production of N2K Networks,
proudly produced in Maryland
out of the startup studios of DataTribe,
where they're co-building the next generation
of cybersecurity teams and technologies. This episode was produced by Liz Urban and senior producer Jennifer Iben. Thanks for listening.