CyberWire Daily - Can ransomware turn machines against us? [Research Saturday]

Episode Date: February 4, 2023

Tom Bonner and Eoin Wickens from HiddenLayer's SAI Team to discuss their research on weaponizing machine learning models with ransomware. Researchers at HiddenLayer’s SAI Team have developed a pro...of-of-concept attack for surreptitiously deploying malware, such as ransomware or Cobalt Strike Beacon, via machine learning models. The attack uses a technique currently undetected by many cybersecurity vendors and can serve as a launchpad for lateral movement, deployment of additional malware, or the theft of highly sensitive data. In this research the team raising awareness by demonstrate how easily an adversary can deploy malware through a pre-trained ML model. The research can be found here: WEAPONIZING MACHINE LEARNING MODELS WITH RANSOMWARE Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 You're listening to the Cyber Wire Network, powered by N2K. data products platform comes in. With Domo, you can channel AI and data into innovative uses that deliver measurable impact. Secure AI agents connect, prepare, and automate your data workflows, helping you gain insights, receive alerts, and act with ease through guided apps tailored to your role. Data is hard. Domo is easy. Learn more at ai.domo.com. That's ai.domo.com. Hello, everyone, and welcome to the CyberWire's Research Saturday. I'm Dave Bittner, and this is our weekly conversation with researchers and analysts
Starting point is 00:01:07 tracking down the threats and vulnerabilities, solving some of the hard problems of protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us. So we've been investigating the sort of machine learning attack surface for some time now, and we realized quite early on that there was some low-hanging fruit in terms of being able to execute code through machine learning models. Joining us this week are Tom Bonner and Owen Wickens.
Starting point is 00:01:43 They're from Hidden Layer's SIA team, the Synaptic Adversarial Intelligence team. The research we're discussing today is titled, Weaponizing Machine Learning Models with Ransomware. As part of that, we were looking at ways in which malware could be deployed through models. That's Tom Bonner. I think it's probably good to preface it with the fact that ML is being used in nearly every vertical these days.
Starting point is 00:02:19 That's Owen Wickens. There was a recent survey that surveyed CEOs, and they said about 86% of them had said that ML is part of their critical business function. And with that, we were thinking, how much of an attack vector is this? So we've been researching this now for the last six, seven, eight months. And we've been finding that there's particular weak points within machine learning that people just haven't really considered or haven't looked at yet. Because I suppose with any new technology, it tends to race on ahead of security consideration. So with models themselves, training is a huge cost, right? I mean, financially, as well as time, as well as
Starting point is 00:03:00 processing capability. And to solve this, people use pre-trained models. And pre-trained models are essentially the result of this massive computation, and they can be shared freely and easily. And there's actually a huge community built around the open source sharing of models, very similar to open source software. But with that has come a little bit of, I suppose, lax scrutiny in that models can be hijacked and code can be inserted in them. And I suppose that's kind of where we are with the research that we put out in that we're trying to really shine the light on the fact that these models can be abused so readily and have been able to be abused for so long. Well, Tom, let's go through this together then.
Starting point is 00:03:47 Take us through step by step. I mean, how did you go about this exploration here? So the first thing we looked at was a very popular machine learning library called PyTorch. It's used for quite a lot of text generation models, image classifiers, things like that. And under the hood, it's storing its data using a format called Pickle. This is part of the Python library for serializing data. Now, unfortunately, there's been a big red warning box in the Pickle documentation for probably about the last 10 years saying,
Starting point is 00:04:27 do not use this if you do not trust the source of the data, because you can embed executable code in a pickle file. Now, I think we'd known about this for quite some time, but we just sort of wanted to take it to its logical extreme, if you will. we just sort of wanted to take it to its logical extreme, if you will. So we looked at ways of abusing the pickle file format to execute arbitrary code. And also, we looked at ways in which we could then embed and hide the malware in a model as well. So we ended up using an old technique called steganography for embedding secret messages into other kind of plain text messages would be the original form. Now, in this case, we actually targeted what are called the weights and biases in a model, so perhaps more colloquially known as neurons. And by targeting the neurons in the model, we were able to change them very slightly
Starting point is 00:05:26 and embed a malicious payload in there in such a way that wouldn't really affect the efficacy of the machine learning model at all. And then using Pickle to execute arbitrary code, when the model is loaded, we can reconstruct the malware payload from the neurons and execute it. can reconstruct the malware payload from the neurons and execute it. So what this means is that the model itself, it looks as normal, really. It loads and runs as normal. But when it is loaded up on a data scientist system or up in the cloud, wherever you're deploying this pre-trained model, it's going to automatically execute malware upon load.
Starting point is 00:06:08 Now, what is the normal amount of scrutiny that a model like this would get from a security point of view? If someone is using this, is deploying it, to what degree do they trust it out of the box? That's a very good question. And really the sort of crux of the problem is that most security software is not really looking too deeply into machine learning models these days. There are a lot of what are called model zoos, which are online repositories where people can
Starting point is 00:06:37 share their pre-trained models, places like Hugging Face or TensorFlow Hub. And I think data scientists are quite used to just downloading a model, loading up on their machine, loading it up on a sort of cloud or AWS instance, without really doing any sort of security checks to see if it's been tampered with or subverted in a malicious manner. So yeah, really, this is why we took things to such an extreme was to highlight that malicious code can quite easily be embedded in these things and automatically executed when you load them. And now a message from our sponsor, Zscaler, the leader in cloud security. Enterprises have spent billions of dollars on firewalls and VPNs,
Starting point is 00:07:41 yet breaches continue to rise by an 18% year-over-year increase in ransomware attacks and a $75 million record payout in 2024. increase in ransomware attacks and a $75 million record payout in 2024, these traditional security tools expand your attack surface with public-facing IPs that are exploited by bad actors more easily than ever with AI tools. It's time to rethink your security. Zscaler Zero Trust plus AI stops attackers by hiding your attack surface, making apps and IPs invisible, eliminating lateral movement, connecting users only to specific apps, not the entire network, continuously verifying
Starting point is 00:08:13 every request based on identity and context, simplifying security management with AI-powered automation, and detecting threats using AI to analyze over 500 billion daily transactions. Hackers can't attack what they can't see. Protect your organization with Zscaler Zero Trust and AI. Learn more at zscaler.com slash security.
Starting point is 00:09:04 Oh, and we've seen this sort of thing on GitHub as a supply chain issue where, you know, somebody can have a repository there, something gets changed, people are relying on it, and the change, the malicious change makes it sway into people's production pipeline. Is this the same sort of thing you're imagining here, where someone would surreptitiously insert something into one of these models and it goes undetected? Yeah, absolutely. I think it's very similar to your traditional supply chain attack. And I suppose the limitations of such an attack are really up to the imagination of the attacker. This can spread out to be an initial access point. It could spread to be a source of lateral movement. You could deploy other malware, have a remote backdoor for repeat access into the environment. And I think what makes these attacks kind of a little bit, not more dangerous than traditional attacks,
Starting point is 00:09:47 but just as, is that often with models, you'll have a lot of access to training data. And that training data may contain personal identifiable information. Or they'll also have access to the model binary themselves or other models that have been trained within the environment. And in that instance, if you've been training a model for the last couple of months and
Starting point is 00:10:11 that's had a lot of your sensitive data go into it, if that gets stolen, that could be a huge financial cost as well as quite a large, I suppose, losing your advantage really against other companies if that's taken. And obviously, there's the potential for things to be ransomed back as well. And, you know, basically following the kind of more traditional cybersecurity attack format that we've seen in the past. And I would just add to that as well, that there's very little in terms of sort of integrity checking or signing around models. So yeah, from a supply chain perspective, it's pretty scary.
Starting point is 00:10:50 It would be very easy for an attacker to subvert a model and a reputable vendor could end up distributing it downstream to their clients and nobody would really be able to know or tell at the moment. And what degree of technical proficiency or sophistication would be required to have the
Starting point is 00:11:12 skills to be able to do something like what you all have outlined here in your research? It's actually quite low. So the technical skills required, I would say right now, this is pretty much in the domain of script kit is to be able to pull off. We released some tooling to do this, but we're by no means the first. There are others who've released tools for targeting the pickle file format, for performing steganography on neural networks and ML models. So really, it's just a case of stringing together the right commands these days
Starting point is 00:11:48 and inserting your malicious payload. It's not an awful lot of skill for an attacker. And Owen, in terms of detecting this, are there tools that will do this or techniques that you all can recommend? There is. There's been research into securing the pickle file format in the past because it's inherently vulnerable.
Starting point is 00:12:11 One of those is fickling by trail of bits. They've put in very good work really into detecting abuse of pickle files. But there's also a whole host of other ways that pickles can be abused as well. And that can potentially be another major pitfall. So I suppose it would be silly of me to pass up the opportunity to say that that's something that we do look at inside Hidden Layer is a way of scanning models and verifying integrity of them to ensure that they're not housing malicious payloads and such. But other than that, it's not been extremely explored
Starting point is 00:12:52 within the industry as far as automated ways go outside of tools such as Fickling. Tom, are you aware of this sort of thing being exploited in the wild? Have we seen any examples of this? of this sort of thing being exploited in the wild? Have we seen any examples of this? We're just starting to uncover, yeah, sort of in the wild attacks using these techniques.
Starting point is 00:13:17 Just recently, we've started to see common tools, things like Cobot Strike, Metasploit, leveraging pickle file formats to execute code. Again, going back to the fact that a lot of antivirus and edr solutions aren't really monitoring pickle python and things like that very closely we've seen a new framework recently as well called mythic and that allows to craft pickle payloads that will automatically execute say shell code or a known binary. And from there, you can load up a C2 or some sort of initial access or initial compromise malware. So what are your recommendations then?
Starting point is 00:13:55 I mean, for folks who may be concerned about this, what sort of things can they put in place to protect themselves? Well, first and foremost, do not load untrusted. In fact, don't really load any machine learning models you've downloaded from the internet on your corporate machine or in your very expensive cloud environment where it could potentially be hijacked for coin mining or things like that. Aside from that, careful scrutiny of models is scanning the models for for malware for payloads evaluating sort of the the behavior of models as well so we can use sandboxes for example to to check the the behavior of a model when it's loaded and make sure it's not doing
Starting point is 00:14:41 things like spawning up cmd.exe to create a reverse shell. And also for suppliers of models, looking into signing models so that we can verify the integrity and even ensure they're not corrupt in any way. We're sort of lacking basic mechanisms like that for models at the moment. Owen, any final thoughts? Probably is also worth mentioning that we did also release a Yara rule for public consumption to detect a lot of different types of malicious pickle files.
Starting point is 00:15:12 So that is something that we tried to provide people with today so that they can look and scan their models. Tom also touched on a really interesting point there, the use of coin miners within production cloud computing environments. I mean, if there's one thing those have access to, it's vast amounts of GPU computational power. And you can imagine with a lot of traditional attacks, you'd see coin miners accidentally ending up as an initial stage
Starting point is 00:15:42 in, I suppose, in victim victim environments and you can imagine now if they so happen to get into a massive SageMaker instance or something like that how much illicit fortune could be made. Our thanks to Tom Bonner and Owen Wickens from Hidden Layer for joining us today. The research is titled, Weaponizing Machine Learning Models with Ransomware. We'll have a link in the show notes. Thank you. ThreatLocker is a full suite of solutions designed to give you total control, stopping unauthorized applications, securing sensitive data, and ensuring your organization runs smoothly and securely.
Starting point is 00:16:52 Visit ThreatLocker.com today to see how a default-deny approach can keep your company safe and compliant. The Cyber Wire Research Saturday podcast is a production of N2K Networks, proudly produced in Maryland out of the startup studios of DataTribe, where they're co-building the next generation of cybersecurity teams and technologies. This episode was produced by Liz Urban and senior producer Jennifer Iben. Thanks for listening.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.