CyberWire Daily - Package manager repository malware detection. [Research Saturday]
Episode Date: August 3, 2019Researchers at Reversing Labs have been tracking malware hidden in software package manager repositories, and it's use as a supply chain attack vector. Robert Perica is a principal engineer at Reversi...ng Labs, and he joins us to share their findings. The research can be found here: https://blog.reversinglabs.com/blog/suppy-chain-malware-detecting-malware-in-package-manager-repositories Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
You're listening to the Cyber Wire Network, powered by N2K. data products platform comes in. With Domo, you can channel AI and data into innovative uses that
deliver measurable impact. Secure AI agents connect, prepare, and automate your data workflows,
helping you gain insights, receive alerts, and act with ease through guided apps tailored to
your role. Data is hard. Domo is easy. Learn more at ai.domo.com.
That's ai.domo.com.
Hello, everyone, and welcome to the CyberWire's Research Saturday.
I'm Dave Bittner, and this is our weekly conversation with researchers and
analysts tracking down threats and vulnerabilities and solving some of the hard problems of
protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us.
And now, a message from our sponsor, Zscaler, the leader in cloud security.
Enterprises have spent billions of dollars on firewalls and VPNs, yet breaches continue to rise by an 18% year-over-year increase in ransomware attacks
and a $75 million record payout in 2024.
These traditional security tools expand your attack surface
with public-facing IPs that are exploited by bad actors
more easily than ever with AI tools.
It's time to rethink your security.
Zscaler Zero Trust Plus AI stops attackers
by hiding your attack surface,
making apps and IPs invisible,
eliminating lateral movement, connecting users only to specific apps, not the entire network, continuously verifying
every request based on identity and context, simplifying security management with AI-powered
automation, and detecting threats using AI to analyze over 500 billion daily transactions.
Hackers can't attack what they can't see.
Protect your organization with Zscaler Zero Trust and AI.
Learn more at zscaler.com slash security.
So the idea behind supply chain attacks is that an attacker abuses a typical deployment vector,
such as an update mechanism, a third-party software download,
or perhaps infecting a package repository in the hope that an unsuspecting developer
might install a particular component and thereby infect his or her own machine.
That's Robert Pericha. He's a threat analyst and reverse engineer at Reversing Labs.
The research we're discussing today is titled Supply Chain Malware,
Detecting Malware in Package Manager Repositories.
So in this way, these components are really widespread as they affect multitude of users.
These components are really widespread as they affect multitude of users.
So what are some of the popular places that are repositories for these sorts of things?
Package repositories typically imply PyPI, RubyGems, NPM, and so on and so on.
But supply chain attacks are not related only to package repositories. They can affect, for example, in the CCleaner case, like popular third-party software distribution repository.
And so the notion here is that rather than creating software from scratch,
folks can go use these components, these building blocks, and plug them into their own projects.
Right.
So let's go through the work that you did here. Walk us through the analysis that you performed.
With supply chain attacks becoming more and more popular,
we were interested how hard would it be to find an example of such an attack
that was still in the wild.
Since there are several types of supply chain attacks,
we opted to survey package repositories first.
So first in line for review was PyPI,
and we modeled a couple of FIARA rules
based on publicly available reports and previous incidents, and then ran the entire PyPI repository
through our titanium platform processing engine to evaluate the rules. In the end, a couple of
packages stood out, and after manual review, we confirmed that they were related to previous
PyPI incidents but had not been removed in the cleanup action.
Now, one of the things you pointed out here is that typosquatting is a common
tactic?
Yes, and package repositories can get infected in a couple of ways. One of them
is for malicious actors to add additional code to known while-use packages. But this
is hard to achieve because popular packages usually go
through an extensive review process, for example, on GitHub through pull requests and so on.
However, in package repositories such as PyPI, people can upload or submit their own packages
with typosquatted names without any review process. So, for example, you type janga instead of jango, and you get a mistype
like a typosquat, and you rely on the unsuspecting user who will mistype the name and install the
malicious package. And what did you see in terms of the frequency of people falling for this?
It's a pretty common tactic. It's an extremely common tactic when it comes to URL typosquatting,
like redirecting to malicious URLs
and so on. But people mistype all the time, like they try to install all the packages from the
requirements.txt file through pip, but they forget to include the.txt file, the.txt extension.
And then you actually say, okay, pip installed the requirements package. And if a requirements
package has a malicious component within it, it will get installed. So yes, I'd say that this would be a common vector.
And so for the person who accidentally downloads the misspelled version of this,
what can they expect to happen? In this case, malware gets downloaded and installed.
But since it's not invoked from the setup.py script during the pip installation, it won't get executed out of the box.
Though it can be run as an executable file or by importing the malicious module and invoking
the malicious function, the malicious package will not run by itself.
The function itself contains an IP address, which has been offline for quite some time,
and from that IP address, the malicious
function downloads the second stage and persists it as a hidden file, modifies bashrc file to be
executed on every terminal or shell open. And that's basically it. We don't have any information
about what the second stage actually is or how widespread it really is.
Can you walk us through it? What was the process like when you ran this through your own engine to do the analysis?
What was going on there?
We modeled our detection rules based on previous incidents.
And we focused on the entire PyPI package repository.
The data set contained around 1.6 million files.
That amounts to around 2.6 terabytes of different files. And we essentially
just plugged in those YARA rules into our engine and ran the entire sample set. The entire run
lasted a bit shorter than a day. And at the end of the day, we had a bunch of matches on different
rules that we plugged in. And then we essentially manually reviewed them and found the offending sample.
So in terms of a percentage of what you found here, is this a relatively infrequent occurrence?
Yes, this is an infrequent occurrence.
This is not something that is commonly done due to the hardness factor, how hard it is to achieve something like this.
Although this script is extremely simple, and I expect to see much more of such attacks in the
future. When you say that the script is simple, what script are you referring to?
I'm referring to the actual malicious components being dropped. So the setup by script with
malicious communication and persistence.
So in terms of this being discovered, as you mentioned, there's no real mechanism for things like this to be scanned when these projects are uploaded. So it's really up to folks like you and other people to report them. I suspect that on the package repository side, perhaps some kind of a review process might be implemented.
Although due to the size of the entire repository and many of the people working there are volunteers, I doubt that that will happen at great scale.
One of the ways they can process such an amount of files is to buy such a platform like the one we have and continuously process all the files.
And of course, on the developer side, you actually have to check what you're downloading,
what you're installing, and so on.
So in terms of best ways for folks to protect themselves against this sort of thing,
what do you recommend?
When you're installing new packages, you could be on the lookout for suspicious network connections
and transfers and not initiated by BIP itself.
You can also be careful about what you type and how you type it.
It would be great if there was a way for public repositories to enforce some kind of content checks, like the continuous processing efforts.
However, that's probably not applicable.
not applicable. And on the developer side, for example, in large software companies and so on,
some kind of an approval of use modules would be nice. We haven't covered, for example, what other types of files we found in the entire sample set. So one would expect, for example,
that a Python package repository contains mostly Python files and perhaps text files.
A Python package repository contains mostly Python files and perhaps text files.
However, we found a couple of executable files for Windows, Linux, Mac OS, and so on.
And we didn't expect to find such things there.
For example, one example is a package that can be used to compare files and see the differences between them. And as a testing sample set, it includes a variety of executable and non-executable file formats.
So our engine, when it scanned all those files, it identified them.
And we found, like I said, a bunch of executable files, even additional archives, document files, and so on.
So what's going on there?
Is it hiding a different type of file from what people are expecting to see to try to throw people off the trail?
This isn't related to malicious packages we found.
This is related to the entire package repository.
One of the packages, which was scanned but was not malicious, was this file compare package,
which included as its own test data set a large amount of executable files for Windows 4, Linux 4, OS X, and so on,
to be used to do a sanity check if the comparisons work as they should.
We didn't expect to find executable content apart from Python scripts in PyPI.
Our thanks to Robert Perissa from Reversing Labs for joining us.
The research is titled Supply Chain Malware,
Detecting Malware in Package Manager Repositories.
We'll have a link in the show notes.
Cyber threats are evolving every second, Thank you. The Cyber Wire Research Saturday is proudly produced in Maryland out of the startup studios of DataTribe,
where they're co-building the next generation of cybersecurity teams and technologies.
Our amazing Cyber Wire team is Elliot Peltzman,
Puru Prakash, Stefan Vaziri, Kelsey Bond,
Tim Nodar, Joe Kerrigan, Carol Terrio, Ben Yellen,
Nick Valecki, Gina Johnson, Bennett Moe, Chris Russell, John Petrick, Jennifer Iben, Rick Howard, Peter Kilpie,
and I'm Dave Bittner. Thanks for listening.