CyberWire Daily - Package manager repository malware detection. [Research Saturday]

Episode Date: August 3, 2019

Researchers at Reversing Labs have been tracking malware hidden in software package manager repositories, and it's use as a supply chain attack vector. Robert Perica is a principal engineer at Reversi...ng Labs, and he joins us to share their findings.  The research can be found here: https://blog.reversinglabs.com/blog/suppy-chain-malware-detecting-malware-in-package-manager-repositories Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 You're listening to the Cyber Wire Network, powered by N2K. data products platform comes in. With Domo, you can channel AI and data into innovative uses that deliver measurable impact. Secure AI agents connect, prepare, and automate your data workflows, helping you gain insights, receive alerts, and act with ease through guided apps tailored to your role. Data is hard. Domo is easy. Learn more at ai.domo.com. That's ai.domo.com. Hello, everyone, and welcome to the CyberWire's Research Saturday. I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down threats and vulnerabilities and solving some of the hard problems of
Starting point is 00:01:10 protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us. And now, a message from our sponsor, Zscaler, the leader in cloud security. Enterprises have spent billions of dollars on firewalls and VPNs, yet breaches continue to rise by an 18% year-over-year increase in ransomware attacks and a $75 million record payout in 2024. These traditional security tools expand your attack surface with public-facing IPs that are exploited by bad actors more easily than ever with AI tools. It's time to rethink your security.
Starting point is 00:01:57 Zscaler Zero Trust Plus AI stops attackers by hiding your attack surface, making apps and IPs invisible, eliminating lateral movement, connecting users only to specific apps, not the entire network, continuously verifying every request based on identity and context, simplifying security management with AI-powered automation, and detecting threats using AI to analyze over 500 billion daily transactions. Hackers can't attack what they can't see. Protect your organization with Zscaler Zero Trust and AI.
Starting point is 00:02:33 Learn more at zscaler.com slash security. So the idea behind supply chain attacks is that an attacker abuses a typical deployment vector, such as an update mechanism, a third-party software download, or perhaps infecting a package repository in the hope that an unsuspecting developer might install a particular component and thereby infect his or her own machine. That's Robert Pericha. He's a threat analyst and reverse engineer at Reversing Labs. The research we're discussing today is titled Supply Chain Malware, Detecting Malware in Package Manager Repositories.
Starting point is 00:03:18 So in this way, these components are really widespread as they affect multitude of users. These components are really widespread as they affect multitude of users. So what are some of the popular places that are repositories for these sorts of things? Package repositories typically imply PyPI, RubyGems, NPM, and so on and so on. But supply chain attacks are not related only to package repositories. They can affect, for example, in the CCleaner case, like popular third-party software distribution repository. And so the notion here is that rather than creating software from scratch, folks can go use these components, these building blocks, and plug them into their own projects. Right.
Starting point is 00:04:00 So let's go through the work that you did here. Walk us through the analysis that you performed. With supply chain attacks becoming more and more popular, we were interested how hard would it be to find an example of such an attack that was still in the wild. Since there are several types of supply chain attacks, we opted to survey package repositories first. So first in line for review was PyPI, and we modeled a couple of FIARA rules
Starting point is 00:04:26 based on publicly available reports and previous incidents, and then ran the entire PyPI repository through our titanium platform processing engine to evaluate the rules. In the end, a couple of packages stood out, and after manual review, we confirmed that they were related to previous PyPI incidents but had not been removed in the cleanup action. Now, one of the things you pointed out here is that typosquatting is a common tactic? Yes, and package repositories can get infected in a couple of ways. One of them is for malicious actors to add additional code to known while-use packages. But this
Starting point is 00:05:02 is hard to achieve because popular packages usually go through an extensive review process, for example, on GitHub through pull requests and so on. However, in package repositories such as PyPI, people can upload or submit their own packages with typosquatted names without any review process. So, for example, you type janga instead of jango, and you get a mistype like a typosquat, and you rely on the unsuspecting user who will mistype the name and install the malicious package. And what did you see in terms of the frequency of people falling for this? It's a pretty common tactic. It's an extremely common tactic when it comes to URL typosquatting, like redirecting to malicious URLs
Starting point is 00:05:45 and so on. But people mistype all the time, like they try to install all the packages from the requirements.txt file through pip, but they forget to include the.txt file, the.txt extension. And then you actually say, okay, pip installed the requirements package. And if a requirements package has a malicious component within it, it will get installed. So yes, I'd say that this would be a common vector. And so for the person who accidentally downloads the misspelled version of this, what can they expect to happen? In this case, malware gets downloaded and installed. But since it's not invoked from the setup.py script during the pip installation, it won't get executed out of the box. Though it can be run as an executable file or by importing the malicious module and invoking
Starting point is 00:06:35 the malicious function, the malicious package will not run by itself. The function itself contains an IP address, which has been offline for quite some time, and from that IP address, the malicious function downloads the second stage and persists it as a hidden file, modifies bashrc file to be executed on every terminal or shell open. And that's basically it. We don't have any information about what the second stage actually is or how widespread it really is. Can you walk us through it? What was the process like when you ran this through your own engine to do the analysis? What was going on there?
Starting point is 00:07:10 We modeled our detection rules based on previous incidents. And we focused on the entire PyPI package repository. The data set contained around 1.6 million files. That amounts to around 2.6 terabytes of different files. And we essentially just plugged in those YARA rules into our engine and ran the entire sample set. The entire run lasted a bit shorter than a day. And at the end of the day, we had a bunch of matches on different rules that we plugged in. And then we essentially manually reviewed them and found the offending sample. So in terms of a percentage of what you found here, is this a relatively infrequent occurrence?
Starting point is 00:07:55 Yes, this is an infrequent occurrence. This is not something that is commonly done due to the hardness factor, how hard it is to achieve something like this. Although this script is extremely simple, and I expect to see much more of such attacks in the future. When you say that the script is simple, what script are you referring to? I'm referring to the actual malicious components being dropped. So the setup by script with malicious communication and persistence. So in terms of this being discovered, as you mentioned, there's no real mechanism for things like this to be scanned when these projects are uploaded. So it's really up to folks like you and other people to report them. I suspect that on the package repository side, perhaps some kind of a review process might be implemented. Although due to the size of the entire repository and many of the people working there are volunteers, I doubt that that will happen at great scale.
Starting point is 00:08:56 One of the ways they can process such an amount of files is to buy such a platform like the one we have and continuously process all the files. And of course, on the developer side, you actually have to check what you're downloading, what you're installing, and so on. So in terms of best ways for folks to protect themselves against this sort of thing, what do you recommend? When you're installing new packages, you could be on the lookout for suspicious network connections and transfers and not initiated by BIP itself. You can also be careful about what you type and how you type it.
Starting point is 00:09:31 It would be great if there was a way for public repositories to enforce some kind of content checks, like the continuous processing efforts. However, that's probably not applicable. not applicable. And on the developer side, for example, in large software companies and so on, some kind of an approval of use modules would be nice. We haven't covered, for example, what other types of files we found in the entire sample set. So one would expect, for example, that a Python package repository contains mostly Python files and perhaps text files. A Python package repository contains mostly Python files and perhaps text files. However, we found a couple of executable files for Windows, Linux, Mac OS, and so on. And we didn't expect to find such things there.
Starting point is 00:10:19 For example, one example is a package that can be used to compare files and see the differences between them. And as a testing sample set, it includes a variety of executable and non-executable file formats. So our engine, when it scanned all those files, it identified them. And we found, like I said, a bunch of executable files, even additional archives, document files, and so on. So what's going on there? Is it hiding a different type of file from what people are expecting to see to try to throw people off the trail? This isn't related to malicious packages we found. This is related to the entire package repository. One of the packages, which was scanned but was not malicious, was this file compare package,
Starting point is 00:10:58 which included as its own test data set a large amount of executable files for Windows 4, Linux 4, OS X, and so on, to be used to do a sanity check if the comparisons work as they should. We didn't expect to find executable content apart from Python scripts in PyPI. Our thanks to Robert Perissa from Reversing Labs for joining us. The research is titled Supply Chain Malware, Detecting Malware in Package Manager Repositories. We'll have a link in the show notes. Cyber threats are evolving every second, Thank you. The Cyber Wire Research Saturday is proudly produced in Maryland out of the startup studios of DataTribe,
Starting point is 00:12:29 where they're co-building the next generation of cybersecurity teams and technologies. Our amazing Cyber Wire team is Elliot Peltzman, Puru Prakash, Stefan Vaziri, Kelsey Bond, Tim Nodar, Joe Kerrigan, Carol Terrio, Ben Yellen, Nick Valecki, Gina Johnson, Bennett Moe, Chris Russell, John Petrick, Jennifer Iben, Rick Howard, Peter Kilpie, and I'm Dave Bittner. Thanks for listening.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.