CyberWire Daily - Managing machine learning risks. [Research Saturday]

Episode Date: June 17, 2023

Our guest, Johannes Ullrich from SANS Institute, joins Dave to discuss their research on "Machine Learning Risks: Attacks Against Apache NiFi." Using their honeypot network, researchers were able to ...collect some interesting data about a threat actor who is currently going after exposed Apache NiFi servers. Researchers state “On May 19th, our distributed sensor network detected a notable spike in requests for ‘/nifi.’” Investigating further, they instructed a subset of their sensors to forward requests to an actual Apache NiFi instance and within a couple of hours the honeypot was completely compromised. The research can be found here: Machine Learning Risks: Attacks Against Apache NiFi Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 You're listening to the Cyber Wire Network, powered by N2K. of you, I was concerned about my data being sold by data brokers. So I decided to try Delete.me. I have to say, Delete.me is a game changer. Within days of signing up, they started removing my personal information from hundreds of data brokers. I finally have peace of mind knowing my data privacy is protected. Delete.me's team does all the work for you with detailed reports so you know exactly what's been done. Take control of your data and keep your private life Thank you. Hello, everyone, and welcome to the CyberWires Research Saturday. I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down the threats and vulnerabilities, solving some of the hard problems,
Starting point is 00:01:43 and protecting ourselves in our rapidly evolving cyberspace. Thanks for joining us. What we saw is in our honeypot network, we sort of get alerts whenever there's a new kind of URL that is being probed in honeypots. And one of the URLs that sort of caused the spike here was slash NiFi, which, of course, initially didn't really ring a bell.
Starting point is 00:02:12 But then doing some looking into it, we figured out that this is likely going after Apache NiFi, which is often described as a data orchestration platform. That's Johannes Ulrich. He's the Dean of Research at the SANS Technology Institute. The research we're discussing today is titled Machine Learning Risks, Attacks Against Apache NiFi. Yeah, I have to admit, NiFi was a new one to me.
Starting point is 00:02:45 It had me running to search for exactly what it was. Can you describe to us what is the general use case for Apache NiFi? Yeah, so a little bit history here. It actually was developed by the NSA, who open sourced it about 10 years ago. And the Apache project sort of took over maintenance of it. It's described as a data orchestration platform. What it does is it reads data from a large number of sources, whether that's like cloud storage, database and such.
Starting point is 00:03:14 You can filter it, you can extract subsets of that data, and it'll save it back or send it back to some destination. And again, you have a wide range here. So use cases are, for example, in business data. You receive data from a database that you then need to adapt in order to use it, for example, in some kind of analytic system. It's also often used these days in machine learning because you have these large data sets that you need to adapt
Starting point is 00:03:41 in order to then process them in your machine learning algorithm. So these are some of the use cases here. It's written in Java, and one of the nice things about it is you don't really need to do a lot of coding with it. It sort of presents a GUI web interface. You can sort of drop and track your sources. You can configure credentials for your S3 bug and tell it, hey, read that JSON file,
Starting point is 00:04:09 pull it out there, turn on an XML file that maybe my enterprise resource planning system can read. Well, let's walk through this together. I mean, as you say, you were noticing some things on your honeypot, so where did it go from there? Well, then of course we want to figure out what is it actually attempting to do here? They're definitely looking for NiFi, but why? That's, of course, the next question.
Starting point is 00:04:28 There is, of course, some interesting data often in these NiFi systems. So what we did is we set up an actual NiFi instance. The problem, of course, with this often described as a full interaction honeypot is you can't really set up a lot of them. We set up one of them, but our honeypot network, we have sort of the feature where we can redirect queries from the honeypots to a system like this. So
Starting point is 00:04:55 a subset of the honeypots were now sending, whenever they saw something going to port 8080 or 8443, which are default ports for Apache and NiFi. Whenever they saw something, well, they were just proxying it to our real NiFi instance that we weren't able to monitor.
Starting point is 00:05:15 And now, of course, the attacker, well, they couldn't tell that this was a honeypot anymore. They considered that a real NiFi instance, which actually, well, it was. It was a real full-featured NiFi instance. So you see them coming in and searching for this, and when they hit the NiFi instance, what do they do? Well, there are sort of two things we saw.
Starting point is 00:05:37 One thing was they installed a crypto coin miner, of course. It was a little bit of a letdown initially, I have to admit, because that's what everybody does. And they used a feature, it's not a vulnerability. So I want to point out here that at no time they actually abused a vulnerability or some kind of zero day or such here. We had a completely patched and up-to-date version
Starting point is 00:06:03 of NiFi, but NiFi has a feature built in that allows you to execute code. And that's the sort of processor that you can set up to process your data, where you can basically load scripts that will process your data. And with that, you have the ability to run arbitrary code. The real problem here is that the attacker took advantage of not requiring a password to actually access NiFi. It's highly recommended in documentation, but who
Starting point is 00:06:32 reads the manual? Is this primarily a configuration issue where folks who are making use of this are neglecting to properly secure it or exposing it to the internet to begin with? I think both. So you never really should expose something like this to the internet.
Starting point is 00:06:52 It's a very complex system. It had vulnerabilities in the past, nothing really super critical. So they have done a reasonable good job there. But the number one problem is that configuration issue that you didn't configure a password. It's not hard. Like I said, there is documentation for it. They have issue that you didn't configure a password. It's not hard. Like I said, there is documentation for it. They have a simple command to set up a password.
Starting point is 00:07:10 So it's not really all that difficult. It becomes a real problem if you actually process real data with NiFi. Because now you not only have access to the data, but also credentials. Because in order for NiFi to access the data, well, NiFi needs to know how to connect to a database, how to connect to that S3 bucket. That information an attacker could also retrieve from NiFi if there's no password or a weak password. The other thing we then saw
Starting point is 00:07:37 is that in addition to installing CryptoCoin, as I said, was sort of a little bit of letdown. The most they did, but there were a of attackers also that used NiFi to then probe the network. So once they had control of the NiFi server, they didn't know what's offered for this lateral movement where they searched the NiFi server for credentials, in particular SSH keys. They looked at, hey, did anybody log in this NiFi server and then connect anywhere else via SSH?
Starting point is 00:08:07 So they went all through the account and host combinations there and basically tried to abuse any of them to gain additional access to systems. And now, a message from our sponsor, Zscaler, the leader in cloud security. Enterprises have spent billions of dollars on firewalls and VPNs, yet breaches continue to rise by an 18% year-over-year increase in ransomware attacks and a $75 million record payout in 2024. increase in ransomware attacks and a $75 million record payout in 2024, these traditional security tools expand your attack surface with public-facing IPs that are exploited by bad actors more easily than ever with AI tools. It's time to rethink your security. Zscaler Zero Trust plus AI stops attackers by hiding your attack surface, making apps and IPs invisible, eliminating lateral movement,
Starting point is 00:09:08 connecting users only to specific apps, not the entire network, continuously verifying every request based on identity and context, simplifying security management with AI-powered automation, and detecting threats using AI to analyze
Starting point is 00:09:24 over 500 billion daily transactions. Hackers can't attack what they can't see. Protect your organization with Zscaler Zero Trust and AI. Learn more at zscaler.com security. Now, for the folks who are running this NiFi instance, say someone comes in and drops a crypto miner in, is it obvious that that has happened? Or does it run quietly behind the scenes? Well, it depends on how careful you look. Within the GUI, you will see these processors that the attacker has set up.
Starting point is 00:10:07 So that's something that you should notice. Of course, you have a very complex instance. There may be tons of processes that you already have configured. It may not be that obvious that you actually have a new one here that wasn't authorized. In the case that we observed, the attacker also set up a ground job in order to have sort of a backup in case that process gets deleted or NiFi gets removed from that system. That ground job would run once a minute and try to reinstall things again. Overall, this was fairly noisy, so an administrator should be able to notice that if they're watching.
Starting point is 00:10:46 Let's face it, they didn't start by setting up a password for it, so who knows what else is missing there. And of course, odd network connections then outbound from that system and such, which may or may not be notable depending on how noisy your network is in general. Yeah. Now, in your research, correct me if I'm wrong here, this is all running in RAM, so it's trying to hide itself that way? Yeah, so the install script that's being downloaded is never saved to disk.
Starting point is 00:11:18 It's a simple bash script. It just uses curl, the command line command, to retrieve commands or retrieve files via HTTP, pass it directly to SH, to the shell. So that's never being saved. The crypto coin miner they're downloading is being saved. Some of the other things like the SH scanning of other machines, also no real files being saved here.
Starting point is 00:11:42 That's the same thing. It just downloads it via curl and pipes it directly to shell. While you may find some miscellaneous evidence of this, there is no actual file being saved on the system. Yeah. Is it fair to say from your description here that we're looking at opportunists
Starting point is 00:12:00 basically? This isn't a high level of sophistication? Correct. This looks like opportunists. CryptocurrencyMiner, of course, could also be sort of their last resort, kind of, if they don't find anything else interesting to do with this instance. And our instance didn't really have any interesting data that they say, you know, let's make some crypto coins while we're in here and sort of use data this way. The interesting part was there was really just one attacker who was really
Starting point is 00:12:27 heavily scanning for this. Also, we then sort of went through some search and to see, well, you know, who's actually exposing NiFi? And found a lot of sort of cloud instances, like particularly in Azure and such, where people had NiFi set up. And that's, of course, where
Starting point is 00:12:43 the entire issue with blocking Internet access becomes more tricky because, well, now your NiFi instance is in the cloud, you have to connect to it via the open Internet unless you set up some careful IP address filtering, which you easily get wrong and then you lose access to it. So that may be one reason why there are these exposed NiFi instances.
Starting point is 00:13:06 Yeah, that's interesting. Any idea who is behind this? You said it seems to be coming from one place primarily. Yeah, we saw a lot of Russian IPs that's part of the infrastructure where all of the hosts are located, where all the scripts are being loaded from. Some Ukrainian hosts, one in particular, does a lot of scanning. Whereas these days, sometimes geolocation with Ukraine versus Russia can be a little bit tricky here. I haven't really looked that close into it. But I haven't really found any strong evidence as far as nationality or so goes.
Starting point is 00:13:39 It's using commodity malware like this crypto coin miner is fairly commonly found. So I'm not really sure if that's one act or another. It could be anybody. Any notion of how widespread this is? Well, like I said, we really see one attacker who is trying it really hard. As far as open NiFi instances,
Starting point is 00:14:03 a quick scan of the standard search engine shows a couple of hundred maybe that are out there that are open to the public on default ports. Hadn't really looked too closely how many are maybe hiding a little bit on slightly different URLs. But this attacker really seems to go for these default instances.
Starting point is 00:14:20 This could also be something where an attacker, once they gain access to a network, is looking for these NiFi instances. Because after all, they do allow that operating code execution. So that would be also then an initial lateral movement again for an attacker who breached a network with a NiFi instance. Right, right. I'm here. While I'm here, I might as well drop a cryptocurrency miner. Or just look for any data being touched by NiFi.
Starting point is 00:14:46 That's something we'll probably do in the future, put some credentials in there and see if they're then being used. Right. So what are the recommendations then for folks who are using Apache NiFi? What sort of things should they be looking out for here? I think the number one thing is right now just inventory. That's sort of one problem here. You may find
Starting point is 00:15:08 data scientists, people in business analytics and such that set up NiFi instances in the cloud without necessarily that rogue IT kind of issue where they don't necessarily properly account for it
Starting point is 00:15:23 and so it never really gets properly configured and patched and all of that good stuff. But the number one thing is if you have NiFi, put a password in there. Even a weak password is better than no password. But while you're at it, pick something a little bit better than NiFi kind of as a password. Right, NiFi 1, yeah. Right, while you're putting in a password, oh, what the heck, make it a strong one. Yes.
Starting point is 00:15:48 I don't think there is much sort of in terms of a strong authentication, but I haven't really looked into how it sort of would integrate with any kind of SSO or such. Right, right. It's really an interesting case here because, I mean, obviously a legitimate tool,
Starting point is 00:16:02 not terribly widespread usage, it would seem. And yet, some clever hacker out there has found a way to use it to their advantage. I suppose it's kind of a cautionary tale. Yeah, and that's really just if it's out there, if it's vulnerable, they'll find it. They may find it before you find it, and that's really the big problem. Our thanks to Johannes Ulrich from the SANS Technology Institute for joining us. The research is titled Machine Learning Risks,
Starting point is 00:16:39 Attacks Against Apache NiFi. We'll have a link in the show notes. Cyber threats are evolving every second, and staying ahead is more than just a challenge. It's a necessity. That's why we're thrilled to partner with ThreatLocker, a cybersecurity solution trusted by businesses worldwide. ThreatLocker is a full suite of solutions designed to give you total control, stopping unauthorized applications, securing sensitive data, and ensuring your organization runs smoothly and securely. Visit ThreatLocker.com today to see how a default-deny approach
Starting point is 00:17:24 can keep your company safe and compliant. The CyberWire Research Saturday podcast is a production of N2K Networks, proudly produced in Maryland out of the startup studios of DataTribe, where they're co-building the next generation of cybersecurity teams and technologies. This episode was produced by Liz Ervin and senior producer Jennifer Iben. Our mixer is Elliot Peltzman. Our executive editor is Peter Kilpie. And I'm Dave Bittner.
Starting point is 00:18:03 Thanks for listening.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.