CyberWire Daily - Amplification bots and how to detect them. [Research Saturday]
Episode Date: January 26, 2019Researchers from Duo Security have been analyzing the behavior of Twitter bots in a series of posts on their web site. Their most recent dive into the subject explores amplification bots, which boost ...the impact of tweets through likes and retweets. Jordan Wright is a principal R&D engineer at Duo Security, and he joins us to share their findings. Link to the original research -Â https://duo.com/labs/research/anatomy-of-twitter-bots-amplification-bots Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
You're listening to the Cyber Wire Network, powered by N2K. of you, I was concerned about my data being sold by data brokers. So I decided to try Delete.me.
I have to say, Delete.me is a game changer. Within days of signing up, they started removing my
personal information from hundreds of data brokers. I finally have peace of mind knowing
my data privacy is protected. Delete.me's team does all the work for you with detailed reports
so you know exactly what's been done. Take control of your data and keep your private life Thank you. JoinDeleteMe.com slash N2K and use promo code N2K at checkout.
The only way to get 20% off is to go to JoinDeleteMe.com slash N2K and enter code N2K at checkout.
That's JoinDeleteMe.com slash N2K, code N2K.
Hello, everyone, and welcome to the CyberWire's Research Saturday.
I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down threats and vulnerabilities and solving some of the hard problems of
protecting ourselves in a rapidly evolving cyberspace.
Thanks for joining us.
And now, a message from our sponsor, Zscaler, the leader in cloud security.
Enterprises have spent billions of dollars on firewalls and VPNs,
yet breaches continue to rise by an 18% year-over-year increase in ransomware attacks
and a $75 million record payout in 2024.
These traditional security tools expand your attack surface with public-facing IPs
that are exploited by bad actors more easily than ever with AI tools. It's time to rethink your
security. Zscaler Zero Trust plus AI stops attackers by hiding your attack surface, making
apps and IPs invisible, eliminating lateral movement, connecting users only to specific apps,
not the entire network, continuously verifying every request based on identity and context.
Simplifying security management with AI-powered automation.
And detecting threats using AI to analyze over 500 billion daily transactions.
Hackers can't attack what they can't see.
Protect your organization with Zscaler Zero Trust and AI.
Learn more at zscaler.com slash security.
So back in August, we presented a white paper at Black Hat on hunting down Twitter bots at a large
scale. That's Jordan Wright. He's a principal R&D engineer at Duo Security.
Along with his colleague, Alabadeh Anis,
he's co-author of a research paper titled
Anatomy of Twitter Bots, Amplification Bots.
So this research really culminated our findings
in what bots look like
and how we can build a large dataset
identifying bots within that data set accurately
and quickly. We also presented a case study showing a cryptocurrency scam botnet measuring
in the tens of thousands of bots. So this really set the tone for the research that we wanted to
continue with this first pass being on more content generating types of bots. But then we
took it a step further. Back in October, we posted a blog
post covering what we call fake followers. These are Twitter accounts that exist only to artificially
inflate another account's following numbers. It makes them appear more popular than they actually
are. So we were able to show how we could use very straightforward techniques and heuristics to
accurately identify a large
network of fake followers. And then the third type of bot that we wanted to explore and what
our research just this week covers is what we call amplification bots. These are bots that we
consider even more damaging than fake followers. So amplification bots work by actively retweeting
or liking tweets.
Their goal is to make information appear more credible than it is, as well as distribute
it to a wide audience of unsuspecting users.
So we say this is more damaging because it's not just about popularity in this case.
It's actively spreading and disseminating information and making that information appear
credible.
So that's where we are today. And that's what our study focuses on is,
can we accurately identify amplification bots? And taking it a step further, can we build a crawler
that can enumerate amplification bots very, very quickly?
All right. Well, you've all started out here with the research kind of trying to establish
what was normal. Can you take us
through it? What was the process there? Sure. And that's a process anytime we go into identifying
bots on Twitter is we first have to ask ourselves, what's the normal behavior? And then we can look
for things that we would deem weird, you know, things that we can accurately say this is weird
enough to constitute automation. So in this case, we took the same data set that
we built during our first round of research, which consisted of 576 million tweets. And we asked
ourselves, what does the ratio of retweets to likes look like? So in this research, we only
wanted to focus on amplification bots that retweet tweets. We didn't focus on those that like tweets because
there wasn't API endpoints in place where we could gather that information very easily.
So looking at these ratios, it intuitively makes sense. If you're on Twitter, you're scrolling
through, you'll notice that generally tweets will have more likes than retweets. This makes sense
because it's kind of a lower impact
action. I'm not spreading it to my own followers. I'm just acknowledging that tweet. And we found
the same thing within our dataset. So in fact, we found that 80% of the tweets in our dataset
had at least more likes than retweets. So we can say that it's fairly weird for a tweet to have
more retweets than likes. And again, this intuitively
would make sense if you're just scrolling through Twitter. So this was one metric that we used to
identify if a tweet was being actively amplified. And we were really conservative. Our estimation
for whether or not it was amplified was, does it have five times as many retweets as it has likes?
Because looking through our data set, we found
that's very, very weird and doesn't happen much at all. And so that was one heuristic that we used.
And we used two others as well. The first is how chronological a user's timeline is.
Most users, as they're interacting with Twitter, will have a fairly chronological timeline.
I'm going to author tweets, and those are going to be in order.
The things that I retweet are generally going to be in order.
There's going to be the rare scenario where perhaps I stumble across an older tweet
from a while back that just caught my attention.
But in general, things are going to trend in one direction.
Now, let's just pause there for a second to make sure I understand what you're saying.
So what you're saying is that both the tweets that I generate and the tweets that I'm interacting with will have basically a chronological
sequence to them. There won't be a lot of jumping around in time. Is that how you describe it?
That's exactly right. And we call those jumping around in time inversions. The idea being if the
next tweet in my timeline is, let's say the tweet before was older
and then my tweet is newer in that case.
But then if I go back in time,
that's an inversion.
I went from older to newer to older.
And that could happen occasionally
for normal timelines.
But in the case of bots,
we found that happens much, much more frequently.
This is because whenever they get their orders to go and retweet certain numbers of tweets,
they could be older tweets, they could be brand new tweets.
But it's really all over the map, these inversions.
We see a significantly higher number of them for what we would consider bot-like behavior.
So when you're looking at one of these bots, do you see that, oh,
this particular bot has been assigned to start retweeting from this legitimate account? Is it
that straightforward? We see a mixture of accounts that they're retweeting. We, of course, can see
who they're retweeting. And that gives us an indication on who they're designed to go and
amplify, whose content they're designed to go and amplify.
And it may be the case that it's one person.
It may be many people from a really diverse set of backgrounds and interests,
which is another indication that perhaps this is automated bot-like behavior
because there is no clear trend in the type of content that they're amplifying.
And on that note, that kind of
leads into the third heuristic that we used during this research, which is how many tweets in their
timeline or retweets. So the average user, their timeline consists of right around 37% retweets.
Usually they're going to have more than that being original content that they author
themselves. So whenever we want to consider if an account is bot-like, one of those heuristics
that we apply is whether or not their timeline consists of 90% or more retweets. This is because
the bots that we encounter, that's really all they do. They just retweet content. So using these three different heuristics,
how many tweets in their timeline are retweets? Are they amplified having that five to one ratio
of retweets versus likes? And how many inversions they have allows us to very accurately identify
amplification bot-like behavior. Now help me understand that there's a part of this that
there's a little puzzling to me. So I can understand someone going out there and paying
for followers so that it appears as though they have a larger following than they have. But with
these amplification bots, who's following them that these retweets would matter? There's really
two goals to having tweets being amplified.
The first is a wider reach.
And to that point, you're right.
It's really hit and miss,
depending on how many legitimate users
are following these bots
that would, as a result, see this content.
So that's going to be kind of a mixed bag
in terms of how effective that is.
But the other goal is to make information
seem more credible or popular than it
actually is. The goal of this could say, you know, if I'm a content creator, maybe I want to appear
more as an influencer. I want to appear that the content that I put out has a wider reach than it
actually does for people coming and looking at my profile after the fact and trying to get that sense of
whether or not they should follow me. And likewise, it also gives a higher incentive or higher
likelihood that a user is going to engage with a tweet. So bringing this back to the security space,
you could imagine that if a tweet has a malicious link in it, and I want this to be broadcast to unsuspecting users. The more retweets
that this has, the more legitimate it may seem and the more likely we could assume users would be
to engage with that tweet and to click on the link to that malicious content.
It's almost like, I don't know, decoy ducks, right? In a pond to attract real ducks for a hunter.
Exactly. And really, we saw a great example of this in our initial research.
Whenever we studied the cryptocurrency scam botnet, we actively saw accounts being created with the sole purpose of liking the scam-related tweets.
So we see this in action in the security space, and we see this actively being used to promote malicious content.
So in the research, you have some examples, some bots that you highlighted here.
Can you take us through, describe what does a typical bot look like?
Sure. So really, it falls down to those three heuristics that we mentioned earlier.
We can use one example that we give in the post to really highlight. The first is that this account that we're looking at, the first tweet on its timeline has roughly 970 retweets and only 164 likes. This is a ratio of
almost six to one, which is incredibly rare. Looking across our data set of 570 some odd
million tweets, we only saw this occur 0.2% of the time. So if we see the same ratio throughout their
entire timeline, that's very, very odd and highly indicative of foul play to some extent.
And the other key indicator that we found with this particular bot is that we were looking for
90% or more of their timeline being retweets.
But this bot in particular had nothing except retweets.
100% of their content was retweets, most of which were highly, highly amplified.
So these two heuristics alone are a good indicator that this is a bot. And then we would look at that third, which is how many inversions do we see?
How often are they jumping around in time?
And this account is a perfect example, jumping around very frequently, which is not normal for an average user.
of these bots, how they're connected to each other and how you tracked and traced the various bots that seem to have been teamed up to do this sort of thing. How did you go about that?
You're absolutely right that these bots are connected in some extent. And the reason that
this is the case is that they operate in groups. So one of the things that my partner, Labadee,
and I did was try to determine how can we map out the entire story, this entire network of bots operating together.
So if I'm a user and I want my content to be amplified, I typically won't try and seek out an individual retweet.
I'm going to try and seek out 50 retweets or maybe 100 retweets. So this gives an incentive to where the bot owners will likely
flock a set of bots to all retweet the same tweet. This means that we can study this system as a
group because that's how they operate. So in our case, using the heuristics that we talked about
earlier, we were able to build out a crawler that starts with just a single account that's known to be an amplification bot, looking at the tweets that that amplification bot has retweeted, and then looking through who else has retweeted those tweets, using those heuristics to identify other amplification bots that are all part of that same network, all amplifying the same content as a group, this coordinated action. This was really
accurate and really effective. Within 24 hours, we were able to track down over 7,000 very likely
amplification bots. And we just consider this kind of the tip of the iceberg. We are highly confident
that had we let this continue running, we would have continued over time mapping out a much larger network of bots.
Now, when you're looking at this, when a tweet gets amplified, when someone engages with someone to do this for them,
is that amplification process more or less instantaneous,
or do they build in some kind of time delay to try to make things perhaps seem a little more organic?
So for this research, we didn't really take a hard look at the economy around how these bots
operate. But what we can say is just from doing some spot checking throughout the research,
there's a wide variety on the types of tactics and methods employed by bot owners to try and
evade detection. The time aspect is one of those, but it really all falls down to with the goal being a high number of retweets and the goal being a large reach of information, the retweets have to come at some point.
And so by using these heuristics that are difficult to try and evade, because like I mentioned, it's hard to get a high retweet count without having a suspicious number of retweets,
right? So using these heuristics and applying them, the time-based aspect of it is something
that wouldn't come into play as much to try and evade the crawler that we built.
So did you notice anything in terms of turnover of these amplification accounts? Do they seem to be
running without any risk of them shutting down?
I guess I'm asking, do they stay around for a while or do they seem to be shut down?
That's a great question. One of the things that we were really encouraged to see
as we were building out our crawler and as we were mapping out these network of bots
was that Twitter was very proactive in shutting them down. It would be almost the case that we
would come across a bot and very
shortly after that bot would have already been suspended, which from a research perspective is
exactly what we're hoping to see when we start engaging and trying to find these bots. Now,
it definitely differs. We found bots that were much older. We found bots that were stood up and
began use right away and quickly suspended.
So it's kind of all over the map.
But as a general trend, we're really encouraged with how proactive Twitter was in shutting them down.
And so what do you suppose the conclusions that you've drawn here, how do they inform folks who are going about their day-to-day work trying to protect their organizations?
What are some of the take-homes for them?
work trying to protect their organizations. What are some of the take-homes for them?
So it's all about remaining vigilant on Twitter and just trying to keep a close eye on,
especially whenever it comes to malicious content, kind of looking back at the cryptocurrency scam where we saw this being used quite a bit. If something appears too good to be true, it likely
is. But the other benefit of our research is really focusing on enabling other researchers
to take our work, build on it, and improve it in really interesting ways. And we're already
seeing that happen, which is really exciting. We're seeing third-party researchers come to us
saying, we were able to take your tools and your methodologies and apply it to this particular
discipline or apply it to this particular area with these really fantastic
results, which is just the very best thing that we can hope for from a lab perspective.
This is why with this research, like the fake followers and like our original Don't At Me
study, we're releasing the code that we wrote to detect these amplification bots using a crawler.
We're open sourcing all of it so that other researchers can take, build on, and improve it.
Our thanks to Jordan Wright from Duo Security for joining us. Along with his colleague,
Alavade Anis, he's co-author of the report, Anatomy of Twitter Bots, Amplification Bots.
That's on the Duo website. We'll have a link in the show notes.
And now a message from Black Cloak. Did you know the easiest way for cyber criminals to bypass your company's defenses is by targeting your executives and their families at home? Black Cloak's award-winning digital executive protection platform secures their personal devices, home networks, and connected lives.
Because when executives are compromised at home, your company is at risk.
In fact, over one-third of new members discover they've already been breached.
Protect your executives and their families 24-7,
365, with Black Cloak. Learn more at blackcloak.io.
The Cyber Wire Research Saturday is proudly produced in Maryland out of the startup studios
of Data Tribe, where they're co-building the next generation of cybersecurity teams and technologies.
Our amazing Cyber Wire team is Elliot Peltzman, Puru Prakash, Stefan Vaziri, Kelsey Bond, Thanks for listening.