CyberWire Daily - Amplification bots and how to detect them. [Research Saturday]

Episode Date: January 26, 2019

Researchers from Duo Security have been analyzing the behavior of Twitter bots in a series of posts on their web site. Their most recent dive into the subject explores amplification bots, which boost ...the impact of tweets through likes and retweets. Jordan Wright is a principal R&D engineer at Duo Security, and he joins us to share their findings. Link to the original research -  https://duo.com/labs/research/anatomy-of-twitter-bots-amplification-bots Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 You're listening to the Cyber Wire Network, powered by N2K. of you, I was concerned about my data being sold by data brokers. So I decided to try Delete.me. I have to say, Delete.me is a game changer. Within days of signing up, they started removing my personal information from hundreds of data brokers. I finally have peace of mind knowing my data privacy is protected. Delete.me's team does all the work for you with detailed reports so you know exactly what's been done. Take control of your data and keep your private life Thank you. JoinDeleteMe.com slash N2K and use promo code N2K at checkout. The only way to get 20% off is to go to JoinDeleteMe.com slash N2K and enter code N2K at checkout. That's JoinDeleteMe.com slash N2K, code N2K. Hello, everyone, and welcome to the CyberWire's Research Saturday.
Starting point is 00:01:36 I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down threats and vulnerabilities and solving some of the hard problems of protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us. And now, a message from our sponsor, Zscaler, the leader in cloud security. Enterprises have spent billions of dollars on firewalls and VPNs, yet breaches continue to rise by an 18% year-over-year increase in ransomware attacks and a $75 million record payout in 2024. These traditional security tools expand your attack surface with public-facing IPs
Starting point is 00:02:20 that are exploited by bad actors more easily than ever with AI tools. It's time to rethink your security. Zscaler Zero Trust plus AI stops attackers by hiding your attack surface, making apps and IPs invisible, eliminating lateral movement, connecting users only to specific apps, not the entire network, continuously verifying every request based on identity and context. Simplifying security management with AI-powered automation. And detecting threats using AI to analyze over 500 billion daily transactions. Hackers can't attack what they can't see. Protect your organization with Zscaler Zero Trust and AI.
Starting point is 00:03:04 Learn more at zscaler.com slash security. So back in August, we presented a white paper at Black Hat on hunting down Twitter bots at a large scale. That's Jordan Wright. He's a principal R&D engineer at Duo Security. Along with his colleague, Alabadeh Anis, he's co-author of a research paper titled Anatomy of Twitter Bots, Amplification Bots. So this research really culminated our findings in what bots look like
Starting point is 00:03:40 and how we can build a large dataset identifying bots within that data set accurately and quickly. We also presented a case study showing a cryptocurrency scam botnet measuring in the tens of thousands of bots. So this really set the tone for the research that we wanted to continue with this first pass being on more content generating types of bots. But then we took it a step further. Back in October, we posted a blog post covering what we call fake followers. These are Twitter accounts that exist only to artificially inflate another account's following numbers. It makes them appear more popular than they actually
Starting point is 00:04:18 are. So we were able to show how we could use very straightforward techniques and heuristics to accurately identify a large network of fake followers. And then the third type of bot that we wanted to explore and what our research just this week covers is what we call amplification bots. These are bots that we consider even more damaging than fake followers. So amplification bots work by actively retweeting or liking tweets. Their goal is to make information appear more credible than it is, as well as distribute it to a wide audience of unsuspecting users.
Starting point is 00:04:54 So we say this is more damaging because it's not just about popularity in this case. It's actively spreading and disseminating information and making that information appear credible. So that's where we are today. And that's what our study focuses on is, can we accurately identify amplification bots? And taking it a step further, can we build a crawler that can enumerate amplification bots very, very quickly? All right. Well, you've all started out here with the research kind of trying to establish what was normal. Can you take us
Starting point is 00:05:25 through it? What was the process there? Sure. And that's a process anytime we go into identifying bots on Twitter is we first have to ask ourselves, what's the normal behavior? And then we can look for things that we would deem weird, you know, things that we can accurately say this is weird enough to constitute automation. So in this case, we took the same data set that we built during our first round of research, which consisted of 576 million tweets. And we asked ourselves, what does the ratio of retweets to likes look like? So in this research, we only wanted to focus on amplification bots that retweet tweets. We didn't focus on those that like tweets because there wasn't API endpoints in place where we could gather that information very easily.
Starting point is 00:06:12 So looking at these ratios, it intuitively makes sense. If you're on Twitter, you're scrolling through, you'll notice that generally tweets will have more likes than retweets. This makes sense because it's kind of a lower impact action. I'm not spreading it to my own followers. I'm just acknowledging that tweet. And we found the same thing within our dataset. So in fact, we found that 80% of the tweets in our dataset had at least more likes than retweets. So we can say that it's fairly weird for a tweet to have more retweets than likes. And again, this intuitively would make sense if you're just scrolling through Twitter. So this was one metric that we used to
Starting point is 00:06:50 identify if a tweet was being actively amplified. And we were really conservative. Our estimation for whether or not it was amplified was, does it have five times as many retweets as it has likes? Because looking through our data set, we found that's very, very weird and doesn't happen much at all. And so that was one heuristic that we used. And we used two others as well. The first is how chronological a user's timeline is. Most users, as they're interacting with Twitter, will have a fairly chronological timeline. I'm going to author tweets, and those are going to be in order. The things that I retweet are generally going to be in order.
Starting point is 00:07:29 There's going to be the rare scenario where perhaps I stumble across an older tweet from a while back that just caught my attention. But in general, things are going to trend in one direction. Now, let's just pause there for a second to make sure I understand what you're saying. So what you're saying is that both the tweets that I generate and the tweets that I'm interacting with will have basically a chronological sequence to them. There won't be a lot of jumping around in time. Is that how you describe it? That's exactly right. And we call those jumping around in time inversions. The idea being if the next tweet in my timeline is, let's say the tweet before was older
Starting point is 00:08:06 and then my tweet is newer in that case. But then if I go back in time, that's an inversion. I went from older to newer to older. And that could happen occasionally for normal timelines. But in the case of bots, we found that happens much, much more frequently.
Starting point is 00:08:24 This is because whenever they get their orders to go and retweet certain numbers of tweets, they could be older tweets, they could be brand new tweets. But it's really all over the map, these inversions. We see a significantly higher number of them for what we would consider bot-like behavior. So when you're looking at one of these bots, do you see that, oh, this particular bot has been assigned to start retweeting from this legitimate account? Is it that straightforward? We see a mixture of accounts that they're retweeting. We, of course, can see who they're retweeting. And that gives us an indication on who they're designed to go and
Starting point is 00:09:02 amplify, whose content they're designed to go and amplify. And it may be the case that it's one person. It may be many people from a really diverse set of backgrounds and interests, which is another indication that perhaps this is automated bot-like behavior because there is no clear trend in the type of content that they're amplifying. And on that note, that kind of leads into the third heuristic that we used during this research, which is how many tweets in their timeline or retweets. So the average user, their timeline consists of right around 37% retweets.
Starting point is 00:09:41 Usually they're going to have more than that being original content that they author themselves. So whenever we want to consider if an account is bot-like, one of those heuristics that we apply is whether or not their timeline consists of 90% or more retweets. This is because the bots that we encounter, that's really all they do. They just retweet content. So using these three different heuristics, how many tweets in their timeline are retweets? Are they amplified having that five to one ratio of retweets versus likes? And how many inversions they have allows us to very accurately identify amplification bot-like behavior. Now help me understand that there's a part of this that there's a little puzzling to me. So I can understand someone going out there and paying
Starting point is 00:10:30 for followers so that it appears as though they have a larger following than they have. But with these amplification bots, who's following them that these retweets would matter? There's really two goals to having tweets being amplified. The first is a wider reach. And to that point, you're right. It's really hit and miss, depending on how many legitimate users are following these bots
Starting point is 00:10:54 that would, as a result, see this content. So that's going to be kind of a mixed bag in terms of how effective that is. But the other goal is to make information seem more credible or popular than it actually is. The goal of this could say, you know, if I'm a content creator, maybe I want to appear more as an influencer. I want to appear that the content that I put out has a wider reach than it actually does for people coming and looking at my profile after the fact and trying to get that sense of
Starting point is 00:11:25 whether or not they should follow me. And likewise, it also gives a higher incentive or higher likelihood that a user is going to engage with a tweet. So bringing this back to the security space, you could imagine that if a tweet has a malicious link in it, and I want this to be broadcast to unsuspecting users. The more retweets that this has, the more legitimate it may seem and the more likely we could assume users would be to engage with that tweet and to click on the link to that malicious content. It's almost like, I don't know, decoy ducks, right? In a pond to attract real ducks for a hunter. Exactly. And really, we saw a great example of this in our initial research. Whenever we studied the cryptocurrency scam botnet, we actively saw accounts being created with the sole purpose of liking the scam-related tweets.
Starting point is 00:12:19 So we see this in action in the security space, and we see this actively being used to promote malicious content. So in the research, you have some examples, some bots that you highlighted here. Can you take us through, describe what does a typical bot look like? Sure. So really, it falls down to those three heuristics that we mentioned earlier. We can use one example that we give in the post to really highlight. The first is that this account that we're looking at, the first tweet on its timeline has roughly 970 retweets and only 164 likes. This is a ratio of almost six to one, which is incredibly rare. Looking across our data set of 570 some odd million tweets, we only saw this occur 0.2% of the time. So if we see the same ratio throughout their entire timeline, that's very, very odd and highly indicative of foul play to some extent.
Starting point is 00:13:15 And the other key indicator that we found with this particular bot is that we were looking for 90% or more of their timeline being retweets. But this bot in particular had nothing except retweets. 100% of their content was retweets, most of which were highly, highly amplified. So these two heuristics alone are a good indicator that this is a bot. And then we would look at that third, which is how many inversions do we see? How often are they jumping around in time? And this account is a perfect example, jumping around very frequently, which is not normal for an average user. of these bots, how they're connected to each other and how you tracked and traced the various bots that seem to have been teamed up to do this sort of thing. How did you go about that?
Starting point is 00:14:12 You're absolutely right that these bots are connected in some extent. And the reason that this is the case is that they operate in groups. So one of the things that my partner, Labadee, and I did was try to determine how can we map out the entire story, this entire network of bots operating together. So if I'm a user and I want my content to be amplified, I typically won't try and seek out an individual retweet. I'm going to try and seek out 50 retweets or maybe 100 retweets. So this gives an incentive to where the bot owners will likely flock a set of bots to all retweet the same tweet. This means that we can study this system as a group because that's how they operate. So in our case, using the heuristics that we talked about earlier, we were able to build out a crawler that starts with just a single account that's known to be an amplification bot, looking at the tweets that that amplification bot has retweeted, and then looking through who else has retweeted those tweets, using those heuristics to identify other amplification bots that are all part of that same network, all amplifying the same content as a group, this coordinated action. This was really
Starting point is 00:15:26 accurate and really effective. Within 24 hours, we were able to track down over 7,000 very likely amplification bots. And we just consider this kind of the tip of the iceberg. We are highly confident that had we let this continue running, we would have continued over time mapping out a much larger network of bots. Now, when you're looking at this, when a tweet gets amplified, when someone engages with someone to do this for them, is that amplification process more or less instantaneous, or do they build in some kind of time delay to try to make things perhaps seem a little more organic? So for this research, we didn't really take a hard look at the economy around how these bots operate. But what we can say is just from doing some spot checking throughout the research,
Starting point is 00:16:13 there's a wide variety on the types of tactics and methods employed by bot owners to try and evade detection. The time aspect is one of those, but it really all falls down to with the goal being a high number of retweets and the goal being a large reach of information, the retweets have to come at some point. And so by using these heuristics that are difficult to try and evade, because like I mentioned, it's hard to get a high retweet count without having a suspicious number of retweets, right? So using these heuristics and applying them, the time-based aspect of it is something that wouldn't come into play as much to try and evade the crawler that we built. So did you notice anything in terms of turnover of these amplification accounts? Do they seem to be running without any risk of them shutting down? I guess I'm asking, do they stay around for a while or do they seem to be shut down?
Starting point is 00:17:11 That's a great question. One of the things that we were really encouraged to see as we were building out our crawler and as we were mapping out these network of bots was that Twitter was very proactive in shutting them down. It would be almost the case that we would come across a bot and very shortly after that bot would have already been suspended, which from a research perspective is exactly what we're hoping to see when we start engaging and trying to find these bots. Now, it definitely differs. We found bots that were much older. We found bots that were stood up and began use right away and quickly suspended.
Starting point is 00:17:45 So it's kind of all over the map. But as a general trend, we're really encouraged with how proactive Twitter was in shutting them down. And so what do you suppose the conclusions that you've drawn here, how do they inform folks who are going about their day-to-day work trying to protect their organizations? What are some of the take-homes for them? work trying to protect their organizations. What are some of the take-homes for them? So it's all about remaining vigilant on Twitter and just trying to keep a close eye on, especially whenever it comes to malicious content, kind of looking back at the cryptocurrency scam where we saw this being used quite a bit. If something appears too good to be true, it likely is. But the other benefit of our research is really focusing on enabling other researchers
Starting point is 00:18:25 to take our work, build on it, and improve it in really interesting ways. And we're already seeing that happen, which is really exciting. We're seeing third-party researchers come to us saying, we were able to take your tools and your methodologies and apply it to this particular discipline or apply it to this particular area with these really fantastic results, which is just the very best thing that we can hope for from a lab perspective. This is why with this research, like the fake followers and like our original Don't At Me study, we're releasing the code that we wrote to detect these amplification bots using a crawler. We're open sourcing all of it so that other researchers can take, build on, and improve it.
Starting point is 00:19:09 Our thanks to Jordan Wright from Duo Security for joining us. Along with his colleague, Alavade Anis, he's co-author of the report, Anatomy of Twitter Bots, Amplification Bots. That's on the Duo website. We'll have a link in the show notes. And now a message from Black Cloak. Did you know the easiest way for cyber criminals to bypass your company's defenses is by targeting your executives and their families at home? Black Cloak's award-winning digital executive protection platform secures their personal devices, home networks, and connected lives. Because when executives are compromised at home, your company is at risk. In fact, over one-third of new members discover they've already been breached. Protect your executives and their families 24-7, 365, with Black Cloak. Learn more at blackcloak.io.
Starting point is 00:20:13 The Cyber Wire Research Saturday is proudly produced in Maryland out of the startup studios of Data Tribe, where they're co-building the next generation of cybersecurity teams and technologies. Our amazing Cyber Wire team is Elliot Peltzman, Puru Prakash, Stefan Vaziri, Kelsey Bond, Thanks for listening.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.