CyberWire Daily - Leaking your AWS API keys, on purpose? [Research Saturday]
Episode Date: November 30, 2024Please enjoy this encore episode: Noah Pack, a SANS Internet Storm Center Intern, sits down to discuss research on "What happens when you accidentally leak your AWS API keys?" This research is a gues...t diary from Noah and shares a project he worked on after seeing an online video of someone who created a python script that emailed colleges asking for free swag to be shipped to him. The research states "In this article, I will share some research, resources, and real-world data related to leaked AWS API keys." In this research, Noah shares what he learned while implementing his experiment. The research can be found here: What happens when you accidentally leak your AWS API keys? [Guest Diary] Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
You're listening to the Cyber Wire Network, powered by N2K. of you, I was concerned about my data being sold by data brokers. So I decided to try Delete.me.
I have to say, Delete.me is a game changer. Within days of signing up, they started removing my
personal information from hundreds of data brokers. I finally have peace of mind knowing
my data privacy is protected. Delete.me's team does all the work for you with detailed reports
so you know exactly what's been done. Take control of your data and keep your private life Thank you. Hello, everyone, and welcome to the CyberWires Research Saturday.
I'm Dave Bittner, and this is our weekly conversation with researchers and analysts
tracking down the threats and vulnerabilities, solving some of the hard problems,
and protecting ourselves in a rapidly evolving cyberspace.
Thanks for joining us.
It was the beginning of the COVID-19 pandemic.
Every conference and career networking event had been canceled that I could find.
career networking event had been cancelled that I could find.
And I created a script that would email companies and ask for free conference swag.
That's Noah Pack, an intern with the SANS Internet Storm Center. The research we're discussing today is titled,
What Happens When You Accidentally Leak Your AWS API keys.
When I started my time in university, I was doing an introduction to computer science class.
And towards the end of that class, the instructor encouraged all the students to pursue a personal project, something they could post publicly on GitHub or learn from. And I saw a video online of a student at a different
university who created a Python script to email universities and ask for free swag. So things like a t-shirt or a mug.
And he got a great response. A lot of admissions departments sent him things to encourage him to
apply there for grad school. I wanted to take that same idea and adapt it a bit. It was the
beginning of the COVID-19 pandemic. Every conference and career networking event had been canceled that I could find.
And I created a script that would email companies and ask for free conference swag.
So I wrote this up in Python.
I found a list of 10 companies that I was fond of and I hoped would respond to me with a
t-shirt or keychain of some kind. I added those company names to my script and it worked flawlessly.
I sent it out. I checked the email sent folder and saw all 10 messages. But to celebrate my achievement,
I posted this code on GitHub.
Shortly thereafter,
receiving multiple login requests
to the email address I'd created
for the script to use.
That was because I hard-coded,
which means that I put in plain text
inside of the code,
the username and password
for that email account
for my script to use.
Oh, Noah. Noah, Noah. Dear sweet Noah.
And I was just a freshman in computer science,
so I hadn't learned safe programming practices yet,
but I certainly learned from the situation.
There had been no ill consequence,
but if my project had been bigger
and I was using AWS or another cloud provider
and hard-coded credentials,
that could have dire financial consequences.
Yeah.
So the lesson here, I suppose,
is that moments after you put this hard-coded email address and password up on GitHub,
I guess automation from other people had searched that out
and just started hammering that email address.
That's exactly what happened.
It happened within minutes.
And I saw the same thing in my research when I published canary tokens on GitHub
for AWS API keys. They were snatched up and used immediately by both threat actors and security
companies that were monitoring for them. Well, let's dig into the research that you did here.
I mean, this does involve canary tokens. For folks who may not be familiar with that, how do you describe a canary token?
Yeah, so a canary token is sort of like a honeypot.
At the Internet Storm Center, we use honeypots,
most of which are Raspberry Pis,
that look like an attractive target
sitting on the Internet for a threat actor to go after.
The honeypots record what commands the threat actor uses
and what files are downloaded.
Canary tokens are similar but on a much smaller scale. They work really well to supplement the
honeypots that we use. Canary tokens can be things like an Excel document, a QR code, or AWS API keys.
a QR code or AWS API keys.
When a threat actor opens the document, scans the QR code,
or uses that API key, it sends an email alert to whoever made the Canary token,
alerting them and giving them a little bit of information about how it was used.
In my use case of AWS API key tokens, it gave me the IP address and user agent that tried to use those credentials. Well, let's dig into the research here. I mean,
what exactly did you set up? So to conduct my research, I added some AWS API key canary tokens to about a moderately but small e-commerce website
that I helped maintain. It gets roughly a thousand visitors a day. It's enough that it'll come up at
the top of a Google search, but someone who's not looking for the things that they're selling probably won't
find it. So I didn't really expect this to be found very quickly, and it wasn't. So it took a
while before someone picked up those keys and tested them. They could have been picked up much
earlier than they were actually tested. But when they were tested, the user agent string was pretty interesting.
The person that was testing them was using the Boto3 library,
and they were using Python on Windows subsystem for Linux.
And their IP address came from ProtonVPN.
So because of the anonymity of a VPN service, it's hard to tie this
to any other attacks or to figure out who is actually behind testing this. It could be anyone
from a threat actor who's looking to abuse this credential to a security researcher that's just
scanning websites. So just so I'm clear here, you had a pre-existing website
and within this website,
you embedded the Canary token,
which to the outside world looked like an AWS API key.
That's exactly right.
And I tried to make it look as though
a developer might have accidentally left it there.
And so who do you suppose was going after this?
I mean, was this obviously an automated process here?
So I would assume that the key was picked up in an automated process, but that it was manually tested.
process, but that it was manually tested.
And that if it were a larger website that receives more traffic, one with
a much different threat
profile, full automation, or
a different threat actor who uses
more automation could pick it up much quicker.
I see. Well, you didn't stop here.
You posted your AWS key elsewhere. Take us through
the next step of the process here. That's right. I also added some AWS API key canary tokens to
GitHub. Now, I created a GitHub repository that I knew any security researcher who lays their eyes on would know that it's a
honeypot. It's there to catch people. The repository was named Canaries and it had a readme
that said something like, this is for some research. If you're a bad guy, try these out.
If not, please just ignore. Wow. Okay.
And after making that repository public,
the requests just flooded in.
It was much different to when I embedded them in the website.
I ended up having to turn off the alerts
just to preserve my email inbox.
But the first one came from AWS,
the first attempt at using those credentials.
And I didn't touch on this in my research,
but when you publish AWS API keys on GitHub,
almost before you can even refresh the page,
AWS will test those keys themselves.
And it's because GitHub has secret scanning built in
where they send anything that they think might be
an API key to AWS to test it.
And AWS will take action.
If it's a real API key and not a Canary token,
they'll send you an email with an urgent subject line.
Action required.
Your AWS access key is exposed for AWS account,
and then it will list your account number.
But not even seconds after AWS tested the key,
I got a ton of requests from a company called GitGuardian.
Now, they have a service that will scan public and private repositories for your secrets.
It's a paid service.
And they tested the keys multiple times within the first few minutes to verify them,
all from similar IP
addresses in Canada. We'll be right back. Do you know the status of your compliance controls right
now? Like, right now. We know that real-time visibility
is critical for security,
but when it comes to our GRC programs,
we rely on point-in-time checks.
But get this.
More than 8,000 companies
like Atlassian and Quora
have continuous visibility
into their controls with Vanta.
Here's the gist.
Vanta brings automation
to evidence collection across 30
frameworks, like SOC 2 and ISO 27001. They also centralize key workflows like policies, access
reviews, and reporting, and helps you get security questionnaires done five times faster with AI. Now that's a new way to GRC.
Get $1,000 off Vanta when you go to vanta.com slash cyber.
That's vanta.com slash cyber
for $1,000 off.
So they're looking for your business here.
They're saying, hey, look what we did.
We found this.
Look how quickly we found this thing. And if you use our service, we'll help you protect against this sort of error.
Not necessarily because they don't reach out to you in any way like AWS did, but
they certainly did see it right away and test it out to protect their customers. Oh, interesting.
Okay. Yeah, that would be a great marketing strategy though. No, clearly I assumed too much, but that's interesting.
So at this point, I mean, you're getting hammered.
You said you had to turn off your email because everything's flooding in.
Right. So this key got a ton of alerts right away.
Almost all of them were from GitGuardian. There was also the request from AWS and a couple
from IP addresses that had been seen doing similar things and scanning the internet.
And so what was your response then? I mean, you see the degree to which this has triggered all
of this activity. As a researcher, what do you do next? Yeah, so my next step was to remove the GitHub repository.
I had gotten the results that I wanted from my research.
I found out that if you publish your AWS API keys on GitHub,
they will be used.
If you publish them on your website, they will be used.
They might take a couple of seconds. They might take a couple of seconds.
It might take a couple of days.
But we also don't know the difference
between when they're picked up and when they're used.
You might be able to rewrite your GitHub repository history
and erase those API keys.
But someone might still have access to them.
They might have downloaded your repository
or the source to your website or scanned your website
before they use those keys.
So the best practice is definitely to rotate them,
to remove all permissions from those keys
and create new keys with the permissions that your code needs.
Yeah, that was going to be my next question.
So once you had removed the information from GitHub, were those keys still being
activated? Were people still trying to hammer away using
those credentials? They were. It took
about an hour after removing the repository before my
last alert came in. So you could chalk that up to
someone having the repository open or downloading
it before looking through it. Perhaps their scanner that they're using to find these leaked
secrets has a bit of a delay or a bit of a backlog from other code that's being uploaded.
I wonder if they'll ever get hit again. Are there folks out there who will grab this and then say,
okay, well, clearly this person realized they had a problem,
but we're going to check again in a month just in case.
Oh, I am extremely excited.
I will be extremely excited if I see that because that would be so cool.
We know that a lot of threat actors
like to lie in wait on networks
before they execute their attack.
So I'm sure a similar thing is possible here.
Yeah.
Well, I mean, I think that the lessons here
are pretty clear.
I mean, how do you sum them up
in terms of the things that you've learned?
I mean, how do you sum them up in terms of the things that you've learned?
Yeah, so leaking your AWS API keys or any credentials is an extremely big deal.
According to Verizon's 2023 data breach investigation report, they said that 61% of data breaches were due to leaked credentials.
And while leaking credentials might seem kind of silly, it seems like a fixable problem. I mean, it's the equivalent to leaving keys to a building
in the parking lot. But it really is harder to stop than you might think. Users reuse usernames
and passwords on sites that are breached. Anyone can fall for a social engineering attack.
Even experienced developers can accidentally publish credentials. And all of those reasons are
those are just three of many reasons that this issue exists and why it's so prevalent and why
entire companies like Truffle Security and GitGuardian exist to solve
this problem. I've seen horror stories from small businesses that had their AWS account hacked,
and the attackers racked up bills in excess of $300,000 before the developers could figure out
how to rotate those keys and mitigate the problem because they didn't have the incident
response experience and they didn't have tools integrated into their code pipeline to find these
secrets and stop them from being published. It's a really good reminder of what I, you know,
I suspect there are folks in our audience who are just nodding along and saying, you know, what a basic straightforward thing this is.
And yet, as you say, despite that, it does happen to so many people.
It happens all the time.
There was a cryptocurrency.
It was sort of what some people call a meme coin back in, I think, 2022 called Shiba Inu coin. And the developers
had a code repository on GitHub where they accidentally leaked their AWS credentials.
Luckily, some security researchers who are fans of the crypto project found them. And unfortunately,
they had no way to contact the developers. There was no bug bounty program.
There was no security.txt on their website.
And those researchers noticed that after a few days,
the AWS API credentials were revoked.
They stopped working,
which means that either they did get a hold of someone at Shiba Inu
or the people at Shiba Inu noticed that someone else,
maybe a bad actor, was using those credentials. Right. What's your advice for folks to help
mitigate something like this if it does happen? The first advice I ever heard on how to mitigate
this issue is actually bad advice. And that would be to rewrite your code repository history on GitHub.
That's because things like the Wayback Machine exist,
and you don't know if somebody's downloaded that code
with the API keys in it.
So the better idea is to rotate those keys.
You could also do things like looking at your CloudTrail logs in AWS
or set up alerts.
At SANS, we like to say that prevention is preferred, detection is a must.
So finding out that those keys were accessed is extremely important.
Teaching secure coding practices is also probably the best and easiest way to prevent this.
is also probably the best and easiest way to prevent this.
This includes avoiding the git command, git add, and a wildcard,
because that can very easily add sensitive files to your repository.
Name the files that contain sensitive information in your.gitignore and your.npmignore files.
Those are sort of like the robots txt of your website,
but forget. And then as a threat hunter, one of the techniques that I really like to use
is to take a baseline of something. So on a network, I would take a packet capture
and look at all of the traffic for the network, slowly eliminating things that I know
aren't bad. And at the end, I'll end up with just the network traffic that could be malicious.
And I'll have a bunch of filters that will filter out all the stuff that I know is good.
Then I can dig into those things that are bad and do the same exercise again in a month or a week or a quarter
and add those same filters and find the new traffic that's bad. That same concept can apply
to any log type, including logs from your cloud provider. So look at those CloudTrail logs,
understand what is supposed to be running in your AWS account, who is supposed to be running what, and look for services you don't recognize.
There are over 200 AWS services at this point, so it's hard to know them all.
But you can at least know what ones you use and everything else you can assume is something that you don't and you can dig into it more.
Our thanks to Noah Pack from the SANS Internet Storm Center
for joining us.
The research is titled
What Happens When You Accidentally
Leak Your AWS API Keys?
We'll have a link in the show notes. and their families at home. Black Cloak's award-winning digital executive protection platform
secures their personal devices, home networks, and connected lives.
Because when executives are compromised at home,
your company is at risk.
In fact, over one-third of new members discover
they've already been breached.
Protect your executives and their families 24-7, 365,
with Black Cloak.
Learn more at blackcloak.io.
The Cyber Wire Research Saturday podcast is a production of N2K Networks.
N2K Strategic Workforce Intelligence optimizes the value of your biggest investment, your people. Thanks for listening. We'll see you back here next time. are Jennifer Iben and Brandon Karp. Our executive editor is Peter Kilby, and I'm Dave Bittner.
Thanks for listening.
We'll see you back here next time.