Screaming in the Cloud - The Power of Networking in the Cloud with Tom Scholl
Episode Date: August 29, 2024A cloud service is only as good as the team of network engineers who keep it up and running. In this episode, AWS Vice President and Distinguished Engineer Tom Scholl breaks down the importan...ce of security and legwork needed to support the company’s massive infrastructure. Corey picks Tom’s brain while singing the praises of the AWS DDoS Protection Team, marveling at the scale of the modern internet, and looking ahead to the next generation of network engineers that could land at AWS. If you’ve ever wondered about the inner workings of the AWS cloud, then this is the discussion for you.Show Highlights: (0:00) Intro(1:09) The Duckbill Group sponsor read(1:42) The importance of a good network for AWS(3:38) Evolution of networking(6:03) Efficiency of the AWS DDoS Protection Team(7:29) AWS Cloud and weathering DDoS attacks(10:03) Policing network abuse(12:08) Walking the SES tightrope and network attacks(15:00) Ensuring the security of the internet(17:53) The Duckbill Group sponsor read(18:37) Scale of the modern internet(20:47) Migrating the AWS network firewall(21:54) Internal network scaling(24:27) Preparing for DDoS disruption(29:14) Finding the next generation of network engineers(32:15) Where to learn more about AWS cloud securityAbout Tom Scholl:Tom Scholl is a VP and Distinguished Engineer at Amazon Web Services (AWS) in the infrastructure organization. His role includes working on AWS’s global network backbone, as well as focusing on denial of service detection and mitigation systems. He has been with AWS for over 13 years.Prior to AWS, Tom was a Principal Network Engineer at nLayer and AT&T Labs (formerly SBC Telecom). He also previously held network engineering roles at OptimalPATH Digital Network and ANET Internet Services. Links Referenced:AWS Security Blog: https://aws.amazon.com/blogs/security/How AWS threat intelligence deters threat actors: https://aws.amazon.com/blogs/security/how-aws-threat-intelligence-deters-threat-actors/Using AWS Shield Advanced protection groups to improve DDoS detection and mitigation: https://aws.amazon.com/blogs/security/using-aws-shield-advanced-protection-groups-to-improve-ddos-detection-and-mitigation/AWS re:Inforce 2024 presentation on Sonaris and MadPot: https://www.youtube.com/watch?v=38Z9csvyFDgNANOG 2023 presentation on AWS networking infrastructure: https://www.youtube.com/watch?v=0tcR-iQce7s AWS re:Invent 2022 presentation on AWS networking infrastructure: https://www.youtube.com/watch?v=HJNR_dX8g8c AWS re:Invent 2022 presentation on Scaling network performance on next-gen Amazon EC2 instances: https://www.youtube.com/watch?v=jNYpWa7gf1A&t=1373sIEEE paper on Scalable Relatable Diagram (SRD): https://ieeexplore.ieee.org/document/9167399SponsorThe Duckbill Group: https://www.duckbillgroup.com/
Transcript
Discussion (0)
I mean, it's definitely, you know, in the many, many terabits of capacity.
And it's different layers of the network, right?
Because you have to think from an availability zone, a data center, you know, how do you connect this to the rest of the world, right?
So there's, you know, large amounts of capacity within a particular AWS region.
And then you actually have to interconnect that too.
Welcome to Screaming in the Cloud.
I'm Corey Quinn.
My guest today is Tom Scholl, VP and Distinguished Engineer at AWS.
Tom, thanks for joining me up. AWS, haven't heard of those folks. What do you do?
Hey, thanks for having me. I am an engineer who focuses on our network, our overall infrastructure organization.
So that includes our data centers, to our hardware engineering,
to our supply chain, some of our network edge services, and our network infrastructure and
things like particularly in the DDoS, anti-DDoS use case, as well as some of our CDN work as well.
And more specifically, I focus on our network infrastructure, kind of our global backbone
and internet transit and pairing. And I spend a fair
amount of my time in DDoS protection and disruption. This episode is sponsored in part by my day job,
the Duck Bill Group. Do you have a horrifying AWS bill? That can mean a lot of things.
Predicting what it's going to be, determining what it should be, negotiating your next long-term
contract with AWS, or just figuring out why
it increasingly resembles a phone number, but nobody seems to quite know why that is.
To learn more, visit duckbillgroup.com. Remember, you can't duck the duck bill bill.
And my CEO informs me that is absolutely not our slogan. There's, I think, a lack of awareness
societally around the value of the network to something like this. There's, I think, a lack of awareness societally around the value of
the network to something like this. I mean, without this, AWS becomes probably the world's
largest collection of space heaters. Because without being able to talk to one another,
computers don't tend to do a whole heck of a lot. It used to be something that was incredibly
top of mind for folks because networks would break and things would stop being able to
communicate clearly. But for most of the world, it's gone to the level of being a utility where when you turn on the
faucet in the bathroom, you don't wonder, is water going to come out this time? It just does. If it
ever doesn't, that's momentous. And networks have sort of gone the same way, at least from the
business user perspective, in no small part due to people who are doing the things that you do.
How did you get into this space?
Well, it all started back in the 90s.
I used to dial into BBSs and starting to learn a lot about Unix and Telephony and
those sorts of systems.
And I eventually got a job in an ISP where you had to be a jack of all trades,
where you had to know Unix sysadmin work.
We have to run the Unix radio servers, mail servers to,
hey, you have to learn some of that network stuff too.
In addition to being tech support too. So you had to basically hey, you have to learn some of that network stuff, too, in addition to being tech support,
too. So you had to basically kind of
know it all and kind of end-to-end, right? And there was
nothing that you could say no to. That wasn't your
specialty, and did that, and eventually
got a job at the phone company
in the Chicagoland area, which was Ameritech,
which later got acquired by SBC, which had
Pacific Bell and SCT in Connecticut,
Southwestern Bell, and worked on building
our broadband network and our internet infrastructure
and got involved in sort of the whole networking scene
with Nanog and pairing the whole ecosystem.
And basically it was building large networks.
And then we eventually acquired AT&T,
which is even bigger network on top of that.
And just did that for a fair amount of time.
And then around 2010, joined Amazon and left
and then briefly came back
and have been working on the Amazon side, primarily on our border network, which is basically the ISP transfer provider of
Amazon that connects our data centers, interregion connectivity, connectivity to and from the
internet. And then the last four years, I've been spending a bit more time on the DDoS space.
I had the privilege of watching your talk at NANOG in Kansas City a month or two ago before
this recording. And it was interesting seeing how
so much of what you do, especially these days, seems like it shies away from a lot of the
technical countermeasures for DDoS and leans much more heavily into being a human being,
reaching out to network operators on the other side of the line when you start seeing bad behavior
emitting from their networks. Has that been something that's always been the case? And I've just been blind to it. Is this an evolution in
networking culture? So I think in the networking culture, there's always been a strong operator
community and you build a lot of relationships and friendships over time where, you know, hey,
if there's a problem in another person's network, like you are that Rolodex, right? For reaching out
to somebody, a particular CDN or cloud or hosting provider and saying, hey, we've got an issue and
we need to troubleshoot it.
And so that transition worked really well in the DDoS space where you would see the
sort of abuse that might be occurring from different parts of the world.
It's like, well, who do I know there?
Well, I know some of the networking side and let me go reach out to them.
And depending on the nature of the issue, it is human contact to basically engage somebody
to say, hey, can you route me to the right person?
So I was kind of doing it for decades on the network side when it came to troubleshooting. And with the DDoS side,
it's kind of a natural evolution to kind of leverage those same relationships to make progress.
It's one of those areas where it feels like there's not a lot of public awareness of the
fact that all the big hyperscalers who compete with each other in cutthroat ways and in many
business ways are very much working together
around things like, I guess, the dark forces that will attempt to destroy the internet,
around security, around abuse, around network peering. There's very much a sense of we're
all in this together in every conversation I've been a part of.
That's correct. There's very much a lively operator community where, you know, reaching
out to people, engineer to engineer,
operator, operator, when you have a problem, it's like, hey, there's a mutual thing. It could be a
mutual customer of ours, right? Or whatever it might be, but it's like, you know, we want to
get the packets to flow. We're all in this together. Let's try to find a way to work the
problem and get drive resolution. And so a lot of that could be, you know, directly through email or
other back channels or slacks and things like that, where you need to reach out to people.
We certainly found issues in other people's networks where it's like, hey, this thing is on fire.
You need to take a look at it.
And so in addition, we have formal ways to actually engage individual NOCs and things like that.
But definitely having those relationships pays off quite a bit when it comes to networking, abuse, DDoS, stuff like that.
AWS offers a shield product that is DDoS protection, and the basic level is rolled out to most
of your endpoints.
Customers benefit from that automatically.
There's a DDoS shield advanced product that comes in at a fixed fee of $3,000 a month,
which at enterprise scale is dropping the bucket.
It also does some weird economic things of changing how WAF rules wind up being charged.
But what I found from customers who've had that and who have suffered from DDoS issues historically,
far and away the thing that they say
that the biggest benefit of that
has been being able to coordinate more closely
with the AWS DDoS prevention team.
Every story I've heard about those folks
has been absolutely top flight.
And it's rare because usually
when someone is undergoing an attack,
they're not in a good mood. I'm just going to say it. They're angry. They're stressed out.
They're wondering, will the website ever work again? So they're inclined to lash out. But
I've heard nothing but positive stories about the team's work.
That's great to hear. And I'm sure the team will be delighted to hear that.
Because I assure you, if you have negative things to say, they find their ways to me.
I'm sort of a negativity magnet by happenstance, I suppose. No, I mean, that team, and I work with
them really closely, and they basically protect all of Amazon in addition to customers who have,
let's say, Shield Advanced, where they directly engage with them, identify the attack, come up
with medications, and work with customers pretty closely. So it's definitely an area that we're
proud to have and definitely enjoy working with them closely.
My experiences with DDoS historically, and when you start a sentence like that,
it sounds like it could go really negatively, but no, I was always firmly on the victim's side of it,
where I was a network staff for a time for the Freenode IRC network, which was an ever-popular
target because, oh, well, what am I going to do today?
I'm just going to give people grief on the internet because.
So there were constant challenges
in dealing with SYN floods
and then more sophisticated attacks as time went on.
And you saw it not just in my hobbies there,
but I would see it with companies
where in some cases suspected competitors
would wind up launching giant attacks
at unprotected endpoints.
And it was easier to do early on when someone had a few servers sitting in a rack in their office.
You can overwhelm links pretty easily.
As hyperscaling started to be a thing and people started realizing,
oh, maybe there's something to this cloud thing,
at least publicly it seems like a lot of those problems kind of went away.
Given that you
have been talking about this for a while, including on stage to very smart network people like
yourself, I get the sneaking suspicion that people just didn't give up on this. There's an awful lot
of very hard work that you and people like you are putting into this. How has it evolved?
Definitely. In the last several years, there's different, when you think about DDoS, there's
different types out there. There's what we call layer four DDoS, and that's basically, you know, either bandwidth saturating, bits per second heavy, or packets per second heavy, which is really there to kind of exhaust state, right? So there's, traditionally, that's been historically how we think about DDoS. And then the last several years has also been layer seven request floods, which are basically HVGET input attacks that just overwhelm from a request-per-second perspective.
But what has changed is that in the last several years,
there's been much more focus in actually identifying
where the infrastructure that's being used to launch these attacks
and actually focusing on disrupting that
and engaging with the actual sources of this traffic
to go and get this shut down.
And that comes in different forms, right?
Where it could be if it's spoof-type traffic,
which we could talk a little bit more about
how we can, you know, with our global backbone
and our global reach and the amount of networks
we connect to gives us insight
into where spoof traffic comes from.
And that's a unique one
because that's been a 20 plus year issue.
And that goes back to IRC and Smurf attacks
and things like that that people used to do.
So that was kind of a unique area
where we stepped up and collaboration
other networks actually chase that down. And then there's other areas where we look at things like
botnets and finding the command and control servers and actually going to target them and
reach out to the hosting provider to get that shut down and the domain registers as well.
So that's some example of where we've started some of that work and pushed pretty aggressively on it.
I started my career in tech running, well, what I thought were large-scale email systems
compared to what you folks are doing.
But I'm at the scale of, that's cute at a university.
But managing a lot of the spam that was coming in was sort of a hobby horse of mine.
I wound up getting dragged along fairly far down that path.
But today, if I were to set up a web server somewhere on the internet,
or sorry, set up an email server somewhere on the internet and start turning to an open relay or
sending ridiculous spam out of it, it would not be very long at all before every provider within
some small degree of rounding error would still no longer accept traffic from that server. They
would effectively black hole that. It would wind up on a bunch of block lists, and that would be the end of it. I'm curious why that pattern doesn't tend to follow a lot of these
network providers who do a poor job of policing the traffic that they are emitting. Is that just
because they're so big that it's difficult to wind up seeing it all from their side? Is it that
they're too big to block as people are just not going to block AT&T, for example? Or is there something more to it? I mean, I think every network has their
own policy of how they deal with this. I think some networks actually are proactive and they
look and are we sending any abuse out? And you definitely find cases where there's other networks
that could do a better job. I know from the AWS perspective, we certainly have various different
detection and mitigation capabilities if we ever see anything anomalous leaving from our network.
And one of the things that in the last few years, like we've actually up leveled that to look for communication to command and control servers.
And like that might be out there on the Internet and actually block that communication that even prevents resources from actually launching attacks in the first place, as well as reaching out to customers and say, hey, you're talking to this thing that our trust and safety team will go and engage with.
So I think we do a really good job of actually preventing that sort of preventative type
of work, where I think a number of other networks out there just haven't gotten to that area.
Maybe they just, you know, the abuse team may not be funded appropriately.
I can't really speak to how other networks operate, but we definitely, it's a high priority for us for sure. Something that I do want to call out is in the
early days when, even before SES came out, the EC2 IP ranges were generally in some cases a source
of abusive traffic. And this is no necessary fault of your own. It's when you wind up letting anyone
start using computers with the swipe of a credit card instantly, that that's an incredibly powerful
thing. Not everyone is a good actor trying to build a business. Sometimes
it's just, I want everyone to see my marketing, and it devolves massively from there very quickly.
And you see that tension somewhere, where people sometimes find it challenging to get out of the
SES sandbox for some workloads. Having worked for the SES team enough, I am of the opinion that they make the right call
most of the time. But in the early days, AWS's traffic, especially once SES launched, was viewed
in the anti-spam community with some suspicion and distrust. I think on some level, that's probably
a function of scale where, well, they're too big to really be able to communicate with anyone over
there. So of course, they're going to be a bad actor.
I don't see that anymore.
There has been a tremendous focus somewhere on tamping out that behavior.
But it's also happening from the perspective of not inconveniencing legitimate customers.
That feels like an impossible tightrope to walk, but some of you folks have done it.
Yeah, I don't work with the SES team that closely,
but I'm aware of some of the efforts
that they've done in terms of how they control
and their detection systems that they built
to prevent that sort of activity.
But we could follow up with you
with more details on some of that.
There's more to it than I believe just email.
That's the one that I have the best experience with.
But I do not hear particular stories.
When you hear about the various forms of
novel network attacks
and the rest,
and you start looking at
some of the traces
that wind up getting published,
for here are the bad actor IPs
that are helping to slam this thing,
I don't see AWS represented
nearly as much as I would expect
relative to the sheer number,
the sheer size of the IP space
that you folks control.
There is clearly something
highly proactive going on that is making the internet a better place. number, the size of the IP space that you folks control. There is clearly something highly
proactive going on that is making the internet a better place. One of the things that we've talked
about in the last year, which is the system called MadPot, which is basically our honeypot system
that we've developed internally for several years ago, which lets us basically be a sponge to any
sort of negative activity that's going out there. And so we can ingest that data, we can process it,
and we can determine where is it coming from, basically.
And if it's coming from internal resources, such as from EC2,
we engage with our trusted safety teams directly to reach out,
engage with customers, or take any other sort of mitigating action.
So we have some of the systems in place to detect that
and proactively engage and take action as just one example.
And that MatPat system has been used for a variety of other systems on the DDoS side,
but that's just another example where that and some of the work from our trust and safety team to identify and mitigate any sort of outbound malicious abusive activity.
It's been said for a long time that at AWS, security is job zero.
And I've always interpreted that to mean protecting customers from external bad actors, the end.
And also in many cases from hypothetical insider attacks at AWS.
Here's how we guarantee that even Amazonians can't access your data when it's stored here.
And you have countless white papers on this to the point where, okay, if there's something inaccurate in here, I'm certainly not going to be the one to find it.
I take that at face value just based upon the sheer amount of work you folks have done.
A lot of the work that you're doing seems to be, in many respects, aimed not at protecting existing customers, but also security aiming at the larger Internet's well-being as a whole.
Is that accurate?
Is that a wildly naive,
Pollyanna optimistic style
misreading of the situation?
No, that's accurate.
And as we went down this journey
around like 2020,
that's when I started pivoting
into the DDoS space.
And, you know, it was not just,
you know, protect AWS infrastructure,
but protect our customers.
But, you know, looking at the data
and collaborating
with other external networks, it was just
a few of us together that said, you know, we can actually take this further.
Like, let's not just observe it and block it, but, you know, we can actually take some
actions here that will be good for the internet as a whole.
And so that's how we started looking at kind of those three different silos of attack traffic
where we saw, hey, there's spoofing traffic coming into our network through pairs.
Like, let's go directly engage with that pair to say, can you trace the spoofing traffic coming into our network through pairs. Let's go directly engage with that pair to say,
can you trace the spoofing back and go and filter it and prevent it
and just make that a daily habit?
And now that one's a little bit more complicated
because you have to go and engage with networks externally
and explain to them what spoofing is.
There's a lot of networks.
Networks have grown.
People who might have been there back in the day aren't there anymore
who maybe are more familiar with it.
So you have to also kind of get over the hump of explaining and with pictures um like hey this is what spoof
traffic is it's yes we know that's not your ip can you go use you know your netflow tooling to
go and figure this out so that was kind of one area um and then when it came to botnets it was
just like well we've got our madpod systems we can find where these botnet command and control
servers are in the domains that they're using like we can go and actually automate and generate the
notes to these hosting providers
to say, here's the data about what's on here.
It's issuing attack commands
to however many thousands of resources around the world.
Please take this down.
And that also goes into the Layer 7 side
where you have resources
where these booters and stressors,
we didn't get really too much into kind of where these attacks come from, but these booters and stressors, we didn't get really too much into kind of where these attacks come from.
But the booters and stressors, they set up a number of machines and they get open proxy lists and they just basically go and blast away at them.
And so you could try to mitigate all the proxies on the Internet or would it be better to really just go to actually the source that's actually generating it and focusing on it?
And it was just really just a few of us together that said it wasn't anyone's roadmap, really.
We're like, this is something we should just go and do.
Let's get it going and measuring the impact of it
that it's had, it's been pretty exciting.
Here at the Duckbill Group,
one of the things we do with, you know, my day job
is we help negotiate AWS contracts.
We just recently crossed $5 billion
of contract value negotiated. It solves for fun
problems such as how do you know that your contract that you have with AWS is the best deal you can
get? How do you know you're not leaving money on the table? How do you know that you're not doing
what I do on this podcast and on Twitter constantly and sticking your foot in your mouth. To learn more, come chat at duckbillgroup.com.
Optionally, I will also do podcast voice when we talk about it.
Again, that's duckbillgroup.com.
One thing that I continually have to remind myself of is the sheer scale of the modern internet.
You folks recently announced Direct Connect availability in some locations
at 400 gigabit per second,
which is just monstrously fast.
Now, I can make jokes
because of how I see the world
in terms of data transfer means money,
but ignoring entirely
economic impact of that,
the sheer scale of peering
between AWS and Comcast,
given the disturbing proportion
of the internet,
and sometimes it feels like you and your peers tend to represent the sheer volume of traffic that must be.
It's almost Beggar's belief to be able to even picture that sense of scale.
It feels like at some point, even as big as I think it is, the reality is almost certainly much larger than that.
No, I mean, it's definitely in the many, many terabits of capacity. The reality is almost certainly much larger than that. A lot of our teams that focus on our backbone network topology, like what are the amount of routes you need to set up backbone links to, understanding diversity when it comes to, well, cables are going to get cut, terrestrial or subsea, right?
And so how much additional capacity do you need to provision on alternate paths to plan for some of these cuts where a terrestrial cut might be short-lived, it might be a day or two, whereas a subsea could be weeks or months, right?
So you have to put a lot of planning into actually having a lot of this capacity there
and standing by.
And then there's the internet side.
Once you get to the actual edge of the network,
you actually have to go and capacity plan
with all these external networks, right?
And one of the things that's really been helpful for us
is that in the last several years,
we've taken a lot of the data center technology,
network technology that we've used there,
and we've actually brought that into basically the border,
kind of the ISP border backbone side of it, where we've taken some of these smaller
commodity chipset devices and actually used them in the internet scale, which is something that is
not super common out there. And so that's really allowed us to basically get into these
end by however many hundred gigs or end by however many 400 gigs. And it's been able to
allow us to scale up rapidly and stay ahead of things. You folks, I think earlier this year,
had a blog post or, I don't know if it was a blog post or white paper. I know it was esoteric
compared to a lot of the stuff that you folks put out, which frankly, I'm here for. It talked
about migrating off of a bunch of networking appliances from legacy vendors was the vibe
that I got onto the AWS managed firewall offering and how that wasn't just a bunch of networking appliances from legacy vendors was the vibe that I got onto the AWS
managed firewall offering and how that
wasn't just a matter of
the capability of handling throughput
at scale, but the ability to get
observability into what those traffic flows
looked like in ways that previously had been very
challenging. I'm aware of that project and know
the team really well. That was an effort
to basically move away off of
hardware-based firewalls in a certain
particular portion of the network. And it really caused the team to look at network firewall and
how are we going to leverage the capabilities of that system. And it actually, in the end,
it got us to a really good spot because it gave us a level of compartmentalization that we like
with VPCs. It gave us a level of visibility through flow logs and through some of the network
firewall capabilities that we really like. And so it was gave us a level of visibility through flow logs and through some of the network firewall capabilities
that we really like.
And so it was a good success story of how like,
hey, we can run these workloads on our products
and it's worked really well.
Well, the question I have about IndieWave
and internal networks work there.
I mean, obviously the way that you are peering
with other folks,
you aren't rolling out your own custom special version
of BGP because as it turns out,
when it comes to the internet,
interoperability is kind of a big deal.
But at reInvent two years ago,
you folks talked about an internal TCP replacement protocol,
SPF or something like that.
And it was, this is fascinating what you talked about.
It makes latency to EBS a lot lower,
was I think the story that got told.
It was, this is fascinating from a protocol perspective.
Can you tell us more about it?
And the answer was no.
Great.
Awesome.
We just sit here and be envious from the outside.
My question is, is internally at AWS, when you start getting into the large scale internal
networking piece, how much of a resemblance does it bear to what you might expect at a
from commercial offerings or someone working in a Cisco lab to pass a
certification, everything just scales up from there. Is it complete Wonderland style stuff,
or is it just the basics you would expect anywhere else writ large?
I would say that it's certainly a network interconnection points within between,
let's say, EC2 and sort of the border network. That's where you'll typically find still things
like BGP operating.
We certainly use BGP.
Obviously, externally, we have to, to the internet.
Within the data centers itself,
there's a mixture of different existing open standard routing protocols.
But in the last few years,
there's been some effort to actually focus on,
can we build additional protocols that can provide us
more rapid convergence and more unique topologies?
So there's definitely active work going on there to actually look at.
Because some of these protocols between OSPF, they do have their own limitations.
And you could modify them and twist them and turn them in certain ways.
But there's also some benefits by saying, can we rethink about how we do link adjacencies
and how do you path calculations?
So certainly within the data center space, there's some of that innovation that's been going on there.
But on the,
and another part to also consider is that a lot of what we do in terms of
traffic steering is through controllers,
right?
So we have different software based controllers when you have traffic that
goes,
let's say to the internet to basically how do you,
you know,
routing protocols don't have a lot of things about performance,
right?
They don't understand latency.
BGP doesn't capture that.
So a lot of behind the scenes, we have controllers don't understand latency. BGP doesn't capture that.
So a lot of behind the scenes,
we have controllers that actually look at performance data from the system that feeds into CloudWatch Internet Monitor
to actually steer things to say,
okay, you need to move this prefix over this location.
Okay, this other path of latency has gotten better.
Let's shift it over there.
Does it fit?
So there's a lot of, it's not just the protocols itself.
It's also the controllers that actually manipulate
the routers themselves and forwarding. guaranteed way to wind up beating it was to be able to throw more bandwidth at it than the attacker
could summon. The problem is with malware being what it is in the scale of the internet today,
they more or less wind up with infinite levels of bandwidth. So at some point,
that just becomes an arms race. How have you been doing around the area of DDoS disruption?
So, I mean, you're accurate in that, like, yes, the attacks get bigger and bigger. You know,
it used to be, you know, hundreds of gigabits, and now you're seeing into the low terabits level of bits per second.
And, you know, in order to address that, you need to have a really large front door, right?
And so that is one of the things that AWS does have, you know, at our scale is that we do have those large front doors,
whether cloud front, application load balancer, where you can basically absorb some of that traffic level.
So that's certainly critical in order to be able to operate in that space.
Now, in terms of disruption, it really comes down
to identifying through some of our systems
of MatPot to actually identify
where these attacks are coming from and then engaging
with those external network operators
to basically say, hey, there's a C2 server
that needs to be taken down.
It's clearly hosting bad things.
Can you shut it down, please?
This domain register, can you take this domain down because it is hosting a C2 that's there? With the layer seven attacks, it's interesting because it's actually typically a lot of Node.js scripts
running on machines with lots of memory and a proxy list that someone imports,
and it has some orchestration.
So typically, a lot of these DDoS operators, they have storefronts,
and those storefronts are kind of hidden a little bit further away from where the attacks actually get generated.
So a lot of the focus that we've done is looking at the actual infrastructure that can generate these and direct engagement with those networks to shut it down where possible.
There was a school of thought for a while that, oh, about hackback attacks, where, oh, someone is attacking you, you just go ahead and wind up breaking into their systems and the rest.
And I was a little concerned because that's always been a dicey proposition at best. So when you started talking in your talk at NANOG about the idea of disrupting these attacks,
it's like, oh no, this is about to go somewhere disastrous. And no, you kept it very much in the
correct direction. And I do keep a hand in the space just to make sure that people aren't
increasingly suggesting debunked ideas from the early knots again, because enough time has passed and people don't think that,
oh, well, this time it's sure to work.
But your holistic approach to it has really been something of note.
No, I think with definitely on the spoofing side,
there's a lot of collaboration with networks.
And occasionally we do get a network where it can be difficult to deal with, right?
And so we'll sometimes talk to other their peers as well,
or maybe their upstream provider, you know, we're not getting through to them and we'll talk
to them and be like, Hey, this is coming from your downstream network. Like, what are your options
here that we can, we can do? So we definitely focus on, you know, being nice and communicating
through email or personal contacts to, to address whatever the issue is. And, and it's a mixture of
things of like education, right? Some of these networks just don't know.
It's interesting that broadband networks have done a really good job of preventing spoofing
by default, right? You get a cable or DSL line, you can't spoof on it.
But it's typically kind of the hosting shops that we find that have typically,
oh, if you've got a dedicated server, then you can spoof, right? So a lot of it comes down to education
and saying, you should make this the default, right?
Or when somebody asks, sometimes people can ask their hosting provider, say, hey, I need to spoof for whatever use case, right?
Sometimes they call it IP header modification, IPHM.
They'll ask for that to be removed.
It's like, okay, we've talked to hosters.
We're like, oh, this customer asked for it to be removed.
They're like, well, you might want to be a little skeptical about it next time, if you can, please.
Yeah, once it's been removed,
what is the behavior they start doing?
What are you seeing going across the wire?
Yeah, trust but verify.
You see all these packets per second that spikes up, right?
And it's all till UDP destination port 53 or 389.
Like that's a pretty good clue, right?
And so that's some of the things
that we do try to educate networks.
So it's like, this is what it looks like.
Here are the different like heuristics
or things that you can look like as a network operator
to find this going on in your network.
And so that's what we've been really spending a lot of time in trying to educate and be
like, here's how you can use some of your off-the-shelf NetFlow tools and some of our
open source that you can actually dig on this and find it on your own.
And I think that's where we've had a lot of success.
And there are some networks that are in that mode or they actually do find it on their
own and they deal with it.
By the time you reach out to them, they're like, hey, it's already taken care of. It's like, that's amazing. I'm glad we've got you in a good spot now. of other people who have been doing this for a very long time, eventually parts wear out and
need to be replaced. As much as some of us might want to live forever, that is not an option that
is currently available. Where does the next generation of people who will do in the future
what you do today, where do they come from? Yeah, no, that's a great question because I
think we struggle with that too sometimes in terms of how do you find talent and how do you,
you know, one of the Amazon leadership principles I like a lot is learn and be curious,
right? And I think, you know, trying to identify folks who have that learn and be curious of like,
hey, I want to go deeper here. I want to understand this a little bit more, you know, don't
maybe just treat this as yet another attack, but like actually understand what's going on behind
it. Like what's actually generating this, right? So a fair amount is just kind of identifying folks
who are interested and, you know, presenting opportunities for them, right? And I think that is the, you know,
as senior technical leaders, like you have to present opportunities for others. And sometimes
it may not go the way you expect, but that's fine. You have to learn. And basically, you know,
allowing people, you know, connecting them with other folks externally, right? Whether that be
external forums, different trust groups, and just how do you basically like, hey, I want to get you into this. And I can, you know, serve as basically connecting
them with other folks, giving the opportunity to take something and running with it, you know,
talking about it after the fact. But it definitely requires like real effort, right? To actually,
you know, help and educate at the same time, which is like, hey, I'm going to have to, you know,
let me try to explain this to you as best as I can. If you have any questions, let me know, no matter what, silly, good, bad,
whatever it is, like I'm here to help, right? I want to make you successful. And I think
certainly as senior technical folks, we definitely need to be growing other folks. And it needs,
you have to carve out the time and resources for it. Do you find that those folks are matriculating
into your org as having studied networking and that that was the direction they wanted to go in or are they basically phasing in from from other technical areas i've seen all
types it's not always purely people with a networking background i've seen people you know
and i've had this conversation with folks before in some of these areas where they're like well
we're not security engineers i'm like neither am i like this is just like like no like this is just
purely like it you know this is an area to immerse yourself in and it was kind kind of my journey, too, when I got in the DDoS space.
Because I've always dealt with it on the receiving end, right?
When we build the network infrastructure and seeing attacks come in.
But I never said, I'm going to actually try to understand this.
And so I had to, myself, immerse myself in this domain.
And even internally, working with other teams inside of Amazon, just understanding trust and safety or the fraud team.
And I was like, hey, I'm coming in here as a newbie.
What can I learn?
And I think definitely with other folks,
we've seen people come in from various backgrounds
where it's like, okay, I want to go and learn.
Luckily, we have a lot of tools and data at our disposal
where folks can pick up and go.
And I think it's just really about
connecting people to it.
And particularly when you're surrounded
around a particular outcome, right?
So, hey, like we want to address this particular issue.
Like, how do we go and lean in here?
And like, what are the different people
that we need to bring together?
So yeah, it's all types of backgrounds.
I really want to thank you for taking the time
to talk to me today.
If people want to learn more, where should they go?
So on the AWS security blogs,
we've definitely had a number of postings about some
of the things that we've built. So we've talked about things like
if you search for MadPot, a recent
thing that we've talked about, which is Scenaris, that we
just were public about, which is sort of this
basically service behind the scenes that actually
detects people trying to
go after, attack customers, right?
And it actually blocks them. So I'd
recommend reading some of the things that we've done on
Scenaris, MadPot, Shield, Advanced.
We've got a number of blog posts that are out there.
Yeah, that's a good starting point
to kind of learn some of the things
that we've done in this domain.
And we will definitely make it a point
to put those in the show notes.
Tom, thank you so much for speaking to me.
I really appreciate it.
Oh, thank you for having me.
Tom Scholl, VP and Distinguished Engineer at AWS.
I'm cloud economist Corey Quinn, and this is Screaming in the Cloud. If you enjoyed this
podcast, please leave a five-star review on your podcast platform of choice. Whereas if you hated
this podcast, please leave a five-star review on your podcast platform of choice, along with an
angry, insulting comment, so then I can block that particular platform from syndication, because
that's how it works.