Screaming in the Cloud - Summer Replay - Ironing out the BGP Ruffles with Ivan Pepelnjak
Episode Date: July 11, 2024If you need a point of contact for all things networking, then look no further than Ivan Pepelnjak. Ivan is the webinar author at ipSpace.net where he is working on making networking an appro...achable subject for everyone. From teaching to writing books, Ivan has been at it for a long and storied career, and as a de facto go-to for networking knowledge, you can’t beat him. In this Summer Replay of Screaming in the Cloud, Ivan and Corey discuss Ivan’s status as a CCIE Emeritus and the old days of Cisco. Ivan also levels his network engineering expertise and helps Corey answer some questions about BGP and its implementation. Ivan aptly narrows it down into “layers” that he kindly runs us through. So tune in for a Dante-esque descent into BGP, DNS and Facebook, seeing out the graybeards of tech, and more!Show Highlights: (0:00) Intro to episode(1:23) Panoptica sponsor read(2:04) The world of VaxVMS(2:39) The significance of being a CCIE emeritus(5:02) The value of certification in the modern tech world(7:37) BGP and networking(12:41) Internal vs. external BGPs(15:23) “Unfair criticisms” of BGP(17:35) Differences between BGP and DNS(23:19) Cloud growth vs. loss of networking engineers(24:57) Panoptica sponsor read(25:20) Outsourcing admin work(27:45) Breaking down the Facebook DNS outage(31:37) Disconnect at the data center(37:06) Where you can find IvanAbout Guest:Ivan Pepelnjak, CCIE#1354 Emeritus, is an independent network architect, blogger, and webinar author at ipSpace.net. He's been designing and implementing large-scale service provider and enterprise networks as well as teaching and writing books about advanced internetworking technologies since 1990.Links Referenced:ipSpace.net: https://ipspace.netOriginal Episode: https://www.lastweekinaws.com/podcast/screaming-in-the-cloud/ironing-out-the-bgp-ruffles-with-ivan-pepelnjak/SponsorPanoptica: https://www.panoptica.app/
Transcript
Discussion (0)
they have DNS servers around the world, and the DNS servers serve the local region, if you wish.
And that DNS server then decides what facebook.com really stands for.
So if you query for facebook.com, you'll get a different answer in Europe than in US.
Welcome to Screaming in the Cloud. I'm Corey Quinn. I have an interesting and storied career path. I dabbled in security engineering slash infosec for a while before I realized that being
crappy to people in the community wasn't really my thing. I was a grumpy Unix systems administrator
because it's not like there's a second kind of those out there.
And I dabbled ever so briefly in the wide world of network administration slash network engineering slash plugging the computers in to make them talk to one another ideally correctly.
But I was always a dabbler.
When it comes time to have deep conversations about networking, I immediately tag out and look to an expert.
My guest today is one such person. Ivan Pepelnyak is oh so many things. He's a CCIE emeritus. And well, let's start there. Ivan, welcome to the show. Thanks for having me. And oh, by the way,
I have to tell people that I was a VEX VMS administrator in those days.
This episode has been sponsored by our friends at Panoptica,
part of Cisco.
This is one of those real rarities
where it's a security product
that you can get started with for free,
but also scale to enterprise grade.
Take a look.
In fact, if you sign up for an enterprise account,
they'll even throw you one of the limited,
heavily discounted AWS skill builder licenses they got because believe it or not, unlike so many companies out
there, they do understand AWS. To learn more, please visit panoptica.app slash last week in AWS.
That's panoptica.app slash lastweek in AWS. Oh, yes.
The VaxVMS world was fascinating.
I talked to a company that was finally emulating them on physical cards
because that was the only way to get them there.
Did you refer to them as Vaxen or Vaxes?
Or how did you wind up referring?
Vaxes.
Vaxes.
Okay.
I was on the other side of that with the inappropriately pluralizing
anything that ends with an X with an EN, Vaxen, and the rest.
And that's why I had no friends for many years.
You do know what the first Vax was, right?
I do not.
It was a Swedish Hoover company.
Ooh.
And they had a trademark dispute with Digital over the name,
and then they settled that.
You describe yourself in your bio as a CCIE emeritus,
and you give the number, which is low, number 1354. Now,
I've talked about certifications on this show in the context of the modern era and whether it makes
sense to get cloud certifications or not, but this is from a different time. Understand that for many
listeners, these stories might be older than you are in some cases, and that's okay. But Cisco at one point, believe it or not, was a shining beacon of the industry, the
kind of place that people wanted to work at.
And their certification path was no joke.
I got my CCNA from them, Cisco Certified Network Administrator.
And that was basically a byproduct of learning how networks worked.
There are several more tiers beyond that, culminating in the CCIE, which stands for Cisco Certified Internetworking Expert,
or am I misremembering? No, no, that's it.
Perfect. And that was known as the doctorate of networking in many circles for many years.
Back in those days, if you had a CCIE, you were guaranteed to be making an awful lot of money at
basically any company you wanted to
because you knew how networking worked.
In the US.
Well, in the US, true.
There's always the interesting stories
of working in places that are trying to go
with the lowest bidder for networking gear
and you wind up spending weeks on end
trying to figure out why things are breaking intermittently
and only to find out at the end
that someone saved 20 bucks by buying cheap patch cables. I digress, and I still have the scars from those. But it was
fascinating in those days because there was a lab component of getting those tests. There were
constant rumors that in the middle of the night during the two-day certification exam, they would
come in and mess with the lab and things you'd set up so you could fix it the following day.
That is true. Yeah. So in the good old days when the lab was still physical,
they would even turn the connectors around
so that they would look like they would be plugged in,
but obviously there was no signal coming through.
And they would mess the jumpers on the line cars and all that stuff.
So when you got your broken lab, you really had to work hard, you know,
from the physical layer up, from the jumpers,
and that would mess up your config and everything else.
It was, you know, the real deal,
the thing you would experience in real world
with underqualified technicians putting stuff together.
Let's put it this way.
I don't wish to besmirch our brethren
working in the data centers, but having worked with folks who did some hilariously awful things with cabling
and how having been one of those people myself from time to time, it's hard to have sympathy
when you just spent hours chasing it down. But to be clear, the CCIE is one of those things where
in a certain era, if you're trying to have an argument on the internet with someone about how networks work and their response is, well, I'm a CCIE, yeah, the conversation
was over at that point. I'm not one to appeal to authority on stuff like that very often, but it's
the equivalent of arguing about medicine with a practicing doctor. It's the same type of story.
It is someone where if they're wrong, it's going to be in the very fringes or the nuances back in this era. Today, I cannot speak to the quality of CCIEs. I'm not attempting to
besmirch any of them, but I'm also not endorsing that certification the way I once did.
Yeah, well, I totally agree with you. When this became, you know, a mass certification,
and the reason it became a mass certification is because reseller discounts
are tied to reseller status, which is tied to the number of CCIEs they have. It became, you know,
this, well, still high-end, but commodity that you simply had to get to remain employed because
your employer needed the extra two-point discount. It used to be that the prerequisite for getting the certification
was beyond other certifications. You spent
five or six years working on things.
Well, that was what
gave you the experience
you needed, because in those days
there were no bootcamps. Today you have a bootcamp.
Now there's bootcamp, brain dump things, where it's
we're going to train you for four straight weeks
of nothing but this. Teach to the
test, And okay.
Yeah.
No, it's even worse.
There were rumors that some of these boot camps in some parts of the world that shall remain unnamed
were actually teaching you how to type in the commands from the actual lab.
Even better.
Yeah.
You don't have to think.
You don't have to remember.
You just have to type in the commands you've learned. You're done. There's an arc to the value of a certification. It comes out,
no one knows what the hell it is, and suddenly it's great. You can use that to really identify
what's great and what isn't. And then it goes at some point down into the point where it becomes
commoditized and you need it for partner requirements and the rest. And at that point,
it is no longer something that is a reliable signal of anything other than that someone spent some time and or money. Well, are you talking about bachelor degree
now? Well, no, I don't have one of those either. I have an eighth grade education because I'm about
as good of an academic as it probably sounds like I am. But the thing that really differentiated in
my world, the difference between what I was doing in the network engineering sense and the things
that folks like you who are actually, you know, professionals rather than enthusiastic amateurs the difference between what I was doing in the network engineering sense and the things that
folks like you who are actually, you know, professionals rather than enthusiastic amateurs
took into account was that I was always working inside of the land, the local area network,
inside of a data center. Cool. Everything here inside the cage, I can make a talk to each other.
I can screw up the switching fabric, et cetera, et cetera. I didn't deal with any of the WAN, wide area network,
think internet in some cases.
And at that point, we're talking about things like BGP
or OSPF in some parts of the world,
or RIP, or RIPv2 if you make terrible life choices.
But BGP is the routing protocol
that more or less powers the internet.
At the time of this recording,
we're a couple weeks past
a BGP kerfuffle that took Facebook down for a number of hours, during which time the internet
was terrific. I wish they could do that more often. In fact, it was almost like a holiday.
It was fantastic. I took my elderly relatives out and got them vaccinated. It was glorious.
Now we're back to having Facebook and terrific. The problem I have
whenever something like this happens is there's a whole bunch of crappy explainers out there of
what is BGP and how might it work? And people have angry opinions about all of these things.
So instead, I prefer to talk to you, given that you are a networking trainer. You have taught
people about these things. You have written books. You have operated large-scale environments.
I even developed a BGP course for Cisco.
You taught it for Cisco, of all places,
back when that was impressive and awesome and not a has-been.
Honestly, I feel like I could go there and still wind up going back in time
and still it's the same Cisco in some respects.
They've all ever died, Dinosaur, and they got frozen in amber.
But let's start at the very beginning.
What is BGP?
Well, you know, when the internet was young,
they figured out that we aren't all friends on the internet anymore,
and that I want to control what I tell you,
and you want to control what you tell me,
and furthermore, I want to control what I believe from what you're telling me so we
needed a protocol that would implement policy where I could say I will only announce my customers to
you but not what I've heard from Verizon and you would do the same and then I would say well but I
don't want to hear about that customer of yours because he's also my customer. So we need some sort of policy. And so they invented a protocol
where you would tell me what you have,
I would tell you what I have,
and then we would both choose what we want to believe
and follow those paths to forward traffic.
And so BGP was born.
On some level, it seems like it's this faraway thing
to people like me
because I have a
residential internet connection, and I am not generally allowed to make my own BGP announcements
to the greater world. Even when I was working in data centers, very often the BGP was handled by
our upstream provider, or very occasionally by a router they would drop in with the easiest
maintenance instructions in the world for me of step one, make sure it has power.
Step two, never touch it. Step three, we'd prefer if you don't even look at it and remain at least
20 feet away to keep from bringing your aura near anything we care about. And that's basically how
you should do with me in the context of hardware. So it was always this arcane magic thing.
Well, it's not, you know, It's like a power transmission. When you know
enough about it, it stops being magic. It's technology. It's a bit more complicated than
some other stuff. It's way less complicated than some other stuff like quantum physics.
But still, it's so rarely used that it gets this aura of being mysterious. And then, of course,
everyone starts getting their opinion,
particularly the graduates of the Facebook Academy.
And yes, it is true that usually BGP would be used between service providers.
So whenever, you know, we are big enough to need policy,
if you just need one uplink, there is no policy there.
You either use the uplink or you don't use the uplink.
If you want to have two different links to two different points of presence or to two different
service providers, then you're already in the policy land. Do I prefer one provider over the
other? Do I want to announce some things to one provider but other things to the other? Do I want to take local customers from
both providers because I want to, you know, have lower latency because they are local customers?
Or do I want to use one solely as the backup link because I paid so little for that link that I know
it's shitty? So you need all that policy stuff. And to do that, you really need BGP. There is no other routing protocol in the world
where you could implement that sort of policy
because everything else is concerned mostly with
let's figure out as fast as possible
what is reachable and how to get there.
And BGP is like, hey, slow down.
There's policy.
Yeah.
In the context of someone whose primary interaction
with networks is their home internet, where there's a single cable coming in from the of someone whose primary interaction with networks is their
home internet, where there's a single cable coming in from the outside world, you plug it into a
device, maybe yours, maybe ISPs, maybe we don't care. That's sort of the end of it. But think in
terms of large interchanges where there are multiple redundant networks to get from here
to somewhere else, which one should traffic go down at any given point in time? Which networks
are reachable on the other end of various distant links? That's the sort of problem that BGP is very
good at addressing and what it was built for. If you're running BGP internally in a small network,
consider not doing exactly that. Well, I've seen two use cases, well, three use cases for people running BGP internally.
Okay, this I want to hear because I was always told, no, touch them. But, you know,
I'm about to learn something. That's why I'm talking to you. The first one was multinationals
who needed policy. Yes, many multi-site environments, large-scale companies that
have redundant links. They're trying to run full mesh in some cases or partial mesh
between a bunch of facilities.
In this case, it was multiple continents
and really expensive transcontinental links.
And it was, I don't want to go from Europe to Sydney over US.
I want to go over Middle East.
And to implement that type of policy,
you have to split the know the whole network into
regions and then each region is what bgp calls an autonomous system so that it gets its tag
its autonomous system number and then you can do policy on that saying well i will not announce
asian routes to europe through us or i will make them less preferred so that if the Middle East region
goes down, I can still reach Asia through US, but preferably I will not go there. The second one is
yet again large networks where they had too many prefixes for something like OSPF to carry.
And so their OSPF was breaking down and the only way to solve that was to go
to something that was designed to scale better, which was BGP. And third one is if you want to
implement some of the stuff that was designed for service providers initially, like VPNs,
layer two or layer three, then BGP becomes this kitchen sink protocol.
You know, it's like using Route 53 as a database.
We are using BGP to carry any information anyone ever wants to carry around.
I'm just waiting for someone to design JSON in BGP RFC,
and then we are, you know, where we need to be.
I feel on some level like BGP gets relatively unfair criticism because the only time it really intrudes on the general awareness is when something
has happened and it breaks. This is sort of the quintessential network or systems or honestly
computer type of issue. It's either invisible or you're getting screamed at because something isn't
working. It's almost like a utility on some level.
When you turn on a faucet, you don't wonder whether water is going to come out this time.
But if it doesn't, there's hell to pay.
Unless it's brown.
Well, there is that.
Let's stay away from that particular direction.
There's a beautiful metaphor probably involving IBM if we do.
So the challenge, too, when you look at it, is that it's this weird esoteric thing that isn't super well
understood. And as soon as it breaks, everyone wants to know more about it. And then in full-on
charging to the wrong side of the Dunning-Kruger curve, it's, well, that doesn't sound hard. Why
are they so bad at this? I would be able to run this better than they could. I assure you, you
can't. This stuff is complicated. It is nuanced. It is difficult. But the common question is,
why is this so fragile and able to easily break?
I'm going to turn that around.
How is it that something that is this esoteric and touches so many different things works
as well as it does?
Yeah, it's a miracle, particularly considering how crappy the things are configured around
the world.
There have been periodic outages of sites when some
ISP sends out a bad BGP announcement and their upstream doesn't suppress it because, hey,
you misconfigured things and suddenly half the internet believes, oh, YouTube now lives in this
tiny place halfway around the world rather than where it's currently being anycasted from.
Called Pakistan, to be precise.
Exactly. There was an actual incident there. We are not dunking on Pakistan as an example, a faraway place. No, no. And Pakistani ISP wound up doing
exactly this and taking YouTube down for an afternoon a while back. It's a common problem.
Yeah. The problem was that they tried to stop local users accessing YouTube. And they figured
out that, you know, YouTube is announcing this prefix, and if they would announce two more specific prefixes, then, you know, they would attract
the traffic and the local users wouldn't be able to reach YouTube. Perfect. But that leaked.
If you wind up saying that, all right, the entire internet is available on this interface,
and a small network of 256 nodes available on the second interface, the most specific route always wins.
That's why the default route or route of last resort is the entire internet.
And if you don't know where to send it, throw it down this direction.
That is usually in most home environments,
the gateway that then hands it up to your ISP
where they inspect it and do all kinds of fun things to sell ads to you
and then eventually get it to where it's going.
This gets complicated at these higher levels. And I have sympathy for the technical aspects of what
happened at Facebook, no sympathy whatsoever for the company itself, because they basically do far
more harm than they do good. And I've been very upfront about that. But I want to talk to you as
well about something that people are going to be convinced. I'm taking this in my database
direction, but I assure you I'm not. DNS. What is the relationship between BGP and DNS? Which
sounds like a strange question sometimes. There is none. Excellent. It's just that different
large-scale properties decided to implement the global load balancing, global optimal access to their servers
in different ways. So Cloudflare is a typical example of someone who is doing any cost.
They are announcing the same networks, the same prefixes from 100 locations around the world.
So BGP will take care that you always get to the closest Cloudflare pop.
And that's it. That's how they work. No magic. Facebook didn't believe in the power of Anycast
when they started designing their service. So what they're doing is they have DNS servers around the world,
and the DNS servers serve the local region, if you wish.
And that DNS server then decides what Facebook.com really stands for.
So if you query for Facebook.com, you'll get a different answer in Europe than in US.
Just a slight diversion on what anycast is. If I ping Google's public resolver 8.8.8.8, easy to remember, from my computer right now, the packet gets there and
back in about five milliseconds. Wherever you are listening to this, if you were to try that same
thing, you'd see something roughly similar. Now, one of two things is happening. Either Google has
found a way to break the laws of physics and get traffic to a central point faster than light, or the 8.8.8.8 that I'm talking to and the one that you
are talking to are not, in fact, the same computer. Well, by the way, it's 13 milliseconds for me,
and between you and me, it's 200 milliseconds. So yes, they are cheating.
Just a little bit, or unless they huddled through the earth rather than having to bounce it off of satellites or through cables.
No, even that wouldn't work.
That's what the quantum computers are for.
I always wondered, now we know.
Yeah, they're entangling the replies in advance,
and that's how it works.
Yeah, you're right.
Please continue.
I just wanted to clarify that point,
because I got that one hilariously wrong once upon a time
and was extremely confused for about six months.
Yeah, it's something that no one ever thinks about unless you know you're really running large-scale DNS because, honestly, root DNS servers were
any-casted for ages. You think there are like 12 different root DNS servers. In reality,
there are like 300 instances hidden behind those 12 addresses.
And fun trivia fact, the reason there are 12 addresses is because any more than that would
no longer fit within the 512 byte limit of a UDP packet without truncating.
Thanks for that. I didn't know that.
Of course. Now, eDNS extensions let you go out for the larger stuff, but you can't guarantee
that's going to hit. And what happens when you receive a UDP packet, when you receive a DNS result with a truncate
flag set on the UDP packet, it is left to the client.
It can either use the partial result or it can try and reestablish over a TCP connection.
That is one of those weird trivia questions they love to ask in sysadmin interviews.
But it's, yeah, fundamentally, if you're doing something that requires the root name
servers,
you don't really want to start going down those arcane paths.
You want it to just be something that fits in a single packet,
not require a whole bunch of computational overhead.
Yeah, and even within those 300 instances, there are multiple servers listening to the same IP address,
and incoming packets are just sprayed across those servers,
and whichever one gets the packet replies to it,
and because it's UDP, it's one packet in, one packet out,
problem solved, it all works.
People thought that this doesn't work for TCP
because you know you need a whole session,
so you need to establish the session, you send the request,
you get the reply, their acknowledgments, all that stuff.
Turns out that there is almost
never two ways to get to a certain destination across the internet from you. So people thought
that, you know, this wouldn't work because half of your packets will end in San Francisco and
half of the packets will end in San Jose, for example. Doesn't work that way. Why not? Well, because the global internet is so diverse
that you almost never get two equal cost paths
to two different destinations
because it would be San Francisco and San Jose
announcing A.A.A.A.
And it would be a miracle
if you would be sitting just in the middle
so that the first packet would go to San Francisco,
the second one would go to San Jose, and back and forth.
That never happens.
That's why Cloudflare makes it work
by announcing the same prefix throughout the world.
So I just learned something new about how routing announcements work,
an aspect of BGP, and you,
a few minutes ago, learned something about the UDP size limit and the root name servers. BGP
and DNS are two of the oldest protocols in existence. You and I are also decades into
our careers. If someone is starting out their career today working in a cloudy environment,
there are very few network-centric roles because cloud
providers handle a lot of this for us. Given these protocols are so foundational to what goes on,
and they're as old as they are, are we as an industry slash sector slash engineers losing
the skills to effectively deploy and manage these things? Yes. The same problem that you have in any
other sufficiently developed technology area. How many people can
build power lines? How many people can write a compiler? How many people can design a new CPU?
How many people can design a new motherboard? I mean, when I was 18 years old, I was wire wrapping
my own motherboard with 8-bit processor. You can't do that today. You know, as the technology
is evolving and maturing, it's no longer fun, it's no longer sexy, it stops being a hobby,
and so it bifurcates into users and people who know about stuff. And it's really hard to bridge
the gap from one to the other. So in the end, you have
like these 20 gray bear people who know everything about the technology and the youngsters have no
idea. And when these people die, don't ask me how we'll get any further on. Few things are better
for your career and your company than achieving more expertise in the cloud. Security improves, compensation goes up, employee retention skyrockets.
Panoptica, a cloud security platform from Cisco,
has created an academy of free courses just for you.
Head on over to academy.panoptica.app to get started.
On some level, it feels like it's a bit of a down-the-stack analogy for what happened to me early in my career.
My first systems administration job was running a large-scale email system.
It was a hobby that I was interested in.
I basically bluffed my way into working at a university for a year.
Thanks, Chapman. I appreciate that.
And it was great, but it was also pretty clear to me that with the rise of things like hosted email, Gmail, and whatnot,
it was not going to be the future of what the present day at that point looked like, which was most large companies needed an email administrator.
Those jobs were dwindling.
Now, if you want to be an email systems administrator, there are maybe a dozen companies or so that can really use that skill set, and everyone else just outsources that. That said, at those companies, like Google and
Microsoft, there are some incredibly gifted email administrators who are phenomenal at understanding
every nuance of this. Do you think that that is what we're going to see in the world of running
BGP at large scale, where a few companies really need to know how this stuff works, and everyone
else just sort of smiles, nods, and rolls with it. Absolutely. We are already there. Because, you know, if I am an end customer and I need BGP because I have two uplinks
to two ISPs, that's really easy.
I mean, there are a few tricks you should follow.
And hopefully, some of the guardrails will be built into network operating systems so
that you will really have to configure explicitly that you want to leak
crowds between Verizon and AT&T, which is great fun if you have two low-speed links to both of
them and now you're becoming transit between the two, which did happen to Verizon. That's why I'm
mentioning them. Sorry, guys. Anyway, if you are a small guy and you just need to up links and maybe do a bit of policy that's easy and that's
achievable let's say with some google and paste and throwing spaghetti at the wall and seeing
what sticks on the other hand what the large scale providers like for example facebook because
we were talking about them are doing is like light years away. It's like comparing me turning on the light bulb and someone running,
you know, a nuclear reactor. Yeah, you kind of want the experts running some aspects on that.
Honestly, in my case, you probably want someone more competent flipping the light switch too,
but that's why I have IoT devices here that power my lights. It, on the one hand, keeps me from
hurting myself, and on the other, leads to a nice seasonal feel because my house is freaking haunted.
So, coming back to Facebook. They have these DNS servers all around the world,
and they don't want everyone else to freak out when one of these DNS servers goes away.
So that's why they're using the same IP address for all the DNS servers sitting anywhere in the world. So the name server for facebook.com is the same worldwide, but it's different machines and they will give you
different answers when you ask where is facebook.com. I will get a European answer, you will
get a US answer, someone in Asia will get whatever. And so they're using BGP to advertise the DNS servers to the world so that everyone
gets to the closest DNS server. And now it doesn't make sense, right, for the DNS server to say, hey,
come to European Facebook if European Facebook tends to be down. So if their DNS server discovers
that it cannot reach the servers in the data center,
it stops advertising itself with BGP.
Why with BGP?
Because that's the only thing it can do.
That's the only protocol where I can tell you,
hey, I know about this prefix.
You really should send the traffic to me.
And that's what happened to Facebook.
They bricked their backbone, whatever they did, they never told.
And so their DNS server said, gee, I can't reach the data center.
I better stop announcing that I'm a DNS server because obviously I am disconnected from the rest of Facebook.
And that happens to all DNS servers because, you know, the backbone was bricked.
And so they just, you know, de-peered from the internet.
They stopped advertising themselves.
And so we thought that there was no DNS server for Facebook,
because no DNS server was able to reach their core.
And so all DNS servers were like,
gee, I better get off this,
because I have no clue what's going on.
So everything was working fine.
Everything was there. It's just that they didn't want to talk to us because they couldn't reach the backend servers. And of course, people blamed DNS first because the DNS servers weren't
working. Of course they weren't. And then they blamed DBGP because it must be BGP if it isn't
DNS. But it's like, you know, you're blaming headache and muscle cramps and
high fever, but in fact, you have flu. For almost any other company that wasn't Facebook, this would
have been a less severe outage just because most companies are interdependent on other companies
to run infrastructure. When Facebook itself has evolved the way that it has, everything that they use
internally runs on these same systems. So they wound up almost with a bootstrapping problem.
An example of this in more prosaic terms are, okay, the data center had a power outage. Okay,
now I need to power up all the systems again, and the physical servers I'm trying to turn on
need to talk to a DNS server to finish booting, but the DNS server is a VM that lives on those physical servers. Uh-oh, now I'm in trouble. That is an overly simplified and
real example of what Facebook encountered trying to get back into this, to my understanding.
Yes, so it was worse than that. It looks like, you know, even out-of-band management access didn't
work, which to me would suggest that out-of-band management was
using authentication servers that were down. People couldn't even log to Zoom because Zoom
was using single sign-on based on Facebook.com and Facebook.com was down. So they couldn't even
make Zoom calls or open Google Dots or whatever. there were rumors that there was a certain hardware tool
with a rotating blade that was used to get into a data center
and unbrick a box.
But those rumors were vehemently denied.
So who knows?
The idea of having someone trying to physically break into a data center
in order to power things back up is hilarious,
but it does lead to an interesting question, which is in this world of cloud computing, there are a lot of people in the
physical data centers themselves, but they don't have access in most cases to log into any of the
boxes. One of the most naive things I see all the time is, oh, well, the cloud provider can read all
of your data. No, they can't. These things are audited. And yeah, theoretically, if they're
lying outright and somehow have falsified all of the third-party audit stuff that has been reported
and are willing to completely destroy their business when it gets out, and I assure you it
would, yeah, theoretically that's there. There is an element of trust here. But I've had to answer
a couple of journalist questions recently of, ooh, is AWS going to start scanning all customer
content? No, they physically cannot do it because there are many ways you can configure things where they cannot see it.
And that's exactly what we want. Yeah, like a disk encryption?
Exactly. Disk encryption, KMS on some level, rolling your own, et cetera, et cetera. They
use a lot of the same systems we do. The point being, though, is that people in the data centers
do not even have login rights to any of these nodes for the physical machines in some cases, let alone the customer tenants on top of those things.
So on some level, you wind up with the people building these systems that run on top of these computers, and they've never set foot in one of the data centers.
That seems ridiculous to me as someone who came up visiting data centers because I had to know where things were when they were working so I could put them back that way when they broke later.
But that's not necessary anymore.
Yeah, and that's the problem that Facebook was facing with that outage.
Because you start believing that certain systems will always work.
And when those systems break down, you're totally cut off.
And then, oh, there was an article in ACMQ
a long while ago where they were discussing,
you know, the results of simulated failures,
not real ones.
And there were hilarious things like
phone directory was offline
because it wasn't on UPS. and so they didn't know whom to
call. Or alerts couldn't be diverted to a different data center because the management station for
alert configuration was offline because it wasn't on UPS. Or you know the one, right, where in New
York they placed the gas pump in the basement
and the diesel generators were on the top floor.
And the hurricane came in and they had to carry gas manually all the way up to the top floor
because the gas pump in the basement just stopped working.
It was flooded.
So they did everything right.
Just the fuel wouldn't come to the diesel generators.
It's always the stuff that is under the hood on these things that you can't make sense of.
One of the biggest things I did when I was evaluating data center sites was I'd get a one-line diagram, which is an electrical layout of the entire facility.
Great.
I talked to the folks running it.
Now let's take a walk and tour it.
Okay.
You show four transformers on your one line diagram. I see two transformers and two empty concrete pads. It's an aspirational one line diagram. It's a joke that makes it a one liner diagram and it's not very funny. So it's okay. If I can't trust you for those little things, that's a problem. Yeah, well, I have another funny story like that.
We had two power feeds coming into the house plus the diesel generator. And it was, you know,
the properly tested every month diesel generator. And then they were doing some maintenance and
they told us in advance that they will cut both power feeds at 2 a.mam on a Sunday morning. And guess what? The diesel generator didn't start.
Half an hour later, UPS was empty. We were totally dead in water with quadruple redundancy.
Because you can't get someone at 2am on a Sunday morning to press that button on the
diesel generator in half an hour. That is unfortunate. Yeah, but that's how the
world works. So it's been fantastic reminding myself of some of the things I've forgotten,
because let's be clear. In working with cloud, a lot of this stuff is completely abstracted away.
I don't have to care about most of these things anymore. Now, there's a small team of people at
AWS who very much has to care, and if they don't, I will say mean things to them on Twitter if I let my hug ops position
slip at just a smidgen.
But they do such a good job at this.
We don't have problems like this almost ever to the point where when it does happen, it's
noteworthy.
It's been fun talking to you about this just because it's a trip down a memory lane that
is a lot more aligned with the things that are there and we tend not to
think about them. It's almost a how it's made episode. Yeah. And don't be so relaxed regarding
the cloud networking because, you know, if you don't go full serverless with nothing on-premises,
you know what protocol you're running between on-premises and the cloud on Direct
Connect? It's called BGP. Ah, you know, I did not know that. I've done some ridiculous IPSec
pairings over those things and was extremely unhappy for a while afterwards, but never got
to the BGP piece of it. Makes sense. Yeah, even over IPSec, if you want to have any dynamic
failover or multiple sites or anything, it's BP. I really want to thank you for
taking the time to go through all this with me. If people want to learn more about how you view
these things, learn more things from you, as I strongly recommend they should if they're even
slightly interested by the conversation we've had, where can they find you? Well, just go to
ipspace.net and start exploring. There's the blog with thousands of blog entries,
some of them snarkier than others.
Then there are like 200 webinars,
short snippets of a few hours of...
It's like a one-man version of reInvent, my God.
Yeah, sort of, but I've been working on this for 10 years
and they do it every year,
so I can't produce the content at their speed.
And then there are three different full-blown courses. Some of them are just, you know,
the materials from the webinars, plus guest speakers, plus hands-on exercises, plus I
personally review all the stuff people submit. And they cover data centers and automation and
public clouds. Fantastic. And we will data centers and automation and public clouds.
Fantastic.
And we will, of course, put links to that into the show notes.
Thank you so much for being so generous with your time.
I appreciate it.
Oh, it's been such a huge pleasure.
It's always great talking with you.
Thank you.
It really is.
Thank you once again.
Ivan Peponyak, network architect and oh so much more,
CCIE number 1354 Emeritus, and read the bio. It's
well worth it. I am cloud economist, Corey Quinn, and this is Screaming in the Cloud. If you've
enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas
if you've hated this podcast, please leave a five-star review on your podcast platform of choice
and a comment formatted as a RIP V2 announcement.