Screaming in the Cloud - Summer Replay - Ironing out the BGP Ruffles with Ivan Pepelnjak

Episode Date: July 11, 2024

If you need a point of contact for all things networking, then look no further than Ivan Pepelnjak. Ivan is the webinar author at ipSpace.net where he is working on making networking an appro...achable subject for everyone. From teaching to writing books, Ivan has been at it for a long and storied career, and as a de facto go-to for networking knowledge, you can’t beat him. In this Summer Replay of Screaming in the Cloud, Ivan and Corey discuss Ivan’s status as a CCIE Emeritus and the old days of Cisco. Ivan also levels his network engineering expertise and helps Corey answer some questions about BGP and its implementation. Ivan aptly narrows it down into “layers” that he kindly runs us through. So tune in for a Dante-esque descent into BGP, DNS and Facebook, seeing out the graybeards of tech, and more!Show Highlights: (0:00) Intro to episode(1:23) Panoptica sponsor read(2:04) The world of VaxVMS(2:39) The significance of being a CCIE emeritus(5:02) The value of certification in the modern tech world(7:37) BGP and networking(12:41) Internal vs. external BGPs(15:23) “Unfair criticisms” of BGP(17:35) Differences between BGP and DNS(23:19) Cloud growth vs. loss of networking engineers(24:57) Panoptica sponsor read(25:20) Outsourcing admin work(27:45) Breaking down the Facebook DNS outage(31:37) Disconnect at the data center(37:06) Where you can find IvanAbout Guest:Ivan Pepelnjak, CCIE#1354 Emeritus, is an independent network architect, blogger, and webinar author at ipSpace.net. He's been designing and implementing large-scale service provider and enterprise networks as well as teaching and writing books about advanced internetworking technologies since 1990.Links Referenced:ipSpace.net: https://ipspace.netOriginal Episode: https://www.lastweekinaws.com/podcast/screaming-in-the-cloud/ironing-out-the-bgp-ruffles-with-ivan-pepelnjak/SponsorPanoptica: https://www.panoptica.app/

Transcript
Discussion (0)
Starting point is 00:00:00 they have DNS servers around the world, and the DNS servers serve the local region, if you wish. And that DNS server then decides what facebook.com really stands for. So if you query for facebook.com, you'll get a different answer in Europe than in US. Welcome to Screaming in the Cloud. I'm Corey Quinn. I have an interesting and storied career path. I dabbled in security engineering slash infosec for a while before I realized that being crappy to people in the community wasn't really my thing. I was a grumpy Unix systems administrator because it's not like there's a second kind of those out there. And I dabbled ever so briefly in the wide world of network administration slash network engineering slash plugging the computers in to make them talk to one another ideally correctly. But I was always a dabbler.
Starting point is 00:01:04 When it comes time to have deep conversations about networking, I immediately tag out and look to an expert. My guest today is one such person. Ivan Pepelnyak is oh so many things. He's a CCIE emeritus. And well, let's start there. Ivan, welcome to the show. Thanks for having me. And oh, by the way, I have to tell people that I was a VEX VMS administrator in those days. This episode has been sponsored by our friends at Panoptica, part of Cisco. This is one of those real rarities where it's a security product that you can get started with for free,
Starting point is 00:01:34 but also scale to enterprise grade. Take a look. In fact, if you sign up for an enterprise account, they'll even throw you one of the limited, heavily discounted AWS skill builder licenses they got because believe it or not, unlike so many companies out there, they do understand AWS. To learn more, please visit panoptica.app slash last week in AWS. That's panoptica.app slash lastweek in AWS. Oh, yes. The VaxVMS world was fascinating.
Starting point is 00:02:07 I talked to a company that was finally emulating them on physical cards because that was the only way to get them there. Did you refer to them as Vaxen or Vaxes? Or how did you wind up referring? Vaxes. Vaxes. Okay. I was on the other side of that with the inappropriately pluralizing
Starting point is 00:02:22 anything that ends with an X with an EN, Vaxen, and the rest. And that's why I had no friends for many years. You do know what the first Vax was, right? I do not. It was a Swedish Hoover company. Ooh. And they had a trademark dispute with Digital over the name, and then they settled that.
Starting point is 00:02:39 You describe yourself in your bio as a CCIE emeritus, and you give the number, which is low, number 1354. Now, I've talked about certifications on this show in the context of the modern era and whether it makes sense to get cloud certifications or not, but this is from a different time. Understand that for many listeners, these stories might be older than you are in some cases, and that's okay. But Cisco at one point, believe it or not, was a shining beacon of the industry, the kind of place that people wanted to work at. And their certification path was no joke. I got my CCNA from them, Cisco Certified Network Administrator.
Starting point is 00:03:18 And that was basically a byproduct of learning how networks worked. There are several more tiers beyond that, culminating in the CCIE, which stands for Cisco Certified Internetworking Expert, or am I misremembering? No, no, that's it. Perfect. And that was known as the doctorate of networking in many circles for many years. Back in those days, if you had a CCIE, you were guaranteed to be making an awful lot of money at basically any company you wanted to because you knew how networking worked. In the US.
Starting point is 00:03:48 Well, in the US, true. There's always the interesting stories of working in places that are trying to go with the lowest bidder for networking gear and you wind up spending weeks on end trying to figure out why things are breaking intermittently and only to find out at the end that someone saved 20 bucks by buying cheap patch cables. I digress, and I still have the scars from those. But it was
Starting point is 00:04:09 fascinating in those days because there was a lab component of getting those tests. There were constant rumors that in the middle of the night during the two-day certification exam, they would come in and mess with the lab and things you'd set up so you could fix it the following day. That is true. Yeah. So in the good old days when the lab was still physical, they would even turn the connectors around so that they would look like they would be plugged in, but obviously there was no signal coming through. And they would mess the jumpers on the line cars and all that stuff.
Starting point is 00:04:41 So when you got your broken lab, you really had to work hard, you know, from the physical layer up, from the jumpers, and that would mess up your config and everything else. It was, you know, the real deal, the thing you would experience in real world with underqualified technicians putting stuff together. Let's put it this way. I don't wish to besmirch our brethren
Starting point is 00:05:04 working in the data centers, but having worked with folks who did some hilariously awful things with cabling and how having been one of those people myself from time to time, it's hard to have sympathy when you just spent hours chasing it down. But to be clear, the CCIE is one of those things where in a certain era, if you're trying to have an argument on the internet with someone about how networks work and their response is, well, I'm a CCIE, yeah, the conversation was over at that point. I'm not one to appeal to authority on stuff like that very often, but it's the equivalent of arguing about medicine with a practicing doctor. It's the same type of story. It is someone where if they're wrong, it's going to be in the very fringes or the nuances back in this era. Today, I cannot speak to the quality of CCIEs. I'm not attempting to besmirch any of them, but I'm also not endorsing that certification the way I once did.
Starting point is 00:05:56 Yeah, well, I totally agree with you. When this became, you know, a mass certification, and the reason it became a mass certification is because reseller discounts are tied to reseller status, which is tied to the number of CCIEs they have. It became, you know, this, well, still high-end, but commodity that you simply had to get to remain employed because your employer needed the extra two-point discount. It used to be that the prerequisite for getting the certification was beyond other certifications. You spent five or six years working on things. Well, that was what
Starting point is 00:06:32 gave you the experience you needed, because in those days there were no bootcamps. Today you have a bootcamp. Now there's bootcamp, brain dump things, where it's we're going to train you for four straight weeks of nothing but this. Teach to the test, And okay. Yeah.
Starting point is 00:06:46 No, it's even worse. There were rumors that some of these boot camps in some parts of the world that shall remain unnamed were actually teaching you how to type in the commands from the actual lab. Even better. Yeah. You don't have to think. You don't have to remember. You just have to type in the commands you've learned. You're done. There's an arc to the value of a certification. It comes out,
Starting point is 00:07:09 no one knows what the hell it is, and suddenly it's great. You can use that to really identify what's great and what isn't. And then it goes at some point down into the point where it becomes commoditized and you need it for partner requirements and the rest. And at that point, it is no longer something that is a reliable signal of anything other than that someone spent some time and or money. Well, are you talking about bachelor degree now? Well, no, I don't have one of those either. I have an eighth grade education because I'm about as good of an academic as it probably sounds like I am. But the thing that really differentiated in my world, the difference between what I was doing in the network engineering sense and the things that folks like you who are actually, you know, professionals rather than enthusiastic amateurs the difference between what I was doing in the network engineering sense and the things that
Starting point is 00:07:45 folks like you who are actually, you know, professionals rather than enthusiastic amateurs took into account was that I was always working inside of the land, the local area network, inside of a data center. Cool. Everything here inside the cage, I can make a talk to each other. I can screw up the switching fabric, et cetera, et cetera. I didn't deal with any of the WAN, wide area network, think internet in some cases. And at that point, we're talking about things like BGP or OSPF in some parts of the world, or RIP, or RIPv2 if you make terrible life choices.
Starting point is 00:08:16 But BGP is the routing protocol that more or less powers the internet. At the time of this recording, we're a couple weeks past a BGP kerfuffle that took Facebook down for a number of hours, during which time the internet was terrific. I wish they could do that more often. In fact, it was almost like a holiday. It was fantastic. I took my elderly relatives out and got them vaccinated. It was glorious. Now we're back to having Facebook and terrific. The problem I have
Starting point is 00:08:45 whenever something like this happens is there's a whole bunch of crappy explainers out there of what is BGP and how might it work? And people have angry opinions about all of these things. So instead, I prefer to talk to you, given that you are a networking trainer. You have taught people about these things. You have written books. You have operated large-scale environments. I even developed a BGP course for Cisco. You taught it for Cisco, of all places, back when that was impressive and awesome and not a has-been. Honestly, I feel like I could go there and still wind up going back in time
Starting point is 00:09:17 and still it's the same Cisco in some respects. They've all ever died, Dinosaur, and they got frozen in amber. But let's start at the very beginning. What is BGP? Well, you know, when the internet was young, they figured out that we aren't all friends on the internet anymore, and that I want to control what I tell you, and you want to control what you tell me,
Starting point is 00:09:41 and furthermore, I want to control what I believe from what you're telling me so we needed a protocol that would implement policy where I could say I will only announce my customers to you but not what I've heard from Verizon and you would do the same and then I would say well but I don't want to hear about that customer of yours because he's also my customer. So we need some sort of policy. And so they invented a protocol where you would tell me what you have, I would tell you what I have, and then we would both choose what we want to believe and follow those paths to forward traffic.
Starting point is 00:10:18 And so BGP was born. On some level, it seems like it's this faraway thing to people like me because I have a residential internet connection, and I am not generally allowed to make my own BGP announcements to the greater world. Even when I was working in data centers, very often the BGP was handled by our upstream provider, or very occasionally by a router they would drop in with the easiest maintenance instructions in the world for me of step one, make sure it has power.
Starting point is 00:10:46 Step two, never touch it. Step three, we'd prefer if you don't even look at it and remain at least 20 feet away to keep from bringing your aura near anything we care about. And that's basically how you should do with me in the context of hardware. So it was always this arcane magic thing. Well, it's not, you know, It's like a power transmission. When you know enough about it, it stops being magic. It's technology. It's a bit more complicated than some other stuff. It's way less complicated than some other stuff like quantum physics. But still, it's so rarely used that it gets this aura of being mysterious. And then, of course, everyone starts getting their opinion,
Starting point is 00:11:27 particularly the graduates of the Facebook Academy. And yes, it is true that usually BGP would be used between service providers. So whenever, you know, we are big enough to need policy, if you just need one uplink, there is no policy there. You either use the uplink or you don't use the uplink. If you want to have two different links to two different points of presence or to two different service providers, then you're already in the policy land. Do I prefer one provider over the other? Do I want to announce some things to one provider but other things to the other? Do I want to take local customers from
Starting point is 00:12:06 both providers because I want to, you know, have lower latency because they are local customers? Or do I want to use one solely as the backup link because I paid so little for that link that I know it's shitty? So you need all that policy stuff. And to do that, you really need BGP. There is no other routing protocol in the world where you could implement that sort of policy because everything else is concerned mostly with let's figure out as fast as possible what is reachable and how to get there. And BGP is like, hey, slow down.
Starting point is 00:12:40 There's policy. Yeah. In the context of someone whose primary interaction with networks is their home internet, where there's a single cable coming in from the of someone whose primary interaction with networks is their home internet, where there's a single cable coming in from the outside world, you plug it into a device, maybe yours, maybe ISPs, maybe we don't care. That's sort of the end of it. But think in terms of large interchanges where there are multiple redundant networks to get from here to somewhere else, which one should traffic go down at any given point in time? Which networks
Starting point is 00:13:05 are reachable on the other end of various distant links? That's the sort of problem that BGP is very good at addressing and what it was built for. If you're running BGP internally in a small network, consider not doing exactly that. Well, I've seen two use cases, well, three use cases for people running BGP internally. Okay, this I want to hear because I was always told, no, touch them. But, you know, I'm about to learn something. That's why I'm talking to you. The first one was multinationals who needed policy. Yes, many multi-site environments, large-scale companies that have redundant links. They're trying to run full mesh in some cases or partial mesh between a bunch of facilities.
Starting point is 00:13:47 In this case, it was multiple continents and really expensive transcontinental links. And it was, I don't want to go from Europe to Sydney over US. I want to go over Middle East. And to implement that type of policy, you have to split the know the whole network into regions and then each region is what bgp calls an autonomous system so that it gets its tag its autonomous system number and then you can do policy on that saying well i will not announce
Starting point is 00:14:19 asian routes to europe through us or i will make them less preferred so that if the Middle East region goes down, I can still reach Asia through US, but preferably I will not go there. The second one is yet again large networks where they had too many prefixes for something like OSPF to carry. And so their OSPF was breaking down and the only way to solve that was to go to something that was designed to scale better, which was BGP. And third one is if you want to implement some of the stuff that was designed for service providers initially, like VPNs, layer two or layer three, then BGP becomes this kitchen sink protocol. You know, it's like using Route 53 as a database.
Starting point is 00:15:15 We are using BGP to carry any information anyone ever wants to carry around. I'm just waiting for someone to design JSON in BGP RFC, and then we are, you know, where we need to be. I feel on some level like BGP gets relatively unfair criticism because the only time it really intrudes on the general awareness is when something has happened and it breaks. This is sort of the quintessential network or systems or honestly computer type of issue. It's either invisible or you're getting screamed at because something isn't working. It's almost like a utility on some level. When you turn on a faucet, you don't wonder whether water is going to come out this time.
Starting point is 00:15:49 But if it doesn't, there's hell to pay. Unless it's brown. Well, there is that. Let's stay away from that particular direction. There's a beautiful metaphor probably involving IBM if we do. So the challenge, too, when you look at it, is that it's this weird esoteric thing that isn't super well understood. And as soon as it breaks, everyone wants to know more about it. And then in full-on charging to the wrong side of the Dunning-Kruger curve, it's, well, that doesn't sound hard. Why
Starting point is 00:16:15 are they so bad at this? I would be able to run this better than they could. I assure you, you can't. This stuff is complicated. It is nuanced. It is difficult. But the common question is, why is this so fragile and able to easily break? I'm going to turn that around. How is it that something that is this esoteric and touches so many different things works as well as it does? Yeah, it's a miracle, particularly considering how crappy the things are configured around the world.
Starting point is 00:16:42 There have been periodic outages of sites when some ISP sends out a bad BGP announcement and their upstream doesn't suppress it because, hey, you misconfigured things and suddenly half the internet believes, oh, YouTube now lives in this tiny place halfway around the world rather than where it's currently being anycasted from. Called Pakistan, to be precise. Exactly. There was an actual incident there. We are not dunking on Pakistan as an example, a faraway place. No, no. And Pakistani ISP wound up doing exactly this and taking YouTube down for an afternoon a while back. It's a common problem. Yeah. The problem was that they tried to stop local users accessing YouTube. And they figured
Starting point is 00:17:21 out that, you know, YouTube is announcing this prefix, and if they would announce two more specific prefixes, then, you know, they would attract the traffic and the local users wouldn't be able to reach YouTube. Perfect. But that leaked. If you wind up saying that, all right, the entire internet is available on this interface, and a small network of 256 nodes available on the second interface, the most specific route always wins. That's why the default route or route of last resort is the entire internet. And if you don't know where to send it, throw it down this direction. That is usually in most home environments, the gateway that then hands it up to your ISP
Starting point is 00:17:58 where they inspect it and do all kinds of fun things to sell ads to you and then eventually get it to where it's going. This gets complicated at these higher levels. And I have sympathy for the technical aspects of what happened at Facebook, no sympathy whatsoever for the company itself, because they basically do far more harm than they do good. And I've been very upfront about that. But I want to talk to you as well about something that people are going to be convinced. I'm taking this in my database direction, but I assure you I'm not. DNS. What is the relationship between BGP and DNS? Which sounds like a strange question sometimes. There is none. Excellent. It's just that different
Starting point is 00:18:38 large-scale properties decided to implement the global load balancing, global optimal access to their servers in different ways. So Cloudflare is a typical example of someone who is doing any cost. They are announcing the same networks, the same prefixes from 100 locations around the world. So BGP will take care that you always get to the closest Cloudflare pop. And that's it. That's how they work. No magic. Facebook didn't believe in the power of Anycast when they started designing their service. So what they're doing is they have DNS servers around the world, and the DNS servers serve the local region, if you wish. And that DNS server then decides what Facebook.com really stands for.
Starting point is 00:19:42 So if you query for Facebook.com, you'll get a different answer in Europe than in US. Just a slight diversion on what anycast is. If I ping Google's public resolver 8.8.8.8, easy to remember, from my computer right now, the packet gets there and back in about five milliseconds. Wherever you are listening to this, if you were to try that same thing, you'd see something roughly similar. Now, one of two things is happening. Either Google has found a way to break the laws of physics and get traffic to a central point faster than light, or the 8.8.8.8 that I'm talking to and the one that you are talking to are not, in fact, the same computer. Well, by the way, it's 13 milliseconds for me, and between you and me, it's 200 milliseconds. So yes, they are cheating. Just a little bit, or unless they huddled through the earth rather than having to bounce it off of satellites or through cables.
Starting point is 00:20:27 No, even that wouldn't work. That's what the quantum computers are for. I always wondered, now we know. Yeah, they're entangling the replies in advance, and that's how it works. Yeah, you're right. Please continue. I just wanted to clarify that point,
Starting point is 00:20:41 because I got that one hilariously wrong once upon a time and was extremely confused for about six months. Yeah, it's something that no one ever thinks about unless you know you're really running large-scale DNS because, honestly, root DNS servers were any-casted for ages. You think there are like 12 different root DNS servers. In reality, there are like 300 instances hidden behind those 12 addresses. And fun trivia fact, the reason there are 12 addresses is because any more than that would no longer fit within the 512 byte limit of a UDP packet without truncating. Thanks for that. I didn't know that.
Starting point is 00:21:18 Of course. Now, eDNS extensions let you go out for the larger stuff, but you can't guarantee that's going to hit. And what happens when you receive a UDP packet, when you receive a DNS result with a truncate flag set on the UDP packet, it is left to the client. It can either use the partial result or it can try and reestablish over a TCP connection. That is one of those weird trivia questions they love to ask in sysadmin interviews. But it's, yeah, fundamentally, if you're doing something that requires the root name servers, you don't really want to start going down those arcane paths.
Starting point is 00:21:50 You want it to just be something that fits in a single packet, not require a whole bunch of computational overhead. Yeah, and even within those 300 instances, there are multiple servers listening to the same IP address, and incoming packets are just sprayed across those servers, and whichever one gets the packet replies to it, and because it's UDP, it's one packet in, one packet out, problem solved, it all works. People thought that this doesn't work for TCP
Starting point is 00:22:15 because you know you need a whole session, so you need to establish the session, you send the request, you get the reply, their acknowledgments, all that stuff. Turns out that there is almost never two ways to get to a certain destination across the internet from you. So people thought that, you know, this wouldn't work because half of your packets will end in San Francisco and half of the packets will end in San Jose, for example. Doesn't work that way. Why not? Well, because the global internet is so diverse that you almost never get two equal cost paths
Starting point is 00:22:52 to two different destinations because it would be San Francisco and San Jose announcing A.A.A.A. And it would be a miracle if you would be sitting just in the middle so that the first packet would go to San Francisco, the second one would go to San Jose, and back and forth. That never happens.
Starting point is 00:23:12 That's why Cloudflare makes it work by announcing the same prefix throughout the world. So I just learned something new about how routing announcements work, an aspect of BGP, and you, a few minutes ago, learned something about the UDP size limit and the root name servers. BGP and DNS are two of the oldest protocols in existence. You and I are also decades into our careers. If someone is starting out their career today working in a cloudy environment, there are very few network-centric roles because cloud
Starting point is 00:23:45 providers handle a lot of this for us. Given these protocols are so foundational to what goes on, and they're as old as they are, are we as an industry slash sector slash engineers losing the skills to effectively deploy and manage these things? Yes. The same problem that you have in any other sufficiently developed technology area. How many people can build power lines? How many people can write a compiler? How many people can design a new CPU? How many people can design a new motherboard? I mean, when I was 18 years old, I was wire wrapping my own motherboard with 8-bit processor. You can't do that today. You know, as the technology is evolving and maturing, it's no longer fun, it's no longer sexy, it stops being a hobby,
Starting point is 00:24:35 and so it bifurcates into users and people who know about stuff. And it's really hard to bridge the gap from one to the other. So in the end, you have like these 20 gray bear people who know everything about the technology and the youngsters have no idea. And when these people die, don't ask me how we'll get any further on. Few things are better for your career and your company than achieving more expertise in the cloud. Security improves, compensation goes up, employee retention skyrockets. Panoptica, a cloud security platform from Cisco, has created an academy of free courses just for you. Head on over to academy.panoptica.app to get started.
Starting point is 00:25:19 On some level, it feels like it's a bit of a down-the-stack analogy for what happened to me early in my career. My first systems administration job was running a large-scale email system. It was a hobby that I was interested in. I basically bluffed my way into working at a university for a year. Thanks, Chapman. I appreciate that. And it was great, but it was also pretty clear to me that with the rise of things like hosted email, Gmail, and whatnot, it was not going to be the future of what the present day at that point looked like, which was most large companies needed an email administrator. Those jobs were dwindling.
Starting point is 00:25:53 Now, if you want to be an email systems administrator, there are maybe a dozen companies or so that can really use that skill set, and everyone else just outsources that. That said, at those companies, like Google and Microsoft, there are some incredibly gifted email administrators who are phenomenal at understanding every nuance of this. Do you think that that is what we're going to see in the world of running BGP at large scale, where a few companies really need to know how this stuff works, and everyone else just sort of smiles, nods, and rolls with it. Absolutely. We are already there. Because, you know, if I am an end customer and I need BGP because I have two uplinks to two ISPs, that's really easy. I mean, there are a few tricks you should follow. And hopefully, some of the guardrails will be built into network operating systems so
Starting point is 00:26:41 that you will really have to configure explicitly that you want to leak crowds between Verizon and AT&T, which is great fun if you have two low-speed links to both of them and now you're becoming transit between the two, which did happen to Verizon. That's why I'm mentioning them. Sorry, guys. Anyway, if you are a small guy and you just need to up links and maybe do a bit of policy that's easy and that's achievable let's say with some google and paste and throwing spaghetti at the wall and seeing what sticks on the other hand what the large scale providers like for example facebook because we were talking about them are doing is like light years away. It's like comparing me turning on the light bulb and someone running, you know, a nuclear reactor. Yeah, you kind of want the experts running some aspects on that.
Starting point is 00:27:32 Honestly, in my case, you probably want someone more competent flipping the light switch too, but that's why I have IoT devices here that power my lights. It, on the one hand, keeps me from hurting myself, and on the other, leads to a nice seasonal feel because my house is freaking haunted. So, coming back to Facebook. They have these DNS servers all around the world, and they don't want everyone else to freak out when one of these DNS servers goes away. So that's why they're using the same IP address for all the DNS servers sitting anywhere in the world. So the name server for facebook.com is the same worldwide, but it's different machines and they will give you different answers when you ask where is facebook.com. I will get a European answer, you will get a US answer, someone in Asia will get whatever. And so they're using BGP to advertise the DNS servers to the world so that everyone
Starting point is 00:28:27 gets to the closest DNS server. And now it doesn't make sense, right, for the DNS server to say, hey, come to European Facebook if European Facebook tends to be down. So if their DNS server discovers that it cannot reach the servers in the data center, it stops advertising itself with BGP. Why with BGP? Because that's the only thing it can do. That's the only protocol where I can tell you, hey, I know about this prefix.
Starting point is 00:28:56 You really should send the traffic to me. And that's what happened to Facebook. They bricked their backbone, whatever they did, they never told. And so their DNS server said, gee, I can't reach the data center. I better stop announcing that I'm a DNS server because obviously I am disconnected from the rest of Facebook. And that happens to all DNS servers because, you know, the backbone was bricked. And so they just, you know, de-peered from the internet. They stopped advertising themselves.
Starting point is 00:29:29 And so we thought that there was no DNS server for Facebook, because no DNS server was able to reach their core. And so all DNS servers were like, gee, I better get off this, because I have no clue what's going on. So everything was working fine. Everything was there. It's just that they didn't want to talk to us because they couldn't reach the backend servers. And of course, people blamed DNS first because the DNS servers weren't working. Of course they weren't. And then they blamed DBGP because it must be BGP if it isn't
Starting point is 00:30:01 DNS. But it's like, you know, you're blaming headache and muscle cramps and high fever, but in fact, you have flu. For almost any other company that wasn't Facebook, this would have been a less severe outage just because most companies are interdependent on other companies to run infrastructure. When Facebook itself has evolved the way that it has, everything that they use internally runs on these same systems. So they wound up almost with a bootstrapping problem. An example of this in more prosaic terms are, okay, the data center had a power outage. Okay, now I need to power up all the systems again, and the physical servers I'm trying to turn on need to talk to a DNS server to finish booting, but the DNS server is a VM that lives on those physical servers. Uh-oh, now I'm in trouble. That is an overly simplified and
Starting point is 00:30:50 real example of what Facebook encountered trying to get back into this, to my understanding. Yes, so it was worse than that. It looks like, you know, even out-of-band management access didn't work, which to me would suggest that out-of-band management was using authentication servers that were down. People couldn't even log to Zoom because Zoom was using single sign-on based on Facebook.com and Facebook.com was down. So they couldn't even make Zoom calls or open Google Dots or whatever. there were rumors that there was a certain hardware tool with a rotating blade that was used to get into a data center and unbrick a box.
Starting point is 00:31:33 But those rumors were vehemently denied. So who knows? The idea of having someone trying to physically break into a data center in order to power things back up is hilarious, but it does lead to an interesting question, which is in this world of cloud computing, there are a lot of people in the physical data centers themselves, but they don't have access in most cases to log into any of the boxes. One of the most naive things I see all the time is, oh, well, the cloud provider can read all of your data. No, they can't. These things are audited. And yeah, theoretically, if they're
Starting point is 00:32:05 lying outright and somehow have falsified all of the third-party audit stuff that has been reported and are willing to completely destroy their business when it gets out, and I assure you it would, yeah, theoretically that's there. There is an element of trust here. But I've had to answer a couple of journalist questions recently of, ooh, is AWS going to start scanning all customer content? No, they physically cannot do it because there are many ways you can configure things where they cannot see it. And that's exactly what we want. Yeah, like a disk encryption? Exactly. Disk encryption, KMS on some level, rolling your own, et cetera, et cetera. They use a lot of the same systems we do. The point being, though, is that people in the data centers
Starting point is 00:32:44 do not even have login rights to any of these nodes for the physical machines in some cases, let alone the customer tenants on top of those things. So on some level, you wind up with the people building these systems that run on top of these computers, and they've never set foot in one of the data centers. That seems ridiculous to me as someone who came up visiting data centers because I had to know where things were when they were working so I could put them back that way when they broke later. But that's not necessary anymore. Yeah, and that's the problem that Facebook was facing with that outage. Because you start believing that certain systems will always work. And when those systems break down, you're totally cut off. And then, oh, there was an article in ACMQ
Starting point is 00:33:29 a long while ago where they were discussing, you know, the results of simulated failures, not real ones. And there were hilarious things like phone directory was offline because it wasn't on UPS. and so they didn't know whom to call. Or alerts couldn't be diverted to a different data center because the management station for alert configuration was offline because it wasn't on UPS. Or you know the one, right, where in New
Starting point is 00:34:02 York they placed the gas pump in the basement and the diesel generators were on the top floor. And the hurricane came in and they had to carry gas manually all the way up to the top floor because the gas pump in the basement just stopped working. It was flooded. So they did everything right. Just the fuel wouldn't come to the diesel generators. It's always the stuff that is under the hood on these things that you can't make sense of.
Starting point is 00:34:33 One of the biggest things I did when I was evaluating data center sites was I'd get a one-line diagram, which is an electrical layout of the entire facility. Great. I talked to the folks running it. Now let's take a walk and tour it. Okay. You show four transformers on your one line diagram. I see two transformers and two empty concrete pads. It's an aspirational one line diagram. It's a joke that makes it a one liner diagram and it's not very funny. So it's okay. If I can't trust you for those little things, that's a problem. Yeah, well, I have another funny story like that. We had two power feeds coming into the house plus the diesel generator. And it was, you know, the properly tested every month diesel generator. And then they were doing some maintenance and
Starting point is 00:35:18 they told us in advance that they will cut both power feeds at 2 a.mam on a Sunday morning. And guess what? The diesel generator didn't start. Half an hour later, UPS was empty. We were totally dead in water with quadruple redundancy. Because you can't get someone at 2am on a Sunday morning to press that button on the diesel generator in half an hour. That is unfortunate. Yeah, but that's how the world works. So it's been fantastic reminding myself of some of the things I've forgotten, because let's be clear. In working with cloud, a lot of this stuff is completely abstracted away. I don't have to care about most of these things anymore. Now, there's a small team of people at AWS who very much has to care, and if they don't, I will say mean things to them on Twitter if I let my hug ops position
Starting point is 00:36:07 slip at just a smidgen. But they do such a good job at this. We don't have problems like this almost ever to the point where when it does happen, it's noteworthy. It's been fun talking to you about this just because it's a trip down a memory lane that is a lot more aligned with the things that are there and we tend not to think about them. It's almost a how it's made episode. Yeah. And don't be so relaxed regarding the cloud networking because, you know, if you don't go full serverless with nothing on-premises,
Starting point is 00:36:41 you know what protocol you're running between on-premises and the cloud on Direct Connect? It's called BGP. Ah, you know, I did not know that. I've done some ridiculous IPSec pairings over those things and was extremely unhappy for a while afterwards, but never got to the BGP piece of it. Makes sense. Yeah, even over IPSec, if you want to have any dynamic failover or multiple sites or anything, it's BP. I really want to thank you for taking the time to go through all this with me. If people want to learn more about how you view these things, learn more things from you, as I strongly recommend they should if they're even slightly interested by the conversation we've had, where can they find you? Well, just go to
Starting point is 00:37:21 ipspace.net and start exploring. There's the blog with thousands of blog entries, some of them snarkier than others. Then there are like 200 webinars, short snippets of a few hours of... It's like a one-man version of reInvent, my God. Yeah, sort of, but I've been working on this for 10 years and they do it every year, so I can't produce the content at their speed.
Starting point is 00:37:46 And then there are three different full-blown courses. Some of them are just, you know, the materials from the webinars, plus guest speakers, plus hands-on exercises, plus I personally review all the stuff people submit. And they cover data centers and automation and public clouds. Fantastic. And we will data centers and automation and public clouds. Fantastic. And we will, of course, put links to that into the show notes. Thank you so much for being so generous with your time. I appreciate it.
Starting point is 00:38:12 Oh, it's been such a huge pleasure. It's always great talking with you. Thank you. It really is. Thank you once again. Ivan Peponyak, network architect and oh so much more, CCIE number 1354 Emeritus, and read the bio. It's well worth it. I am cloud economist, Corey Quinn, and this is Screaming in the Cloud. If you've
Starting point is 00:38:34 enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice and a comment formatted as a RIP V2 announcement.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.