The Standup with ThePrimeagen - Google takes down the internet!

Episode Date: August 1, 2025

🔗 Sponsored by Sentry https://sentry.io | Code breaks, fix it faster #sponsored ssh terminal.shop https://lowlevel.academy/ 📌 Chapters: 00:00:00 - Intro 00:02:04 - Low level explains what we kno...w 00:05:32 - How does this compare to the CrowdStrike outage? 00:07:13 - What is a memory fuzzer? 00:10:17 - What was the impact of the outage? 00:12:22 - Movie talk sidebar 00:14:58 - AUTH, choices and risk managment 00:19:33 - Cloudflare also went down 00:21:14 - Knowledge management 00:23:07 - Chaos at Netflix 00:32:00 - DHH's response 00:33:21- Personally effected 00:34:15 - Internet Of Things Devices 00:39:38 - Personal Network Security vs Faith in Humanity 00:42:27 - More on IOT devices 00:48:10 - Car talk and internet connected failures 00:51:17 - Fail open 00:55:30 - Could Rust have prevented this? 00:57:45 - Wrap up and outro Could Your Company Survive a Google Outage? Last week, Google Cloud went down—and with it, a massive chunk of the internet. In this episode of Standup, we’re joined by security expert Low Level Learning to break down what actually happened, how a single null pointer crashed Google’s management plane, and why Cloudflare and other services followed. We also go deep on software fuzzing, dependency risk, fail-open systems, and the absurd reality of internet-connected lamps. Featuring: Prime: https://x.com/ThePrimeagen Casey Muratori: https://x.com/cmuratori Trash Dev: https://x.com/trashh_dev Low Level Learning: https://x.com/LowLevelTweets Bonus topics include: automated cat feeders, Teslas on fire, and Baby Shark as a disaster protocol.

Transcript
Discussion (0)
Starting point is 00:00:00 Hello everyone and welcome to stand up. I have to step away for a second. Shut up. Oh my God, dude. Classic prime moment, dude. Yep. Hey, hi, hey, everybody. There is a very high chance.
Starting point is 00:00:14 I have no idea what we're talking about. There's a 100% chance I have no idea. The Google Cloud Service outage last week. Okay, so you do know. Which isn't really a security thing, but that's okay. Wait, you made a video on this. What are you talking about? It doesn't mean anything.
Starting point is 00:00:30 security. I made a video on the switch. What? Yeah, it's not security. We're talking about. No one got I did make a video on it. Just throwing that out. I just, oh yeah, I made a little quick video on it, but I have I actually, I got it. I got it. I got it. I'll carry us. It's all good. Let's do it. Let's go. We got this. We got this time. Nothing could possibly go wrong. We're good. It's great. Step away. No problem. We got this. Go peepee or something. Yeah. Okay. The reason why I got to be right back is I got to put a little bit of milk in my cacao coffee. because just a splash of milk with cacao is really delicious. I would love a coffee right now, actually.
Starting point is 00:01:02 Oh, I'm on coffee so bad. Yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah. Prime, I know that you're, you're itching right now to do your signature opening to the extent that it exists for the podcast. I, the single greatest opener of all time from the stand-up would love to start the opening. Hey, today we have a very special episode of the stand-up. We replaced TJ with, in fact, a smarter version of TJ. Low-level learning's first appearance on the stand-up. Security expert, probably Bellatro expert as well.
Starting point is 00:01:38 As always, we have Casey Moritory and Trash Dev on here. Today, we're going to be talking about a very large security incident and low-level learning. We brought him on due to all of his expertise, his good looks, his deep knowledge of C and null pointers and everything and he's going to walk us through what's kind of happened. And we're going to talk about maybe the state of security and who knows where this is going to go. So low level, why don't you kick us off on this one? Let's do it, man. Yeah.
Starting point is 00:02:07 So, I mean, it all goes back to 1972 when C the programming language was invented. And then 50 or so years later, we're still using it, which is kind of crazy. But yeah, man, last week, Google kind of took the Internet offline for three hours. Time out. Time out. Yeah, yeah, yeah, yeah. You started like 50 years ago and then went right up to last week. Like what happened in between?
Starting point is 00:02:29 Not a ton, honestly. Like not a ton. Like a couple new C standards came out and then now here we are. So that's about where it ends. People were born. Yeah, some people were born. But yeah, so Google took the internet out, man. And it's kind of crazy how they did it.
Starting point is 00:02:42 It was like Thursday at like 1352 roughly Eastern time in the U.S. I have to convert that in my head, 1350, 152. One. He's a military boy. You just got to deal with it. 10 p.m. Pacific time. Although I guess you're routed to 2 o'clock, I guess. It's 52, but yeah.
Starting point is 00:02:57 Internet went down for a little bit. And no one still to this day knows exactly what happened, but Google did put out a report that talks about what they did. The end result, or the end reason, was that Google published code that did not account for null pointers, and when a policy change went out that contained a null pointer in the policy change, it crashed the management plane of the Google Cloud, period, full stop. That's what happened.
Starting point is 00:03:23 Absolutely nuts. Classic. Crazy stuff. Classic Google move. And the irony in all of this is that Google is a company that runs the OSS Fuzz Project, which is open source software fuzzing. So literally what they have is a, like, data center full of machines that fuzz open source software to find memory corruption vulnerabilities in them, right?
Starting point is 00:03:42 This is a project Google has been sponsoring for a very long time. So the palpable irony in Google releasing software internally to their own management plane that had a memory issue. Not like a heap overflow or a race condition, something a little harder to detect. Something is simple that you could like lint with like a failing to check a null pointer. Crashing the internet is just some of the craziest news I've seen in a very long time. Can you rewind a little bit? Yeah.
Starting point is 00:04:08 You said two things that I think probably need a bit of explaining. One, maybe give like a quick overview of what a memory fuzzer is. And then two, you said that nobody knows what happened, but Google released a port. Are you saying that Google knows what has happened? They're just not telling people what has happened. What happened? I don't know. Did you push on a Friday?
Starting point is 00:04:28 Never. How are we going to figure this out? How are we going to figure this out? Got it to your prods down? Did you guys need the wheel? You have them to get fixed? That would have been a couple hours for factory. I'll beat the plugins.
Starting point is 00:04:51 Don't guess where your issues are. You can see exactly where they are happening with Century. Get all the context you need to debug any problem. Because code breaks, so fix it faster with Century. Operationally, internal to Google, we do not know how. how that code got out, right? Like, was the code pushed through CICD and, like, it failed the test and they just said, roll with it?
Starting point is 00:05:13 Was the code vibe coded and no one really checked? Like, there's a lot of, like, weird if and if ends that could be happening with that code that caused it to get published in the state that it was. That's a little weird. That's what I mean by, like, no one knows what happened. We know that there was a null pointer. We know that that was the reason why it crashed. But how that code got released is still kind of, like, ambiguous.
Starting point is 00:05:31 And then Casey and I were talking previously about, like, when CrowdStrike had their incident. Right. They released like a 50-page report that said like, hey, here is exactly what happened. Here is why the code went out. You know, sorry, it won't happen again. Yada, yada, yada, yada. Google has not done that yet. And amusingly, these two exploits were, I'm not ex-plites the wrong term.
Starting point is 00:05:47 Sorry. These two, like, they're just crashes. Like, nobody exploited anything, right? Right. But they're actually kind of similar. The Crowdstrike one and the Google one were both like, we have some config stuff that can happen, essentially. Like, there's some kind of, like, code. that's being driven off of, like, some data that goes into the code that tells it, like,
Starting point is 00:06:08 what it should do, right? And in the CrowdStrike case, that was the problem, is that, you know, the in-production was the first time it got fed the correct data, like, to make the bad case happen. And the same was true of Google, apparently. It's like, basically, this code, it will only hit its null pointer case. I mean, trying to read between the lines of the thing that they published, because, again, they didn't show the code like CrowdStrike did, but it only gets to the null pointer case, apparently if you pass it a certain configuration for the quota system.
Starting point is 00:06:38 Like this, it was in the, like, thing that's trying to check quotas or this sort of thing. So it's the same exact kind of pathologies. Like, there's a type of data that if you fed it to this thing, it doesn't do the right, like, checking of that data to make sure that it won't go into some bad code path. And off you go. And so they're kind of actually somewhat similar, at least from what Google actually said, which, again, they didn't include the code. So it's like, who knows?
Starting point is 00:07:03 They didn't even say what language it was in, correct? They didn't. They said null pointer, but that could be C, C++, whatever. Or go. Go has nil pointers, I think, too, right? Yeah. Yes, they do. So famously.
Starting point is 00:07:14 Hold on, time out. Did we explain what a memory fuzzer is? Yeah, I can talk about that. So basically, a memory fuzzer, it's basically a picture, it's like a unit test where you're trying to get as much code coverage as possible. But instead of like traversing down paths with like the sane data shape, you're literally mutating the data to try to find. find edge cases in your code that are not accounted for in testing.
Starting point is 00:07:35 So like, if I have an HTTP fuzzer, my initial corpus, my first seed is going to be HTTP, you know, like a get request, right? And then what the fuzzer will do is it'll pick a random bite in that packet. It'll mutate it by like flipping a bit. It'll like make an integer, a negative integer. It'll do some mutation. And then it'll run that request on the service and see, hey, does that input generate new coverage?
Starting point is 00:07:59 If it does, okay, that is a new piece we have to add to the corpus. to future mutate to find new code paths, and you do that trying to enumerate the entire code base from a coverage perspective. And so what a memory fuzzer would have done, if they had fuzz this software, and again, like, this is, the science behind getting coverage in fuzzling
Starting point is 00:08:17 is like cutting edge technology right now. No one has it, like, figured out, but Google is very good at it. A fuzzer would have very quickly found, like, oh, if this field is zero, it crashes the program, right? So it kind of implies that the software was not fuzzed, at least long at all. To Casey's point,
Starting point is 00:08:33 This code had actually been public or not public. It had been released since like May of this year, like May 29th, I think. And because they had never pushed a policy change that exercised this code path, it just never was seen to crash. So either they didn't test it, they didn't fuzz it, nothing really was done to try to exercise this code until it got hit on June 12th. And it sounded like, too, that the fuzzing would not have had to have been very complicated because the way they made it sound anyway in the very terse description was that like the kind of, the kind of. kind of thing that would crash this was just something with some missing fields. They just used the phrase missing fields. So, you know, a basic fuzz tester would probably try a bunch of different field combinations
Starting point is 00:09:13 with some missing. Like, that's not a weird thing to have fuzz. And apparently that was not done, I guess. But again, because there's so little information, who knows. Maybe it was a very strange, like, really unusual combination of missing fields that no one thought to check. Who knows? Yeah.
Starting point is 00:09:27 What I would imagine is, again, all speculation. I don't work at Google. I have no idea. It is likely they're passing around either a JSON blob or a YAML file and then like if you use some YAML object that exists but a subfield in that YAML object does not exist, that will populate a null pointer that then gets read. And the problem with that is like fuzzing binary data is very easy because like I said, you flip a bit, you make an integer sign or something like that. fuzzing data with a sane grammar, like JavaScript, for example, or YAML, much more complicated. So it just, the fact that it was not fuzzed implies that it was some, like, human readable grammar and like a config spec that just didn't get tested, which is, again, for Google, it's just crazy. But also, I can totally see why that would happen operationally.
Starting point is 00:10:19 So I have a question. Yeah, what's up? So I don't know much about the outage because I was at a conference, so I didn't really see the internet go out if it actually did at all. So when you say that crashed internet, to me that just sounds like a really bold claim. Like what? Like like yeah. Like yeah. Like what was a fall out right?
Starting point is 00:10:35 Like what is like the impact here and like what was the collateral? Like what happened exactly? Yeah. So basically when when this outage happened, what really ended up occurring was that the binary that was responsible for doing quota and authority checks or like authorization checks on requests like crashed in all regions of the world. Right. So basically any call to a Google cloud platform.
Starting point is 00:10:57 platform API with just 503. Like you could not do anything. So if you're like my course website, not a plug, this is like real life. What's the course website again? Just low level academy. If you want to learn to see, Ross, whatever. Yeah. Clugs are allowed here. Clugs are allowed. Now but I, but the reason I'm saying this is because I depend on Google Oaf, right? And so every time my site receives a 503 from an API, I get an email. And so during this outage, because the OOath provider was returning 503s, I got like 120 emails in 30 minutes. being like you know the internet's down so the issue was every like basically every site on the internet now depends on Google in some form or fashion
Starting point is 00:11:37 through oaf o off through the actual like compute providers a bunch of other stuff that I don't use or don't care to use but basically because Google in its entirety went down it kind of like brought the internet down with it which is very like counterintuitive to like the design of the internet it's meant to be like a decentralized mesh of nodes that if one goes down routing exists and you can kind of route around it. But it kind of just goes to show that we have this really weird dependence on like single points of failure in modern cloud architecture, which is kind of scary, I think, from a, you know, a cyber attack perspective, right?
Starting point is 00:12:10 It's kind of like how when Dyn went down or Dyn, how ever you want to pronounce it in 2016, like, again, DNS core protocol to the internet. There was like one major provider that went down because of a botnet. And when that went down, it took the whole internet down with it. Cyberdyne, did you say? No, just Dyn, D-Y-N. Yeah, sorry, cyber attack. the provider is called Dine. I don't know if it's Dine or D.
Starting point is 00:12:30 It's a Terminator joke. He's going Skynet on us. Sorry, sorry. I'm a little too young for Terminator, I think. I'm 30. No way. Yeah. Shut up.
Starting point is 00:12:38 You're not too young for Terminator. You're not too young. Wait, what was the first? When did the first Terminator come out? Like 78, I think. Oh, was it that old? No, 801 or two. Oh, it was 80s?
Starting point is 00:12:48 Terminator 1? No, I don't think so. I think I'm right. It was in late 70? No, he wasn't. Terminator 2 is the best 104. 84. You're right.
Starting point is 00:12:54 I was going to say it's like a little, my bad, my bad, my bad. Yeah. But the problem is you can name any movie. I probably haven't seen it. Lord of the Rings. Lord of the Rings. Okay, those I've seen. Oh, God.
Starting point is 00:13:07 But I'm a nerd. Any other movie. Fine. Really just any other movie? What? Do you do what you want to do this again? Star Wars. Fine.
Starting point is 00:13:17 Star Wars, those are the only two. Other than that. You guys are bad. That was it. Love it. Anyways, I guess we could just move on, Mr. I've never seen any movies, except for all the ones we name. Exactly. Of the two, yeah.
Starting point is 00:13:32 Before, before, maybe we should just stay off topic. I know it's going to bring us back on, but this is kind of good. Name a third movie. Someone name a third movie. I went with Gladiator. I have not seen that. Yeah, that's a little more. Saving Private Ryan.
Starting point is 00:13:46 No, first off, Gladiator is not rare. Wait, wait, wait, okay, whoa, whoa, whoa, whoa, whoa, whoa, whoa. Gladiator is one of the best movies ever. Everyone stop. In my opinion. half of saving price. Can I tell you why? Can I tell you why? But I was growing up, big paintball kid. We love to play paintball. We would go drive an hour south to go play paintball called Cousins Paintball in Tom's River, New Jersey. And every time
Starting point is 00:14:06 we went, we would hype ourselves up by watching the Normandy invasion scene. We would get like an hour into the movie and then we would be done. That's what we did. That is dark. That's what we would do. We were like 12 year old kids, man. We had to get hyped up somehow. And that's how I knew I would have a job in security. There's like literally people getting blown up. What? We were going to war with paintball, man. Oh yeah, we were a teenager, so I mean, like, give me a break.
Starting point is 00:14:32 But yeah, so that's why I've only seen half of it. Because we would get into the Normandy invasion scene, get like an hour into the movie, and then it'd be over. And I mean, we'd be there. He's like, I still don't know what happened in World War II. Who won? Yeah, I mean, did we win? It looked pretty bad from that.
Starting point is 00:14:44 I didn't, you probably weren't. No spoilers, please. Yeah, no spoilers. And apparently, like, Oppenheimer, we had like a big bomb or something. That's crazy, too. When did that even happen? Who knows? It's confusing.
Starting point is 00:14:53 Very confusing. All right. Well, let's get back on topic here. Okay, so philosophically, are you saying that all those times I've been told that rolling my own off is for losers and people who want security issues may not have been correct. Is that true? Is that what you're saying? Low-level learning?
Starting point is 00:15:13 Chief expertise of security? I mean, chief expertise in cloud incorrect. No. But, yeah, I don't. It's tough, right? It's just everything you do anywhere, including in programming, including in like system architecture is a choice in risk. And what is risk?
Starting point is 00:15:30 It's the probability that a thing will occur and if it occurs, what happens, right? How bad is it? The probability that Google Cloud goes down all the time is fairly low, so not impossible, but like, you know, it could happen. So I don't think it's bad if you use Google off. There's something wrong with that, but you have to consider that in your calculus, like, well, you know, how mission-critical is my software. If Google goes down, which it definitely could, are people going to get hurt kind of thing? Like maybe medical providers shouldn't be using cloud off for their stuff. If it's
Starting point is 00:16:00 a complete necessity that they get into it. So I don't think it's bad to roll your own off. I just think you have to do the math on what makes sense for your application, you know? It's kind of rough, too, I think, like at least from a user's perspective, or I mean, for when I say user, I mean, someone who uses like cloud services. It's a bit tough, too, because one of the things that seemed to get exposed by this particular crash, at least when I was kind of looking at people's reports on it, was that a lot of people didn't seem to know or maybe could not have known that they were relying on someone who was relying on Google.
Starting point is 00:16:38 So I could even imagine somebody who was like, oh, don't worry, we use two different storage providers or two different auth providers, not realizing that actually they will both go, down if Google's quota server goes down, right? And so one of the things that kind of occurred to me during this crash, not being a security professional, so I don't really have to think about this for my job, but it just was kind of
Starting point is 00:17:00 floating around my head was I was like, you know, it probably should be a requirement somewhere for these kinds of services that you get the diagram, where it's like, what are all the things that you actually call out to? So that I can look and if I want to do a redundant
Starting point is 00:17:16 thing, I know that I need something where the Venn diagrams don't overlap. Because if at the end of the day, all of these people call AWS, then really I have no redundancy at all. Because as soon as AWS goes down, all of my stuff goes down or whatever. And I realize no one's probably thinking that way. Like people, like the web is not really about up time. It doesn't seem. But like, let's suppose you were, it would be interesting to be able to get that so that you could just kind of know, okay, if I use this and this, they're fairly disparate.
Starting point is 00:17:43 Whereas if I use this and this, they share a bunch of services. So there's probably no point because I'm not really getting any redundancy. Does that make sense what I'm saying? Yeah, it makes sense. But I do want to throw something in here really quickly before an actual expert speaks. Is that Casey, it sounds like you're getting dangerously close to us drawing UML, which just generally speaking here at the startup. We're very against the UML. So that already sounds like, hey, we just need like a universal language for modeling all the services and their dependencies.
Starting point is 00:18:09 You want Grady Booch on here. You got to bootch it up. I don't want Booch on here talking about it. Boch is coming on. I'm going to call him up. I don't know him, but I'll, you know, call him up randomly. Hey, Grady. What's going on, man? You don't know me, but I got this podcast that no one wants you on, apparently.
Starting point is 00:18:26 Could you come on? Yeah. Okay. Well, no, I'm not talking about a UML diagram, but I guess I am sort of scarily talking about maybe, like, what do they call this? A Systems Architect diagram? What if they call? That little thing where they draw a bunch of little cloud shapes and then they write a name in it and then there's arrows. I mean, I don't actually need that because I just need the list, right?
Starting point is 00:18:46 I just need to know, like, what main services, you know, are you depending on or something like that. But I do think that would be interesting to know just from an architecture standpoint, because it's like, if you're thinking of this service as just an endpoint you connect to, and so you're just thinking, what is the chances that that goes down? You may not realize that really that and some other thing, they will go down at the same time if this other third service they both depend on goes down. And that's like to the point, low level was just saying about like calculating that risk, that's an important part of the risk calculation, right? You have to know which things are independent. Because like the chance that Amazon and Google have an outage on the same day is pretty low.
Starting point is 00:19:26 But if actually they're secretly both using each other's stuff, it's very high. And you need to know the difference between those two things. That's all. Yeah. I mean, you saw that when like as a result of this, Google had an outage, but it also caused cloud flare services.
Starting point is 00:19:38 Yes. I was just about to ask this. How did that happen? What was this? because I just, I was, I guess I was a bit shocked when Cloudflare was reporting things going down afterwards because I heard this was like a Google related thing. Yeah. And so I was very confused as to how the other services went down.
Starting point is 00:19:54 They have this worker service thing they do and it was it relied on Google Cloud Cloud, apparently. Oh, were they actually related? Yes, I believe that was the case. So that's the only out of was the Cloudflare one. So I didn't realize they were kind of like one in the same. I think they are. Which is really funny because at the conference we were out of Cloudflare,
Starting point is 00:20:11 Luth and we were like showing them the status like hey you guys are down and we just like drop the bomb on them At Render ATL you are literally walking up to people who are sweating bullets I think Haxor did it Or somebody You're damn
Starting point is 00:20:25 You know like Your entire service is down Have a nice conference Sorry I'm looking at the other You guys should pack up and go Cloudflare's report went a lot more in detail to than Google's did so it looks like yeah Casey said workers as well as
Starting point is 00:20:38 So what is it workers warp access Gateway, images stream, workers AI, turnstile, auto rag, I don't know what that is, and Zeraz. So there's a lot of... A ton of services are downstream dependent on Google at what you would call, you know, a CSP, right, a cloud service provider.
Starting point is 00:20:54 Which is just, again, to Casey's point, like, you can't make a sane risk calculus if you don't know, like, the probability and the outcomes of things happening. So, like, if Google is like the core, call it center of gravity, right? Of all of these things, if it goes away, what happens? You know,
Starting point is 00:21:10 because you're not exposed to that, you can't make a good call off of that. I have a question for you, a low level. What's up? So obviously like... You say your question, trash. I'm going to say my question, right? You ready? Three, two, one.
Starting point is 00:21:25 Hold on. Articulating. Articulate. Okay. So if you, let's picture like a big company. Yeah. Like Amazon or something. Yeah.
Starting point is 00:21:34 Let's see these. So you were like Netflix. Let's say you should pick a random company. It's like Netflix. I was thinking about this question for like 10. It's just like, okay, okay, okay. Let's just picture I'm like Mr. Amazon and I have a lot, big company. It's so many teams.
Starting point is 00:21:47 It's hard for me to find, like, which team is like is actually depend on something that could actually like, like the dependency waterfall, right? Like, how do I, like, when you're consulting or have you seen this in practice, like, how do companies approach this like to know that you're like the single domino? Like if you fall, we're all like fucked, right? Like, yeah. Because it seems like when things like this happen, it's clearly like not in anyone's like peripheral right I don't know yeah I'm gonna say I don't have a good answer to this I've worked at big places and I've worked at small places and inevitably what happens at companies that get big is their knowledge management solutions begin to suck very bad and people that in what you would call like principal engineering positions no less and less about like the actual principal engineering concepts of the architecture um and so at the end of the day like nobody probably knows the totality of what you depend on. And I'm not sure there is a good answer. I mean,
Starting point is 00:22:43 I think the real answer is like you have to have a single guy that probably knows the entire setup and have like a good place that everyone can go reference and see like kind of what Casey said, like the UML diagram of the architecture, right? Like, hey, we depend on Google here, here, here, if it goes down, we can make this risk calculus. But there really isn't a good answer to this. It's probably why this, not that it keeps happening, but like that this is able to happen. Because no one's really aware how it all kind of glues together. It's hard, especially at a company as big as Amazon or Netflix potentially
Starting point is 00:23:11 where there are just a random company a random just completely at random so like for Prime so obviously we have like chaos monkeys chaos monkey kind of like in the same spirit of I don't really know what it does
Starting point is 00:23:23 at a low level I'll tell you what chaos monkey Cass Kong and Cass Guerrilla or Cass Gorilla of Cass Kong so we have three of them in Netflix I had no idea there was three of a what I got to tell you about Netflix
Starting point is 00:23:33 as you work there and I don't so at so Chaos Monkey what it would do is actually target an individual instance in Amazon and it would kill that individual instance. And so that way it's kind of to prevent any sort of like stateful operations between two separate services where I'm cashing, like I'm doing some sort of extracurricular caching and creating some sort of weird state dependency. It will start just randomly killing services.
Starting point is 00:23:56 And that runs in production every single day as far as I know at Netflix. Chaos Kong or Chaos Guerrilla will take down an entire service. That's why we have something called Blue Services at Netflix. So if you look into Blue Services, that's like how we do. a lot of default responses and things like that. So if a service goes down, you're able to have some form of data coming up. That's why we have a default low-lomo. Because if you can't reach GPS or map or, you know, all the things that go into creating,
Starting point is 00:24:22 there's like pitcher, there's wrecks. There's all those services altogether. If one of them goes down, then you get what is the default low-lomo, which is like the latest known highest-ranking videos that will all come down. So you'll get like stranger things right off the rep even if you've seen it. And there's no new episode. Just to translate this for Prime, using knowledge. I gained from Prime Stream, they
Starting point is 00:24:40 use the word Lola Mo to mean list of list of movies, I believe. I didn't know that for the longest time. Yes, but we now technically are a LoloRomo, which is a list of recommendation objects for movie objects. Okay. That's a new one. I can explain that for another one. I actually was the first person to implement that.
Starting point is 00:24:56 It was very tricksy, but we got it done. Anyway, so with all that, then, Chaos Kong, which is you will see emails about that. So if you just search in your email Chaos Kong, it usually runs on a Wednesday and what will end up happening is that we'll actually take down an entire region. So not just a specific service, we'll actually take down U.S. East 1 or U.S. East 2 or West 1 or somewhere in EU, and it will be ran in production.
Starting point is 00:25:21 And usually it's for like one or two hours to make sure that if some region goes down, Netflix continues to operate correctly. So those are the three levels. And so this doesn't help because the problem is that if every single one of these services has some sort of dependency, say, on Google, you can't, take down a region to test this because you don't take down a region, you're taking down an external service, which would crash every single sub-service, which would take down all regions. And so there is no overflow mechanic.
Starting point is 00:25:46 So very interesting thing to do a chaos monkey with, which is to kill that service, or a Chaos Kong, at least, would be to kill that entire service for a moment and see how does your system handle, say, a Google going down or a something. But then you don't also know, like all these, like you said, like no one would know that Cloudflare was going to get taken down because Google, did an access to a no pointer. I have an idea. I have a great idea.
Starting point is 00:26:10 You guys can have this for free at Netflix. I'm just giving this to you. I'm giving this one to you. I think he's on the same page as me. I'm curious what this is. Okay. I'm not on the same page. What's happening?
Starting point is 00:26:19 So here's what you do. Because obviously you do have a problem because Netflix is like the whole point of the service is supposed to be able to play movies so they'll stream movies at home or wherever on my device is. And you get in this situation,
Starting point is 00:26:33 Google Cloud goes down, Amazon goes down, Whoever's storing the data goes down, I can't get the movie. So here's what you do. You get a really, and I mean blocky, compressed version of Baby Shark. So we're packing this thing into like a megabyte, right? Okay. And every single server node you have, no matter what it's running, keeps that one
Starting point is 00:26:57 megabyte in memory. So that if everything goes down, Netflix will just stream Baby Shark to everyone. so that at least something is playing during an outage. And then you can just be like, you can make up something about what happened. You're like, oh, sorry, you know, like our recommendation service was a little on the fritz, so it just decided that everyone wanted to watch Baby Shark. And then it will seem like you just have like infinite uptime. There you go.
Starting point is 00:27:21 Can I tell you a fun side story? This just reminds me of a side story. I tried to convince our marketing wing to do something kind of similar in a similar vein. So effectively, they did this once. So if you look back in the day, this must be 2016, 2015. 15. We actually leaked, I think, the second season of House of Cards early. Okay. And everybody was like, oh, my gosh, House of Cards.
Starting point is 00:27:42 And then you got, like, for 30 minutes, it was up and people could go watch it. So we actually, like, had content that we shouldn't have had leaked. And so then there's all these articles written, all this hype built and all this. And so I actually want to, I know this is not actually related at all to your idea. It's just, it just reminds me of it. I actually, I tried to convince Netflix to leak shows more regularly. Just for like a half an hour, because then you get all these articles written and everyone gets super hyped up. And it's like such a sweet way to mess with people.
Starting point is 00:28:11 And it looks like it's an accident. And so you can have this whole idea of like accidental content and just being like, oh, our fault. You know, it didn't mean to do that one. And then published like a fake apology every time, right? It's like making up service names like our Gorgonzola service failed. And then they're just like they don't actually exist internally. Great. Love Galacticus. You know.
Starting point is 00:28:33 Sorry, low-level, you wanted to say something that was not my ridiculous baby shark joke. It sounded like you were going down the path that I was going to go down. But you didn't. It's okay. You had a completely different path. Yeah. I propose chaos schlong. What you do is you, this is actually real.
Starting point is 00:28:49 Like what you could do is you could create like basically a firewall red button layer in between a cloud service provider and like whatever service it is. So just like you're doing like chaos dong, what are they called chaos Kong? You know, chaos con attacks at Netflix. You can also do, and then if it's a regional cloud-based thing, it'd be chaos Kong dong or chaos, you know, if it's just for, you know, a small little service, chaos monkey dong. But all you're doing is you're red-buttoning the cloud firewall to see, like, if Google goes away, what happens, right? My rate is roughly 1,200 an hour. That will be 120 bucks. You can bill it to Netflix trash.
Starting point is 00:29:27 I'll give you my info. Obviously, the only problem with that notion is that there's all those third-party. one, like the Cloudflare one, that leaves your cloud or leaves your private network that you can control your whatever, however you're controlling the traffic through your private network, you wouldn't be able to control that. And so they would still remain up, which would hide where the problems were actually asked. This is, this is like the sole problem, which is the internet is supposed to be distributed. It's supposed to be the people-owned thing.
Starting point is 00:29:51 But the reality is that the internet is largely power accumulated. You know, you only have a couple of sources. Can you explain why that wouldn't work again one more time? So you're saying... Okay, so let's see, I have three services. I have Google login, and I have Cloudflare AutoRag, which I assume is something to do with AI. You would program in your little cluster of machines
Starting point is 00:30:13 that any request going out to Google automatically subsplodes. It's not a cluster of machines. It's called Donkey Kong or something. I don't remember what he called it. It was like Dongky Kong or whatever. Donkey Kong Schlong. Yeah, that. When that thing's on, it takes the cluster machines and says,
Starting point is 00:30:28 okay, any network traffic that matches this right? which of course don't do that, but let's just pretend you do it this way. This rejects, we just return an auto 500. Okay, that works on the Google sign-in. We fix the problem. But the Cloudflare request to auto-rag, you don't solve because that request to Google happens outside of your purview to, like, affect. And so that's where this big problem is, is that you can't actually simulate it.
Starting point is 00:30:53 You need a black hat team to go actually crash Google and see what happens for the day. And that's the only way to actually implement Donkey Kong Shlong or whatever it's called. It's crazy. We'll always been here for 20 minutes in case he's saying shlong and dong, true. I mean, couldn't you also catch this too if like, so I'm saying like basically any external dependency you guard with a red button that you can test, right? Because then like then you could test. It's not so much like that you're depending on that to tell you when Google or Cloudflare or whatever goes down. It's more like you're able to test like, hey, from a risk calculus perspective, if CloudFlare.
Starting point is 00:31:25 Oh, but you're saying you don't know the dependency, I guess. That's the problem. It's the same problem I was talking about originally. just like if you don't if this isn't or worse yet I guess we're if you combine the two problems that we've talked about it gets even worse because like I'm saying like oh it would be nice if they said what they depend on but what you were saying the level is that like it's unclear that anyone at the company may even know the answer to that question always like dude forgot that oh we actually do call like the Google service in this one place and we forgot about that and we didn't put it on the diagram and
Starting point is 00:31:54 so then we went down because you know blah blah blah so yeah I guess it's like what I'm saying is just an observation of like this increases, this decreases your ability to analyze your risk and there may be nothing you can do about it. Yeah. I think I'm in that team. I think I'm literally in the team of like, that's just how it is. Figure it out. Like, you know what I mean?
Starting point is 00:32:11 Either roll your own everything or forget it. I mean, do companies do that? Do people just like, I just don't want any dependencies? I'm so happy you said that. Trash, I'm so happy you said that because that's the next thing I wanted to talk about, which was my happiest part about this whole thing is knowing the level of tweets that were going to happen from DHH right after the. this thing started going down.
Starting point is 00:32:30 I was so stoked. I'm like, here comes DHH. He's going to be coming in hard. He's going to be like on-prem solutions. Exactly. At what point do we just have it all under our own room? He was living the dream life. They're like, oh, it went down.
Starting point is 00:32:44 Oh, I didn't know that. Was he tweeting a lot? I didn't see. Was he going nuts? Oh, yeah. He said there's even, I saw more tweets about people getting worried about DHS tweeting than DHS tweeting, but there were several of them.
Starting point is 00:32:59 That's awesome. I love that. That's my favorite aspect of the whole thing is that someone who has been right this whole time, who everyone makes fun of, and then the thing happens that he talks about, and they're all like, oh, great. He's going to be insufferable now. It's just like, or he's just going to be right, and he told you so. Mm-hmm, mm-hmm.
Starting point is 00:33:20 Okay, no one else, notice, whatever. I thought it's fantastic. We don't need to discuss that part. I was living real life, touching grass. I missed it all. Okay, so I want to tell you. So something funny happened to me is that I was streaming during that day. And I got done my stream.
Starting point is 00:33:36 I ended at like a, uh, I think it was the hour that cloud flare went down and then I had a whole bunch of programming that I had to do. And I just proceeded the program until like three or four in the afternoon, open up Twitter and everyone's just like, oh, I can't work. The life's ruining.
Starting point is 00:33:51 The internet's down. I had no idea the internet even went down. I was just in my own little world with my headphones on and I missed the entire like experience of the internet exploding. Yeah, I literally had no idea either. I lost out. I don't actually know like what exactly went down like in my own. Like what do I use every day that?
Starting point is 00:34:08 Everything. Everything went like I remember like people couldn't we like on our discord people couldn't upload images. Could Tesla start? Could you drive the Tesla? I don't have a Tesla. I refuse to use a computer that is connected to an internet. I mean, a car that is connected to the internet.
Starting point is 00:34:24 I was like Casey, you're on a computer that's connected to the internet, buddy. Casey, I used to have that. I used to like refuse to do smart thermostat, smart cars, and then I just gave up. I was like, nah, whatever. Don't do it. If they're going to kill me, they're going to kill me. You know what I mean? I know what happened.
Starting point is 00:34:39 Your wife bought that Wi-Fi lamp. You uploaded a picture of it, and after that you're just like, go. G-G. My wife, love her to death. She's actually in the other room. She might walk in here while I'm making fun of her. She bought a lamp that was like IoT. And to turn on the lamp, you had to download an app that connected to the lamp over
Starting point is 00:34:56 Bluetooth. Oh, hold on. And then you had to put your Wi-Fi password into the lamp for it to connect to the internet to upload its firmware and then you could turn the lamp on. Exactly. Yeah. And then it bricked itself somehow and it's useless sitting in my bedroom school. It's crazy, man. So ever since then, oh, and also, I had to punch a hole through my firewall for it to work.
Starting point is 00:35:17 I had to make a separate VLAN on my network. And on that network, I had to basically allow, I block traffic by country. I had to allow a different country into my network for that lamp to work, which is pretty same. Was it Russia? China. China. I had allowed China's traffic to my network. Never heard of them, yeah, yeah, yeah.
Starting point is 00:35:36 I love that you allowed China in from a Wi-Fi lamp that uploads passwords. That definitely wasn't worth it. Needed the lamp, baby. Did it at least change colors? Was it like a huge? Oh, yeah, it was cool. It also had a timer in it that when you, like, oh, it's 630, sunrise is 645, it'll slowly bring the room up. to light. It'll be like, and it makes like bird noises.
Starting point is 00:35:58 It's like kind of a red one. I actually have a clock that does like does that. Yeah, but it broke. So it's, yeah, whatever. I love how like, no one can program without the internet. Like nothing you just said requires the internet at all. But they're like, you know what? We need this lamp to connect to China. The only way we can sunrise in America is a server in China. That's right. That's the new Ronald. We need a convenient way to ask you for your Wi-Fi plain text password. But like, you know, I'll let up you what it is.
Starting point is 00:36:25 Yeah, man. it's crazy. Like GPS exists. GPS has a time source. Just use that. I don't know why I needed to Wi-Fi access. Yeah, honestly, peak tech was the clap-on, clap-off flights. Dude, you walk in. Remember those commercials? Everybody? Old people?
Starting point is 00:36:37 They're all getting out of bed. They're like, that was peak technology. Yeah, dude, so good. I will say the fact that we have to sign into something to use it, for me personally, is single-handedly the most frustrating aspect of modern life. Like, anything that I
Starting point is 00:36:53 purchase that requires that, I want to throw it out the window and I recently got an automated dog food feeder for my dog and get this. I want you to just sit down and listen to this. There's no internet to it. There's no Bluetooth to it. It has a record button so you can just be like, hey dog, it's time to eat. It records your voice up to 10 seconds and you can have up to four feeding sessions a day. That's pretty cool.
Starting point is 00:37:23 We have this automated your life. We have basic technology. It exists. We have this as well. We have this for our cats. It says Bon Appetit Tiny Puss. Every time it puts the little stuff. Let's go.
Starting point is 00:37:37 What? That's awesome. It does. Like in your voice? Or it's like a song? Yeah, yeah, yeah. We recorded that. It's like Bonapeteet tiny puss.
Starting point is 00:37:48 Is that how you say it? Bonapeteet tiny puss. And then all the stuff, the food comes out. He's Mario all of a sudden. Tine Puts. The best part about this is you have to remember, this is an automated feeder, and there's two of them because we have two cats,
Starting point is 00:38:07 and there's one for each cat, and these feeders go off, you know, four times the day or something. So if there's guests over, right, it's just like, oh, hey, how's going? Like, in the background, I was like, part of a tit, ta'clock.
Starting point is 00:38:23 It's like, ah, this cat, Peter, don't worry about it. Yeah. That's awesome. It's really cool. It's totally adorable, bro. It's just a cat meter, okay? I'm weird about it. But okay, so this cloud thing, we gotta get back to the pop. Like, I just got to be like, all right, we're just blowing past that.
Starting point is 00:38:40 I've never heard anyone call a cat a tiny pus before. Oh, dude. That is new. That is no. I totally abuse. I abuse that so much all the time. You abused the push? Oh, yeah, yeah, yeah.
Starting point is 00:38:56 What is that? happening right now. Yeah, yeah. I mean in exactly the way you're doing now. Like, I love double entendre. What can I say? I just, any, it's always good for me, like, I always like when you can accidentally accidentally make it sound dirty.
Starting point is 00:39:10 I'm just gonna, I'm gonna like it. I'm gonna like that. I swear T.J. leaves for one stream. Oh God, you're right. This is he would love this. Yeah. I don't know if you would love it. I don't know if you would love it, but he'd probably shut it down real quick.
Starting point is 00:39:25 He'd buy it. We would have been on different topic. Family-friendly podcast, guys. I'm from New Jersey. I can't help myself. I'm sorry. Okay, listen. All right. Well, anyway, what were you going to say about some stupid cloud? Honestly, I have no idea at this point.
Starting point is 00:39:41 I didn't want to say that. Okay, all right. Just, I guess, okay, so low-level learning. We had a thing where you weren't allowed to plug in devices on the Netflix network. Yeah. Why are you giving up? Why not just not have, like, do you really need, sunset lamp or can you just
Starting point is 00:39:57 turn on a lamp so my thing is like how do I say this I for a long time lived a very paranoid life where I wouldn't do certain things and I wouldn't use modern comforts out of fear of being targeted. But then I learned
Starting point is 00:40:14 that you wouldn't use like something that help us understand that. No literally like a smart thermostat I refused like I and like me 10 years ago would not have driven a Tesla because like oh it's you know lithium ion batteries attached to a computer on the internet and they could blow up, right? Like that, that crap used to be what I would say. But then I realized if you're being targeted, they're gonna get you, dude. It doesn't matter. That's literally my way of life. Yeah, what? What?
Starting point is 00:40:37 That's one password. I have faith in humanity until it turns on me. So like, if I go to like a public space, I'll just literally leave my wallet just standing or sitting right there with my case. Hold on. Stop. Wait, okay. I'll literally just pull my walls like right there and open. Like I'm just like, the one's going to steal it. Like, I have faith in you, man. So far hasn't yet. Oh, my goodness. Just jinx myself. But, like, I leave my stuff. Like, this is, like, at, like, jims that I go to, like, where I kind of know everybody. But, like, I have faith in humanity to where, like, if they're going to get me, they're going to get me. If they want to get my wallet,
Starting point is 00:41:08 they're just breaking into my locker. I don't. Time on. Time on. I just, I just, I'm speechless. Like, there's a difference. What? I just leave my wallet out because if they're going to get me, first off, there is a shadowy cabal that's going to get somebody for a very specific. reason. Leaving your wallet out for
Starting point is 00:41:27 anybody to steal is not smart. Okay, okay, here's a better example. Sometimes I don't lock the door in my house. Like if they're going to get me, they're going to just break the window, kick the door down. Yeah, but you live in California though. That's different. That's what you guys do out there. I mean, I did
Starting point is 00:41:43 on the East Coast too, you know? Leave your keys on the car. All I'm saying is if they're going to get me, they're going to give me, I'm going to live my life. No, but that's where I'm at, right? It's like, back to your question prime, like, a corporate corporate network has more to lose, right? So like maybe there should be rules in place to not plug in random crap, but like at my house,
Starting point is 00:42:00 like I'm gonna do my best. I'm not gonna go on fricking Tmu and buy a router from Tmu for $14 and then like be surprised when I get hacked. I'm not gonna do that. But I'm also like, I'm gonna use the internet lamp. You know what I mean? Like I'm just whatever. I'm over the paranoia, just for myself personally.
Starting point is 00:42:14 So have good hygiene, like use like shower, no, but like use a password manager, right? Do all that stuff. Like VPN's when appropriate, but I mean like I'm not gonna like not use certain technologies out of fear anymore. Just how I've decided to do. What about a pain of an ass? Like to have to sign in via Bluetooth, send a password so it can sign into your network
Starting point is 00:42:34 just to turn on a lamp. Like that's not even about getting hacked. That's just an ass pain. I don't want to use that product like ever. Yeah. I don't care how great the sunrise and bird calls are and how. Do you hear that? That's actually a Southwest limit.
Starting point is 00:42:48 That's specific to a region. Like I don't even care. Like there's no money you can pay me to sign into a lamp to make it turn on. That one's at the far end of the spectrum. I agree. But like, you know, that's the counterpoint. That is the legitimate counterpoint because it's like what you're saying makes total sense.
Starting point is 00:43:03 Like if your threat model for why your stuff in your house isn't going to work is that like, you know, some government security agency is going to target you. It's like, forget it because they'll just come to your house with guns and take you. Right. They don't need to hack your lamp. It's not necessary, right? But the they in your sentence could just as easily just be incompetent people. at the actual manufacturer who cannot get the freaking lamp working.
Starting point is 00:43:28 And that's why I don't buy these things. Like, I know that if I bought a nest thermostat, I guarantee you the temperature in my house would be random all the time. And that is because I also seem to have uniquely bad luck with this sort of thing. So, you know, all the people who are like, what are you talking about? My Linux install work the first time, right? It's like, for me, it's always like, I always will hit the weird case for some reason. So if, you know, if I bought a Tesla, it would, the software would never be working.
Starting point is 00:43:54 It would be, and in fact, a friend of mine actually had this. He had a Tesla pretty early. And he had so many hilarious stories. He's like, yeah, the air conditioning just stopped working for several weeks because of some software update. But then it's sort of working again now. And he's like, oh, the sunroof stopped working because they updated the software and they changed. I guess new models have that plugged into a different heart physical port. And they stopped sending the things out the old port.
Starting point is 00:44:16 So now his sunroof is just closed permanently. Like, all of then I'm like, yes, that is what would have happened to me if I bought a Tesla. I don't want any of that. I just want to know that, like, the sunroof is. on a motor and unless the motor dies, it will open, right? Yeah. And it's less failure points. I will say, like, I don't like being on the cutting edge of new technology.
Starting point is 00:44:33 So, like, I own a Tesla, for example, right? And to that point, I recently went to go drive somewhere. I sat in my car and then on the Tesla, it said there has been an electrical failure called Tesla. And I'm like, what the hell is that? And so what I had to do is have them tow the Tesla 45 minutes away and fork out $5,000. And because it's Tesla, it's proprietary. So they can't tell me what happened. They're just like, yep, sorry, we fixed it.
Starting point is 00:44:54 Here you go. Give me your money. And so I feel you in terms of the ease of use on technology. Things that are well-solved problems, I will use newer technology. But like electric cars, I have a Tesla. I'm probably going to get rid of it for that reason. I don't like effectively being the market test monkey, right, for these bigger companies. And they're not serious about fault tolerance.
Starting point is 00:45:17 Like that's the thing. It's like this car was not. I mean, maybe the brakes were, which is good. But like the actual ability to drive the car was not as far as like. can tell, right? And use the features of the car. Yeah, I would imagine, like, internally there's, like, some real-time, hard system for braking and stuff, but, like, the U-X of turning on the car, maybe not so much. Maybe not, yeah.
Starting point is 00:45:38 Sorry, you go trash. No, I was going to say. You just go, okay. Prime is so mad. What is happening right now? I don't know. He's melting down. He's not mad.
Starting point is 00:45:46 He's, like, so disappointed in us. I just want to get this off my chest. So disappointed at us right now. All I want. I don't really care about. smart houses or anything all I want is the fridge that has the clear door so I can just look inside of it without opening the door oh wait what I want you can't glass door is that what you're asking you know the fridge is where you can
Starting point is 00:46:07 see before you open the door that's all I want they have that you can buy that I know but it's just glass why don't you just get triple pain glass or something like why do we have to have technology that involves a camera when we already have something you could see through called glass no that's what I'm saying I just want a simple glass he wants the glass one he's agreeing with you I don't want the technology this isn't even about technology anymore
Starting point is 00:46:29 I just want to frame of glass I'm on your side I'm on your side friend thank you because when I hear everyone like all the nest thermostat is good I'm like bitch it is I set my AC to 69
Starting point is 00:46:42 and I never said it again this same here what the hell do you need a smart can I tell you can I tell you okay hit us up with it This is the most American thing you're ever going to hear about here. I'm laying in bed.
Starting point is 00:46:58 It's 9.30. My son, two years old, is asleep. I can see on the baby monitor that his room is a piping hot 74. Okay. I don't want to get up. I'm in bed. I brush my teeth, my envisalines, and you know what I do? I go to my phone.
Starting point is 00:47:10 I can change the temperature in the house from my bed. I don't have to get up, baby. You can try. You can try to change the temperature and everything. I have two days going. I'm set. What? I turned off the camera and only.
Starting point is 00:47:23 listened to audio. And for most of our kids' childhood, I didn't even have one of those. I slept near enough that I could hear them cry if they got really honorary, and that's it, because you don't need to listen to everything. Why are you filling your life with this shit?
Starting point is 00:47:39 Just stop it. You don't need those, dude, it's 74. It sucks, but that's totally normal temperature. But the moral of the story is I changed that shit for my phone, dude, it was great. I don't have to get up, so that's why I use. That's not the moral story.
Starting point is 00:47:53 Prime, just more heated now. Wait, so I saw in chat, Brian, are you mad? And then the Google Cloud went down and my baby froze to death. But you know, that one day that I could change the temperature from bed, it was all worth it, right? Google Logging stopped working, so I couldn't heat. I couldn't show my house. It was 89 today in the summer. Prime, are you mad because he bought a Tesla?
Starting point is 00:48:13 Me? No, I never bought a Tesla. Oh, Began said you bought a Tesla. That's why. I have a 2006, what's called Chevy flatbed pickup that has manual window roll. Dude, when I went up with Prime and Tees in L.A., they saw my car, and Prime was so disappointed. And he was like, and he was like, you need to get your priorities straight in life. What was the car?
Starting point is 00:48:35 I have a Porsche. And he just looked at the ground and was like, damn it, trash. Get your life on track. It was so funny. I felt so bad. I was like, I guess I'll just drive home now. And it just like drove home in shape. Let the man have his horse.
Starting point is 00:48:53 My bad, dude. That is a crazy message. Holy shit. We lost the baby because it had an upstream debt on GCP. Yeah. Dude, but that's like actually, okay, that is like the world we live in now. Like, we live in a world where computers have crashed planes into the ground and everyone died. That's not, like, fictional.
Starting point is 00:49:12 That's terrifying. So, like, sorry. Like, it really is. People underestimate the risks from actually increasingly putting technology in places where they could have bad results. I agree the baby froze to death is probably an exaggeration in this case. But like things like that are not far, right? And it doesn't have to be about security exploits. It can just be incompetence.
Starting point is 00:49:34 That's the scary part. You guys have heard of the Toyota 2009 case, right? Yes, I'm deeply familiar. I was this the pre-6thceloration case? I was like, what, 16 years ago? I don't know. So I don't know if it was just Prius Casey. I think there was a lot of Camry's too.
Starting point is 00:49:47 But basically like, so the way the ECU worked on Toyota is it had an R-Toss in it. And inside of Artax is you have like the thread control block for all the threads that run on the Arthos. Time out time operating system. Real time operating system. Sorry. Okay, okay, okay. Got, got, got, got, got. But basically, the way that it was designed, if, like, a single bit got flipped in the TCB, the task that ran the brakes would not,
Starting point is 00:50:11 no, sorry, that ran the accelerator would not properly halt. So, like, the cars were literally just randomly flooring it, and there was no way to stop it. It was the way that they wrote the braking logic, because it was a flight. by wire braking. It wouldn't actually actuate the brakes. So people from like 09 to 2012, I think, were like just randomly getting caught doing like 120 miles an hour and flying into a brick wall because the ECU was like crashing non-gracefully. I didn't know that they ever root caused that. That's always they actually found they actually did.
Starting point is 00:50:40 Oh, I shouldn't say root cause it. I didn't know they ever published what it actually was. So it was it was closed privately. But Michael Barr, who was a consultant that was doing the analysis, has published. some of his slides of his findings. I think a lot of it is still under NDA with Toyota, but he basically like published what he thought it was before it went public,
Starting point is 00:51:00 public internal, right? And he said basically, yeah, like this thing is a landmine. If a bit flips here, here, here, or here, the ECU will just run on forever in undefined behavior. And that includes like, oops, the brakes don't work. Wow. Just recently, like just this year, or maybe last year, Tesla had it where
Starting point is 00:51:16 there was some sort of fire that happened and people couldn't unlock the door because there was a power failure. in the Tesla and it just like the door shut and the people burn to death inside their own car. Fail open is very important. In fact, in fact, we can tie this back here at the, because we are at our hour mark, I think,
Starting point is 00:51:33 or getting close to it. We can tie this back to the original topic of the day. One of the things that Google included in their like, Mea culpa, we took down the entire internet, sorry about that, our bad, was at the end they were like, what are the steps we're going to take
Starting point is 00:51:49 to try and improve our process, you know. And one of the things they listed was this service should have failed open. Like instead of the default for when the quota system goes down being deny everything, maybe it's just approve everything so the services can keep running. I don't know how that necessarily, but maybe you could do something more cabined where it's like approve any reasonable, like approve anything that doesn't allocate new, too much new space. I don't know what they meant by that exactly. But they were like try to make these things fail open so that it won't just take down the internet if like the quota check fails.
Starting point is 00:52:21 That was what I assumed that they meant, but I don't really know because, again, it was a very terse statement. And I only read it for this podcast, right? I wasn't, I wasn't like, I don't study this stuff, so I'm not sure. But so fail open is a pretty important principle. Like should something fail closed or should something fail open? And if you have like door locks on a car, ideally, if there could be people inside, you always want it to fail open. So that there's no way for like a car computer system to fail and then trap the people on the inside. You want to fail if the system for locking goes down, you want it to be open, right?
Starting point is 00:52:53 Because worst case, people steal your car or something like that. That's a lot better than burning alive. It is tough. It is tough. It is tough from like, I think a design perspective. It is. Because the control plane software that crashed was the authorization software, it's like, I agree with you. It should fail open.
Starting point is 00:53:09 But like, to your point, what does fail open mean in an authorization case, right? You can't just be like, yeah, go do it, whatever, because that's bad. But inversely, it's denying everything killed the internet first. minute. So what do you do? It's tough. Prime. Go ahead. I was about to say, you know, this fail open thing, that also, like, there's, you also have a problem with this, even with just the physical reality. Remember how Tesla was
Starting point is 00:53:28 claiming that they're, whether or not it actually is bulletproof, their windows were bulletproof? What happens when you drive a cyber truck into a pond and you need to break out of the window? I actually have a key chain. Right. Yeah, yeah, I have a key utility that breaks windows, but normal windows, they just, you hit it once and they explode
Starting point is 00:53:44 into a thousand little pieces. What do you do when you hit it? It just goes, and you're like, oh! Yeah. Well, this is like the locks, right? It's like one of the reasons that you want a lock to always be possible to do physically, if you could, right, is for this reason.
Starting point is 00:54:01 It's because if you think about what fail open means, well, if a mechanical element has to push the lock up and down and it's not accessible to humans, it's almost impossible to build a fail-open version of that because if the electrical system fails, the locks cannot actuate. So there's really no fail-open possibility.
Starting point is 00:54:17 So, like, again, like just smart mechanical design is like your car should have a way that you, with your hand, can unlock the door from the inside if you needed to, right? And like anything short of that is just kind of bad safety design, right? And the same is true for internet services to the extent that you can do them. You want to be thinking about that the same way. It doesn't quite analogize exactly, but it's similar, right? That is interesting because, like, the Tesla, you know, you have, the way that the doors work is you have a button that you push that electrically pulls the window down and opens the door for you. But there is a physical, I'm pretty sure it's pure mechanical knob where you can rip that and it pops the door open. It's bad for the windows obviously, but that's the fail open scenario.
Starting point is 00:54:55 It's like, you know, you are in a lake, the electronics are dead, pull this lever, the door will open. So I'm interesting, interested in that case if they like, if they knew that, if it was a rental, maybe that version didn't have that model. Maybe it's not mechanical. Yeah, like, how did they burn to death without being able to do that? Sorry. Yeah. That's just the things that I heard. I don't know how to exactly works. I didn't do any research.
Starting point is 00:55:15 Window-wise, it's worse because like to your. point, like you can't open the door if you fall into a lake because the water pressure is too high, so you need to be able to open the window. And in that case, I don't know what their answer is to that. Break it, I guess, or I don't know. That's a terrifying scenario. That is terrifying. Prime, I know you love Rust. Yeah. Do you think that the GCP outage would have happened had this binary been written Rust? You know, that's actually a really good question. I think that we would have still had some sort of weird state because at the end of the day, how do you have a state, let's just say the YAMO file produces something incorrectly? What happens to a GCP service if they update the policy and there's
Starting point is 00:55:55 nothing there in a thing that's supposed to be there? Like someone would have probably programmed the positive case and then what happens at the negative case? It just wouldn't deploy. Like all things would stop anyways. I'm not really sure like how this could have saved it much other than you would have had a pre-planned. Hey, it's going to explode. Maybe you could have identified it earlier and maybe the outage would have been less. I mean, I'm all for the idea of unique pointer slash smart, like smart pointers where you can't access things without some sort of like, hey, yeah, this thing's not nil.
Starting point is 00:56:25 Like I like monads, yeah, dude, get me in. Yeah, let me have a piece of that monad. I'm all about that. But at the same time, I don't know, because the real question, I guess the question I'd have to follow up with is that it's not whether or not Rust would have made this possible or not possible. It is, if it were written in Rust, would this software ever have existed beyond somebody's idea or not? And that's really the question I have to ask is, can it actually exist in production?
Starting point is 00:56:52 I'm not too sure. Right. We don't have any security exploits in Rust because the software did not get written. Right, because it never got shipped. It's never in production. Yeah, that's a thing, right? So whenever I do videos, I'd say at the end of the video about memory exploits, like, would Rust have saved this? I think the answer to this one is sort of, right?
Starting point is 00:57:10 Like because it wasn't like a necessarily like a memory corruption vulnerability or a memory corruption issue It was more just like you de-referenced a bad address I think the Rust programming model would have helped with it But like you can do an unwrap on an option that's none right and then like you crash So it's like there are still human factors in the code that could have caused it to be the same DOS condition if you know the the the ha ha object in the YAML file wasn't there right They would have made it I think less what's that they would have had an expect right? Like because they'd been like You cannot have a license or whatever it was
Starting point is 00:57:44 Without these fields and they probably would just expected Sure Which would have caused it's like no different than a nil pointer Crash because you wouldn't have known about it until runtime So I assume it's the same thing I don't think Russ would have helped at all here Yeah it would have been nice if they had said What language it was written in
Starting point is 00:58:00 Show us a little bit of the code like Told us if this was Gemini Right this could have been like somebody vibe coded this And like hey man I'm pretty sure The AI said it was great You know, like, but there was no information. So we don't really know, like, human factors-wise. We have no idea what happened here, really, because they were very, very, like.
Starting point is 00:58:18 It was actually written in Rust, is my guess. And they actually don't want to release that information that it was written in Rust. By Gemini. It was written in Rust by Gemini. As part of the greater agenda of the Rust Industrial Complex, you sheep will need to wake up. Put up the board. Where's the board with all the connections between the Rust Foundation and the, like, Where is the diagram?
Starting point is 00:58:41 We need the little conspiracy diagram. All right. Are we done? Are we done? Are we, have we wrapped? I think we're wrapped, right? We got tons of great content in here.
Starting point is 00:58:52 No one could be unhappy with that. No one could be unhappy with that episode. Chad, you just, that was a golden episode. What more do you want from us? Low level, low level, everybody. Lo-level, everybody. Terminal coffee and him living.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.