The Changelog: Software Development, Open Source - The BSOD CrowdStrikes back (Friends)
Episode Date: July 26, 2024Robert Ross joins us in CrowdStrike's wake to dissect the largest outage in the history of information technology... and what it means for the future of the (software) world....
 Transcript
 Discussion  (0)
    
                                         Welcome to Changelog and Friends, a weekly talk show about Citrix thin clients.
                                         
                                         Big thanks to our partners at Fly.io, the home of changelog.com. Over 3 million
                                         
                                         apps have launched on Fly. You can too. Learn more at Fly.io. Okay, let's talk.
                                         
                                         Hey friends, I'm here with a new friend of mine, Shane Harder, the founder of Chronitor.
                                         
                                         Check him out, chronitor.io.
                                         
                                         It lets you keep tabs on your cron jobs, Linux, Kubernetes, Apache Airflow, Sidekick, and more.
                                         
                                         With over 12 open source integrations, you can instrument all your jobs no matter where you're running them.
                                         
                                         So, Shane, for me, I'm using Linux and Linux cron jobs are by far
                                         
    
                                         the most popular in my opinion, right? But there's so many other cron like things, Kubernetes,
                                         
                                         Airflow, Sidekick. Help me understand the full spectrum of background jobs and cron jobs
                                         
                                         beyond Linux cron. Yeah, Linux cron jobs are massively popular. They are still, 40 years later,
                                         
                                         the tool that most developers will go to first when they need to start scheduling something in
                                         
                                         the background. But when you get into a team environment or an enterprise environment,
                                         
                                         there is a lot of other constraints at play and there's other considerations. And whether it's
                                         
                                         simply like redundancy that you're not going to get from CronTab itself or, you know, more like complex orchestration stories like you can get with like Airflow.
                                         
                                         We see companies eventually outgrowing Cron.
                                         
    
                                         And what we wanted to be sure of is that, first of all, like migrating from Cron to anything else is a complicated thing.
                                         
                                         So we wanted to give you tools to help you monitor that transition and make sure your jobs are working good as you as you do that migration you know and then second we wanted to give you a way to unify all these different job
                                         
                                         platforms because seldom do you have just like platform a and you migrate cleanly to platform b
                                         
                                         probably in a in a real world scenario you're running both side by side for a while you don't
                                         
                                         want to have different monitoring tools or different monitoring strategies for different for every different platform that you that you deploy. So our goal is anywhere you're
                                         
                                         running a background job, you can use Chroniter. The number one way that we ensured that was
                                         
                                         possible is by having like a really simple API that you can just use with an HTTP request yourself,
                                         
                                         which is pretty abnormal for monitoring tools. But that works in a lot of cases. But to make it
                                         
    
                                         easier than every popular job platform out there, like Linux, CronJobs, Kubernetes, CronJobs, Windows,
                                         
                                         Sidekick, Airflow, you name it. We have a Cronitor SDK that you can install that will run
                                         
                                         automatically configure your monitoring, run in the background and sync all your jobs with
                                         
                                         Cronitor the same way your Linux CronJobs will be synced. Okay, friends, join more than 50,000 developers using Chronitor.
                                         
                                         I'm one of them.
                                         
                                         You can start for free and they have a pay-as-you-grow pricing plan.
                                         
                                         Setup is too easy with more than 20 SDKs.
                                         
                                         Check them out at chronitor.io.
                                         
    
                                         That's C-R-O-N-I-T-O-R dot I-O.
                                         
                                         Again, chronitor.io.
                                         
                                         Well, friends, we're here to discuss an outage, a disaster that made history.
                                         
                                         And we have a good friend of ours here, Robert Ross, the founder and CEO of FireHarton, to help us dig into what exactly happened and maybe more pertinently how to prevent incidents at large or just deal with them.
                                         
                                         What do you think, Robert?
                                         
                                         Well, I'll do my best without wearing a monocle and thinking about exactly how this went down.
                                         
                                         But yeah, I've read every news source about it,
                                         
                                         I think, at this point.
                                         
    
                                         I think everyone's heard about it, so excited to dive in.
                                         
                                         What are you guys talking about?
                                         
                                         I'm not even sure what we're referring to.
                                         
                                         Yeah, right, Jared.
                                         
                                         Did something happen?
                                         
                                         You know what I kept thinking every time I read CrowdStrike?
                                         
                                         I kept thinking of ACDC's Thunderstruck.
                                         
                                         I couldn't quite pull the pun across, because it's CrowdStrike, I kept thinking of ACDC's Thunderstruck. I couldn't quite pull the pun across
                                         
    
                                         because it's CrowdStrike, Thunderstruck,
                                         
                                         but that song has been
                                         
                                         playing in my head
                                         
                                         probably before this happened, but it just happened to a line.
                                         
                                         I don't know. I'm an ACDC fan.
                                         
                                         What can I say? The developer may have been
                                         
                                         listening to that when they wrote the code.
                                         
                                         They might have been. That could be why
                                         
    
                                         you're there, potentially.
                                         
                                         I like to code to some ACDC.
                                         
                                         Yeah, for sure. Especially that song. That could be why you're there. Potentially. I like to code to some ACDC.
                                         
                                         Yeah, for sure.
                                         
                                         Especially that song.
                                         
                                         That'll pump you up, man.
                                         
                                         For sure.
                                         
                                         I code faster when that type of music is playing.
                                         
    
                                         That's for sure.
                                         
                                         I'm sure most folks are, to some degree,
                                         
                                         primed on what happened.
                                         
                                         But who wants to nominate themselves to explain at least a primer of what happened?
                                         
                                         I think you did it pretty well in News Jerry,, but you also covered some other sides of it too.
                                         
                                         But what do you think?
                                         
                                         Do you want to handle it or do you want me to handle it?
                                         
                                         Well, there was a giant outage on Friday due to CrowdStrike pushing a bad update to a billion machines.
                                         
    
                                         I'm not sure the exact number. but basically every Windows-based company,
                                         
                                         organization around the world
                                         
                                         was affected probably somehow.
                                         
                                         Many things were down.
                                         
                                         The banking industry got hit hard.
                                         
                                         Hospitals got hit hard.
                                         
                                         Airlines got hit hard,
                                         
                                         except for Southwest,
                                         
    
                                         which I discussed in news.
                                         
                                         The reasoning, by the way,
                                         
                                         quick update on that,
                                         
                                         I put in news was that they are allegedly still running
                                         
                                         old versions of Windows 95, 3.1.
                                         
                                         Could be true.
                                         
                                         Might not be true.
                                         
                                         Those are actually rumors.
                                         
    
                                         I thought that was a joke when I saw that.
                                         
                                         Maybe that's true, actually.
                                         
                                         It kind of was a...
                                         
                                         It duped Jared.
                                         
                                         It got him.
                                         
                                         It might be fake news.
                                         
                                         I updated our ChangeLog newsletter to make sure that it's accurate now
                                         
                                         because I thought it was funny, too, which is why I put it in there.
                                         
    
                                         And it's true that Southwest was unaffected.
                                         
                                         And of course, Southwest famously was down, was it two years ago?
                                         
                                         For 10 days.
                                         
                                         Yeah.
                                         
                                         Because they couldn't.
                                         
                                         The holiday outage.
                                         
                                         Yeah, the holiday outage.
                                         
                                         And back then, there was reasonings because they were on really old
                                         
    
                                         versions of Windows and they couldn't do stuff.
                                         
                                         And so I think those two stories combined to say perhaps their old versions of Windows have actually saved them this time.
                                         
                                         But allegedly, not necessarily the case.
                                         
                                         But funny either way.
                                         
                                         Yeah, man.
                                         
                                         I guess the way I would summarize it is the blue screen of death made an epic comeback and took over the world.
                                         
                                         Total world domination last week.
                                         
                                         Wouldn't you say that this is affected by
                                         
    
                                         CrowdStrike customers?
                                         
                                         Not just simply
                                         
                                         Windows users.
                                         
                                         Yeah, but I guess, here's what's weird about it.
                                         
                                         I had never even heard of CrowdStrike,
                                         
                                         but it sounds like who's not a CrowdStrike customer.
                                         
                                         Robert, were you familiar with CrowdStrike
                                         
                                         prior to this?
                                         
    
                                         Yeah, we used CrowdStrike at FireHydrant.
                                         
                                         Okay, so what do you use them for?
                                         
                                         Endpoint security. We run their Falcon
                                         
                                         daemon on all of the employee laptops. We don't use it for the
                                         
                                         services we provide, but it is running on every
                                         
                                         FireHydrant laptop.
                                         
                                         And these laptops are Windows, Linux, macOS?
                                         
                                         All Mac, yeah. So we weren't impacted by it, thankfully.
                                         
    
                                         Just the Windows CrowdStrike world.
                                         
                                         Yeah, that's what it seems like.
                                         
                                         And it seems like there was a change that was in the new sensor
                                         
                                         that runs silently.
                                         
                                         I think a lot of people don't even know
                                         
                                         that they have CrowdStrike on their laptop.
                                         
                                         And that's by design, right?
                                         
                                         I would say a good product.
                                         
    
                                         You don't even know it's there
                                         
                                         until it gives you a blue screen of death.
                                         
                                         It's a bad way to find out about it,
                                         
                                         but before then, brilliant.
                                         
                                         It's like you had a bunch of stuff in your walls
                                         
                                         and then eventually it falls out of the wall and you're like, oh, that's been rotting
                                         
                                         behind there for a long time. I think that the change is always
                                         
                                         the biggest cause of incidents. We see it all the time. Google
                                         
    
                                         even has a stat that 80% of their incidents are caused by a change.
                                         
                                         So it's not exactly shocking that a change caused this.
                                         
                                         I think what's shocking to people is the scale of the incident.
                                         
                                         And when you had ACDC Thunderstruck playing in your head,
                                         
                                         I kind of had Jeff Goldblum in my head where he's like,
                                         
                                         flap your wings and a hurricane happens across the ocean.
                                         
                                         That's kind of what it felt like.
                                         
                                         The butterfly effect.
                                         
    
                                         Yeah, the butterfly effect, exactly. That's kind of what it felt like. The butterfly effect. Yeah, the butterfly effect, exactly.
                                         
                                         That's kind of what it felt to me.
                                         
                                         A very simple try to access memory that wasn't there.
                                         
                                         Grounded flights still has grounded flights.
                                         
                                         Delta has canceled hundreds of flights
                                         
                                         every single day for the last five days.
                                         
                                         And I think we're just going to keep hearing
                                         
                                         about problems for the next few weeks from this thing.
                                         
    
                                         Yeah, it would be interesting if somebody could somehow,
                                         
                                         some way come up with a global economic impact of this event.
                                         
                                         But it has to be measured in billions, maybe trillions of dollars.
                                         
                                         I think so.
                                         
                                         We had employees and teammates at Fire Hydrant
                                         
                                         that had to cancel trips.
                                         
                                         I had friends that were at the airport
                                         
                                         that had to cancel their weekend plans
                                         
    
                                         that they were flying somewhere.
                                         
                                         So it wasn't only the places like airports and hospitals that were impacted.
                                         
                                         It was local economies that were impacted by this as well.
                                         
                                         Friends going to Dominican Republic that couldn't go.
                                         
                                         And it's hard to reschedule those types of plans.
                                         
                                         So it's kind of like, you know like probably not coming back, that loss.
                                         
                                         That money, yeah.
                                         
                                         Well, not to mention just labor.
                                         
    
                                         Pure labor costs of mitigation or remediation because this, unfortunately, does require, I think,
                                         
                                         direct impact with each machine affected,
                                         
                                         meaning you can't just remotely reboot
                                         
                                         these machines, is what I read. You have to actually go touch each machine and, I don't know,
                                         
                                         boot in a safe mode, or maybe you know, Robert or Adam, exactly the process. But it's relatively
                                         
                                         straightforward, unless you have an encrypted hard drive, then it's slightly less straightforward.
                                         
                                         But we're talking about people walking around data centers,
                                         
                                         going to each computer, or walking around hospitals,
                                         
    
                                         going to each computer.
                                         
                                         I mean, the amount of highly paid individuals effectively doing a mass reboot this week
                                         
                                         is probably measured in large numbers.
                                         
                                         Yeah, and even parts of the country in the US
                                         
                                         that had issues probably don't have, you know, a big workforce capable of doing this work.
                                         
                                         You think of a, you know, a giant airline, they have a massive IT team that can go and do that labor and that work.
                                         
                                         But Alaska, in rural Alaska, 911 wasn't working.
                                         
                                         People couldn't call 911.
                                         
    
                                         Really? in rural Alaska, 911 wasn't working. People couldn't call 911.
                                         
                                         And at one point, even Portland's mayor declared a state of emergency on Friday.
                                         
                                         And there's parts of the impact area
                                         
                                         that just don't have a response unit
                                         
                                         that can go solve those problems.
                                         
                                         So I do think we're going to keep hearing about it.
                                         
                                         There's going to be inquiries by the government. I think I saw today
                                         
                                         that CrowdStrike CEO is going to be
                                         
    
                                         called upon by Congress.
                                         
                                         That was news of like 16
                                         
                                         hours ago or so. AP had that out there.
                                         
                                         The Washington Post had it out there.
                                         
                                         House committee calls on CrowdStrike CEO
                                         
                                         to testify on
                                         
                                         the global outage. And not surprising.
                                         
                                         And he went on
                                         
    
                                         air pretty quickly.
                                         
                                         It was like, this is our fault.
                                         
                                         We're fixing it.
                                         
                                         And I have to commend the confidence
                                         
                                         to just go and say, own it that quickly.
                                         
                                         But, you know, I have questions.
                                         
                                         I think everyone does.
                                         
                                         Even my aunts and uncles in their late 60s
                                         
    
                                         who don't quite understand this type of world
                                         
                                         like we all do, were asking me questions i
                                         
                                         mean it had everyone felt this i think in some way shape or form well windows only there's a
                                         
                                         lot of details so i caught up with dave plumber that's literally his name he is on youtube he
                                         
                                         runs a channel called dave's garage he's an ex, from what I understand, an ex-Microsoft
                                         
                                         operating system developer. And so he knows a lot
                                         
                                         about this stuff. And I will link it in the show notes, but
                                         
                                         he was my source of literally what really happened
                                         
    
                                         on the inside. There's also the code report from
                                         
                                         I think it's Fireside or Firesomething on Fireship
                                         
                                         on YouTube that also summarizes some things that I pay attention to as well as part of like
                                         
                                         researching this topic. So there's some theories that this is just simply bad quality code.
                                         
                                         This could be sabotage or this could be planned. Now those are obviously theories,
                                         
                                         not truth at this point. But I think it's important to look at, you know, Robert, you said
                                         
                                         change is what affects things and what causes incidents. We're not sure when exactly this code
                                         
                                         got pushed, but what happened was, or at least from my understanding, and thanks to Dave for
                                         
    
                                         explaining this, is that this software Falcon as
                                         
                                         you all run as well it runs in what they call kernel mode and stop me if you've heard this one
                                         
                                         before but there's two lands to live in basically in the operating world you've got user mode and
                                         
                                         you got kernel mode and kernel mode has you know higher priority and when an application crashes
                                         
                                         in kernel mode it crashes a system and it does it by design because it's protecting the system. It's better to crash than to actually boot up.
                                         
                                         Something else worse could happen if that was the case.
                                         
                                         And CrowdStrike, their software called Falcon, lives
                                         
                                         and runs in kernel mode. And that's, I guess, by design.
                                         
    
                                         I'm not sure why it has to. And then there's this labs that
                                         
                                         Microsoft has called
                                         
                                         Windows Hardware Quality Labs that drivers that live in kernel mode or run in kernel mode that
                                         
                                         are third party, they have to go through this process to get deployed. And so it gets tested
                                         
                                         by Microsoft through this WHQL labs system to be able to be deployed to get signed and used by the operating system etc but the way
                                         
                                         they bypassed this was because in dave's words they want to be they want to be agile ambitious
                                         
                                         and aggressive to get the latest protection and so as a way to deploy this latest protection
                                         
                                         more fastly to windows users and i guess it's not the case for Mac or other systems because it didn't happen to you all, Robert,
                                         
    
                                         is that they have these things called definition files that the kernel reads from.
                                         
                                         So when the kernel wakes up, if it's a new boot, it wakes up, it enumerates a folder,
                                         
                                         and looks for this other code, this dynamic code that gets deployed outside the kernel delivery system.
                                         
                                         So essentially you have unsigned code that runs in kernel mode.
                                         
                                         That's bad stuff.
                                         
                                         From what I understand, thanks to Dave,
                                         
                                         that's a rough version of the mechanics of how this works on the Windows system.
                                         
                                         I think it's a game of trade-offs, and that's a hard thing to feel now, right?
                                         
    
                                         Like people's flights got canceled, you know, hospital surgeries got canceled.
                                         
                                         Like it's a big deal.
                                         
                                         But at the end of the day, do we, it's easy to say this was the worst thing that could
                                         
                                         happen instead of the sum of the parts of all the things that were maybe prevented in
                                         
                                         the past.
                                         
                                         And we just have no idea.
                                         
                                         I don't even think that CrowdStrike would probably know.
                                         
                                         But how many things were via CrowdStrike
                                         
    
                                         or another locking system, security system running,
                                         
                                         have prevented mass credit card theft
                                         
                                         or identity theft or other things going on?
                                         
                                         It's hard to say.
                                         
                                         No one's going to buy that now, though.
                                         
                                         Because no one's going to look at now though the trade because no one's going
                                         
                                         to look at a trade-off right now there could be like flight my flight got canceled i don't care
                                         
                                         what i don't care what my trade-offs were in the past right now the other thing that i think that
                                         
    
                                         is going to be we're just going to have to see if crowd strike post a public retrospective but
                                         
                                         this code could have been this code that is you know the the crime scene of this code could have been this code that is
                                         
                                         the crime scene of this code base
                                         
                                         that could be in there, we don't know, for 10 years
                                         
                                         we have no idea
                                         
                                         and another piece of code was deployed
                                         
                                         10 miles away or so they thought
                                         
                                         from that code base or that line of code
                                         
    
                                         calling that memory address and then that caused it right
                                         
                                         i think it's one of the challenges with building software now is like we were kind of saying
                                         
                                         earlier the butterfly effect like software is so complex now and so vast that you can deploy a
                                         
                                         change and what you think is a different country of your code base but it impacts across the ocean
                                         
                                         than somewhere else and i i would wager that's what happened here i would wager there's just
                                         
                                         no way that crowd strike doesn't have a crazy test suite that microsoft is probably running
                                         
                                         tests for them because it does run in kernel mode they have to get that approval it sounds like
                                         
                                         i just have a really hard time believing that this very simple line of code
                                         
    
                                         just got deployed and took everything down.
                                         
                                         I could be totally wrong and totally off-base.
                                         
                                         I have no idea.
                                         
                                         But whenever I've taken down production, and it's been many times,
                                         
                                         it wasn't explicitly because the one piece of code that I wrote.
                                         
                                         Because I tested that.
                                         
                                         I put that through its paces.
                                         
                                         I wrote unit tests.
                                         
    
                                         It was the combination of that
                                         
                                         and something else. When you add chlorine and vinegar, what's that potent combination they
                                         
                                         say never to do because it's super toxic all of a sudden? That's what it feels like happened to me
                                         
                                         in this outage specifically. Yeah. I mean, for me, it seems like some of our
                                         
                                         most ingrained premonitions
                                         
                                         coming to fruition
                                         
                                         in terms of
                                         
                                         being down in the mucky
                                         
    
                                         muck as a developer.
                                         
                                         We just know, and I've said many times,
                                         
                                         it just feels like we're building a house of cards.
                                         
                                         Because it's so complicated.
                                         
                                         And it's so intertwined.
                                         
                                         And it's effectively, especially with web development,
                                         
                                         we're talking about a worldwide distributed system
                                         
                                         which has things that happen.
                                         
    
                                         Of course, there's an explanation in retrospect for all of these things,
                                         
                                         but when you build a house of cards, eventually it's going to just topple.
                                         
                                         And sometimes it topples in ways that you don't know why or when or how
                                         
                                         and what will be the downstream effects.
                                         
                                         And, of course, this isn't web development in this case.
                                         
                                         This is operating system code.
                                         
                                         But still, network to machines, being able to remotely update.
                                         
                                         Every once in a while, just a house of cards topples,
                                         
    
                                         and we have to start over to a certain extent rethink things
                                         
                                         try to adjust and clean up the mess and move forward i mean i even think of for every person
                                         
                                         listening to this like think about the mechanics of what is going on as you're listening to this
                                         
                                         podcast if you're using headphones right now your headphones have software in them that is going to a bluetooth chip that has
                                         
                                         software on it that's part of an operating system that's translating that to go over the air to a
                                         
                                         cell phone tower that's running software that's going to a network switch that cisco probably
                                         
                                         built that's running software and it just goes more and then eventually hits an apple music server or some app spotify server that goes
                                         
                                         through a cdn that's software it's just software all the way down i mean it is thousands of touch
                                         
    
                                         points of software for you to hear this stupid analogy that i'm making like that that's like
                                         
                                         that and you had to go through that grueling exercise through that much software.
                                         
                                         And that's just the world now.
                                         
                                         That's the way it is.
                                         
                                         It's not going back.
                                         
                                         We can't unwind this anytime soon.
                                         
                                         Right.
                                         
                                         That's why I said sometimes you just have to clean up the mess and then obviously do a retrospective.
                                         
    
                                         And one thing we can do is make sure
                                         
                                         that particular thing doesn't happen ever again.
                                         
                                         But that's just one of the things.
                                         
                                         That's what regression tests are for.
                                         
                                         I'm not going to let this particular bug
                                         
                                         bite me and my billion customers again.
                                         
                                         And I'm sure CrowdStrike,
                                         
                                         after they go through the PR process,
                                         
    
                                         I mean, not pull requests, but public relations,
                                         
                                         because their stock was down 23%, I think they have.
                                         
                                         I mean, they are massively 23% I think they have.
                                         
                                         I mean they are massively hurt by this.
                                         
                                         Their reputation is just in the mud.
                                         
                                         So they're going to go through all that and maybe there'll be people fired.
                                         
                                         Who knows what's going to happen.
                                         
                                         But then hopefully they sit down and say, okay, let's do our analysis.
                                         
    
                                         Let's do our postmortem. Let's figure out how we can make this particular aspect of our business not hurt people
                                         
                                         again. But that's just one thing. As similar, it goes back to the conversation of information
                                         
                                         security that we're having with Jacob DePriest from GitHub's security team. The challenge of
                                         
                                         the defender is you have to defend the entire system and the attacker only has to find one
                                         
                                         hole. Bugs work the same way, only it's just accidental and not malicious, you know?
                                         
                                         And so in that conversation, I said, I feel like to a certain extent, resistance is futile.
                                         
                                         I mean, the defender does all they can, but you're still going to have the attacker succeed
                                         
                                         sometimes.
                                         
    
                                         And it seems like with software systems, the bugs are going to be there.
                                         
                                         I mean, we haven't found a way of eliminating all bugs.
                                         
                                         And so how do we build around, fortify, defense in depth, react, respond?
                                         
                                         I don't know.
                                         
                                         I think in one case, this is an advertisement for heterogeneous systems.
                                         
                                         What's the word?
                                         
                                         Not a monoculture,
                                         
                                         just like in biological systems, right?
                                         
    
                                         Like you want to have... Yeah, regenerative farming
                                         
                                         where you have, you know,
                                         
                                         you plant two crops
                                         
                                         in the same plot of land
                                         
                                         and they help each other.
                                         
                                         Yeah, exactly.
                                         
                                         Just diversity inside
                                         
                                         of our software systems
                                         
    
                                         so that when we have a problem
                                         
                                         in one particular system,
                                         
                                         aka Windows machines running CrowdStrike,
                                         
                                         that's not a worldwide global outage.
                                         
                                         That's like a regional, you know,
                                         
                                         20% of the internet was down today, guys,
                                         
                                         versus what it actually, like that whole,
                                         
                                         let's have multiple operating systems,
                                         
    
                                         not just worldwide, but even in our own organizations which can be
                                         
                                         a huge burden, a huge pain
                                         
                                         and we tend to want to normalize
                                         
                                         and streamline and
                                         
                                         formalize a specific
                                         
                                         stack of software because it's easier
                                         
                                         to maintain and manage
                                         
                                         but then you just are vulnerable to
                                         
    
                                         attacks at like a 100%
                                         
                                         scale of your organization
                                         
                                         so I mean, I think
                                         
                                         one takeaway we can have is like, hey, I'm
                                         
                                         really happy I'm running macOS today.
                                         
                                         Now maybe tomorrow, all the Windows users
                                         
                                         will be happy that they're running Windows and
                                         
                                         not macOS because something will attack macOS.
                                         
    
                                         But the Linux users are having the best
                                         
                                         time of their life right now.
                                         
                                         Oh yeah, the memes are
                                         
                                         strong right now.
                                         
                                         What is the year of the Linux desktop, as you know, Jared?
                                         
                                         I've heard that the last 15 years of my life, and it has not come to fruition.
                                         
                                         Here's the through line to all this, though.
                                         
                                         The through line is massively deployed software.
                                         
    
                                         That's it.
                                         
                                         Or massively dependent upon software in a different scenario, like a dependency.
                                         
                                         It's that this was everywhere, right?
                                         
                                         It's that this was everywhere, right? It's that this was everywhere.
                                         
                                         And then I think there's very specifically to this scenario, there are some layers that may have been not thought so well through. in his description of how they bypassed the WHQL, which is a hardware labs quality system
                                         
                                         that is there to sign these drivers to run in kernel mode.
                                         
                                         Because it's so, like what runs in kernel mode
                                         
                                         is so limited because of its power.
                                         
    
                                         And here they are able to run there,
                                         
                                         which is okay, fine.
                                         
                                         If you have to then, and Windows and that team blesses you
                                         
                                         and they put you in the WHQL system to have this signed certificate to say, okay, your driver is blessed. We've tested it to the absolute best of our knowledge. We put it through all the paces. at scale and they be the driver essentially is an engine that runs code that has not been signed
                                         
                                         or not gone through these paces that alone there is like i'd imagine robert as you look at what
                                         
                                         you do and how you help folks look at incidents it's like when we look at what we've done here
                                         
                                         we have to examine the system we built maybe it's you know anti the Windows way to have this sidecar, this folder of definitions that the driver enumerates over and sucks in and the driver essentially is an engine that runs unsigned code.
                                         
                                         That could be true if Dave's accurate.
                                         
    
                                         And if that is true, sense, but like by the relationship formed between CrowdStrike,
                                         
                                         the Falcon software and the Windows team that has WHQL
                                         
                                         to allow this to live in kernel land and not user land.
                                         
                                         That's one thing.
                                         
                                         And then you got just the ability to deploy at scale
                                         
                                         and for the system to do what it should have done.
                                         
                                         So, you know, when an app crashes,
                                         
                                         an app crashes. When the kernel mode
                                         
    
                                         crashes, the system crashes.
                                         
                                         And it crashes because it has to.
                                         
                                         Like, this is how, it did
                                         
                                         what it should have done. There was a bug
                                         
                                         in the kernel driver that when it booted
                                         
                                         up, it didn't, for whatever reason,
                                         
                                         cause an exception at the kernel level.
                                         
                                         And when the kernel crashes, the whole system crashes.
                                         
    
                                         And that's by design. So effectively it was preventative on purpose
                                         
                                         but by a bug or a faulty code.
                                         
                                         Yeah. I think as software engineers,
                                         
                                         and I feel qualified to say this because it is a criticism,
                                         
                                         is that we love thinking that we you know have
                                         
                                         invented new things and every once in a while you just kind of have to take a step back and
                                         
                                         and think of oh actually we've gone through all this process without software we've already done
                                         
                                         it and the example i use all the time is like buildings and building codes and structures.
                                         
    
                                         And when was the last time you heard of a building catching on fire?
                                         
                                         I live in New York City.
                                         
                                         There's a lot of opportunities for buildings to catch on fire.
                                         
                                         And it does happen.
                                         
                                         It does happen.
                                         
                                         But not nearly at the rate that it used to happen.
                                         
                                         If you think about the London fire, if you think about the San Francisco fire, like all
                                         
                                         of these events that occurred really just triggered new ways of thinking because of
                                         
    
                                         catastrophe.
                                         
                                         And this will do the same thing.
                                         
                                         We've been perfectly fine for however long this sidecar technology has been running in production. We've been perfectly fine with that. And then now we're not, right? Or maybe
                                         
                                         now, maybe now we're not. The same thing has happened. I mean, we have sprinklers in our
                                         
                                         buildings because of fires. We didn't put sprinklers there as a preventative measure.
                                         
                                         We had to have a lot of fires before we said, maybe we should have sprinklers in buildings, or maybe we
                                         
                                         should put concrete as the center of the building so it doesn't fall when it becomes structurally
                                         
                                         unsound. And because of the hundreds of years that we've had of retrospectives and all of these
                                         
    
                                         learnings from these types of things, we have safe buildings now. Same things with cars. You
                                         
                                         were saying the kernel panic is a preventative measure. Cars have the same thing. We have safe buildings now. Same things with cars. You were saying the kernel panic
                                         
                                         is a preventative measure. Cars have the same thing. They have crinkle zones to protect the
                                         
                                         driver. It's designed to collapse in certain ways. And we're getting to that point with software
                                         
                                         more and more. I think the challenge we have for software is it's much easier to do new things
                                         
                                         with software than it is to do new things with
                                         
                                         cars. I can go write a crazy random piece of code and put it in production today to all the
                                         
                                         Fire Engine customers. I swear I won't do that, but I could do that and it would cost nothing.
                                         
    
                                         There'd be no labor virtually, but with these other systems, it's expensive to do new things
                                         
                                         like that. So the problem I think is we're kind of getting ahead of our skis now
                                         
                                         with software, it's happening more and more and more
                                         
                                         that we're hearing about these global outages
                                         
                                         because the system is changing constantly
                                         
                                         and we're introducing change at the fastest, most rapid rate
                                         
                                         that we possibly could do it
                                         
                                         like you were saying, it's a bit of a house of cards
                                         
    
                                         this is probably just the beginning we're probably going to have another massive outage before we really start to
                                         
                                         learn oh maybe we should scale back how much we're actually changing these really complicated systems
                                         
                                         yeah and the technical details of that hypothetical future outage could be wildly different than this
                                         
                                         and so you know whereas maybe you can say, what was the cause of the fire?
                                         
                                         Well, it was a gas leak. Well, it was a person who was doing something, you know, there's these
                                         
                                         different reasons, but it's, they're all kind of like, eh, something combusted where it shouldn't
                                         
                                         happen. We didn't have, we didn't have preventative measures in place with software. So much of it's
                                         
                                         wildly different. I think it could be very hard. Now, we have had some motion in the direction of, I think it was the United States White House recently promoted memory-safe languages, for instance.
                                         
    
                                         Rust being, I think, named perhaps, but definitely the Rust stations were very excited about that particular note.
                                         
                                         So we have kind of nudges happening by governments.
                                         
                                         I know the EU is what I would call more heavy- handed with their regulation around the things you can and cannot do with software.
                                         
                                         But gosh, it just seems like because of the diversity in software systems, you can't just put fire suppression in the building and be done. There's going to be so many different things, I think. So many
                                         
                                         different regulations and rules and
                                         
                                         details in order to actually
                                         
                                         harness up some sort of protection
                                         
                                         that would be effective against
                                         
    
                                         an 80% solution even.
                                         
                                         I hear what you're saying. It's a crazy
                                         
                                         thought and I really hope we don't
                                         
                                         end up in this world.
                                         
                                         Buildings have
                                         
                                         regulated materials that they can be built in
                                         
                                         now and you can't even like children's toys can't have certain chemicals in them like it's a these
                                         
                                         are all very regulated industries and you know could software eventually get to that point
                                         
    
                                         where governments are like you can't use any memory on safe language. It has to be
                                         
                                         plus by the US government
                                         
                                         if it's being used for public distribution.
                                         
                                         Period.
                                         
                                         Could we get there? I don't know.
                                         
                                         Maybe. We've gotten there and almost
                                         
                                         everything else, people that have
                                         
                                         cabins in the woods have regulations
                                         
    
                                         that they still have to abide by.
                                         
                                         It's a wild thought. I've never really had it
                                         
                                         until you started saying that, Jordan.
                                         
                                         Well, what you're saying, though,
                                         
                                         is we get to the future innovations
                                         
                                         through past failure
                                         
                                         and retrospectives and learning.
                                         
                                         That's how we get to the future,
                                         
    
                                         is deploying what we think is the best solution,
                                         
                                         it not being the best solution.
                                         
                                         There's some sort of catastrophe
                                         
                                         on a small or large scale.
                                         
                                         We examine that.
                                         
                                         We retrospective.
                                         
                                         We policy.
                                         
                                         We regulate.
                                         
    
                                         We redeploy.
                                         
                                         And we try again.
                                         
                                         Well, the only other answer is to predict the future.
                                         
                                         Yeah.
                                         
                                         Yeah.
                                         
                                         And I think that's, to some degree,
                                         
                                         what developers are trying to do.
                                         
                                         They're at least tasked with trying to
                                         
    
                                         solve the present problem
                                         
                                         that is future proof.
                                         
                                         That has a version of future proof in it.
                                         
                                         You hear that all the time, right? This is future proof code.
                                         
                                         I've never said that about my code.
                                         
                                         Maybe not, but somebody's like,
                                         
                                         this will future proof us.
                                         
                                         Somebody's definitely said that.
                                         
    
                                         And I have always regretted it.
                                         
                                         Say feature proof, maybe maybe my code's feature proof
                                         
                                         yeah not future proof yeah feature free what's up friends i'm here in the breaks with david shu
                                         
                                         founder and ceo of retool so david retool has definitely cornered the market on internal tool
                                         
                                         software development but zoom out for me what's the big idea why Why did you start Retool? What is the big idea with internal software?
                                         
                                         Yeah, so Retool started at this point seven years ago. And when we started Retool,
                                         
                                         the core idea was that internal software is a giant, giant category that no one really thinks about. And what's surprising to most people is that internal software represents something like
                                         
                                         50 to 60% of all the code written in the world, which might sound pretty surprising.
                                         
    
                                         But if you think about it, most of us at Silicon Valley, we work at software companies, whether it's like an Airbnb, a Google, a Meta.
                                         
                                         These are all companies that are software companies selling software.
                                         
                                         And so most engineers in these companies are working on external phasing software. But if you think about most software engineers in the world,
                                         
                                         most software engineers in the world actually don't work at these software companies.
                                         
                                         There's not that many of them. There's maybe 10, 20 of them, big ones at least.
                                         
                                         Most of the companies in the world are actually non-software companies.
                                         
                                         So if you think about a company like an LVMH, for example, or like a Coca-Cola, for example, or like a Zara.
                                         
                                         Zara's not selling any software,. They actually have a lot of software
                                         
    
                                         engineers, actually. And all their software engineers, all they do day in and day out,
                                         
                                         is basically build internal software. So that's, I think, one reason we started Retool.
                                         
                                         The second reason we started Retool is if you look at all this internal software that people
                                         
                                         are building, it is remarkably similar. So if you take a look at, you know, like a Zara,
                                         
                                         for example, versus Coca-Cola, two very different companies, obviously.
                                         
                                         One a clothing company, one a beverage company.
                                         
                                         But if you actually look at the software they're building internally to go run their operations, it is remarkably similar.
                                         
                                         It's basically forms, buttons, tables, all these sort of pretty common building blocks, basically, that come together in different ways. But then if you think about, you know, not just the UI, but also what's the logic behind a lot of this stuff,
                                         
    
                                         they're pretty much just hitting API endpoints, hitting databases. You care about authentication,
                                         
                                         you care about authorization. There are sort of a lot of common building blocks, if you will,
                                         
                                         to internal tools. And so for us, the insight was, wow, internal software is a ginormous category,
                                         
                                         and it's all so similar, and developers hate building it and so
                                         
                                         could we create a sort of higher level framework if you will for building all this software and
                                         
                                         that would be really cool that would be really cool okay so listeners retool is built for everyone
                                         
                                         built for enterprise built for scale built for developers and that's you and if you found
                                         
                                         yourself nodding your head to what david was saying then check out retool at retool.com slash changelog it's the fastest way to build internal software
                                         
    
                                         do yourself a favor get a demo or start for free today again retool.com slash changelog
                                         
                                         i really come back to this at scale situation. I think, you know, when we have the larger catastrophes, outages, etc., it's because of widely deployed code, which is a great thing because that code is somehow widely useful.
                                         
                                         But then you've got to have certain things in place that once you're maybe at that level, certain things that have to take place to instantiate change. Because like you said earlier, Robert, it's usually change,
                                         
                                         and not so much that specific change, it's that change plus something else
                                         
                                         that's the unintended consequence of those two together.
                                         
                                         And I did look up, by the way, just because I was like,
                                         
                                         what actually happens when you combine chlorine bleach with vinegar?
                                         
                                         It produces chlorine gas, which is highly toxic, so don't do that and the reaction is just i couldn't remember not good at all baby pad
                                         
    
                                         yeah it's not good at all i mean it it will damage your eyes respiratory your respiratory system like
                                         
                                         your breathing it's it's just not good at all so never we learned that the hard way you know
                                         
                                         somebody somebody did it yeah see exactly but now we know someone did it. Yeah, see, exactly. But now we know. Someone did it.
                                         
                                         I like noticing obscure signs in public places because they're always indicative of some sort of incident.
                                         
                                         Every sign has a story.
                                         
                                         Yeah, I remember I was at a hotel one time
                                         
                                         and I was hanging out in the pool or maybe the hot tub.
                                         
                                         And there's a sign that said,
                                         
    
                                         this pool is not for defecation purposes.
                                         
                                         Yeah, which was a very strange sign. And that might not be verbatim. And I can't remember if it was a defecation purposes. Yeah, which was a very strange sign.
                                         
                                         And that might not be verbatim.
                                         
                                         And I can't remember if it was a defecation or really,
                                         
                                         you know, it was very formal though.
                                         
                                         So I probably did say that.
                                         
                                         And I thought, yeah, somebody pooped in this pool at one point.
                                         
                                         And there was an incident where they said,
                                         
    
                                         we got to put a sign up.
                                         
                                         Or someone watched Caddyshack and was just terrified.
                                         
                                         Just baby ribs.
                                         
                                         Yeah. and was just terrified. Just baby ribs.
                                         
                                         So yeah, we learn from the hard way most of the time because we can't predict what will happen
                                         
                                         when we combine those two elements until somebody does it.
                                         
                                         And sometimes what happens is we go too far, honestly.
                                         
                                         We, governments, teams, whatever it is,
                                         
    
                                         the reaction can almost be too much. And I really do hope that,
                                         
                                         I mean, this is such a big outage that governments are getting involved that I really hope there's
                                         
                                         some restraint in what comes out of this. I do, because I can see a world where it does get more
                                         
                                         restrictive in the next few years because of this like a good
                                         
                                         example is like the tsa the you know horrible tragic event 9-11 but the tsa has been proven
                                         
                                         time and time again that it's security theater and we spend billions upon billions of dollars on it
                                         
                                         every single year and i think that's an example of like you know we overdid it we went too far reactionary
                                         
                                         i don't think a tsa should be gone entirely i think you know there is purpose to it but
                                         
    
                                         there are plenty of examples of things in the world that we just go too far for example
                                         
                                         moratoriums and code it's pretty often that you have a couple incidents in a row.
                                         
                                         And then what happens? Everyone says, don't deploy anymore. Stop deploying. And then you realize that you have a memory leak and your system dies anyways because you're not deploying
                                         
                                         and not restarting that process. And it dies anyways. So I just hope that we don't go
                                         
                                         too far with this, that we don't overreact to this massive outage.
                                         
                                         I want an appropriate reaction to it.
                                         
                                         Right.
                                         
                                         Just to add some layers to this
                                         
    
                                         and going back to something you said, Jared,
                                         
                                         and it's kind of a sidetrack,
                                         
                                         but I kind of get the information now.
                                         
                                         I texted my friend.
                                         
                                         So I had lunch with a friend of mine yesterday.
                                         
                                         I won't say where they work,
                                         
                                         but they work at a bank.
                                         
                                         And he said they were down for four hours, which I think is a short time frame compared to other scenarios we've heard of.
                                         
    
                                         I don't know if that was literally only exactly four hours or some coworkers were only down for four hours or the specifics.
                                         
                                         But let's just say at least a 10,000 plus organization when it comes to having laptops and distributed employees and branches and regional HQs and state HQs, whatever, and all these things.
                                         
                                         So at least a day, and those who did not have their laptop booted down and have to boot up were safe because there was no reboot required. But for those, Jared, you would love this
                                         
                                         because if you're a freaking multi-year streaker,
                                         
                                         what was the number of years for your laptop?
                                         
                                         I was listening back to our podcast recently
                                         
                                         and I can't remember which one.
                                         
                                         Yeah, my old, my very first MacBook Pro laptop.
                                         
    
                                         I didn't reboot it for over a year.
                                         
                                         I just was trying to see how long it could go.
                                         
                                         Oh, did you do like uptime and terminal?
                                         
                                         Yep, uptime.
                                         
                                         Well, I had the had the also i stat menus
                                         
                                         we'll show that to you which i've used for many years so it's very cool and i'll just close it
                                         
                                         and open it and i refuse to reboot it because i just wanted to see how long i called it a server
                                         
                                         right yeah and you'd have been safe so the people that you know had your your ambitions i suppose on
                                         
    
                                         on boot time were safe but for those who booted down and booted back up the next day,
                                         
                                         which is a large majority of the people, right, they had that issue.
                                         
                                         And they were told to reboot and see if it fixed it.
                                         
                                         Obviously it didn't.
                                         
                                         And that if that didn't work, they literally had to go to the localist
                                         
                                         IT center for them to have a person, like you had said, Jared, touch the machine,
                                         
                                         do something to it, and then it was, you know, good to go again, you know? But could you imagine,
                                         
                                         like, could you imagine the cost of that enumerated across all the scenarios across
                                         
    
                                         the entire globe that was affected by this. Was it 8.5 million Windows computers
                                         
                                         were actually affected in a single day?
                                         
                                         Where there was a larger deployment,
                                         
                                         but 8.5 million, I think, is the current number,
                                         
                                         if it's accurate.
                                         
                                         That's it?
                                         
                                         I think that was just one section of it, wasn't it?
                                         
                                         Well, I think that was the crash.
                                         
    
                                         Like, there was, like, that many Windows computers
                                         
                                         that crashed.
                                         
                                         I don't know if that's the only computers that were affected necessarily but those are the ones were like in the critical
                                         
                                         sphere of should be up but not up so yeah well you know and one of those servers was a
                                         
                                         sql server 2000 that right or is 500 other servers were Right. Yeah, the cascading failure is massive.
                                         
                                         I just feel like Nick Burns had his best day of work ever.
                                         
                                         Do you guys remember Nick Burns from Saturday Night Live?
                                         
                                         This is your company's...
                                         
    
                                         Your company's...
                                         
                                         Your company's computer guy.
                                         
                                         Your company's computer guy.
                                         
                                         Nick, the computer guy.
                                         
                                         He'll fix your computer.
                                         
                                         Then he's going to make fun of you.
                                         
                                         Because he's Nick Burns, the company's computer guy.
                                         
                                         Yeah, it's a Jimmy Fallon character.
                                         
    
                                         It's one of his better characters.
                                         
                                         Not a huge Fallon fan myself, but this was a good one
                                         
                                         where he was just the most obnoxious computer guy stereotype ever.
                                         
                                         And nobody wanted to go ask him for help
                                         
                                         because he was going to just denigrate them.
                                         
                                         And I think his catch line was like,
                                         
                                         move, move.
                                         
                                         Was that so hard?
                                         
    
                                         So I think Nick Burns had a great day.
                                         
                                         He gets to go around to everybody's computer and
                                         
                                         get out of the way, I'm going to reboot this thing.
                                         
                                         The heroes, honestly.
                                         
                                         I mean, the amount of patience
                                         
                                         that you would have to have
                                         
                                         on that day saturday sunday today yes you know oh gosh oh my gosh could you imagine this
                                         
                                         safe booting everything into safe mode and fix i just couldn't even and just to have a list of
                                         
    
                                         like hundreds of computers you have to do next you do next. You're like, all right, just one by one.
                                         
                                         Bam, bam.
                                         
                                         Oh my gosh.
                                         
                                         Yeah, that's true.
                                         
                                         It was a Friday event that happened over the weekend.
                                         
                                         I mean, not even just those affected by obviously the downtime and their travel and their plans or their work.
                                         
                                         It's now like, wow, IT has a big job to do i was just watching like the first few 30 seconds of when
                                         
                                         i'll link up the show notes nick burns your computer guy or your company's computer guy
                                         
    
                                         he's like something about a virus and he's not going to be able to reboot like he just almost
                                         
                                         described what happened you've got to go and fix it so i'll drop that in the notes but or maybe
                                         
                                         even the audio we'll see i mean it's this outage, this CrowdStrike outage,
                                         
                                         really hit every trope.
                                         
                                         It really did.
                                         
                                         Deployed on a Friday.
                                         
                                         Right.
                                         
                                         Global outage.
                                         
    
                                         Windows.
                                         
                                         Yeah, I mean, the whole...
                                         
                                         It brought in the operating system wars.
                                         
                                         It really hit so many checkboxes.
                                         
                                         Memory on safety, of course.
                                         
                                         There was a lot of C++ versus Rust conversations.
                                         
                                         Yeah, I saw a lot of flaming of C++.
                                         
                                         That's what kind of irks me, because I'm like,
                                         
    
                                         I don't know, the stuff that you probably tweeted this tweet from
                                         
                                         is probably running C++ in some way, shape, or form.
                                         
                                         Certainly somewhere in the stack, yes.
                                         
                                         I can even think of it.
                                         
                                         I think that TwitterX runs Envoy, which is written in c plus plus right i don't know stuff i was thinking about
                                         
                                         this actually from a an incident standpoint and uh robert you know a thing or two about
                                         
                                         instance right you know one or two things about them at least like yeah i think so would you
                                         
                                         think so i mean let, test me out here.
                                         
    
                                         Just checking. It's like, so specifically to my friend in the bank situation, their team had to
                                         
                                         raise an incident company-wide that wasn't even their fault. It wasn't like their IT department
                                         
                                         messed up. So can you describe what you hypothesize for how the incident in a well
                                         
                                         managed IT slash technology stack organization would and should react when it's not even their
                                         
                                         problem? Like it's their problem, obviously, but they didn't do it. And the fix is not clear
                                         
                                         because it's upstream. How do you think this percolated inside?
                                         
                                         What's your hypothesis?
                                         
                                         So it's a good question.
                                         
    
                                         I mean, for an incident like this, like you're saying, it's on the outside of your controlled
                                         
                                         world.
                                         
                                         It's challenging.
                                         
                                         So your job at that point for whatever these teams, the banks, the call centers, all these
                                         
                                         places that were down because of this outage,
                                         
                                         the first job is going to be containment and workarounds.
                                         
                                         You're going to try to find a workaround as fast as humanly possible.
                                         
                                         And those teams, what they're going to do is they're going to work
                                         
    
                                         within their controlled world.
                                         
                                         So an IT team at a bank probably is going to tell everyone
                                         
                                         at the bank impacted,
                                         
                                         own the communications like, it's not a bug that we're causing. Here's the news that I'm sure
                                         
                                         everyone probably knew at that point. Here's what you can do to try to fix it, right? Here's how you
                                         
                                         boot into safe mode. Here's how you do X. And the incident responders at that point, they're just
                                         
                                         going to be trying to create a perimeter where it doesn't get worse and they can do things a little
                                         
                                         bit better. A good example is like, if you think of a wildfire, there are firefighters that are
                                         
    
                                         fighting the fire, that's CrowdStrike. And then there are firefighters down or rather up the hill,
                                         
                                         chopping down brush, cutting down trees, like trying to stop it from going any further.
                                         
                                         That's kind of what those teams are going to go.
                                         
                                         That's the mode that they're going to go into.
                                         
                                         I can't say for sure, but like that's in the situations I've had a vendor outage.
                                         
                                         That's the first thing we do is we try to look for another route.
                                         
                                         This happened recently.
                                         
                                         I mean, we actually,
                                         
    
                                         our CDN provider, you know, incidents are natural, so I won't name them. It's not,
                                         
                                         not blaming them, but they had a incident like a week and a half ago, only impacted Newark,
                                         
                                         pretty small. And we can't control that. And we had to own that. And our, and we had an incident
                                         
                                         opened internally because all of the East Coast users
                                         
                                         were going through this point of presence
                                         
                                         and they were getting 502s.
                                         
                                         So what we did is we actually just rerouted traffic.
                                         
                                         We just took our CDN out of the loop
                                         
    
                                         and that's how we got around it.
                                         
                                         That was the only thing we could do.
                                         
                                         And I think teams are going to have to start thinking
                                         
                                         about these emergency routes more and more,
                                         
                                         especially because it's CrowdStrike outage,
                                         
                                         they're going to be like, what is our risk surface area? If we use this vendor and that vendor goes
                                         
                                         down, are we screwed? I think a lot of companies are going to start thinking that now, just because
                                         
                                         of this one outage, it's going to be pretty present in people's minds. And the management
                                         
    
                                         process is going to have to change. You're going to have to create like your go bag of incident management when it's out of your control. I remember doing these
                                         
                                         practices back when I was in school, which was a MIS degree with a CS minor I was going to school
                                         
                                         for, which is, you know, management information systems. I probably haven't said that phrase
                                         
                                         since I graduated, but I remember them doing these practice routines,
                                         
                                         business continuity planning.
                                         
                                         I'm starting to remember the acronyms as well.
                                         
                                         Disaster recovery.
                                         
                                         Like you would actually write down
                                         
    
                                         what are all the things that could possibly go wrong,
                                         
                                         which is a fool's errand, by the way.
                                         
                                         But you'd still try.
                                         
                                         You'd do your best, right?
                                         
                                         There's the predict the future part.
                                         
                                         You can get close.
                                         
                                         There's your predict the future part.
                                         
                                         And then you'd have to come up with a game plan
                                         
    
                                         for each of these situations like how are we
                                         
                                         going to mitigate the the impact how are we going to continue to run our business what are the
                                         
                                         workarounds what are the next steps etc and i did enjoy those processes except for the writing part
                                         
                                         of course because i was in school nobody wants to write. I thought it was very useful to think like,
                                         
                                         what are a list of things that are likely to happen?
                                         
                                         Do you remember any of them?
                                         
                                         A lot of them were, well, they're completely made up businesses of course.
                                         
                                         So it's all kind of just arbitrary because we didn't actually have any businesses.
                                         
    
                                         And so we were like, you're the CTO of X corp that does Y thing. And now what could
                                         
                                         happen? And so you had to kind of like make up, here's our technology stack, here's what we're
                                         
                                         doing. And then if X, then Y. And no, I don't remember any of those particular details, but
                                         
                                         I did recently visit a nuclear power plant here in Nebraska and the amount of things they've thought through and the amount of planning
                                         
                                         that they've done and building hedges, so to speak, around almost every possible thing
                                         
                                         that could go wrong at a power plant.
                                         
                                         It's actually, it's laudable.
                                         
                                         It's amazing how thorough these folks have gone through and prepared for umpteen potential things
                                         
    
                                         and it made me realize like oh in software we just kind of fly by the scene of our pants don't we
                                         
                                         you know of course they move way slower i mean that's the trade-off right like
                                         
                                         everything moves super slow at a nuclear power plant. It has to because the consequences of disaster are so large.
                                         
                                         And maybe the fairytale we've told ourselves,
                                         
                                         and maybe it's gotten less and less true over time,
                                         
                                         is the consequence of software disasters isn't that big.
                                         
                                         We even had the phrases for it.
                                         
                                         I don't think we were pretending at all.
                                         
    
                                         What was it? Move fast and break things.
                                         
                                         How many times was that said in Silicon Valley? Right. That got abused though. I mean, I think that at the time that began at
                                         
                                         Facebook, so that was a Facebook-born ideation. And I think it was a culture because they were
                                         
                                         in an innovation state. They were not in a, I mean, I guess they were becoming more and more widely deployed,
                                         
                                         but they were also a web service.
                                         
                                         So it wasn't like, well, it's installed and it's going to crash something.
                                         
                                         So I think there's scenarios.
                                         
                                         Now, obviously, it's a social network and there's a lot of people out there that are affected by,
                                         
    
                                         you know, abuse, harm, et cetera, that can happen in social media, which I fully agree to.
                                         
                                         That's like, that's just how it kind of just sucks.
                                         
                                         And so the move fast and break things want
                                         
                                         to occur to a lot of people is just like not a good thing obviously but to a technologist who's
                                         
                                         trying to innovate that's a very it's a very admirable thing like yeah let's move fast and
                                         
                                         break things because what happens is what the iteration cycle to learning happens faster
                                         
                                         right this this cycle you described with the sprinklers, well, it doesn't happen is the danger zone right in places like
                                         
                                         crowd strike should not deploy this idea of move fast and break things and maybe they did move fast
                                         
    
                                         and break things well it's interesting in that particular context because they are fighting
                                         
                                         adversaries who are also moving fast in order to break things. And so this goes back to the trade-offs that Robert was discussing.
                                         
                                         I mean, I can understand the ethos that said, we need a way to deploy to these machines
                                         
                                         outside of going through the entire process with Microsoft and the kernel stuff and the
                                         
                                         signing.
                                         
                                         We need a way to get our fixes out there before they attack all of our customers.
                                         
                                         That's what they're paying us for.
                                         
                                         And so I can see that trade-off of like, well, how can we do that? Well, let's develop a system where we're going to just side
                                         
    
                                         load some rules and we'll try to make it innocuous. And we'll have, or I'm sure there's CICD and
                                         
                                         there's test suites. I mean, this is a publicly traded company. I'm sure they have infrastructure
                                         
                                         around the code they're rolling out. I'm giving them too much credit. I don't think I am.
                                         
                                         I would be shocked if we learned that they didn't,
                                         
                                         like this code went out when one person wrote it
                                         
                                         and nobody else looked at it.
                                         
                                         And I doubt that's the case.
                                         
                                         The anxiety of that code review, Jared.
                                         
    
                                         Right.
                                         
                                         A little throwback.
                                         
                                         Yes.
                                         
                                         And so I can understand that push and pull.
                                         
                                         I mean, we have this even inside of like the app store
                                         
                                         where it takes forever in software terms
                                         
                                         to roll out an app update.
                                         
                                         But if you have your Logic Server side
                                         
    
                                         and you can push even web components into a view,
                                         
                                         you can actually update your app throughout the day.
                                         
                                         You can basically do what they're doing with CrowdStrike,
                                         
                                         with Falcon.
                                         
                                         Over-the-air updates are exactly what you're saying.
                                         
                                         Apple restricts them
                                         
                                         pretty heavily for their platform but i like what you're saying that crowd strike this is an
                                         
                                         advantage this is probably something they have bragged about in their sales cycle like you don't
                                         
    
                                         ever need to do an update of this agent it just will update itself this is how i understand how
                                         
                                         it works and when new vulnerabilities come out,
                                         
                                         we will cover you and protect you.
                                         
                                         That's a huge selling point.
                                         
                                         Why would you want to get rid of that?
                                         
                                         Come on, Adam.
                                         
                                         Why would you want to get rid of that?
                                         
                                         Don't take it away from us.
                                         
    
                                         No, and I agree with that.
                                         
                                         I think, I don't think,
                                         
                                         so the question comes back to,
                                         
                                         what can we do to learn from this?
                                         
                                         I've heard, I think, did you mention this in news, this? I've heard, I think, was it, did you mention
                                         
                                         this in news, Jared? I'm like,
                                         
                                         I've read and listened to several things.
                                         
                                         EBPF. And how this could
                                         
    
                                         be, this, the way the EBPF
                                         
                                         works, and I'm
                                         
                                         loosely, I mean, I'm steeped in it
                                         
                                         to some degree, but also very, like,
                                         
                                         beyond even novice. Like, I'm just like,
                                         
                                         no, I'm a green person when it comes to
                                         
                                         what EBPF is and how to describe it. But from what like, no. I'm a green person when it comes to what eBPF is
                                         
                                         and how to describe it.
                                         
    
                                         But from what I understand,
                                         
                                         this could be a different architecture
                                         
                                         that could prevent this.
                                         
                                         Well, what's interesting is that CrowdStrike
                                         
                                         is actually using eBPF in their Linux client,
                                         
                                         is what I read from Brendan Gregg's article about eBPF.
                                         
                                         And so they're very well aware of it.
                                         
                                         It's a way to do this that's safer. And it's in
                                         
    
                                         development inside of Microsoft to provide EBPF support for Windows.
                                         
                                         This was you then. Thank you. I love ChangeLog News, by the way. Hey, y'all listen to this.
                                         
                                         ChangeLog.com slash news. Subscribe today. If you're not, you're just missing out.
                                         
                                         You're missing out. So Brendan Gregg has this post, which was in Chainsaw News,
                                         
                                         called No More Blue Fridays,
                                         
                                         and it's his writing of why eBPF
                                         
                                         will be potentially another tool in our toolbox, right?
                                         
                                         In order to achieve what they're trying to achieve
                                         
    
                                         without some of the dangers
                                         
                                         latent in the current Windows-based rollout.
                                         
                                         However, the in-development version of eBPF
                                         
                                         will not have all the features it has in Linux.
                                         
                                         And so could CrowdStrike immediately use it
                                         
                                         in order to replace their current rollout?
                                         
                                         Survey says probably not.
                                         
                                         It has to be much more full-featured
                                         
    
                                         in order for that to be a thing they could start using
                                         
                                         as soon as it's shipped.
                                         
                                         But it's a direction.
                                         
                                         Well, what better way to get R&D budget
                                         
                                         to make that go faster than what just happened, right?
                                         
                                         Well, there you go.
                                         
                                         That was kind of Brendan Gregg's point at the end.
                                         
                                         And of course, I think he has a dog in the hunt.
                                         
    
                                         He's very much invested in the BPF,
                                         
                                         which is open source and all that,
                                         
                                         but there's businesses built around it.
                                         
                                         But he said like, hey, here's your great moment.
                                         
                                         If you are paying for computer security software
                                         
                                         and you are a paid customer of these entities,
                                         
                                         you could push them to make this eBPF path
                                         
                                         happen faster and better
                                         
    
                                         because you're their customer.
                                         
                                         So that was his call to action at the end of that post.
                                         
                                         And what would happen is that is at the kernel level
                                         
                                         do you know much about this to describe
                                         
                                         what would happen if this hypothesis
                                         
                                         or this hypothesized world existed
                                         
                                         this future development, how it would work
                                         
                                         to prevent this kernel from
                                         
    
                                         crashing the system or booting without it or
                                         
                                         being more safer?
                                         
                                         No. Okay.
                                         
                                         Well that's what I was thinking of.
                                         
                                         How can we
                                         
                                         I guess, and I'm not a Windows developer,
                                         
                                         so by all means, just slap me in the face after this one, but I'm just thinking
                                         
                                         you have a crash dump whenever the blue screen of death comes up.
                                         
    
                                         And the system knows probably what crashed it, at least if it's a driver
                                         
                                         in kernel mode, what's crashing it. Could you not just offer
                                         
                                         the user the option to boot SANS,
                                         
                                         that third-party, especially if it's third-party software, temporarily?
                                         
                                         Now, I get that this is cybersecurity software.
                                         
                                         What do you mean?
                                         
                                         Well, I'm just thinking if the kernel driver of CrowdStrike,
                                         
                                         a third-party, not a first-party, native operating system kernel driver,
                                         
    
                                         is crashing the system.
                                         
                                         So by moniker, it's a third party.
                                         
                                         Could you not say, well,
                                         
                                         this system knows that
                                         
                                         this third party driver is crashing the system.
                                         
                                         Do you want to boot without it?
                                         
                                         And maybe that's what safe mode does,
                                         
                                         but I mean, why couldn't that be a non-safe mode thing?
                                         
    
                                         I don't know.
                                         
                                         Because maybe those systems could have just been booted
                                         
                                         by everyday people.
                                         
                                         It's about UX and user friendliness.
                                         
                                         Now, I don't know if that's secure. Robert's shaking his head a little bit.
                                         
                                         Are you saying the system knows that the system
                                         
                                         is crashing?
                                         
                                         It's a layer on a layer.
                                         
    
                                         You're throwing another layer that doesn't currently exist
                                         
                                         in there? Is that what you're saying, Robert?
                                         
                                         I think. I mean, I'm not even
                                         
                                         going to try to pretend I know how
                                         
                                         these kernel
                                         
                                         I'm going to call it an add-on.
                                         
                                         That's how inexperienced I am with it.
                                         
                                         Like, plugins.
                                         
    
                                         I don't want to pretend to know.
                                         
                                         But I think that what Adam is saying,
                                         
                                         I think the challenge with that is
                                         
                                         just more complexity.
                                         
                                         And is the risk worth the reward?
                                         
                                         And can the system...
                                         
                                         Think about the amount of trial and error
                                         
                                         you would have to go through
                                         
    
                                         for that to work really well.
                                         
                                         And where does the operating system even store that knowledge
                                         
                                         that that plugin is borked?
                                         
                                         You're at the point of it booting.
                                         
                                         That's my point. It's crashing currently.
                                         
                                         You might not even have file system access yet.
                                         
                                         That's how early in the ones and zeros we are.
                                         
                                         So I think that's the challenge is you got to put it somewhere. So let's zoom back out one layer then. My thought is not literally
                                         
    
                                         how we deploy the fix. Literally, this is how we solve it. But from a user experience standpoint,
                                         
                                         the reason why the outage perpetuated to its length was because
                                         
                                         everyday people could not solve their own problem with the system. And I'm just suggesting,
                                         
                                         is there a path where you can provide everyday users of their computer some version of
                                         
                                         bypassing this crash. That's all.
                                         
                                         And I don't know that answer.
                                         
                                         I'm just hypothesizing that
                                         
                                         the reason why I perpetuated
                                         
    
                                         was because people who,
                                         
                                         like IT basically,
                                         
                                         people smarter than the end user
                                         
                                         from a technical level,
                                         
                                         in most cases standpoint,
                                         
                                         could not solve,
                                         
                                         they had to come in and be deployed to
                                         
                                         literally open up the laptop
                                         
    
                                         or could you imagine trucking in
                                         
                                         a workstation like not everybody uses laptops these days some people use workstations but like
                                         
                                         you had to take the thing into the people they had to plug a monitor into it and a keyboard into it
                                         
                                         and somebody else had to touch it i'm just thinking is there an other way where the end user could
                                         
                                         have done more of this in line too, rather than simply waiting.
                                         
                                         I don't think Nick Burns wants the end user to do it.
                                         
                                         No?
                                         
                                         Well, I remember the days of Windows
                                         
    
                                         where it was remote PCs
                                         
                                         and the only thing that that station was responsible for
                                         
                                         was basically connecting to something else
                                         
                                         that was doing the compute.
                                         
                                         Maybe that comes back, right?
                                         
                                         Maybe that's a world that...
                                         
                                         Client-side computing was thin clients.
                                         
                                         That was Citrix, and that's my roots, man.
                                         
    
                                         I grew up in IT in the early 2000s,
                                         
                                         worked at an IT company that deployed Citrix
                                         
                                         and VMware intensely.
                                         
                                         We had our own co-location system at a data center.
                                         
                                         You were talking about the power plant, Jared.
                                         
                                         Data centers are similarly, if not equally, thought through.
                                         
                                         Not equally.
                                         
                                         Not equally.
                                         
    
                                         Yeah, I'm going to say maybe not all the way.
                                         
                                         Nuclear power plants are so regulated.
                                         
                                         Well, that's why I said similarly, if not equally.
                                         
                                         There's a version of the thoughtfulness, let's just say.
                                         
                                         I'm going to say I hope they're not.
                                         
                                         I hope that nuclear power plants have more thought.
                                         
                                         Okay, I would give you that.
                                         
                                         I came out feeling much safer about nuclear power
                                         
    
                                         through this tour because of how stinking serious
                                         
                                         they are about safety.
                                         
                                         But anyways.
                                         
                                         Yeah.
                                         
                                         Well, just the point was that I agree, Robert.
                                         
                                         Maybe thin clients or remote.
                                         
                                         I mean, but.
                                         
                                         What's old is new again.
                                         
    
                                         Maybe, you know, I think.
                                         
                                         Well, you know what the web is?
                                         
                                         Jerry was talking about that.
                                         
                                         It's like a widely deployed operating system.
                                         
                                         Most of us are on web apps these days anyways. You know, the web is? Jerry was talking about that. It's like a widely deployed operating system. Most of us are on web apps these days
                                         
                                         anyways. You know, most
                                         
                                         of what we do is through the browser.
                                         
                                         Like right now, we're having this discussion
                                         
    
                                         through the browser.
                                         
                                         Video, audio, recorded
                                         
                                         locally, streamed back up.
                                         
                                         In most cases, doesn't fail.
                                         
                                         Really good
                                         
                                         software, but it's web software. We have to use
                                         
                                         a special browser,
                                         
                                         which is a whole different fight.
                                         
    
                                         Web software goes down.
                                         
                                         I'm just not sure exactly what we're solving with this moving the furniture around.
                                         
                                         So what I had in my head is,
                                         
                                         I saw a picture through all the news cycles
                                         
                                         of this CrowdStrike outage was,
                                         
                                         it was actually, it was a gate agent's computer.
                                         
                                         It was at the gate where you board the plane
                                         
                                         and it had the blue screen of death.
                                         
    
                                         And in that situation does that computer need a crowd strike colonel agent running on it maybe it
                                         
                                         does maybe it doesn't i don't know but i think where i'm going with this is does that computer
                                         
                                         just need a screen a mouse and a keyboard that's hooked up to something else down the hallway, you know, that's one station that's powering 20 gates and it's much
                                         
                                         easier. It's smaller surface area. You know, I think we're getting to that point. Like networks
                                         
                                         are getting fast enough to do that type of thing. Maybe it's too far. I'm not sure, honestly. I mean,
                                         
                                         some companies have tried to do this with like gaming, example i don't remember if you know it all
                                         
                                         failed so far it failed so fast yeah but maybe that was too far right like that's hard to do
                                         
                                         that's like you need super low latency video feeds right and it was google it was google trying to do
                                         
    
                                         it it wasn't some fly by night i mean they have the resources if anybody could accomplish it you'd
                                         
                                         think google yeah and microsoft xbox is
                                         
                                         trying to do it too i forget the name yeah yeah true but maybe it's like that type of world right
                                         
                                         where it's just a keyboard a mouse and a screen and it's hooked up somewhere else maybe that's
                                         
                                         where we go to you reduce the surface area therefore you reduce the amount of potential
                                         
                                         outage i think in this case that hypothesis has merit only because we know
                                         
                                         what we know. It's not because we know what we knew or know what we know
                                         
                                         prior to, and that's the plan. Because I think even in that scenario, you
                                         
    
                                         have now a single machine dependency
                                         
                                         of many dependencies. And now it's like, well, when that one machine is down,
                                         
                                         it's not just one person.
                                         
                                         The outage affects many because of the design of, you know, dependency.
                                         
                                         I am pro thin client, though.
                                         
                                         I'm pro what Citrix did back in those days.
                                         
                                         It was a very cool thing.
                                         
                                         I mean.
                                         
    
                                         I hated it.
                                         
                                         Well, so for certain workers, for certain tasks, it was perfect.
                                         
                                         I hated it too, Jared, because I...
                                         
                                         Why were you for it then?
                                         
                                         Well, in my scenario, I was for it for everybody else though.
                                         
                                         Oh, for everybody else.
                                         
                                         Oh, yeah.
                                         
                                         Oh, I'm for it for everybody else, yeah.
                                         
    
                                         Yeah, I think it's cool tech.
                                         
                                         The ergonomics of it were terrible.
                                         
                                         Yes.
                                         
                                         Yeah.
                                         
                                         I agree, the tech was cool.
                                         
                                         And for certain scenarios, I helped out.
                                         
                                         I ran network administration for a company that did commodity training. And so they had machines in silos, you know, grain silos. And those places are dirty, nasty, corn, chaff, etc. Like, it's not the place where you're going to have a server farm. Or you wouldn't even want a PC because eventually that tower is going to get all kinds of
                                         
                                         stuff into it's going to break down and so in those cases like the thinnest client possible
                                         
    
                                         with a Citrix connection was the answer made tons of sense yeah but in many other use cases you got
                                         
                                         your employees sitting in their office and they're Citrixing into somewhere else you know to run
                                         
                                         with this latency and it was slow and they didn't have access to local resources.
                                         
                                         Okay.
                                         
                                         In those contexts,
                                         
                                         I was like,
                                         
                                         this is ridiculous.
                                         
                                         I have a beefy computer
                                         
    
                                         sitting here.
                                         
                                         It's connected
                                         
                                         to a remote machine.
                                         
                                         The grain silo
                                         
                                         didn't have a good
                                         
                                         internet connection.
                                         
                                         Well, that was another problem.
                                         
                                         We had to create,
                                         
    
                                         a lot of times
                                         
                                         we had to create
                                         
                                         internet connection
                                         
                                         for them
                                         
                                         in order for them
                                         
                                         to actually connect back
                                         
                                         to Citrix.
                                         
                                         And so that was,
                                         
    
                                         I mean, it was,
                                         
                                         you're trying to do remote computing in a grain silo.
                                         
                                         It's not going to be easy no matter how you do it.
                                         
                                         Right.
                                         
                                         What's up, friends?
                                         
                                         I'm here with Firas Abugadije, founder and CEO of Socket.
                                         
                                         Socket helps to protect the best engineering teams out there with their developer first
                                         
                                         security platform.
                                         
    
                                         And so Firas, speaking of developer first, Socket is developer first.
                                         
                                         What does that mean?
                                         
                                         What do you mean by being developer first?
                                         
                                         Most security software is typically sold to executives.
                                         
                                         So it tends to suck to actually use it.
                                         
                                         So the company, the vendor goes in and makes a sale.
                                         
                                         The executive thinks it looks good, but they don't actually care at all what the developer
                                         
                                         experiences of the tool.
                                         
    
                                         So I think that's where I would start.
                                         
                                         The first problem with security tools is they're sold to executives.
                                         
                                         In the best case, those tools get purchased and they just sit around on the shelf bothering nobody and protecting nobody. But in the worst case, they get rolled out and they prevent developers from getting things done. And they just get all up in your face with alerts and pointless noise that isn't actionable. And if you actually go and fix those alerts, you're not even improving security because a lot of the time those vulnerabilities are super low impact. That's like the dirty secret of vulnerabilities is most of them are low impact. They're either in
                                         
                                         dev dependencies, so they're never going to run in production or they're really difficult to
                                         
                                         exploit. Or if you exploit them, there's nothing really there. It's like a, you know, a denial of
                                         
                                         service in some random component. And in reality, like that's just such a low risk in terms of just
                                         
                                         your priorities of things you need to work on as a developer. I would actually say probably 90 or 95 percent of the vulnerability alerts that developers are used to seeing from other tools are just completely pointless.
                                         
                                         They're just fake work.
                                         
    
                                         And fixing them doesn't even meaningfully improve security at all.
                                         
                                         There you have it.
                                         
                                         Protect yourself, your team, and your software from the threats that really matter.
                                         
                                         Don't do fake work.
                                         
                                         Use Socket.
                                         
                                         Socket.dev. Book a demo. Install the GitHub app. Install the So that really matter. Don't do fake work. Use Socket. Socket.dev.
                                         
                                         Book a demo.
                                         
                                         Install the GitHub app.
                                         
    
                                         Install the Socket CLI.
                                         
                                         Whatever it takes to take the next step, do it.
                                         
                                         Go to Socket.dev.
                                         
                                         Again, Socket.dev.
                                         
                                         Well, Intel Innovation 2024 Accelerate the Future is right around the corner.
                                         
                                         It takes place September 24th and 25th in San Jose, California.
                                         
                                         This event is all about you, the developer, the community,
                                         
                                         and the critical role you play in tackling the toughest challenges across the industry.
                                         
    
                                         Ignite your passion for AI and beyond, grow your skills to maximize your impact,
                                         
                                         and network with your peers as they unleash the next wave of advancements in technology.
                                         
                                         Understand the emerging innovation and trends in dev tools, languages, frameworks, and technologies in AI and beyond.
                                         
                                         Join on-site hands-on labs, workshops, meetups, and hackathons to collaborate and solve real problems in real time. Collab with experts, learn and have fun, engage in interactive sessions, connect, grow your
                                         
                                         network, gain a unique idea and perspective, and build lasting networks.
                                         
                                         And of course, have fun.
                                         
                                         You'll hear from leading experts in the industry, technologists, startup entrepreneurs, and
                                         
                                         fellow developers, along with Intel leadership CEO Pat Gelsinger and CTO Greg Lavender as they take you through the latest advancements in technology.
                                         
    
                                         Don't miss out on the chance to be at the forefront of innovation.
                                         
                                         Take advantage of their early bird pricing from now until August 2nd.
                                         
                                         Register using the link in the show notes or to learn more.
                                         
                                         Go to Intel dotcom slash innovation.
                                         
                                         When you're at scale, like CrowdStrike was,
                                         
                                         and you deploy bad code, regardless of which theory you go with,
                                         
                                         bad code, done on purpose, rogue whatever.
                                         
                                         I mean, there's people saying like this was planned.
                                         
    
                                         I haven't read any of that stuff, but I'm sure it's out there.
                                         
                                         Well, you know, anytime something like this happens at a scale like this,
                                         
                                         you got to wonder, like we live in a simulation lately.
                                         
                                         Like there is strange things happening every single day
                                         
                                         that has been basically unprecedented every single day.
                                         
                                         So like the new precedent is unprecedented, you know?
                                         
                                         Right.
                                         
                                         And I just, I don't want to hypothesize here
                                         
    
                                         because that's not what we're trying to do
                                         
                                         or not what I'm trying to do.
                                         
                                         But when you're at scale like this,
                                         
                                         it's obviously an attack surface of some sort,
                                         
                                         whether it's bad code, an incident,
                                         
                                         or just simply, you know, a bad day, a bad Friday, a bad weekend.
                                         
                                         And how can we give CrowdStrike the ability to do what they want to do and have the sales
                                         
                                         pitch they want to have without having the opportunity for outage like this?
                                         
    
                                         And then all the others, they're going to fall on their footsteps.
                                         
                                         Who else?
                                         
                                         Well, the software will be at scale
                                         
                                         and be a tax surface, whether it's
                                         
                                         bad code, planned,
                                         
                                         intended, rogue, whatever.
                                         
                                         They're all similar scenarios,
                                         
                                         just a matter of how the incident
                                         
    
                                         percolates.
                                         
                                         I mean, there's just
                                         
                                         the surface area of which software
                                         
                                         can be impacted
                                         
                                         now, either just through sheer outage or security is staggering
                                         
                                         i mean there was i don't know maybe a month and a half ago two months ago there was it was it was
                                         
                                         newsworthy enough for the new york times i saw the word postgres on the front page of new york times
                                         
                                         i was like what is this and you go and read it and there it all boils
                                         
    
                                         down to there was a state actor that gained the trust of the core team for postgres and they
                                         
                                         started submitting patches that were fixed real things and then they submitted something that was
                                         
                                         very subtle that was caught on accident by another
                                         
                                         engineer years later and they eventually figured it out they were like holy crap this person just
                                         
                                         gained our trust by submitting real stuff and then snuck something in and how do you defend that
                                         
                                         you just you just you just can't i don't think you can and that sounds a lot like the xz thing is this
                                         
                                         in addition to that i think that's what i'm talking about yeah i can't i couldn't remember the
                                         
                                         the exact name of it but yeah so i don't remember the postgres part but certainly
                                         
    
                                         this xz backdoor was placed by a state actor i think it was someone working on postgres is like
                                         
                                         and then they got like down to that level babe That's how I misremembered it.
                                         
                                         Fair enough. Well, XZ is a dependency
                                         
                                         of many software packages
                                         
                                         and was close to being
                                         
                                         actually distributed via
                                         
                                         Apt and other package
                                         
                                         registries prior to
                                         
    
                                         it getting found out on accident
                                         
                                         by a developer. So yeah,
                                         
                                         crazy times for sure.
                                         
                                         Definitely not tinfoil hat, Adam,
                                         
                                         to say, you know, was this,
                                         
                                         to ask the question of,
                                         
                                         was this mere incompetence
                                         
                                         or was this actually an attack?
                                         
    
                                         Because attacks happen
                                         
                                         and they are happening
                                         
                                         and they will continue to.
                                         
                                         And so those questions do have to be asked.
                                         
                                         I think in this particular case,
                                         
                                         I jumped immediately to incompetence,
                                         
                                         you know, Occam's razor style,
                                         
                                         because I know how complex software systems are to roll out updates.
                                         
    
                                         You know, I was like, oh gosh, somebody had a really bad day,
                                         
                                         but that could be a wrong conclusion to jump to.
                                         
                                         Well, I think in the case that you're talking about, Robert,
                                         
                                         with Postgres, if this is accurate, is code analysis.
                                         
                                         You have to analyze, especially in open source,
                                         
                                         but when it's closed source like CrowdStrike and a definition update,
                                         
                                         all you can do is rely upon that team, that company,
                                         
                                         to be mature enough to have protections in place.
                                         
    
                                         When it's proprietary closed source,
                                         
                                         there's nothing you can do from a scale point to analyze the code.
                                         
                                         From a different route with open source,
                                         
                                         you could do a lot of things.
                                         
                                         You could pay attention to where the patches are coming from.
                                         
                                         You know, I guess in this case here,
                                         
                                         if the patch was, you know,
                                         
                                         hey, Robert, here's the patch.
                                         
    
                                         I'm Adam.
                                         
                                         Let's just say it's you as the core committer
                                         
                                         and I'm the friend who's trying to be friendly.
                                         
                                         I've solved this problem.
                                         
                                         Here you go, Robert, and you just take my code
                                         
                                         and maybe you actually deploy it to Postgres.
                                         
                                         So it's coming in signed. Maybe that's an example where you really can't analyze very well.
                                         
                                         But if you had to say, Robert is signing this commit, but it's being the location or the source of the commit is from an outside source helping out because it's open source,
                                         
    
                                         then you could at least have a waypoint to begin to track
                                         
                                         if you're doing code analysis.
                                         
                                         I think that's the area where I'm really confident
                                         
                                         and looking forward to more and more being done.
                                         
                                         Because when you can analyze the Git repository
                                         
                                         and the graph of things happening in a code base,
                                         
                                         there's a lot you can pull
                                         
                                         out when it's like, okay, that's a smell.
                                         
    
                                         You got a brand new committer.
                                         
                                         You got somebody being nurtured or whatever you want to call it to kind of get their trust
                                         
                                         over multiple years even.
                                         
                                         There's layers of anomaly that can be identified because of the way open source works
                                         
                                         if you do specific code analysis.
                                         
                                         So that's where I'm hopeful.
                                         
                                         I'm hopeful that we can keep
                                         
                                         open source going the way it is for
                                         
    
                                         longer. I do think that some of
                                         
                                         these risks that are coming up with
                                         
                                         state actors infiltrating through
                                         
                                         years of building trust and
                                         
                                         accidental attack
                                         
                                         vectors coming through like over time i
                                         
                                         think that people are going to start to get skeptical yeah and that's going to be a tough
                                         
                                         moment we're going to have to kind of the start thinking about that i'm starting to hear more and
                                         
    
                                         more about people like don't want to use third-party libraries for common things just because of the risk. For example, attacking a JavaScript MDM package
                                         
                                         that's widely used.
                                         
                                         That does a pretty simple thing.
                                         
                                         Candidly, it's less risky to just do it yourself sometimes.
                                         
                                         And that's a calculus that companies
                                         
                                         are going to have to start thinking about.
                                         
                                         Yes.
                                         
                                         I mean, I think every developer should make that calculation every
                                         
    
                                         time they're going to pull in dependency and i'm not saying don't pull the dependency in but i think
                                         
                                         you do have to think through that i think we're learning that and hopefully our collective immune
                                         
                                         system will react i do think that these state actors being outed every once in a while at least
                                         
                                         will boost our immune system as open source
                                         
                                         maintainers to be like let's kind of be a little more leery of the contributors who are coming
                                         
                                         around and like just you know that whole kumbaya open open open we're all friends worldwide thing
                                         
                                         that was going on when open source began is like it's gone it's just not the same world anymore and so maybe we just won't be
                                         
                                         fooled next time hopefully
                                         
    
                                         by somebody who's trying to butter us up
                                         
                                         in order to take advantage of us
                                         
                                         do you think there's a way to
                                         
                                         label software at scale
                                         
                                         like an XE
                                         
                                         if you're a
                                         
                                         contributor to XE do you know how
                                         
                                         much is deployed and you understand how crucial your core role is to that software?
                                         
    
                                         Yes and no.
                                         
                                         Yes and no, right?
                                         
                                         Probably hard to feel the actual gravity of it.
                                         
                                         Right.
                                         
                                         Right.
                                         
                                         I'm just wondering, is there a way to, and I'm literally asking the question without having put any thought into it.
                                         
                                         So if it's naive, you know, slight me around if you have to.
                                         
                                         As we do.
                                         
    
                                         Yeah. I'm just wondering, is there a way to elevate certain software
                                         
                                         without maybe even by analysis
                                         
                                         to understand its deployment
                                         
                                         or its dependency levelness, I suppose?
                                         
                                         Its scale.
                                         
                                         Like I'm sure CrowdStrike knew
                                         
                                         how at scale they were.
                                         
                                         This was not unknown to them so this is
                                         
    
                                         not an example but xz and the folks behind that who are being you know groomed for lack of better
                                         
                                         terms over a year or more a very long patient amount of time do they understand how crucial
                                         
                                         the software is that they're in control of so that they can have that position you just said, I'm just thinking, is there a way to
                                         
                                         label something, hey, you're a scaled software,
                                         
                                         you're widely deployed,
                                         
                                         and there's some way to elevate them
                                         
                                         to a different level, at least by label,
                                         
                                         so that there's an awareness
                                         
    
                                         that if there's a
                                         
                                         malicious attack on that code base,
                                         
                                         it has effects.
                                         
                                         I feel like GitHub could own that.
                                         
                                         Honestly. They know how many could own that. Honestly.
                                         
                                         They know how many times a repository
                                         
                                         is committed. They know how many times
                                         
                                         it's even looked at, just page views
                                         
    
                                         in general. They know the number
                                         
                                         of stars on it.
                                         
                                         And maybe it's not GitHub.
                                         
                                         Maybe it's some other program. Maybe it's government
                                         
                                         sponsored. That goes
                                         
                                         to these maintainers
                                         
                                         and says, just FYI, you're on our list.
                                         
                                         You just made the list.
                                         
    
                                         And in a way, it's like, congratulations, you've built such valuable software.
                                         
                                         It's now a national security threat.
                                         
                                         But I hear what you're saying.
                                         
                                         I think it's hard.
                                         
                                         I think it's hard because it takes the steam out of it.
                                         
                                         It takes the altruism out of it sometimes too
                                         
                                         for some people that just want to do a good thing.
                                         
                                         When the barrier is high, then people won't do it.
                                         
    
                                         And I think that's challenging.
                                         
                                         I think the maintainers of scaled software know.
                                         
                                         I think that they're just wildly under-resourced
                                         
                                         and exhausted and can't possibly
                                         
                                         sometimes care enough anymore because they've cared so much for so long, for so little.
                                         
                                         So I think for the rest of us, I did not know how big XZ was in terms of its dependency
                                         
                                         graph, the other way around, how many dependency graphs it was in, which was many, but I'm sure that the author of XC has an idea.
                                         
                                         Like, that's why I said yes and no.
                                         
    
                                         He may not know exactly how big his software is, but at a certain point when your package
                                         
                                         is deployed across all these distributions and stuff, yeah, you understand that like,
                                         
                                         wow, this thing is really reaching lots of places.
                                         
                                         And so I think there's some of that gravity there.
                                         
                                         But for the rest of us, that might be useful
                                         
                                         to have that list of softwares
                                         
                                         that are considered national security importance
                                         
                                         or whatever it is.
                                         
    
                                         They aren't the threat, but they are of potential threat
                                         
                                         because of their
                                         
                                         situation i think one one example of a of a developer who just built an open source something
                                         
                                         and took it down not realizing the true scale of this thing was left pad oh yeah 2016 that one was
                                         
                                         wild that was so many packages couldn't be installed and deploys like stop for hours
                                         
                                         because of that and it was just some i forget the exact context but i think it was like some
                                         
                                         dispute and out of he was like i'm gonna take down the package you're using wasn't a political
                                         
                                         yeah i don't remember exactly i don't think I don't think LeftPad was political. LeftPad was a long time ago.
                                         
    
                                         It was a political one.
                                         
                                         You just deleted it off of NPM package registry and then chaos ensued.
                                         
                                         I think LeftPad might have been the one
                                         
                                         where they had another package called Kik or SideKik
                                         
                                         and another company, a company, not another company.
                                         
                                         This might not be LeftPad either.
                                         
                                         But this definitely happened.
                                         
                                         There's a company, a startup called Kik, K-I- either, but this definitely happened. There's a company, a startup
                                         
    
                                         called Kik, K-I-K, I believe.
                                         
                                         And there's a package called Kik, I think
                                         
                                         owned by the LeftPad owner,
                                         
                                         if it's coming back to me.
                                         
                                         And the Kik company contacted
                                         
                                         NPM and wanted the name,
                                         
                                         but didn't have the package name.
                                         
                                         And I think NPM granted them
                                         
    
                                         access to the Kik package name,
                                         
                                         basically kicking it off the LeftPad owner. And then they got mad and just pulled LeftPad. All their stuff. I think theyPM granted them access to the Kik package name, basically kicking it off the LeftPad owner.
                                         
                                         And then they got mad and just pulled LeftPad
                                         
                                         and all their stuff.
                                         
                                         I think they pulled all their stuff.
                                         
                                         I'm pretty sure that's LeftPad.
                                         
                                         That may be a different one
                                         
                                         because there's been so many at this point,
                                         
    
                                         but that definitely happened.
                                         
                                         I have the, there's a Wikipedia page for it.
                                         
                                         Is there?
                                         
                                         NPM LeftPad incident that I just found.
                                         
                                         And yeah, you're right on the money
                                         
                                         with what you just said.
                                         
                                         But you know what's kind of crazy about that?
                                         
                                         And it kind of goes back to what I was saying
                                         
    
                                         about own your software a little bit more.
                                         
                                         LeftPad was not a thing
                                         
                                         that needed to go out over a network
                                         
                                         and download a package and pull it down.
                                         
                                         Any engineer should be able to write
                                         
                                         what LeftPad did.
                                         
                                         Absolutely.
                                         
                                         Or copy-paste the function.
                                         
    
                                         It was like a...
                                         
                                         Or that, yeah. Because I mean, you can use somebody else's code Absolutely. Or copy paste the function. It was like a... Or that, yeah.
                                         
                                         Because I mean,
                                         
                                         you can use somebody else's code
                                         
                                         with a little copy paste
                                         
                                         and remove that dependency.
                                         
                                         And because,
                                         
                                         not because you can't trust the author,
                                         
    
                                         but because we cannot trust the network.
                                         
                                         Right?
                                         
                                         That's the problem with NPM.
                                         
                                         We can trust the authors in most cases,
                                         
                                         but we cannot trust the network
                                         
                                         into the future.
                                         
                                         You can maybe trust it today,
                                         
                                         but you cannot trust the network tomorrow. And so You can maybe trust it today, but you cannot trust the network
                                         
    
                                         tomorrow. And so, copy
                                         
                                         paste that sucker. Vendor it. I mean, that's
                                         
                                         what we used to call it in the real world, vendor it.
                                         
                                         Which is to pull it into your repo,
                                         
                                         check it in, and leave it there.
                                         
                                         I remember doing that.
                                         
                                         Did you see that one? It was a couple weeks ago that a domain
                                         
                                         expired that was hosting
                                         
    
                                         a JavaScript package.
                                         
                                         Polyfill.
                                         
                                         Someone else bought the domain, put something
                                         
                                         not good there.
                                         
                                         Same domain
                                         
                                         path
                                         
                                         and all these websites that were resolving
                                         
                                         that domain to the new source
                                         
    
                                         were impacted. It was like 100,000 websites.
                                         
                                         You can't trust the network.
                                         
                                         Yeah, so that's a good way.
                                         
                                         You can't trust the network.
                                         
                                         Especially over time. Because that's a good way. You can't trust the network. I think it's a good way.
                                         
                                         Especially over time.
                                         
                                         Yeah.
                                         
                                         Because that's what we think of today,
                                         
    
                                         but over time the network changes.
                                         
                                         In ways that we wouldn't expect.
                                         
                                         Like nobody expected polyfill.io to change ownership.
                                         
                                         Yeah.
                                         
                                         Or CDN, whatever the CDN that was hosting polyfill.io.
                                         
                                         Right.
                                         
                                         We put some stuff through proxy, basically.
                                         
                                         And that kind of does it.
                                         
    
                                         You proxy yourselves and let kind of does proxy yourselves and
                                         
                                         the gems and some stuff
                                         
                                         and that way it's kind of a
                                         
                                         if it's there we trust it
                                         
                                         kind of thing right you know if you try to pull
                                         
                                         something else in a bundle
                                         
                                         install yarn install whatever it is
                                         
                                         go get it goes through
                                         
    
                                         there and if it's not there then it kind of
                                         
                                         triggers a well why are you trying to get something that
                                         
                                         isn't in this you know it's not blessed yet it's a proxy that you guys run uh yes is this like
                                         
                                         a like artifactory kind of thing where you pull yeah some other i forget the exact tech if i'm
                                         
                                         honest but yeah but but similar to the j factory or j frog artifactory yeah yeah it's a great idea
                                         
                                         just get yourself layers in between you and the unknown.
                                         
                                         I mean, that's
                                         
                                         one of the wise practices
                                         
    
                                         for sure.
                                         
                                         Well, that's like the,
                                         
                                         I guess,
                                         
                                         rich man's version
                                         
                                         or rich person's version
                                         
                                         of vendoring.
                                         
                                         It's like the same idea
                                         
                                         except for it's
                                         
    
                                         Yeah, you vendor it
                                         
                                         to a server.
                                         
                                         It's vendoring.
                                         
                                         I mean, this has been
                                         
                                         the tale as old as time,
                                         
                                         basically.
                                         
                                         Ruby had it first.
                                         
                                         Well, like I said,
                                         
    
                                         what's old is new again.
                                         
                                         Yeah, exactly. We're going to go back to all these ideas in some way, shape, or form, I think. We're going back to time basically ruby had it first well like i said what's old is new again yeah exactly we're gonna
                                         
                                         go back to all these ideas in some way shape or form i think we're going back to thin clients
                                         
                                         apparently so i mean i think even that too you have to have an incident like this to have a
                                         
                                         discussion like this that says these older ideas that were probably pretty good you know maybe at
                                         
                                         the time it was like less modern to do it now it's more modern so maybe there's but i suppose to your
                                         
                                         point jerry with your meme like i deployed software today Now it's more modern. So maybe there's, but I suppose to your point, Jer, with your meme,
                                         
                                         like I deployed software today,
                                         
    
                                         so it's modern, right?
                                         
                                         Like when you have a meme
                                         
                                         out there somewhere,
                                         
                                         there's like.
                                         
                                         Oh yeah.
                                         
                                         Just mostly a gripe.
                                         
                                         Like people always advertise
                                         
                                         their software as modern,
                                         
    
                                         which just literally means
                                         
                                         that it's just a newer thing.
                                         
                                         You know, like it's not a feature.
                                         
                                         It's just that you started
                                         
                                         coding it six months ago.
                                         
                                         Right.
                                         
                                         You know.
                                         
                                         At some point,
                                         
    
                                         someone's going to start bragging
                                         
                                         about how much their software hasn't changed.
                                         
                                         Yeah, I think vintage software should make a move.
                                         
                                         This is classic, this is vintage.
                                         
                                         When I was a young gun engineer
                                         
                                         and I heard about these banks using cobalt still
                                         
                                         and I was like, ah, what losers.
                                         
                                         And now I'm like, hey, whatever.
                                         
    
                                         If it works, I can look at my balances
                                         
                                         and I've never had an issue and I can always charge my card.
                                         
                                         You do you.
                                         
                                         Maybe calcified software
                                         
                                         has a purpose in the world
                                         
                                         where it just gets rarely touched
                                         
                                         and we're just happy about that.
                                         
                                         I'm leaning that way more and more.
                                         
    
                                         Do we need to keep changing the software? I don't know.
                                         
                                         That's not really good for your business
                                         
                                         though, Robert.
                                         
                                         I mean, if you advocate for that.
                                         
                                         Robert's out there.
                                         
                                         More incidents.
                                         
                                         We need more incidents.
                                         
                                         My investors, my board hears that.
                                         
    
                                         They'll be like, what are you doing?
                                         
                                         What are you saying, Robert?
                                         
                                         Stop right now.
                                         
                                         Well, I think even if you have unchanged software,
                                         
                                         there's still bound to be incidents of some sort.
                                         
                                         I mean, there's still going to be...
                                         
                                         No one's going to listen to you, Robert.
                                         
                                         No one's going to do that, right?
                                         
    
                                         I recommend that.
                                         
                                         Yeah.
                                         
                                         Well, this has been fun digging into the details, I think.
                                         
                                         You know, it's fun to speculate out.
                                         
                                         You know, I do want to, again, mention I love Dave Plummer and his channel on YouTube.
                                         
                                         He's a great resource.
                                         
                                         I always appreciate what he shares.
                                         
                                         I probably listened to his video twice, just making sure I kind of understood some of the mechanics behind it because I really want to understand like what to what degree does this software actually operate on Windows. you know, how this incident propagated. You know, we don't know if it was really bad code
                                         
    
                                         or if it was sabotage or if it was some sort of plan.
                                         
                                         That's all speculation that we're not trying to really go through here.
                                         
                                         But sort of like, hey, if you're out there and you've been affected by this
                                         
                                         or you're just curious, you know, go out there and do your own investigations.
                                         
                                         Pay attention to what's happening out there.
                                         
                                         And I guess we can look forward to George Kurtz, the CEO,
                                         
                                         current CEO of CrowdStrike, who was there
                                         
                                         at the helm during this incident
                                         
    
                                         to stand before Congress
                                         
                                         and explain exactly
                                         
                                         what happened. And maybe then we'll know.
                                         
                                         Talking about security theater. Right. Until then,
                                         
                                         all we can do is speculate what may have happened.
                                         
                                         We can, you know,
                                         
                                         use the, they're not called dumps. What are they
                                         
                                         called? Are they called dumps whenever it's a
                                         
    
                                         kernel panic? Well, you dump the stack.
                                         
                                         Yeah.
                                         
                                         It's not a stack trace,
                                         
                                         because that's like an application kind of thing.
                                         
                                         Kernel panic.
                                         
                                         Yeah, exactly.
                                         
                                         You can examine that.
                                         
                                         And there's lots of folks,
                                         
    
                                         there was a famous tweet out there
                                         
                                         that made the rounds explaining that,
                                         
                                         you know, this one file was updated,
                                         
                                         and while it should have had the needed definition in there,
                                         
                                         instead it just contained zeros because of a null pointer.
                                         
                                         There's all these things like why this actually happened.
                                         
                                         But I think in the end, we can just say at scale, software can have massive effects.
                                         
                                         And we got to do something about that.
                                         
    
                                         It's a good thing to have scale software, but at the same time, we have to do updates responsibly. Or in this scenario where you have a kernel-level driver,
                                         
                                         how do you do what CrowdStrike wants to do with Falcon
                                         
                                         but not bypass the security systems?
                                         
                                         That's the real question here, specifically for this incident.
                                         
                                         I think for others, it's just love your maintainers if it's open source.
                                         
                                         If it's not open source, drag them through Congress and make them explain it.
                                         
                                         You know, and slap them around a little bit.
                                         
                                         You know, otherwise, just do what you can to stay safe.
                                         
    
                                         You know, scrutinize your dependencies, your third parties, etc.
                                         
                                         And that's about it for me.
                                         
                                         And run Linux on your desktop.
                                         
                                         I mean, that's the way.
                                         
                                         This is the way.
                                         
                                         Write Rust, run Linux, and you'll be good to go. And then let on your desktop. I mean, that's the way. This is the way. Write Rust, run Linux,
                                         
                                         and you'll be good to go.
                                         
                                         And then let all of us know about it.
                                         
    
                                         Once they figure out their audio drivers to come on
                                         
                                         this show, it'll be great to hear their experience.
                                         
                                         Well, every time we have
                                         
                                         a Linux user, we're always happy,
                                         
                                         obviously, and then sad.
                                         
                                         Because we expect to have
                                         
                                         some version of issue because
                                         
                                         of drivers.
                                         
    
                                         It's almost unanimous.
                                         
                                         Almost unanimous.
                                         
                                         Well, thanks so much for having me.
                                         
                                         This was a blast.
                                         
                                         I think it was a fun topic to talk about and super interesting.
                                         
                                         For sure.
                                         
                                         Thanks for joining us.
                                         
                                         Yeah, Robert.
                                         
    
                                         It's been fun.
                                         
                                         Bye, friends.
                                         
                                         Bye, Robert.
                                         
                                         Well, friends, here we are again at the end of a busy and interesting week in the software world,
                                         
                                         which more and more is the whole world.
                                         
                                         Do you have thoughts?
                                         
                                         Do you have opinions?
                                         
                                         I know you do.
                                         
    
                                         We would love to hear them.
                                         
                                         Sound off in the comments.
                                         
                                         Link in the show notes.
                                         
                                         Oh, and stick around, ChangeLog++ members.
                                         
                                         This is yet another extended
                                         
                                         episode. We love doing these for our most loyal supporters. Oh, and by the way, if you are a
                                         
                                         Changelog++ member, maybe sign in to changelog.com using your plus plus email address and see if you
                                         
                                         see anything new on your homepage. I won't say more than that for now, but we'll talk details soon enough,
                                         
    
                                         probably on the next Kaizen. Okay, quick thanks again to our partners at Fly.io,
                                         
                                         to Breakmaster Cylinder, to Sentry, UseCodeChangelog, and to you, of course, for listening along.
                                         
                                         Seriously, we appreciate it. Next week on the Changelog, news on Monday, Joseph Jaxx from OSS
                                         
                                         Capital on Wednesday, and Adam is flying solo on Friday, but he has a very special guest,
                                         
                                         the author of his favorite book series, The Babaverse.
                                         
                                         Yes, Dennis E. Taylor joins the show.
                                         
                                         Have a great weekend.
                                         
                                         Leave us a five-star review if you dig our work,
                                         
    
                                         and let's talk again real soon.
                                         
                                         So during the main show, I did not ask you about this, nor did we directly reference it, but it was a reference point for me.
                                         
                                         You wrote something the same day as this incident, I think, is July 19th, 2024.
                                         
                                         Beyond the headlines, the unsung art of Software Outage Management And rather than
                                         
                                         It's better
                                         
