Risky Business - Wide World of Cyber: Why we should show CrowdStrike no mercy

Episode Date: July 30, 2024

In this episode of Wide World of Cyber, Risky Business host Patrick Gray discusses the recent CrowdStrike incident and its implications for security software that operat...es in kernel space with Chris Krebs and Alex Stamos of SentinelOne, a CrowdStrike Competitor. The conversation also delves into Microsoft’s role in this whole disaster and the potential changes it could make to its operating system to prevent similar incidents in the future. A video version of this episode is also available on Youtube!

Transcript
Discussion (0)
Starting point is 00:00:00 Hey everyone and welcome to another edition of the Wide World of Cyber. My name is Patrick Gray. Wide World of Cyber is the podcast we do here at Risky Business HQ with Chris Krebs and Alex Stamos of SentinelOne. This is a joint production between Sentinel One and Risky Business Media. And yeah, we love to do it. Chris was, of course, the first director of the CISA agency in the United States before he went on to co-found
Starting point is 00:00:36 KSG, the Krebs-Damos Group, with our other guest, Alex Damos, who has served as the CISO for Yahoo and Facebook and has done all sorts of interesting stuff. Also the founder of iSec Partners back in the day. He has been around the industry for a long time. Yes, Mr. Alex Damos. So Chris, welcome. Alex, welcome. Thanks, Patrick.
Starting point is 00:00:58 Hey, Patrick. Good to be with you again. So let's have a chat, shall we? Because it's been an interesting old news cycle recently. I can't believe we're doing it this week. We weren't going to cancel just for being bored. Yeah, so of course, you know, we got to talk about this CrowdStrike thing and, you know, what it means for EDR, what it means for software resilience. Obviously, and this is I'm going to say it, you know, Sentinel-1 is a competitor, a direct competitor of CrowdStrike. I would not want people to think that this is ambulance chasing. That's not what this is. I mean, Chris and Alex here are two of the world's foremost commentators on cybersecurity issues. They happen to work at Sentinel-1. I think that puts them in
Starting point is 00:01:41 a great position to talk about this i mean let's just kick it off there let's kick it off with you alex because you know of the two of you you were the most technical right so we've been talking before we got right i will concede that pat i will concede that okay thank you thank you thank you chris uh you know you and i got talking uh even before we got recording and we both agree that this failure of CrowdStrike to actually test these content updates and thus causing a kernel panic and a blue screen of death across all CrowdStrike machines is just sort of inexplicable and bizarre. Like that was my take in the weekly Risky Business Show. You agree with that. I mean, it is just weird. And you have an inside view
Starting point is 00:02:23 because you work for, you know, a similar company. Talk to me with that. I mean, it is just weird. And you have an inside view because you work for a similar company. Talk to me about that. That's right. So first, like you said, we're direct competitors of CrowdStrike. You can tell this because if you go to CrowdStrike.com right now and you mouse over white CrowdStrike, there's a whole section of CrowdStrike versus SentinelOne where they talk about what's wrong with our product. So certainly they think we're competitors.
Starting point is 00:02:43 So I want people to have that out there. Okay, so I'm just gonna say it. The narrative that immediately came out after CrowdStrike took out at least 8 million computers and shut down a big chunk of the global economy, including stranding me personally for hours and hours in an airport in which I got minor food poisoning from terrible fish and chips. When CrowdStrike did that, the immediate emerging narrative from them and from their proxies on LinkedIn and in social media was, this could happen to anybody. That is false. That is false. This could not happen to anybody. CrowdStrike has made intentional architectural engineering and QA decisions that made this
Starting point is 00:03:24 happen. They were negligent in their engineering decisions and their QA decisions. They created this problem for themselves and for the world. And it is dangerous for them to spread that idea that this could happen to anybody, because what they're doing is they're planting this idea in the minds of CEOs around the country, around the world, that security products are inevitably this incredibly dangerous, that it is not worth it to protect yourself from ransomware actors or state actors, because it is more likely that your company that you're paying millions of dollars to, that they're the ones who are going to take you out versus the bad guys.
Starting point is 00:03:56 So that's, I think, my core message here is this was preventable. And in fact, the vast majority of actors in the space are much more responsible than CrowdStrike was in the way that our products are designed and tested and deployed. And that's why, while there are problems with other products, they are never as widespread or as destructive as this problem. I'm happy to get into the details, but that's kind of the base issue here. So the thing that I can't wrap my head around, right, is I can understand, I can definitely understand how CrowdStrike's product architecture evolved the way it did. 100% understand because, you as long as you recognize that that's a risky sort of architecture and you put the sort of compensating tests around it.
Starting point is 00:04:52 And indeed, there are advantages even today for the way that they've structured their product. But again, there's a lot of risk there. And they absolutely should have been doing dynamic testing. Let me ask you this. When Sentinel-1 pushes even just a signature update, and this idea that, oh, well, a signature update or a content update, as they call it,
Starting point is 00:05:15 could never have caused this sort of thing, and it's this really unexpected thing. This is something we've seen time and time again with antivirus companies, EDR platforms, all sorts of stuff. I can think of incidents involving Sophos, McAfee, all sorts of stuff. I can think of incidents involving Sophos, McAfee, all sorts of people. So I'm guessing that most competing firms, yours included, I mean, you do dynamic testing on all sorts of updates, I would imagine. And then you probably stagger rollouts into different rings, right? Right. So all anti-malware products, and so we're talking about from traditional AV
Starting point is 00:05:45 all the way up to the most advanced ADR, XDR products, of which Craftsware and Sentinel-1 are that category, have both the base code and then some kind of rule set. And the truth is, this is a complicated world in which it used to be for AV, the rule sets were these basic static files that were being parsed by a rule, parsed by code, and then read, and then turned in some kind of memory structure. And now it is some kind of executable code, right? It is something more complicated. What CrowdStrike has done is they have pushed a huge amount of their intelligence into a kernel module, right? Now, every product, including ours, if you want to be truly secure, has to run on Windows at least, have some
Starting point is 00:06:26 component that is a kernel module. And there's really kind of three functions that you have to have to do that. One is you have to tap into certain events that are only really available in the kernel. If you want to do certain introduction of events, because kind of for modern EDR, you don't want to just only shut down a process or only shut down the machine. You want to be able to kind of intelligently shut off processes from doing certain things. And you can only do that from inside the kernel. There is not a user mode API to be allowed to do that. And you're trying not to do things like hook and inject into every single process on the
Starting point is 00:06:56 machine because that causes stability issues. And so you can only do that from the kernel. And the third and the most important is tamper resistance, is that there is a constant battle with the high-end threat actors, including the ransomware actors, and that there is a constant black market that if you watch the discussion groups and such, is that you'll have the lock bits and the equivalents, the black cats and such, discuss, I've got a CrowdStrike bypass. I've got a Sentinel-1 bypass. I know how to turn these guys off. And so they're constantly looking for ways to shut us down so they can try to do something. And then we're constantly looking for what they're doing and then updating our systems.
Starting point is 00:07:28 And the only way you can protect yourself on Windows right now is from inside the kernel. Right. And so everybody's got a kernel module. But what you can do is you can build your kernel module to only do the minimal things, to tap into those APIs, to tap into the hook into the things that you absolutely have to keep that as stable as possible, to put as minimal logic as possible, and to push all of your AI, your ML, your parsing, all of your logic, anything that parses any rule set, push that all into user mode, and then create very well tested, very robust iOctools and other kinds of APIs between the
Starting point is 00:08:04 kernel and user mode and then test the crap out of those. What's amazing here, Alex, is that I'm a journalist and I know this, you know, because like the Airlock Digital guys who, you know, they make an allow listing solution that's, you know, terrific. They're Australian guys. And I remember years ago, they were like, yeah, you know, because we do this through a kernel driver, you know, we're really just wondering what, you know, what we can do to really make this rock solid, stable, whatever. So they went to Silvio Cesare, who's a world renowned researcher in Australia who knows a lot about kernels. You know, he's globally renowned for being a kernel security expert. And they dumped a pile
Starting point is 00:08:37 of money on Silvio. And it's just really interesting what you said, because Silvio came back and he said, there's a few things you're doing in the kernel that you don't need to be doing in the kernel so you need to move them out and yeah you know so they were thinking very early on like how do we just get as much risk out of here as possible uh and in the end you know their kernel driver is like 60 kilobytes uh obviously it's a much simpler product uh than something like fully featured edr but it goes back to those principles which is if you're going to be in the kernel, you know, this is the way you need to think about it. But again, the question that I asked you
Starting point is 00:09:10 wasn't about their risky architecture. It was about testing. Because this is the part that doesn't make sense to me. And you do test, you know, so when Sentinel-1 rolls out, like even just a signature file, there is a dynamic test, is there? And tell me what that process actually looks like for a CrowdStrike competitor. Right. And so this is one of the differences
Starting point is 00:09:31 between us, and I can't speak for everybody, right? But I'm looking actually at an internal slide deck right now where we talk about how even for our content files, we do a ton of testing that obviously CrowdStrike doesn't. And so CrowdStrike has, they've released this slide deck where it shows that their QA, they have sensor content and rapid response. And for sensor content, they do all this different stuff. And then rapid response, it says, they have template instances and they have checks that are being performed and that's it. And the checks obviously didn't happen. And effectively they released this PIR, this initial, you know, this incident report that is hundreds and hundreds of words to basically say we never ran this on a Windows machine. Right.
Starting point is 00:10:11 Because it instantly blue screens. So clearly it never touched an actual running Windows machine, which is amazing, which is the only possible way this could have made it out. No, no. I mean, I never actually. I know you. I know you heard our show. Yeah. I know you heard our show. This was exactly our conclusion,
Starting point is 00:10:28 which is like they did not even like put this on. Anyway, it's the mind it boggles. Look, I want to bring Chris into the conversation here. So, but I'm sorry, you did ask. And so we do a bunch of different testing.
Starting point is 00:10:39 So we do our own testing. We roll it out to our own testing machines, including the live updates, right? So it runs on actual Windows machines on a bunch of different pieces of hardware. The modal, so again, EDR, we break things. I'm just going to admit it. Sentinel-1 has bugs. We have broken things. The modal break is you have a conflict with something that's specific to a specific piece of hardware or a specific customer, right? So that's what will be much more likely is that you have like a specific Dell driver
Starting point is 00:11:05 that it doesn't like, or we'll have a customer where they have a really specific piece of software that only that company has. It's custom that our detection rules will detect and it's a line of business software and it's critical. And so we will have test rules that will include that stuff
Starting point is 00:11:19 so that we don't, you know, oops, we screwed up once. We never want to make that mistake again, right? So when we do content updates, all of those tests happen. Thousands and thousands of tests happen on real virtual machines and real hardware before anything else happens. Then we roll it out to ourselves. We dog food those rules before we roll it to anybody else.
Starting point is 00:11:37 We beta test them. And then we have telemetry where we roll it out to small percentages of customers and then get the telemetry back. And if you don't have like a 98% positive rate, it automatically stops, right? It is clear CrowdStrike did none of that because any of those steps that I just laid out would have stopped this, any single one of them. And now they're announcing like, oh, we're going to do some of this. This is super basic. This is like 2015. And this isn't like something Sentinel-1 invented. This is like what anybody who's ever done high quality engineering has done for decades. It's just
Starting point is 00:12:12 kind of mind boggling for me that you could possibly ever do something where you're like, I'm going to ship code that's going to go run in millions of Windows kernels. I'm just going to run it by like a Perl script and then YOLO it out to millions of machines. It's just mind-boggling negligent. And it's shocking. And the idea that people are going out and saying anybody could do this is just a complete lie. So one thing, one thing though,
Starting point is 00:12:36 they do dog food, as I understand it, they actually do dog food their updates, but they're a Mac shop. Yeah, great. Oops. Which is amazing, right? Like when you're're like why didn't they what you know catch it's like oh my god guys you just don't get it and anyway like they do have windows machines because they're a public company this is a challenge of every public company is you have windows machines because you have to have like 64-bit excel but nobody was up at 4 a.m on a
Starting point is 00:12:58 friday yeah right none of their people yeah i was it was crazy right because you know as as listeners who might be new listeners who might not know but yes yes, I am based in Australia, thus the accent. And yeah, we copped it during the middle of business. You know, it was like 2 p.m. on a Friday sort of thing. It was crazy. So we were one of the first when, yeah, just social media is kicking off. And, you know, it goes from mysterious blue screens everywhere to the prime minister will address the nation sort of thing like in a few hours. Absolutely not. So, Chris, I want to bring you into this right because and i want to bring you in to
Starting point is 00:13:29 talk about the microsoft side of this because yes what crowd strike did was mind-numbingly like bad right as chris has established and again you know people might say oh is this ambulance chasing uh i don't think it is but is this this you guys kicking CrowdStrike when they're down? Yes, but also they deserve it. So we've established all of that. But then there's the whole Microsoft dimension to this, right, where, you know, there's been a lot of discussion about, you know, EU rulings, which have said that Microsoft can't just kick everyone out of the kernel and give itself an unfair advantage in its security products. And we've also got Microsoft saying, well, we obviously need to change Windows so that this doesn't happen.
Starting point is 00:14:09 You know, if I'm Microsoft at this point, I'm pretty mad that this has happened because it has reflected poorly on my company. And I'm thinking, well, how do I stop this from happening again? And again, if I'm Microsoft, I don't really care too much if it's going to make security companies sad. What I do care about, though, is the reaction of competition regulators if I start doing things that could disadvantage security companies. Why don't you walk us through the whole dynamic here that's at play with Microsoft and regulators and, you know, what sort of steps they could take to, you know, change their operating system or the way they do things without getting in trouble with the FTC?
Starting point is 00:14:46 Yeah. Look, I mean, I think there's a baseline initial question of was Microsoft really aware of the extent at which CrowdStrike was really playing around in the kernel? And it's not clear to me that they were aware. And so they're going to be taking a hard look at what CrowdStrike was doing. They're probably going to take a look at a whole bunch of other vendors as well. And the natural reaction, and I think this is something that Alex has talked about elsewhere, is locking it down. I mean, that's the natural, right, almost like biological response of like,
Starting point is 00:15:22 if somebody hurts you, you're going to stop them, you know, allowing them into that space where they're hurting you. The problem is, as you pointed out, is the 2009 agreement with the European Union or the European Commission where Microsoft is not allowed to shut that, shut access down. They have to provide access to other security vendors, the same access they have. That's what's holding now. And I think that's probably going to be the initial kind of friction on them walling it off. That said, these agreements can change. The United States is not necessarily bound by that agreement. And there's a possibility that you could have some different treatment here, or there could be some further negotiation. Euro windows. Sounds horrible.
Starting point is 00:16:07 I doubt that would happen based on kind of my experience. Well, I think also if Microsoft were to do something to give itself an advantage in this market, like there's going to be FTC problems. So that's the second piece, right? Yeah. Yeah. I mean, that's the second piece because I think there's with the, you know, with the prior security challenges, we'll just kind of leave it there, that Microsoft has faced over the last couple of years and some of the broader attention to IT monocultures. The United States Congress is taking a hard look at tech competition issues, you know, from the hardware all the way through productivity and security. The FTC, the SEC may be looking at these issues. So I would think that Microsoft has a very delicate dance to do here so that they do not exacerbate any of the potential antitrust
Starting point is 00:17:01 issues, that they do not walk further down that line because that is something that they dealt with at the turn of the century and they don't want to deal with that again. All that said, I just do not see Microsoft just kind of chalking this one up and saying like, ah, that's a mulligan. You got a mulliganigan guys. Yeah. So something's going to change. And I think Alex, you know, his, you, we've talked about VBS enclaves and things like that. Like, I just, I don't know where this goes, but I do know that there's a lot of external pressure from customers saying, how the hell can you allow this to happen? But there's also a lot of pressure from, as you've mentioned, the oversight regulators in the, in the enforcement agencies that are going to be
Starting point is 00:17:45 sitting there going like, hey, guys, you still have to play fairly with the rest of the ecosystem. And the third aspect here is that it's actually worse for, I think, the broader ecosystem if they were to lock it down, because that takes us further down that monoculture road or path. And diversity right now is a good thing in this space. And we've talked about that time and time again. Diversity in the security space is a good thing because what you have otherwise is a bunch of slices of Swiss cheese where the holes actually just line up and things will pass right through. Yeah. I mean, it's a tricky one. And I want to talk to you about this too, Alex, because, you know, I had an interview the other day with the Airlock digital guys and that's
Starting point is 00:18:34 running, that's running next week. And I have been talking to them a lot because obviously as a security vendor with a presence in the kernel, like, and they're my friends, of course, I've been talking to them about this a lot and excuse me. me, yeah, so, you know, they were saying, like, for their purposes, too, it's a little bit different because they're not a full EDR thing. As long as Microsoft introduced the right kind of API, they would mostly be okay. But they were making the points that for EDR vendors, they're not going to like it if Microsoft goes the way of Mac OS and releases some sort of generic endpoint security API. I mean, the point David Cottingham made to me is like, if they do anything, like use a little bit too much system resources or whatever, Mac OS just kills them, right? Because you're affecting the user experience. So they're just like, no, no security software, goodbye.
Starting point is 00:19:20 I mean, Microsoft is unlikely to make the same sort of design decisions there. But the point is, you're going to lose an awful lot of flexibility if Microsoft, you know, even if they're doing equitable sort of access to this API, because they would need to do the same for Defender as they're doing for, you know, Defender's competitors. It would need to be an even playing field to avoid the FTC causing them drama. But how do you think they could even do this technically, I think is the question, right? Because I don't really know how they could lock everyone out of the kernel and still give startups and existing vendors room to innovate, if that makes sense. What do you expect them to do here? Yeah, so I think there's two or three options.
Starting point is 00:20:06 So one would be an Apple-like direction. SentinelOne works on Apple Silicon Macs and it has been a challenge, but we would be willing, our position here is we are willing to work with Microsoft if they want to go that direction, if they hold themselves to the same standard. If we are held to a second-class standard,
Starting point is 00:20:22 then it'll never work, right? Because they will use this as an opportunity to make Defender the only EDR that works. Certainly, they're going to try that. I'm just going to throw it out there. Microsoft is going to try to use this as an excuse to make Defender the only EDR product that works. And so it is really important for both enterprises, the government, and for third parties like
Starting point is 00:20:39 us to say, to point out when Microsoft makes that move, that if they want to do better, we're happy to be a design partner there. And we're happy to beta test and alpha test with them of doing something better. So I do think an Apple-like model is possible. It'll have to be very carefully designed, but it is theoretically possible to do this stuff in user space. Customers are very performance sensitive, right?
Starting point is 00:20:59 Like we get lots of tickets of, you changed my performance in this workload by one and a half percent. And so that will be an issue. But that is something that could be worked out collaboratively with Microsoft if they're willing to do so in a reasonable way. In the short term, I think a better process would be for them to work collaboratively with the EDR vendors to make sure that we're not doing the same stupid stuff CrowdStrike did. CrowdStrike effectively bypassed the whole purpose of the WHQL process, right? So when you submit a kernel driver to Microsoft, they test your driver to make sure it doesn't crash the kernel. And then CrowdStrike
Starting point is 00:21:33 was going and doing all this dangerous stuff in the driver that we don't do. Microsoft should go revoke CrowdStrike. CrowdStrike broke their promise to Microsoft, right? Look, I'm going to actually push back on you a little bit there in CrowdStrike's defense, because what they gain through this architecture that they've got is an awful lot of flexibility. They've got an awful lot of performance advantages, right? So I'm not actually as critical of their architecture as you are. The thing that I'm critical of is that they didn't actually recognize,
Starting point is 00:22:02 they don't seem to have recognized or have forgotten how risky it is, and they didn't do the testing they don't seem to have recognized or have forgotten how risky it is and they didn't do the testing. Look, if you're doing adequate testing, and you pointed out before that even like a basic dynamic testing regime, you're going to miss edge cases, right? You're going to hit some box that's using some obscure software. It's not going to play nice. There might be a crash, but you're not going to blue screen all of your customers if you're doing some rudimentary testing. So I think that you can have an architecture like that and have some compensating controls like testing. So again, I'm not as critical of their architecture. I'm just thinking if you're going to operate like that, you need to be careful and they weren't,
Starting point is 00:22:36 and that's baffling. So what I think Microsoft could do in the short term is they could amend their WHQL guidelines to say, if you want to have an EDR driver signed by us, you have to pull all this dangerous stuff out of the kernel and put it in user mode. And that should be a requirement they put in place in the next 90 days, 180 days, something like that. That would be a reasonable short-term solution. And then what they're already working on,
Starting point is 00:22:58 which would be a fascinating place is eBPF in the kernel. So we have an agent that uses ePBF on Linux that you can run Sentinel-1 in a Linux container, which obviously can't run kernel modules. And it works great. It's super cool. It technically runs in the kernel, but it's running in a safe VM. Obviously, there's interesting performance issues with just-in-time compile. There's interesting interactions with hypervisors and stuff. But I think that's a fascinating direction. Again, as long as Microsoft Defender uses it. So if Microsoft wants to go that direction, I think that's awesome. But it needs to be something that Microsoft holds their own security products to that same standard.
Starting point is 00:23:34 So it's interesting you mentioned that because throughout this whole thing, I actually did find myself Googling eBPF for Windows. What is the status of eBPF on Windows? How far along is it? It's not something that works end-to-end. It's not usable. It's basically a test. It's something you can use. It's like a toy. It's an experiment. It's a lab experiment at this point. It's an experiment. It's not useful right now because
Starting point is 00:24:00 it is something you load as a device driver. You need a bunch of protections in there. I think that's more like a Windows 12 thing, right? I don't see them doing that as a backport to Windows 10 and Windows 11. But what would probably be a reasonable direction would be to update their WHQL standards for Windows 10 and Windows 11, and then work on something either eBPF or a user mode model for a Windows 12 or some kind of other major kernel re-architecture. I mean, I agree with you that I think the most likely large changes here
Starting point is 00:24:30 are coming for Windows 12. I think it's a little bit too hard to try to retrospectively, you know, cork this. Like it's just forget it. Right, well, you remember that the Apple changes happened in the context of them doing the Apple Silicon changes, which was a massive re-architecture
Starting point is 00:24:43 of the macOS kernel. I got to say, Apple has surprised me on the upside with all of the engineering work they've done in the last five years around the way they've engineered things. It's impressive. Chris, I want to talk to you. I have to say, though, this talk of Windows 12 is giving me a mini seizure,
Starting point is 00:25:03 because you remember Windows 10 was supposed to be it. What happened along the way? It's just always the way. But look, I wanted to talk to you about this as well, because, you know, you're out there, you're Mr. Big Picture. You speak to all sorts of people in government. You speak to all sorts of people at these, you know, global mega corporations. And OK, you know, they're very much attuned to the risks of security software. But I think a big wake up call for policymakers in this, policymakers and senior corporate leaders, is, oh my god, this was only 1% of systems. But look at the chaos, right? So they're starting to look at their supply chains. And it ties back to that whole conversation about supply chains,
Starting point is 00:25:43 which I know you're a fan of having having what's been the reaction out there among your contacts you know and i'm not necessarily just talking about cyber people here but you know uh government types politicians uh you know senior directors at at large corporation corporations what's been their reaction to this in terms of like their thinking outside of just sort of EDR and security software? Well, I think it was actually somewhat reminiscent of SolarWinds because there were a lot of people that woke up and they're like, what's an Orion in 2020? Same thing going here was that like, what do you mean there was this much going on at the kernel and not in user land? And I think what
Starting point is 00:26:28 we're having is perhaps in corporate space, a bit of a wake up call of kind of that risk reward tension, doing a lot of really interesting heavy stuff at the kernel rather than over here where everybody else does it. Yes, maybe it's highly beneficial but at the same time you you know in a one hour time period brought down 8.5 million machines and ground air travel to a halt yeah it was actually less it was actually less it was actually within like minutes and uh my joke on the on the weekly show uh was that that's impressive engineering the fact that they could update everyone all at once but uh you I pointed out, however. Yeah. The update was only available for an hour. Yeah. It was like 78 minutes that it was available. But it's just interesting because I made the same mistake. I actually said, oh, it's amazing that they hit them all in an hour. And then someone
Starting point is 00:27:19 was like, no, it was actually minutes. So, you know. Yeah. And so it goes, you know, that's the first thing. It's like, I think we're going to have this kind of almost 2020-ish renaissance of back then it was, it was applications. That was a great year that we're all keen to relive, I'm sure. Yeah, sure. Well, let's not forget that every time there's a US presidential election with a new administration, there's a lot of stuff that happens in that first year. I mean, remember 2017 with WannaCry, NotPetya, BadRabbit, and then we had 2020 with Hafnium, SolarWinds kind of crashed over that, and then you had Colonial. Anyway, so again, I think there's this conversation about taking a look at what products and services are playing
Starting point is 00:28:07 and what's part of the architecture, what part of the enterprise and what sort of privileges, what sort of ability they have to bring you down to your knees. At the same time, there's a secondary conversation about just resilienceilience in operations. Alex and I were talking about one company we used to work with that had a significantly redundant operation. Two of everything. Completely separated, two of everything. Problem is they had CrowdStrike running on both. So the first one drops, they fail over, that falls down. So I think you're going to see a lot more kind of diversity of options, again, depending on where in the risky bits they play. But the ability to, hey, we can lose this one, but because we're running on a completely independent system that would have an unrelated impact, we're going to deploy that.
Starting point is 00:29:06 So I think companies are going to, in the longer term, probably spend more money. I mean, I think that's just the reality. Redundant systems are going to cost more. And I think that's probably going to be the future you see, at least for a lot of the highly important critical infrastructure. But nonetheless, you can go read the terms and conditions for CrowdStrike. And they say, if you're in critical infrastructure, you should not be running this stuff in your operational networks,
Starting point is 00:29:34 live connected, getting live updates. So where does this go? We already kind of touched on investigations. The United States Congress, the House Homeland Security Committee sent a letter to CrowdStrike to George Kurtz saying, hey, by Wednesday, whatever that the United States Congress usually takes off for recess to go back to the district and do other things. That's particularly true in election cycles. So the U.S. Congress will not be in session for the month of August. So I suspect that that hearing will take place in September.
Starting point is 00:30:19 By then, I suspect kind of the flash of this all will have dropped off a little bit. But nonetheless, I think you're going to have motivated staff and members of Congress. They're going to want to get to the bottom of this. Separately, NetChoice, a tech trade group in D.C., sent a letter to a number of senators saying, hey, Senate, you should have a similar hearing. And by the way, there's still a bunch of unresolved, unanswered questions about Microsoft security failures. So you should probably roll that into the hearing as well. So Congress is not going to forget about this. They may not be great at legislating right now, but they do okay in terms of holding hearings and asking some tough questions.
Starting point is 00:31:02 The Federal Aviation Administration has already announced that they will investigate the airline outages. And of course, the Delta continued to struggle through the week due to, I guess, the manual nature of recovering a bunch of these systems. I mean, one of my favorite things
Starting point is 00:31:19 that I saw through this was they were handwriting, not Delta, but there were airlines in India that were literally handwriting boarding passes. Handwriting tickets. Yeah. I don't know how- It's just amazing, right?
Starting point is 00:31:30 It's classic. Yeah. I don't know how security checkpoints, TSA, and all that stuff would work. But at the same time, there was some really interesting innovation in recovery. You saw some companies using scan guns and QR codes and things like that. Look, that was actually a point. I was doing an ABC radio interview and the Australian Broadcasting
Starting point is 00:31:53 Corporation is a CrowdStrike customer. So I had this really weird situation where this event was only two hours old and I'm on radio with a guy who is operating just with a microphone and a CD player because all the broadcasting systems were down. And eventually, I think by the time they got me on air, they had been using speakerphone held up to the microphone to do interviews, right? Like, that was the level of it. And then by the time I spoke to them, they'd found some sort of console they could plug a mobile phone into and do interviews that way. So, I was literally, you know, talking to the guy via a mobile phone, plugged into some sort of consumer grade, great hardware. And he was asking me, Oh, you know, should people run out and get
Starting point is 00:32:30 money out of ATMs and whatnot? And you know, should people panic? And I'm like, well, no, there'll be workarounds. Like for example, this interview that I'm doing with you now, where you are operating without your, your key systems, but certainly, you know, you look at Delta and that wasn't the experience. They were not able to find workarounds immediately. That's been an absolute disaster. You know, putting it all in context, of course, though, yes, you had significant outages. Yes, you had the major airlines in the United States out. Yes, you had the London Stock Exchange out. Yes, you had Sky News or whatever in the UK out and others globally. Healthcare systems, Alex has mentioned this before, but you, but my own family, my father and
Starting point is 00:33:06 mother both had appointments canceled. I sold my house last week. The wire was delayed from the sale settlement proceeds. Everybody somehow got affected by this. And yet the world- Except me. Sorry, I had to throw that in there. Do you even have internet down there? Is this coming over copper? How's this interview happening? So look, just one more thing. Let me just redirect this slightly. I feel like one thing that's unfortunate about all of this is that we're going to see a massive reaction to this
Starting point is 00:33:39 from business leaders and from governments. When this is something that I don't know, I don't know how much we should judge our policy decisions on something extremely stupid that a company did, if that makes sense, right? Like, I think the really bad thing that happened here, it's not so much the Microsoft ecosystem, it's not so much concentration of one vendor in these,
Starting point is 00:34:02 you know, it's just the fact that CrowdStrike did something inexplicably dumb, you know? So I just wonder, like, could there be an overreaction to this? I don't think so. No. So I think this is the exact right conversation to have, because what we now have in technology that's been building for quite some time, going back to SolarWinds and even before that, is a crisis in confidence in the technology that we're using on a daily basis and baking it into every single aspect of our lives. And it has only gotten worse, meaning we've only gotten more digitized, and that is only going to increase going forward. And yet we don't have a framework
Starting point is 00:34:36 of assurances and transparency around these products that we deploy. Again, Microsoft may not have known what CrowdStrike was actually doing in the kernel. That is something we have to figure out what's the right set of questions for transparency. And ultimately what we have to work towards is reestablishing trust in the products that we use throughout the digital ecosystem. That is the direction that we have to take this conversation. And this was just one more reminder that we're not there and we're not close. Well, and it's just along the lines of what you said. The thing that I just keep coming back to is that this is 2024 and this shouldn't have
Starting point is 00:35:20 happened, right? You can kind of understand how some sort of endpoint security scrappy startup might have made a mistake like this or how some sort of uh you know bit rotting legacy uh piece of software with you know very few skilled people left that it could happen i think that's the thing that's just crazy about this is there is it is it's an event that seems out of step with the times what do you think of that alex like i curious, and I know we're just speculating and just guessing, and they are a competitor of yours, so you're not going to be motivated to say anything nice here. But how do you think this happened? How do you think they actually got into a position where they weren't doing this testing? Because I honestly, I just think, you know, if you've got a competent management, like, how?
Starting point is 00:36:03 I just, I'm still, just my mind boggles at this. Sometimes companies grow in that you have processes that people don't realize how important they've become. And they don't re, you know, you just assume that nothing's gone wrong and you don't go and reassess until something breaks. Because the thing that I keep coming back to is that it's got to be something to do with churn. Staff churn, right? Yeah. No, it's possible too. The person who used to think about this just isn't there anymore.
Starting point is 00:36:29 That's possible. I mean, I've certainly seen that at companies. I saw that at Facebook all the time. We were like, hey, here's this thing that nobody knows how it works anymore. And you literally have to put like four people on it to reverse engineer. How does this process work? The problem here is that CrowdStrike has dug a huge hole for the whole security industry. Like we as an industry now have to, we have to like rebuild trust because if this ends up being,
Starting point is 00:36:51 as I said before, if CEOs believe that security products are inherently unsafe, it is a really bad thing for the safety of the world, right? And so I think what's going to have to happen is we're going to have to, all of us have to be looking, you know, we were already in a better place, but we, you know, it's NL1, the engineering team's been double, triple checking all of our QA processes. The product team is mocking up better ways. We've already had way more controls than CrowdStrike has on deployments, but we're going to have way more transparency and like, what are you deploying? What are you controlling deployment? This is one of the things that CrowdStrike customers were complaining about is they thought they weren't deploying these
Starting point is 00:37:23 things. They did not understand the difference between the different kinds of updates. And so everybody's going to have to have it super transparent about what you're updating at what times. I will say though, the percentage of customers who are in a position to sort of use or action that information is vanishingly small. Probably, yeah, right.
Starting point is 00:37:41 For the most part, like I said, we never, even if you have everything turned on we never did 100 on the beginning so yeah you know cloud-based uh security companies are just going to be incredibly careful with deployment um and we're going to have to document this kind of stuff and we're just going to have to be we're just going to have to slowly rebuild that trust and demonstrate and we're going to have to like in this podcast and in our writing now that we're a week out and people we can't be afraid of people calling us ambulance chasers anymore, we're going to have to push back against this narrative that could happen to anybody
Starting point is 00:38:11 because that is a dangerous narrative. We have to say, nope, nope. There's actually, software engineering is a, engineering is a practice. This was not software engineering. This was software cowboyism, right? Engineering, like you build a bridge, is being careful and it is planning and it is thinking about these processes. It is about being an adult. It is about measuring things. It is about not trying to tear down your competitors and being the fastest and being the measure. I think also as an industry, one of the things that like you look at CrowdStrike's website and it's all about we're better at this measurement, we're faster at this, we're faster at that. And one of the problems is driving this look at crowd strikes website and it's all about we're better at this measurement we're faster at this we're faster at that and one of the problems is
Starting point is 00:38:47 driving this is there's a bunch of kind of metrics driven stuff that's about are you faster in this and have you done more detections here it's not about did you have the best detections or do you have the the least false positives is that just did you have the most yeah did you have the fastest i look that's the other thing i told you i mean but this is on the CISOs, though. Like, if you're just going for, did these people detect it the fastest, then what you're encouraging, if you actually use those numbers, I'm not going to name any analysts out there because our analyst relations people will be mad at me. But there are people out there that are pushing the industry to do very dangerous things, they need to look at themselves too. If all you're doing is saying, is it seven minutes to detect or 20 minutes to detect? Seven minutes to detect means no time for testing.
Starting point is 00:39:31 The thing is, how else are they supposed to generate their mystical quadrangles of cyber goodness, Alex? One thing I will say too is that you just touched on something about engineering. I'm not sure if you even know this, but my excuse me, but my, excuse me, but my undergraduate degree is actually in engineering
Starting point is 00:39:48 and it's been very interesting for me to come in. And I've done some, I've had some experience doing R&D in an engineering context and then transitioned into technology later. And it was just always mind boggling to me, you know, for the last 25 years, what people you know it tech call engineering because i'm like that word does not mean what you think it means like just building stuff ain't engineering drives me spare look i think we're gonna i thought you're just a journalist which is like your whole shtick patrick oh he's talking about no i'm gonna agree with you because i have an electrical
Starting point is 00:40:21 engineering degree and when you do an engineering degree, they talk a lot about like, hey, the PE, professional engineering and bridges. And we had to take an entire class on the ethics of engineering and failures and people dying. I did that class as well, man. Right, and that's actually a problem with the security industry is I love security. I love security people.
Starting point is 00:40:41 I love Black Hat and DEF CON, but there's a ton of cowboys, right? There's a bunch of people who come from just hacking backgrounds and stuff and have never sat through that. And I think that's part of the problem too, is that like, it's a bit of a cowboy. Well, they need their Ralph Wiggum voice.
Starting point is 00:40:54 I'm an engineer, you know? And it's just, no, you're not, you know? It's a little bit nuts. But that said, if you were to apply genuine engineering principles to, you know, modern technology, geez, it would slow things down. Right. So that's a that's a whole other conversation for a want is over-regulation of this industry or that industry. There's no kind of technology, policy, discipline, or capability that exists or is resident in government that's going to make some magical regulatory framework. So this really does get down to customers demanding more and corporate responsibility on the software provider side. And that's,
Starting point is 00:41:47 unfortunately, just where we are right now. But I like this parallel to professionalism across the industry and stating what is a solid safety standard? What is professional engineering? I'm not saying we're looking to licensing because God knows I don't think we could push everybody through that regime, but we need to approach something along those lines. Yeah. I mean, you can, because even the, I don't know how it works in the United States, the engineering institute here, you don't necessarily need an engineering degree to be certified, but you would need seven years of experience and to be able to answer a few pretty important questions. Right. So I think you could. I agree, though, that that's probably not the solution here.
Starting point is 00:42:27 In the midst of a workforce gap where, depending on which numbers you read, it's 5 million or whatever it is. I don't think we're getting there. I'm just saying we have software engineering practices to prevent this kind of stuff. Yeah, 100%. But Alex, any final thoughts before we wrap it up? No, I just think Black Hat's going to be fascinating.
Starting point is 00:42:46 Yeah. It's going to be an interesting discussion. And from my perspective, I'm going to be talking to Black Hat. I'm going to be giving a talk on this. I'm going to be talking about how we're going to, the security industry needs to kind of use this as a moment for self-reflection
Starting point is 00:42:58 because we're going to have to start to figure ourselves out. This has, while it's not, it's CrowdStrike's name out there, it has hurt all of us. And I think that is going to be a real problem for the world. Because, for two reasons. One, because people are ripping and replacing. The other thing is, CrowdStrike has given a great demonstration. I was up at 2 a.m. that morning because I would not have been shocked if you had told me the PLA Marines were in the Taiwan Strait, right?
Starting point is 00:43:20 This is exactly what the Chinese would have done. Vault Typhoon, this is exactly what they would have done on day one of the 100 i made i made that comparison too i don't know if you caught that but yeah i didn't catch that but like yeah i this is crowd strike is now given the exact roadmap for what russia or china are going to do on the day they make their move um and so all of us now have to be extremely extremely up to date and ready to go because um we are all we are all potential uh conduits for for this kind of activity not being a mistake but being intentional 100 if i'm the pla i'm sending people to interview for jobs at all of the edr companies right now and as long-term plans because
Starting point is 00:43:56 that is uh you know bang for buck an amazing investment all right guys we're going to wrap it up there chris krebs alex damos thank you so much for joining me uh to do another one of these i'm i i'm recording you know for obviously people wouldn't know this, but I'm actually recording this on a Saturday. I love to do these podcasts. It's always fascinating to talk to you both. Chris, thank you. Alex, thank you.
Starting point is 00:44:14 Thanks, Patrick. And it's good to know that we make it to Saturday. We made it. We made it. Thank you. Have a great day.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.