Risky Business - Wide World of Cyber: Why we should show CrowdStrike no mercy
Episode Date: July 30, 2024In this episode of Wide World of Cyber, Risky Business host Patrick Gray discusses the recent CrowdStrike incident and its implications for security software that operat...es in kernel space with Chris Krebs and Alex Stamos of SentinelOne, a CrowdStrike Competitor. The conversation also delves into Microsoft’s role in this whole disaster and the potential changes it could make to its operating system to prevent similar incidents in the future. A video version of this episode is also available on Youtube!
Transcript
Discussion (0)
Hey everyone and welcome to another edition of the Wide World of Cyber. My name is Patrick Gray.
Wide World of Cyber is the podcast we do here at Risky Business HQ with Chris Krebs and Alex Stamos of SentinelOne.
This is a joint production between Sentinel One and Risky Business Media.
And yeah, we love to do it.
Chris was, of course,
the first director of the CISA agency
in the United States
before he went on to co-found
KSG, the Krebs-Damos Group,
with our other guest, Alex Damos,
who has served as the CISO
for Yahoo and Facebook
and has done all sorts
of interesting stuff. Also the founder of iSec Partners back in the day. He has been
around the industry for a long time. Yes, Mr. Alex Damos. So Chris, welcome. Alex, welcome.
Thanks, Patrick.
Hey, Patrick. Good to be with you again.
So let's have a chat, shall we? Because it's been an interesting old news cycle recently.
I can't believe we're doing it this week. We weren't going to cancel just for being bored.
Yeah, so of course, you know, we got to talk about this CrowdStrike thing and, you know, what it means for EDR, what it means for software resilience.
Obviously, and this is I'm going to say it, you know, Sentinel-1 is a competitor,
a direct competitor of CrowdStrike. I would not want people to think that this is ambulance
chasing. That's not what this is. I mean, Chris and Alex here are two of the world's foremost
commentators on cybersecurity issues. They happen to work at Sentinel-1. I think that puts them in
a great position to talk about this i mean let's just kick it off
there let's kick it off with you alex because you know of the two of you you were the most technical
right so we've been talking before we got right i will concede that pat i will concede that okay
thank you thank you thank you chris uh you know you and i got talking uh even before we got
recording and we both agree that this failure of CrowdStrike to actually test
these content updates and thus causing a kernel panic and a blue screen of death across all
CrowdStrike machines is just sort of inexplicable and bizarre. Like that was my take in the weekly
Risky Business Show. You agree with that. I mean, it is just weird. And you have an inside view
because you work for, you know, a similar company. Talk to me with that. I mean, it is just weird. And you have an inside view because you work for a similar company.
Talk to me about that.
That's right.
So first, like you said, we're direct competitors of CrowdStrike.
You can tell this because if you go to CrowdStrike.com right now and you mouse over white CrowdStrike,
there's a whole section of CrowdStrike versus SentinelOne where they talk about what's wrong
with our product.
So certainly they think we're competitors.
So I want people to
have that out there. Okay, so I'm just gonna say it. The narrative that immediately came out after
CrowdStrike took out at least 8 million computers and shut down a big chunk of the global economy,
including stranding me personally for hours and hours in an airport in which I got minor food
poisoning from terrible fish and chips. When CrowdStrike did that, the immediate
emerging narrative from them and from their proxies on LinkedIn and in social media was,
this could happen to anybody. That is false. That is false. This could not happen to anybody.
CrowdStrike has made intentional architectural engineering and QA decisions that made this
happen. They were negligent in their
engineering decisions and their QA decisions. They created this problem for themselves and for the
world. And it is dangerous for them to spread that idea that this could happen to anybody,
because what they're doing is they're planting this idea in the minds of CEOs around the country,
around the world, that security products are inevitably this incredibly dangerous,
that it is not worth it to protect yourself from ransomware actors or state actors, because
it is more likely that your company that you're paying millions of dollars to, that they're
the ones who are going to take you out versus the bad guys.
So that's, I think, my core message here is this was preventable.
And in fact, the vast majority of actors in the space are much more responsible than CrowdStrike
was in the way that our products are designed and tested and deployed.
And that's why, while there are problems with other products, they are never as widespread
or as destructive as this problem.
I'm happy to get into the details, but that's kind of the base issue here.
So the thing that I can't wrap my head around, right, is I can understand, I can definitely understand how CrowdStrike's product architecture evolved the way it did. 100% understand because, you as long as you recognize that that's a risky sort of architecture and
you put the sort of compensating tests around it.
And indeed, there are advantages even today for the way that they've structured their
product.
But again, there's a lot of risk there.
And they absolutely should have been doing dynamic testing.
Let me ask you this.
When Sentinel-1 pushes even just a signature update,
and this idea that, oh, well, a signature update
or a content update, as they call it,
could never have caused this sort of thing,
and it's this really unexpected thing.
This is something we've seen time and time again
with antivirus companies, EDR platforms, all sorts of stuff.
I can think of incidents involving Sophos, McAfee,
all sorts of stuff. I can think of incidents involving Sophos, McAfee, all sorts of people. So I'm guessing that most competing firms, yours included,
I mean, you do dynamic testing on all sorts of updates, I would imagine. And then you probably stagger rollouts into different rings, right? Right. So all anti-malware products, and so we're
talking about from traditional AV
all the way up to the most advanced ADR, XDR products, of which Craftsware and Sentinel-1
are that category, have both the base code and then some kind of rule set. And the truth is,
this is a complicated world in which it used to be for AV, the rule sets were these basic
static files that were being parsed by a rule, parsed by code,
and then read, and then turned in some kind of memory structure. And now it is some kind of
executable code, right? It is something more complicated. What CrowdStrike has done is they
have pushed a huge amount of their intelligence into a kernel module, right? Now, every product,
including ours, if you want to be truly secure, has to run on Windows at least, have some
component that is a kernel module. And there's really kind of three functions that you have to
have to do that. One is you have to tap into certain events that are only really available
in the kernel. If you want to do certain introduction of events, because kind of for
modern EDR, you don't want to just only shut down a process or only shut down the machine.
You want to be able to kind of intelligently shut off processes from doing certain things.
And you can only do that from inside the kernel.
There is not a user mode API to be allowed to do that.
And you're trying not to do things like hook and inject into every single process on the
machine because that causes stability issues.
And so you can only do that from the kernel.
And the third and the most important is tamper resistance, is that there is a constant battle
with the high-end threat actors, including the ransomware actors, and that there is a constant
black market that if you watch the discussion groups and such, is that you'll have the lock
bits and the equivalents, the black cats and such, discuss, I've got a CrowdStrike bypass.
I've got a Sentinel-1 bypass. I know how to turn these guys off. And so they're constantly looking
for ways to shut us down so they can try to do something. And then we're constantly looking for what they're doing and then updating our systems.
And the only way you can protect yourself on Windows right now is from inside the kernel.
Right.
And so everybody's got a kernel module.
But what you can do is you can build your kernel module to only do the minimal things,
to tap into those APIs, to tap into the hook into the things that you absolutely have to keep that as stable
as possible, to put as minimal logic as possible, and to push all of your AI, your ML, your parsing,
all of your logic, anything that parses any rule set, push that all into user mode,
and then create very well tested, very robust iOctools and other kinds of APIs between the
kernel and user mode and then test
the crap out of those. What's amazing here, Alex, is that I'm a journalist and I know this,
you know, because like the Airlock Digital guys who, you know, they make an allow listing solution
that's, you know, terrific. They're Australian guys. And I remember years ago, they were like,
yeah, you know, because we do this through a kernel driver, you know, we're really just
wondering what, you know, what we can do to really make this rock solid, stable, whatever. So they went
to Silvio Cesare, who's a world renowned researcher in Australia who knows a lot about kernels.
You know, he's globally renowned for being a kernel security expert. And they dumped a pile
of money on Silvio. And it's just really interesting what you said, because Silvio came back and he
said, there's a few things you're doing in the kernel that you don't need to be doing in the kernel so you need to move them out
and yeah you know so they were thinking very early on like how do we just get as much risk out of
here as possible uh and in the end you know their kernel driver is like 60 kilobytes uh obviously
it's a much simpler product uh than something like fully featured edr but it goes back to those
principles which is if you're going to be in the kernel,
you know, this is the way you need to think about it.
But again, the question that I asked you
wasn't about their risky architecture.
It was about testing.
Because this is the part that doesn't make sense to me.
And you do test, you know,
so when Sentinel-1 rolls out,
like even just a signature file,
there is a dynamic test, is there?
And tell me what that process actually looks like for a CrowdStrike competitor. Right. And so this is one of the differences
between us, and I can't speak for everybody, right? But I'm looking actually at an internal
slide deck right now where we talk about how even for our content files, we do a ton of testing
that obviously CrowdStrike doesn't. And so CrowdStrike has, they've released
this slide deck where it shows that their QA, they have sensor content and rapid response.
And for sensor content, they do all this different stuff. And then rapid response, it says,
they have template instances and they have checks that are being performed and that's it. And the
checks obviously didn't happen. And effectively they released this PIR, this initial, you know, this incident report that is hundreds and hundreds of words to basically say we never ran this on a Windows machine.
Right.
Because it instantly blue screens.
So clearly it never touched an actual running Windows machine, which is amazing, which is the only possible way this could have made it out.
No, no.
I mean, I never actually.
I know you.
I know you heard our show.
Yeah. I know you heard our show.
This was exactly our conclusion,
which is like they did not even
like put this on.
Anyway, it's the mind it boggles.
Look, I want to bring Chris
into the conversation here.
So, but I'm sorry, you did ask.
And so we do a bunch
of different testing.
So we do our own testing.
We roll it out to our own
testing machines,
including the live updates, right? So it runs on actual Windows machines on a bunch of different pieces of hardware.
The modal, so again, EDR, we break things. I'm just going to admit it. Sentinel-1 has bugs.
We have broken things. The modal break is you have a conflict with something that's specific
to a specific piece of hardware or a specific customer, right? So that's what will be much
more likely is that you have like a specific Dell driver
that it doesn't like,
or we'll have a customer
where they have a really specific piece of software
that only that company has.
It's custom that our detection rules will detect
and it's a line of business software and it's critical.
And so we will have test rules
that will include that stuff
so that we don't, you know,
oops, we screwed up once.
We never want to make that mistake again, right?
So when we do content updates, all of those tests happen.
Thousands and thousands of tests happen on real virtual machines and real hardware
before anything else happens.
Then we roll it out to ourselves.
We dog food those rules before we roll it to anybody else.
We beta test them.
And then we have telemetry where we roll it out to small percentages of customers
and then get the telemetry back.
And if you don't have like a 98% positive rate, it automatically stops, right? It is clear
CrowdStrike did none of that because any of those steps that I just laid out would have stopped this,
any single one of them. And now they're announcing like, oh, we're going to do some of this.
This is super basic. This is like 2015. And this isn't like something Sentinel-1 invented. This is like
what anybody who's ever done high quality engineering has done for decades. It's just
kind of mind boggling for me that you could possibly ever do something where you're like,
I'm going to ship code that's going to go run in millions of Windows kernels. I'm just going to run
it by like a Perl script and then YOLO it out to millions of machines. It's just
mind-boggling
negligent. And it's shocking. And the
idea that people are going out and saying anybody could do
this is just a complete lie.
So one thing, one thing though,
they do dog food, as I understand it, they
actually do dog food their updates, but
they're a Mac shop.
Yeah, great. Oops. Which is amazing,
right? Like when you're're like why didn't they what
you know catch it's like oh my god guys you just don't get it and anyway like they do have windows
machines because they're a public company this is a challenge of every public company is you have
windows machines because you have to have like 64-bit excel but nobody was up at 4 a.m on a
friday yeah right none of their people yeah i was it was crazy right because you know as as listeners
who might be new listeners who might not know but yes yes, I am based in Australia, thus the accent.
And yeah, we copped it during the middle of business.
You know, it was like 2 p.m. on a Friday sort of thing.
It was crazy.
So we were one of the first when, yeah, just social media is kicking off.
And, you know, it goes from mysterious blue screens everywhere to the prime minister will address the nation sort of thing like in a few hours.
Absolutely not. So, Chris, I want to bring you into this right because and i want to bring you in to
talk about the microsoft side of this because yes what crowd strike did was mind-numbingly like
bad right as chris has established and again you know people might say oh is this ambulance chasing
uh i don't think it is but is this this you guys kicking CrowdStrike when they're down? Yes, but also they deserve it. So we've established all of that.
But then there's the whole Microsoft dimension to this, right, where, you know, there's been a lot
of discussion about, you know, EU rulings, which have said that Microsoft can't just kick everyone
out of the kernel and give itself an unfair advantage in its security products. And we've
also got Microsoft saying, well, we obviously need to change Windows so that
this doesn't happen.
You know, if I'm Microsoft at this point, I'm pretty mad that this has happened because
it has reflected poorly on my company.
And I'm thinking, well, how do I stop this from happening again?
And again, if I'm Microsoft, I don't really care too much if it's going to make security
companies sad.
What I do care about, though, is the reaction of competition regulators if I start doing things that could disadvantage security companies.
Why don't you walk us through the whole dynamic here that's at play with Microsoft and regulators and, you know, what sort of steps they could take to, you know, change their operating system or the way they do things without getting in trouble
with the FTC?
Yeah.
Look, I mean, I think there's a baseline initial question of was Microsoft really aware of
the extent at which CrowdStrike was really playing around in the kernel?
And it's not clear to me that they were aware.
And so they're going to be taking a hard look at what CrowdStrike was doing.
They're probably going to take a look at a whole bunch of other vendors as well.
And the natural reaction, and I think this is something that Alex has talked about elsewhere,
is locking it down. I mean, that's the natural, right, almost like biological response of like,
if somebody hurts you, you're going to stop them, you know,
allowing them into that space where they're hurting you. The problem is, as you pointed out,
is the 2009 agreement with the European Union or the European Commission where Microsoft is not
allowed to shut that, shut access down. They have to provide access to other security vendors,
the same access they have. That's what's holding now. And I think that's probably going to be the initial kind of friction on them walling it off. That said, these agreements can change.
The United States is not necessarily bound by that agreement. And there's a possibility that
you could have some different treatment here, or there could be some further negotiation.
Euro windows. Sounds horrible.
I doubt that would happen based on kind of my experience.
Well, I think also if Microsoft were to do something to give itself an advantage in this
market, like there's going to be FTC problems.
So that's the second piece, right? Yeah.
Yeah. I mean, that's the second piece because I think there's with the, you know, with the prior security challenges, we'll just kind of leave it there, that Microsoft has faced over the last couple of years and some of the broader attention to IT monocultures.
The United States Congress is taking a hard look at tech competition issues, you know, from the hardware all the way through productivity
and security. The FTC, the SEC may be looking at these issues. So I would think that Microsoft has
a very delicate dance to do here so that they do not exacerbate any of the potential antitrust
issues, that they do not walk further down that line because that is
something that they dealt with at the turn of the century and they don't want to deal with that
again. All that said, I just do not see Microsoft just kind of chalking this one up and saying like,
ah, that's a mulligan. You got a mulliganigan guys. Yeah. So something's going to change. And I think Alex, you know, his, you, we've talked about VBS
enclaves and things like that. Like, I just, I don't know where this goes, but I do know that
there's a lot of external pressure from customers saying, how the hell can you allow this to happen?
But there's also a lot of pressure from, as you've mentioned, the oversight regulators in the,
in the enforcement agencies that are going to be
sitting there going like, hey, guys, you still have to play fairly with the rest of the ecosystem.
And the third aspect here is that it's actually worse for, I think, the broader ecosystem if they
were to lock it down, because that takes us further down that monoculture
road or path. And diversity right now is a good thing in this space. And we've talked about that
time and time again. Diversity in the security space is a good thing because what you have
otherwise is a bunch of slices of Swiss cheese where the holes actually just line up and things
will pass right through. Yeah. I mean, it's a tricky one. And I want to talk to you about this too,
Alex, because, you know, I had an interview the other day with the Airlock digital guys and that's
running, that's running next week. And I have been talking to them a lot because obviously as a
security vendor with a presence in the kernel, like, and they're my friends, of course, I've
been talking to them about this a lot and excuse me. me, yeah, so, you know, they were saying, like, for their purposes, too, it's a little bit different because they're not a full EDR thing.
As long as Microsoft introduced the right kind of API, they would mostly be okay.
But they were making the points that for EDR vendors, they're not going to like it if Microsoft goes the way of Mac OS and releases some sort of generic endpoint security
API. I mean, the point David Cottingham made to me is like, if they do anything,
like use a little bit too much system resources or whatever, Mac OS just kills them, right? Because
you're affecting the user experience. So they're just like, no, no security software, goodbye.
I mean, Microsoft is unlikely to make the same sort of design decisions there.
But the point is, you're going to lose an awful lot of flexibility if Microsoft, you know, even if they're doing equitable sort of access to this API, because they would need to do the same for Defender as they're doing for, you know, Defender's competitors. It would need to be an even playing field to avoid the FTC causing them drama. But how do you think they could even do this technically,
I think is the question, right?
Because I don't really know how they could lock everyone
out of the kernel and still give startups
and existing vendors room to innovate, if that makes sense.
What do you expect them to do here?
Yeah, so I think there's two or three options.
So one would be an Apple-like direction.
SentinelOne works on Apple Silicon Macs and it has been a challenge,
but we would be willing,
our position here is we are willing
to work with Microsoft
if they want to go that direction,
if they hold themselves to the same standard.
If we are held to a second-class standard,
then it'll never work, right?
Because they will use this as an opportunity
to make Defender the only EDR that works.
Certainly, they're going to try that.
I'm just going to throw it out there.
Microsoft is going to try to use this as an excuse to make Defender the only EDR product
that works.
And so it is really important for both enterprises, the government, and for third parties like
us to say, to point out when Microsoft makes that move, that if they want to do better,
we're happy to be a design partner there.
And we're happy to beta test and alpha test with them of doing something better.
So I do think an Apple-like model is possible.
It'll have to be very carefully designed,
but it is theoretically possible
to do this stuff in user space.
Customers are very performance sensitive, right?
Like we get lots of tickets of,
you changed my performance in this workload
by one and a half percent.
And so that will be an issue.
But that is something that could be worked out collaboratively with Microsoft if they're willing to do so in a reasonable way.
In the short term, I think a better process would be for them to work collaboratively with the EDR vendors to make sure that we're not doing the same stupid stuff CrowdStrike did.
CrowdStrike effectively bypassed the whole purpose of the WHQL process, right? So when you submit a kernel driver to
Microsoft, they test your driver to make sure it doesn't crash the kernel. And then CrowdStrike
was going and doing all this dangerous stuff in the driver that we don't do. Microsoft should go
revoke CrowdStrike. CrowdStrike broke their promise to Microsoft, right?
Look, I'm going to actually push back on you a little bit there in CrowdStrike's defense,
because what they gain through this architecture
that they've got is an awful lot of flexibility.
They've got an awful lot of performance advantages, right?
So I'm not actually as critical of their architecture as you are.
The thing that I'm critical of is that they didn't actually recognize,
they don't seem to have recognized or have forgotten
how risky it is, and they didn't do the testing they don't seem to have recognized or have forgotten how risky it is and they didn't do the testing.
Look, if you're doing adequate testing, and you pointed out before that even like a basic dynamic testing regime, you're going to miss edge cases, right?
You're going to hit some box that's using some obscure software.
It's not going to play nice.
There might be a crash, but you're not going to blue screen all of your customers if you're doing some rudimentary testing. So I think that you can have an architecture like that and
have some compensating controls like testing. So again, I'm not as critical of their architecture.
I'm just thinking if you're going to operate like that, you need to be careful and they weren't,
and that's baffling. So what I think Microsoft could do in the short term is they could amend
their WHQL guidelines to say, if you want to have an EDR driver signed by us,
you have to pull all this dangerous stuff out of the kernel
and put it in user mode.
And that should be a requirement they put in place
in the next 90 days, 180 days, something like that.
That would be a reasonable short-term solution.
And then what they're already working on,
which would be a fascinating place is eBPF in the kernel.
So we have an agent that uses ePBF on Linux
that you can run Sentinel-1 in a
Linux container, which obviously can't run kernel modules. And it works great. It's super cool.
It technically runs in the kernel, but it's running in a safe VM. Obviously, there's interesting
performance issues with just-in-time compile. There's interesting interactions with hypervisors
and stuff. But I think that's a fascinating direction. Again, as long as Microsoft Defender uses it. So if Microsoft wants to go that direction, I think that's awesome.
But it needs to be something that Microsoft holds their own security products to that same standard.
So it's interesting you mentioned that because throughout this whole thing, I actually did find
myself Googling eBPF for Windows. What is the status of eBPF on Windows? How far along is it?
It's not something that works end-to-end.
It's not usable. It's basically
a test. It's something you can use.
It's like a toy. It's an experiment. It's a lab
experiment at this point. It's an experiment.
It's not useful right now because
it is something you load as a device driver.
You need a bunch of protections
in there. I think that's more like a Windows 12 thing, right? I don't see them doing that
as a backport to Windows 10 and Windows 11. But what would probably be a reasonable direction
would be to update their WHQL standards for Windows 10 and Windows 11, and then work on
something either eBPF or a user mode model for a Windows 12 or some kind of other major kernel re-architecture.
I mean, I agree with you that I think
the most likely large changes here
are coming for Windows 12.
I think it's a little bit too hard
to try to retrospectively, you know, cork this.
Like it's just forget it.
Right, well, you remember that the Apple changes
happened in the context of them
doing the Apple Silicon changes,
which was a massive re-architecture
of the macOS kernel.
I got to say, Apple has surprised me on the upside
with all of the engineering work they've done
in the last five years around the way they've engineered things.
It's impressive.
Chris, I want to talk to you.
I have to say, though, this talk of Windows 12
is giving me a mini seizure,
because you remember Windows 10 was supposed
to be it. What happened along the way? It's just always the way. But look, I wanted to talk to you
about this as well, because, you know, you're out there, you're Mr. Big Picture. You speak to all
sorts of people in government. You speak to all sorts of people at these, you know, global mega
corporations. And OK, you know, they're very much attuned to the risks of security software.
But I think a big wake up call for policymakers in this, policymakers and senior corporate leaders,
is, oh my god, this was only 1% of systems. But look at the chaos, right? So they're starting to
look at their supply chains. And it ties back to that whole conversation about supply chains,
which I know you're a fan of having having what's been the reaction out there among your
contacts you know and i'm not necessarily just talking about cyber people here but you know
uh government types politicians uh you know senior directors at at large corporation
corporations what's been their reaction to this in terms of like
their thinking outside of just sort of EDR and security software?
Well, I think it was actually somewhat reminiscent of SolarWinds because there were a lot of people
that woke up and they're like, what's an Orion in 2020? Same thing going here was that like,
what do you mean there was this much going on at the kernel and not in user land? And I think what
we're having is perhaps in corporate space, a bit of a wake up call of kind of that risk reward
tension, doing a lot of really interesting heavy stuff at the kernel rather than over here where
everybody else does it. Yes, maybe it's highly beneficial but at the same time you you know in a one hour time period brought down 8.5 million machines and ground air travel
to a halt yeah it was actually less it was actually less it was actually within like minutes
and uh my joke on the on the weekly show uh was that that's impressive engineering the fact that
they could update everyone all at once but uh you I pointed out, however. Yeah. The update was only available for an hour.
Yeah. It was like 78 minutes that it was available. But it's just interesting because I made the same
mistake. I actually said, oh, it's amazing that they hit them all in an hour. And then someone
was like, no, it was actually minutes. So, you know. Yeah. And so it goes, you know, that's the
first thing. It's like, I think we're going to have this kind of almost 2020-ish renaissance of back then it
was, it was applications. That was a great year that we're all keen to relive, I'm sure. Yeah,
sure. Well, let's not forget that every time there's a US presidential election with a new
administration, there's a lot of stuff that happens in that first
year. I mean, remember 2017 with WannaCry, NotPetya, BadRabbit, and then we had 2020 with
Hafnium, SolarWinds kind of crashed over that, and then you had Colonial. Anyway, so again,
I think there's this conversation about taking a look at what products and services are playing
and what's part of the architecture, what part of the enterprise and what sort of privileges,
what sort of ability they have to bring you down to your knees.
At the same time, there's a secondary conversation about just resilienceilience in operations. Alex and I were talking about one company we used to work
with that had a significantly redundant operation. Two of everything. Completely separated,
two of everything. Problem is they had CrowdStrike running on both. So the first one drops,
they fail over, that falls down.
So I think you're going to see a lot more kind of diversity of options, again, depending on where in the risky bits they play.
But the ability to, hey, we can lose this one, but because we're running on a completely independent system that would have an unrelated impact, we're going to deploy that.
So I think companies are going to, in the longer term, probably spend more money.
I mean, I think that's just the reality.
Redundant systems are going to cost more.
And I think that's probably going to be the future you see, at least for a lot of the
highly important critical infrastructure.
But nonetheless, you can go read the terms and conditions for CrowdStrike.
And they say, if you're in critical infrastructure,
you should not be running this stuff in your operational networks,
live connected, getting live updates.
So where does this go?
We already kind of touched on investigations.
The United States Congress, the House Homeland Security Committee sent a letter to CrowdStrike to George Kurtz saying, hey, by Wednesday, whatever that the United States Congress usually takes off for
recess to go back to the district and do other things.
That's particularly true in election cycles.
So the U.S. Congress will not be in session for the month of August.
So I suspect that that hearing will take place in September.
By then, I suspect kind of the flash of this all will have dropped off a little bit.
But nonetheless, I think you're going to have motivated staff and members of Congress.
They're going to want to get to the bottom of this.
Separately, NetChoice, a tech trade group in D.C., sent a letter to a number of senators
saying, hey, Senate, you should have a similar hearing.
And by the way, there's still a bunch of unresolved, unanswered questions about Microsoft security failures. So you should probably roll that into the hearing as well.
So Congress is not going to forget about this. They may not be great at legislating right now,
but they do okay in terms of holding hearings and asking some tough questions.
The Federal Aviation Administration has already announced
that they will investigate
the airline outages.
And of course, the Delta
continued to struggle through the week
due to, I guess, the manual nature
of recovering a bunch of these systems.
I mean, one of my favorite things
that I saw through this was
they were handwriting, not Delta,
but there were airlines in India that were literally
handwriting boarding passes.
Handwriting tickets.
Yeah.
I don't know how-
It's just amazing, right?
It's classic.
Yeah.
I don't know how security checkpoints, TSA,
and all that stuff would work.
But at the same time, there was some really interesting
innovation in recovery.
You saw some companies using scan guns and QR codes and things like that.
Look, that was actually a point. I was doing an ABC radio interview and the Australian Broadcasting
Corporation is a CrowdStrike customer. So I had this really weird situation where this event was
only two hours old and I'm on radio with a guy who is operating just with a microphone and a CD
player because all the broadcasting systems were down. And eventually, I think by the time they got me on air,
they had been using speakerphone held up to the microphone to do interviews, right? Like,
that was the level of it. And then by the time I spoke to them, they'd found some sort of console
they could plug a mobile phone into and do interviews that way. So, I was literally,
you know, talking to the guy via a mobile phone, plugged into some sort of
consumer grade, great hardware. And he was asking me, Oh, you know, should people run out and get
money out of ATMs and whatnot? And you know, should people panic? And I'm like, well, no,
there'll be workarounds. Like for example, this interview that I'm doing with you now,
where you are operating without your, your key systems, but certainly, you know, you look at
Delta and that wasn't the experience.
They were not able to find workarounds immediately. That's been an absolute disaster.
You know, putting it all in context, of course, though, yes, you had significant outages. Yes, you had the major airlines in the United States out. Yes, you had the London Stock Exchange out.
Yes, you had Sky News or whatever in the UK out and others globally. Healthcare systems,
Alex has mentioned this before, but you, but my own family, my father and
mother both had appointments canceled. I sold my house last week. The wire was delayed from the
sale settlement proceeds. Everybody somehow got affected by this. And yet the world-
Except me. Sorry, I had to throw that in there.
Do you even have internet down there? Is this coming over copper? How's this interview happening?
So look, just one more thing.
Let me just redirect this slightly.
I feel like one thing that's unfortunate about all of this
is that we're going to see a massive reaction to this
from business leaders and from governments.
When this is something that I don't know,
I don't know how much we should judge our policy decisions
on something extremely stupid that a company did,
if that makes sense, right?
Like, I think the really bad thing that happened here,
it's not so much the Microsoft ecosystem,
it's not so much concentration of one vendor in these,
you know, it's just the fact that CrowdStrike
did something inexplicably dumb, you know? So I just wonder, like, could there be an overreaction to this?
I don't think so. No. So I think this is the exact right conversation to have,
because what we now have in technology that's been building for quite some time,
going back to SolarWinds and even before that, is a crisis in confidence in the technology that
we're using on a daily basis and baking it
into every single aspect of our lives. And it has only gotten worse, meaning we've only gotten more
digitized, and that is only going to increase going forward. And yet we don't have a framework
of assurances and transparency around these products that we deploy. Again, Microsoft may
not have known what CrowdStrike was actually doing in
the kernel. That is something we have to figure out what's the right set of questions for
transparency. And ultimately what we have to work towards is reestablishing trust in the products
that we use throughout the digital ecosystem. That is the direction that we have to take this
conversation. And this was just one more
reminder that we're not there and we're not close. Well, and it's just along the lines of what you
said. The thing that I just keep coming back to is that this is 2024 and this shouldn't have
happened, right? You can kind of understand how some sort of endpoint security scrappy startup
might have made a mistake like this or how some sort of uh you know bit rotting legacy uh piece
of software with you know very few skilled people left that it could happen i think that's the thing
that's just crazy about this is there is it is it's an event that seems out of step with the
times what do you think of that alex like i curious, and I know we're just speculating and just guessing, and they are a competitor of yours, so you're not going to be motivated to say anything nice here.
But how do you think this happened?
How do you think they actually got into a position where they weren't doing this testing?
Because I honestly, I just think, you know, if you've got a competent management, like, how?
I just, I'm still, just my mind boggles at this.
Sometimes companies grow in that you have processes that people don't realize how important they've become.
And they don't re, you know, you just assume that nothing's gone wrong and you don't go and reassess until something breaks.
Because the thing that I keep coming back to is that it's got to be something to do with churn.
Staff churn, right?
Yeah.
No, it's possible too.
The person who used to think about this just isn't there anymore.
That's possible.
I mean, I've certainly seen that at companies.
I saw that at Facebook all the time.
We were like, hey, here's this thing that nobody knows how it works anymore.
And you literally have to put like four people on it to reverse engineer.
How does this process work?
The problem here is that CrowdStrike has dug a huge hole for the whole security industry.
Like we as an industry now have to, we have to like rebuild trust because if this ends up being,
as I said before, if CEOs believe that security products are inherently unsafe, it is a really
bad thing for the safety of the world, right? And so I think what's going to have to happen is we're
going to have to, all of us have to be looking, you know, we were already in a better place,
but we, you know, it's NL1, the engineering team's been double, triple checking
all of our QA processes. The product team is mocking up better ways. We've already had way
more controls than CrowdStrike has on deployments, but we're going to have way more transparency and
like, what are you deploying? What are you controlling deployment? This is one of the
things that CrowdStrike customers were complaining about is they thought they weren't deploying these
things. They did not understand the difference
between the different kinds of updates.
And so everybody's going to have to have it super transparent
about what you're updating at what times.
I will say though, the percentage of customers
who are in a position to sort of use or action
that information is vanishingly small.
Probably, yeah, right.
For the most part, like I said,
we never, even if you have everything
turned on we never did 100 on the beginning so yeah you know cloud-based uh security companies
are just going to be incredibly careful with deployment um and we're going to have to document
this kind of stuff and we're just going to have to be we're just going to have to slowly rebuild
that trust and demonstrate and we're going to have to like in this podcast and in our writing
now that we're a week out and people we can't be afraid of people calling us ambulance chasers anymore,
we're going to have to push back against this narrative that could happen to anybody
because that is a dangerous narrative. We have to say, nope, nope. There's actually,
software engineering is a, engineering is a practice. This was not software engineering.
This was software cowboyism, right? Engineering, like you build a bridge, is being
careful and it is planning and it is thinking about these processes. It is about being an adult.
It is about measuring things. It is about not trying to tear down your competitors and being
the fastest and being the measure. I think also as an industry, one of the things that like you
look at CrowdStrike's website and it's all about we're better at this measurement, we're faster at
this, we're faster at that. And one of the problems is driving this look at crowd strikes website and it's all about we're better at this measurement we're faster at this we're faster at that and one of the problems is
driving this is there's a bunch of kind of metrics driven stuff that's about are you faster in this
and have you done more detections here it's not about did you have the best detections or do you
have the the least false positives is that just did you have the most yeah did you have the fastest
i look that's the other thing i told you i mean but this is on the CISOs, though.
Like, if you're just going for, did these people detect it the fastest, then what you're encouraging, if you actually use those numbers, I'm not going to name any analysts out there because our analyst relations people will be mad at me.
But there are people out there that are pushing the industry to do very dangerous things, they need to look at themselves too. If all you're doing is saying, is it seven
minutes to detect or 20 minutes to detect?
Seven minutes to detect means no time for testing.
The thing is, how else are they supposed
to generate their mystical quadrangles
of cyber goodness, Alex?
One thing I will say too is that
you just touched on something about engineering.
I'm not sure if you even know this, but my
excuse me, but my, excuse me,
but my undergraduate degree is actually in engineering
and it's been very interesting for me to come in.
And I've done some, I've had some experience doing R&D
in an engineering context and then transitioned into technology later.
And it was just always mind boggling to me, you know,
for the last 25 years, what people you know it tech call engineering because i'm
like that word does not mean what you think it means like just building stuff ain't engineering
drives me spare look i think we're gonna i thought you're just a journalist which is like your whole
shtick patrick oh he's talking about no i'm gonna agree with you because i have an electrical
engineering degree and when you do an engineering degree, they talk a lot about like, hey, the PE, professional engineering and bridges.
And we had to take an entire class
on the ethics of engineering and failures
and people dying.
I did that class as well, man.
Right, and that's actually a problem
with the security industry is I love security.
I love security people.
I love Black Hat and DEF CON,
but there's a ton of cowboys, right?
There's a bunch of people
who come from just hacking backgrounds and stuff
and have never sat through that.
And I think that's part of the problem too,
is that like, it's a bit of a cowboy.
Well, they need their Ralph Wiggum voice.
I'm an engineer, you know?
And it's just, no, you're not, you know?
It's a little bit nuts.
But that said, if you were to apply
genuine engineering principles to, you know, modern technology, geez, it would slow things down. Right. So that's a that's a whole other conversation for a want is over-regulation of this industry or that industry. There's no kind of technology, policy,
discipline, or capability that exists or is resident in government that's going to
make some magical regulatory framework. So this really does get down to customers demanding more
and corporate responsibility on the software provider side. And that's,
unfortunately, just where we are right now. But I like this parallel to professionalism
across the industry and stating what is a solid safety standard? What is professional engineering?
I'm not saying we're looking to licensing because God knows I don't think we could
push everybody through that regime, but we need to approach something along those lines.
Yeah. I mean, you can, because even the, I don't know how it works in the United States,
the engineering institute here, you don't necessarily need an engineering degree to
be certified, but you would need seven years of experience and to be able to answer a few
pretty important questions. Right. So I think you could. I agree, though, that that's probably not the solution here.
In the midst of a workforce gap where, depending on which numbers you read,
it's 5 million or whatever it is.
I don't think we're getting there.
I'm just saying we have software engineering practices
to prevent this kind of stuff.
Yeah, 100%.
But Alex, any final thoughts before we wrap it up?
No, I just think Black Hat's going to be fascinating.
Yeah.
It's going to be an interesting discussion.
And from my perspective,
I'm going to be talking to Black Hat.
I'm going to be giving a talk on this.
I'm going to be talking about how we're going to,
the security industry needs to kind of use this
as a moment for self-reflection
because we're going to have to start to figure ourselves out.
This has, while it's not,
it's CrowdStrike's name out there, it has hurt all of us.
And I think that is going to be a real problem for the world.
Because, for two reasons.
One, because people are ripping and replacing.
The other thing is, CrowdStrike has given a great demonstration.
I was up at 2 a.m. that morning because I would not have been shocked if you had told me the PLA Marines were in the Taiwan Strait, right?
This is exactly what the Chinese would have done.
Vault Typhoon, this is exactly what they would have done on day one of the 100 i made i made that comparison too i don't know if
you caught that but yeah i didn't catch that but like yeah i this is crowd strike is now given the
exact roadmap for what russia or china are going to do on the day they make their move um and so
all of us now have to be extremely extremely up to date and ready to go because um we are all
we are all potential uh conduits for
for this kind of activity not being a mistake but being intentional 100 if i'm the pla i'm sending
people to interview for jobs at all of the edr companies right now and as long-term plans because
that is uh you know bang for buck an amazing investment all right guys we're going to wrap
it up there chris krebs alex damos thank you so much for joining me uh to do another one of these
i'm i i'm recording you know for obviously people wouldn't know this, but I'm actually recording
this on a Saturday.
I love to do these podcasts.
It's always fascinating to talk to you both.
Chris, thank you.
Alex, thank you.
Thanks, Patrick.
And it's good to know that we make it to Saturday.
We made it.
We made it.
Thank you.
Have a great day.