The Journal. - The Glitch That Crashed Millions of Computers
Episode Date: July 23, 2024Last Friday, 8.5 million computers around the world stopped working. All kinds of businesses were impacted, from airlines to banks to hospitals. The cause was a routine update sent out by a software c...ompany called CrowdStrike. WSJ’s Robert McMillan explains how the meltdown happened and why Microsoft’s software was especially vulnerable. Further Reading: - Blue Screens Everywhere Are Latest Tech Woe for Microsoft - CrowdStrike Made Its Name Fighting Technology Problems. Now It Has Caused One. Further Listening: - The Computer Glitch That Caused Nearly 1,000 Convictions - Hacking the Hackers Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
Last Friday, 8.5 million computers around the world stopped working.
It was a massive outage that stalled all kinds of industries.
Many of us woke up this morning and discovered we lost access to mobile banking,
maybe even the use of debit and credit cards.
Travelers here at LAX stuck dealing with flight cancellations
and delays. Non-urgent surgeries postponed at hospitals across the country. Passports
couldn't be verified, so real IDs couldn't be processed. It affected banks. It affected UPS.
It affected Starbucks. It affected Tesla. It affected the MTA. I talked to our colleague Bob McMillan yesterday afternoon.
There were schools that went out, hospitals, News Corp.
Yes, our parent company.
I just talked to our IT guy. He's still, this is Monday, he's still trying to fix stuff.
Wow.
The crash happened because a little-known company made a mistake during a routine software update.
It was an update that went horribly, horribly wrong.
A lot of people in corporate environments run this product called CrowdStrike Falcon.
And CrowdStrike Falcon keeps your computer safe, but it gets these updates about what the bad stuff is all the time.
But for some reason, this update that went out on Friday morning contained data that caused
the Falcon software to blow up inside the brains of Windows computers. And once that happened,
those computers became very, very difficult to fix.
and those computers became very, very difficult to fix.
Welcome to The Journal, our show about money, business, and power.
I'm Jessica Mendoza. It's Tuesday, July 23rd.
Coming up on the show, how one software update caused a global IT meltdown. Well, let's say I'm at a food truck I've never tried before. Am I going to go all in on the loaded taco?
No, sir.
I'm keeping it simple.
Starting small.
That's trading on Kraken.
Pick from over 190 assets and start with the 10 bucks in your pocket.
Easy.
Go to kraken.com and see what crypto can be.
Not investment advice.
Crypto trading involves risk of loss.
See kraken.com slash legal slash ca dash pru dash disclaimer for info on Kraken's undertaking to register in Canada. So CrowdStrike sent out the faulty update on Friday.
When did people start to notice the problem?
Well, right away.
Because all these computer systems just stopped working.
People watching Sky News noticed it because it suddenly went off the air. because all these computer systems just stopped working.
People watching Sky News noticed it because it suddenly went off the air.
People in airports, the baggage handling system wasn't working.
So it was immediate.
I was actually flying back from Milwaukee from the Republican National Convention, and it was chaos at the airport because there were all these blue screens everywhere
that were just showing kind of like recovery or error.
Yeah, yeah.
What was that blue screen exactly?
That's called the blue screen of death.
The blue screen of death is a problem specific to Microsoft computers.
It shows a blue screen with an error message,
and in some cases, a sad-faced emoticon.
But blue screen of death means the computer is not working.
It's not going to work until you do something to make it start up again.
And usually what happens when you get a blue screen of death is you just reboot, and everything kind of sorts itself out.
And what happens when you get a blue screen of death?
Well, they call it bricking, right?
When your computer becomes as useful as a brick.
And so that happened to all kinds of computers all around the world.
And what made this so tricky is that in order to fix the problem, you couldn't just reboot it.
You know, often with the blue screen of death, you just start all over and everything works fine.
Right.
But in this case, you had
to physically go to the machine. You had to start it up in a certain way. Then you had to surgically
go in and remove a file. So we're talking, I don't know, 20 minutes like every computer,
but also you have to physically get to all these computers. So all these people, even today,
are showing up at their corporate headquarters
saying, like, my computer hasn't worked since Friday.
Could you get it going again for me, please?
On Friday, CrowdStrike's chief executive said
that the company was working to restore operations
for its customers.
So tell us about the company
at the center of this crash, CrowdStrike.
I had never heard of it until this outage happened.
What is it? What does it do?
Well, CrowdStrike, they were founded in 2011.
So what does that make them? 13 years old.
They're a very fast-growing company.
They're very well-respected.
Also, I think it should be pointed out that CrowdStrike is like an incredibly flashy cybersecurity company. They're very well respected. Also, I think it should be pointed out that CrowdStrike
is like an incredibly flashy cybersecurity company. George Kurtz, the CEO of the company,
races sports cars. CrowdStrike once sent me a calendar that for every month it had like a
cartoon picture of a hacking group. Wow. And they give their hacking groups colorful names like Fancy
Bear and Cozy Bear, and they have Scattered Spider. So they're sort of like a cybersecurity
group with a little pizzazz. Yeah, I would say so, for sure. CrowdStrike was founded at a time
when hackers were getting better at getting around traditional antivirus software. The company seemed
to offer an effective alternative.
CrowdStrike came up and they said, we're going to really pay attention to what the hackers are
doing. We're going to really focus on understanding the hackers and we're going to create more
behavior-based software. We're going to create a new kind of software that's better than traditional
antivirus. And their software was better than traditional antivirus, and they were
very, very successful. So they went from like a small startup in 2011 to, they're about a 8,000
person, $73 billion market cap type company right now that's publicly traded. And they were
extremely popular in the Fortune 500. They really focused on the big companies and doing
sales to satisfy these very large corporate clients. So would you say CrowdStrike came to
be known as sort of the premier software for protecting? Yeah, they're considered one of the
best cybersecurity companies to go to if you're a large corporation.
Yeah, big enough to put up an ad during the Super Bowl, right?
Protecting your business from cyber attacks can be unrelenting.
Today's adversaries move fast.
CrowdStrike moves fast.
I can't think of another cybersecurity company that's done a Super Bowl ad.
There might be one, but, and their ads were pretty good too, I gotta say.
On Friday, almost as soon as CrowdStrike got wind of the outage, it tried to fix the bug.
And the company was able to, just over an hour after the update went out.
But dealing with the aftermath was another thing.
just over an hour after the update went out.
But dealing with the aftermath was another thing.
They clearly stayed up all night.
Because, yeah, we saw George Kurtz, you know, on the Today Show Friday morning.
He looked tired.
And I want to start with saying we're deeply sorry for the impact that we've caused to customers,
to travelers, to anyone affected by this, including our company.
So they were very quick to say, like, look, at we weren't hacked this isn't some kind of side i mean sure they wish that that they could
blame somebody else but they were very clear that this wasn't somebody taking over our product they
100 took responsibility for it they've been a little unclear on the precise nature of the problem. And so even now, we don't have like the 100%
crystal clear, precise understanding of how this flaw got introduced, when it got introduced,
who introduced it. That flaw affected a lot of computers,
but only ones running Microsoft Windows. Why is next.
CrowdStrike makes security software. Any company can buy the program and keep computers safe from potential hackers. But the reason last weekend's faulty update impacted only Microsoft computers has to do with something called the kernel.
The kernel is like the very, very center of it.
It's like the first thing you boot up in it.
It's kind of like command central.
Like think of, you know, just the brain is really the best way to think of it.
But it's the thing at the very center of all of it that starts up at the beginning, that has control over everything.
Apple and Android operating systems restrict software programs' access to a computer's kernel.
But Microsoft doesn't.
A holdover from the way its programs were originally designed. In the olden times, when Windows was coming up,
it was really common to just like allow the software access to the kernel. It could be
much more powerful then. So if you had security software, it could do a really much better job
of finding bad stuff. It was just like, once you're in the kernel, you're in like this super powerful place,
so you can do anything you want to do. And so it became really, it's a great place for security
software, but it's also a really dangerous place if the security software goes wrong.
This is why the faulty update was so bad for Microsoft computers. And it's why other
computers like Macs weren't affected. This put Microsoft in a position where some company that
they have no control over can introduce an issue that can crash eight and a half million of their users,
and they can't do anything about it. And so from Microsoft's perspective,
this wasn't an oversight or anything. This was just the way that Windows has been designed.
It allows these different kinds of software to access the kernel.
I mean, what happened this week was 100% not Microsoft's fault, but they have designed their product in a way that allowed this to happen.
What has Microsoft said about the issue?
Well, they've tried to help their customers with it, right?
So they've published some guidance about how to fix things.
They're in a tough position because they didn't cause this, but they're sort of, you know,
they're the operating system vendor that was affected by it.
So they've tried to be as helpful with their customers as they can be,
but there's only so much they can do.
In a blog post, Microsoft said that the outage affected less than 1% of its global footprint.
A Microsoft spokesman said the company can't legally wall off
its operating system in the
same way Apple does, because of an understanding it reached with the European Commission in 2009
following a complaint. So who is at fault? Is it Microsoft or is it CrowdStrike?
It's CrowdStrike. CrowdStrike really bungled this. But I think that if Microsoft had really
been pushing the envelope on the design of their
operating systems and really prioritized security, they could have made some changes to the kernel
that would make this less likely to happen. So the whole question about why they haven't done that,
it's a tough one. But, you know, clearly Apple did make this move a few years ago, and Microsoft has not.
As of Tuesday, some of the problems caused by this outage haven't been resolved yet.
Some companies are still fixing computers,
and travelers are still dealing with the aftermath of thousands of canceled flights.
And for CrowdStrike, its stock has gone down by 25%. Long-term, in my experience, these kind of problems, you can bounce back from them,
but it's really damaged what was a pristine reputation.
And they're going to have to make sure that something like this doesn't happen again.
There is a point at which you can really seriously erode your
customer's trust if you're making them go through this on a frequent basis. It's also like kind of,
it's sort of incredible that after all these years with all this experience and software
reliability and building systems, like you can still have like eight and a half million
computers go out like that.
What does this crash, you know, that brought down so many industries, it left so many travelers
stranded, it turned computers into bricks. What does that tell you about the state of our technology?
Well, I've been writing about computers and computer security for a long time now.
And I'm always watching to see when an outage or a computer issue transcends inconvenience, right?
So this was like incredibly inconvenient and financially costly, but no lives were lost.
So, you know, the kind of pessimistic cliche would say this shows that we're more dependent than ever on technology and that when it goes down, it can have wide ranging and very annoying and costly effects.
But it's also showing that, you know, we haven't hit that point where it's as catastrophic as like a hurricane, you know.
And will we get there?
Like, I think we probably will.
That's all for today, Tuesday, July 23rd.
The Journal is a co-production of Spotify and The Wall Street Journal.
Additional reporting in this episode by Tom Ditton.
Thanks for listening. See you tomorrow.