Post

2024 Cyber Blackout! Who pressed the enter key!?

This week, a significant issue arose involving CrowdStrike and Microsoft computers. Here’s a brief summary of what happened, as well as my Hot take on the matter:

CrowdStrike’s Falcon Sensor Update

CrowdStrike, a cybersecurity company, released an update for its Falcon Sensor software, which is designed to protect computer systems from cyber attacks. Unfortunately, this update contained a defect that caused Windows computers with the software installed to crash, resulting in the infamous “Blue Screen of Death”.

[Update: I’d mention that Dave Plummer has a great overview of the technical side to this outage over on his YouTube Channel. I recommend viewing this if you are more inclinded to here the nitty gritty details of what happened with the update.]

Global Impact

The issue started in Australia and quickly spread worldwide, affecting various sectors including airlines, banks, broadcasters, and healthcare systems. Many businesses and organizations experienced significant disruptions as their Windows-based systems crashed.

Affected Industries

Airlines and Airports: Airlines and airports experienced significant disruptions, leading to delays and cancellations. Banks and Financial Services: Banks and financial institutions faced operational challenges, affecting transactions and services. Healthcare: Hospitals and healthcare systems were impacted, causing delays in services and treatments. Media and Broadcasting: Media companies and broadcasters experienced outages, affecting their operations. Retail: Retail businesses faced disruptions in their operations, affecting sales and customer service. Government Services: Various government services were also affected, leading to delays and disruptions.

Resolution Efforts

CrowdStrike identified the issue and deployed a fix. However, the recovery process involves manual steps, such as booting affected systems into Safe Mode and deleting specific files. Microsoft is also working closely with CrowdStrike to provide technical guidance and support to affected customers.

This incident has been described as one of the largest IT outages in history, highlighting the critical importance of robust cybersecurity measures and the potential impact of software updates.

Hot take

CrowdStrike is a cybersecurity company that was formed 5yrs ago, because they thought that security is fundamentally broken. They provide added software to ‘fix’ issues in Microsoft Defender, and have a massive number of HUGE clients running their software on Windows Servers. Yet… they proved themselves right. Security IS fundamentally broken… partly because they did it. I mean, with an name like “The Crowd is going to Strike you like a Distributed Denial of Service Attack”… sure… to the customers that relied on them for security, it was 20/20 hindsight that this was inevitable.

That said, mistakes happen, and people are usually at the root of those mistakes. This has got to be a huge disappointment for the CrowdStrike team, and likely could lead to major losses for the company. Having worked in the realm of security at Microsoft (in FFO) I know how hard it is to manage and maintain massive cloud security systems world wide. The layers that it takes to make a mistake on this level is monumental, and I’m sure there will be a detailed RCA to determine the steps that it took to get to this result. This may even result in a congressional hearing when so many systems of industry, both medical and commercial, were effected.

Someone said to me,

It is too bad that they keep saying that it is a Microsoft outage. Since it really is a different company.

My reply was a pithy,

This is nothing new… we get blamed for [large third party software suite]’s software not working… [Fruit flavored music app] not working on Windows (what?) as well as anytime your internet goes down.

However, I know we also have had (even in the same week) outages that caused much of the same sort of issues on a smaller scale, and it is always hard to come out of this unscathed.

It is important though to acknowledge that I hope that all the effected persons and organizations are able to recover without undue loss. Included here is the many people left stranded in travel, unable to make vital financial transactions, government (well that is always slow) This goes double for those that are effected by medical systems down and unable to get the care they require. I personally can vouch for how difficult this may be.

I actually do also wish the CloudStrike team the best, and hope they can weather the storm and find a way to recover from this massive loss.

This post is licensed under CC BY 4.0 by the author.