Understanding Today’s Computer Outages from CrowdStrike Update

News security incident
Published:

We choose to run an ad-free site, so this post may contain affiliate links. If you wish to support us and use these links to buy something, we may earn a commission. Learn more here.

Today, many computer users experienced unexpected outages due to a recent update deployed by CrowdStrike for Windows computers. This update, intended to enhance security, led to widespread crashes and the notorious Blue Screen of Death (BSoD). The impact was far-reaching, affecting everyone from home users to critical industries like banks and airports, and even Azure instances, which are typically well-protected against such issues. This raises the question of how such a significant problem could come from a security update. So let’s take a look at what happened.

Who is Crowdstrike?

Let’s start by looking at who CrowdStrike is. CrowdStrike is a computer security company based in Austin, Texas. They have many different software tools that they provide to help keep your computers and your network safe: endpoint security, cloud security, SIEM’s for monitoring all of your security, and more. They offer a full stack of computer security tooling for everyone from a small company to large enterprises. And as one of the larger companies in this space with roughly 24% of the market ( source ). This means that there are a lot of computers using their security software.

Cause of the outage

The recent computer outages were caused by an update from CrowdStrike for their security software for Windows machines. The update contained an error that was not found during the testing phase. This error caused systems to crash, and get a Blue Screen of Death (BSoD) on their computers. The machines then go into a boot loop, and are unable to start up successfully.

This widespread problem required immediate attention and a fix from CrowdStrike to resolve the issues caused by the update. CrowdStrike released a workaround, and ended up rolling back the update, which allowed systems to come back online.

How did this happen?

With any software, there will be bugs and glitches and things that just don’t work the way they should. Software is written by humans, and humans are fallible. Even software that is written by computers like Copilot and ChatGPT can be buggy, with the generative AI hallucinating code or solutions that don’t exist.

So, all software is fallible and buggy. Even NASA software is fallible and buggy.

The best that programmers can do is extensively test their software to make sure that it performs under all conditions. However, again – humans need to write these tests, and the tests can be fallible. Or worse yet, something may happen that the person who was writing the test never thought of. In this scenario, there won’t be a test for what happens.

So, if someone makes a mistake writing tests, or something happens that wasn’t covered by tests because no one thought about it, then bugs slip into production and we get what happened today.

What can we do about it?

Something that could have been done to prevent this issue is to have an “incremental deployment” where only a subset of machines are updated to the newest patch. This allows new versions of software to be tested in production and confirmed to be working as intended before the whole set of computers and systems get an update.

Incremental updates would have prevented the widespread outages that we saw today. The bug still would have been there, and still would have been deployed. However, instead of deploying to tens of thousands of computers ( if not more ), it would have only deployed to 1% or 2% of those computers. This is something that either CrowdStrike or a business’s IT department would have needed to handle. As soon as those 1% to 2% of computers started crashing, CrowdStrike would have known there was an issue with the release and could have stopped the deployment.

In Summary

In summary, the recent computer outages were linked to an update from CrowdStrike. This update caused compatibility issues with some systems, leading to widespread disruptions. As always, staying informed about software updates and potential impacts can help manage and prevent such issues in the future.

Latest News