This blog posting represents the views of the author, David Fosberry. Those opinions may change over time. They do not constitute an expert legal or financial opinion.

If you have comments on this blog posting, please email me .

The Opinion Blog is organised by threads, so each post is identified by a thread number ("Major" index) and a post number ("Minor" index). If you want to view the index of blogs, click here to download it as an Excel spreadsheet.

Click here to see the whole Opinion Blog.

To view, save, share or refer to a particular blog post, use the link in that post (below/right, where it says "Show only this post").

CrowdStrike Incompetence Causes The Largest Cyber-Incident In History!

Posted on 27th July 2024

Show only this post
Show all posts in this thread (Software).

By now, most people must be aware of the huge incident of computer downtime that started on the 18th July 2024. The lives of very many people were affected by the massive downtime of Windows computers around the world. Systems used by airlines, banks, health services, payment systems, hotel chains and many other industries were impacted. Microsoft estimates that 8.5 million systems worldwide were brought down by the issue, according to this report on the BBC. The scale of the incident dwarfs the impacts of any malware attack, and is the largest single cyber-incident ever.

Microsoft are clearly very embarrassed by the incident, and were quick to explain that they were not responsible for the system outages, which is only partially true. Then MS tried to blame the EU as described here on Euronews (which the EU have since denied). Also, unusually, MS rolled out software to assist in system recovery (in far less time than it normally takes for them to address problems).

Most computer users were previously unaware of CrowdStrike. The problem was with CrowdStrike's Falcon Sensor software, which is a type of cybersecurity software (similar to anti-virus software); more strictly, the fault was with a data file used by Falcon Sensor, rather than the software itself. This video on YouTube gives a very accessible summary of how Falcon Sensor works, and why a faulty data file could cause such enormous problems.

In addition to crashing systems (leading to the "blue screen of death"), it appears that system backups were affected on some systems, according to this report on Reuters.

CrowdStrike's Falcon Sensor is used by large corporations to protect their systems: Windows, Linux, and to a lesser extent Mac systems, so the bug did not affect private computers. In this case the issue only impacted Windows server and client (desktop and laptop) systems. It should be noted, however, that earlier this year a similar problem started to affect Debian Linux systems, as described in this report on Techspot and this article on Tom's Hardware. Also, in 2010 a global Windows PC meltdown (described here on BGR) was sparked by a bungled McAfee (anti-virus software) update; at the time, the CTO of McAfee was George Kurtz, the man who's now the CEO of CrowdStrike! All this strongly suggests that George Kurtz has a very cavalier attitude to software quality, and is the real source of the problems.

As described in the video above, Falcon Sensor works via a driver which runs at the operating system kernel level. This driver is extensively tested and certified by Microsoft. Unfortunately, there is no such validation by Microsoft of the data (similar to the malware signature file used by your anti-virus software) which caused the outages. This kind of data file, which controls the behaviour of Falcon Sensor, is referred to in the software industry as reference data. I have extensive experience of software testing, and I know from experience that, for data-driven software, comprehensive testing of such reference data is even more important than testing of the software that uses it. Clearly, the testing of the reference data was, on this occasion, not adequate; it turns out that the file was full of zeros, which should have been easy enough to detect!

The process of recovery from the bug is complex and time consuming (requiring multiple reboots), requires administrator privileges and rolls back the protections provided by Falcon Sensor to the state before the update. Even now, more than a week after it all began, system recovery has not been completed for 100% of affected systems.

The impact on people's lives is enormous, and financial losses are huge:

  • Initial estimates put the damage for the Australian economy at more than $1 billion, according to this report on ABC.
  • This report on CNN claims to know the cost of the outages, but clearly we do not yet really know.
  • I would not be at all surprised if the final bill exceeds $100 billion.

Companies are already talking about suing CrowdStrike to recover their losses. Good luck with that: the company is not rich enough to pay the losses, and will probably be driven into bankruptcy. If you own shares in CrowdStrike, now would be the time to divest. I would not be surprised if some corporations also tried to sue Microsoft, for allowing data driven software to run at the kernel level without them validating the associated reference data.

Finally, just to rub salt in the wound, this report on The Guardian suggests that this kind of issue is likely to happen again. Maybe war (nuclear or conventional), climate change, global pandemics or the AI apocalypse are not what will wipe out civilisation, but rather faulty software.