Lecciones del apagón de CrowdStrike

Lessons from the CrowdStrike outage

Celia Catalán

As is known, a few weeks ago there was a computer blackout that affected a large part of the world (you can see an article published by Flu Project going deeper into this in the following link). This problem, which affected almost 9 million Windows devices (1% of machines worldwide), was due to a faulty update of CrowdStrike Falcon, a famous EDR sensor responsible for blocking attacks against systems and that captures and records all activity in real time, in order to quickly detect threats.

First lesson: an avoidable problem

But, the question is, could all of this have been avoided? To do this we must understand the root of the problem.

According to what the company itself has said, the problem was due to the presence of a corrupt file in the CrowdStrike Falcon update, which caused any system that obtained it to collapse as it was not able to read it. Since the problem affected the kernel level, the systems were left in a loop when booting, causing the well-known “blue screen” (BSOD).

How is it possible that something like this happens in the middle of 2024? Aren't checks done before launching an update of this type? Normally yes, but it doesn't always have to be that way. In this case, CrowdStrike Falcon had Microsoft certification, but in updates not all the pieces that make it up need to be certified.

And here's the problem, these components take advantage of Falcon certification to sneak into the core of the operating system and any error in it can be fatal. Therefore, it is very important to carefully control the providers that can access this part of the OS. 

All of this makes us wonder if the update was so high risk, why didn't CrowdStrike test the update before releasing it globally? Well, as some experts explain, “CrowdStrike had to take a risk because they discovered several critical vulnerabilities in the system, and these updates were intended to eliminate them before any attacker could take advantage of them”.

After all this and returning to the initial question, if this whole problem could have been avoided, well, most likely, yes.

Second lesson: honesty and humility

After the incident, it was important to find someone to take responsibility for what happened, and we must speak from the point of view of Microsoft and CrowdStrike.

On Microsoft's part, the question is recurring, why does it allow third-party software to go so far, when this is so risky for its systems?

In statements to The Wall Street Journal, a spokesperson for the technology giant accused the European Commission of forcing it to do so, following an agreement that both parties signed in 2009. This agreement was intended to promote free competition. Microsoft had acquired several antiviruses to provide cybersecurity services with direct access to the Windows kernel, which gave it a clear advantage over other competitors. That is why it was agreed that the company should allow others to learn about the operating system as well as Microsoft's own solutions.

A representative of the commission responded that Microsoft must adapt its security infrastructure in accordance with the signed agreement, since the problem was not limited to EU territories. He also claimed that Microsoft had never shown or raised a concern about this security aspect either before or after the incident.

Following Microsoft's statements, many experts have given their opinion and assure that, despite the agreement, there is no good reason why the technology company could not comply with it with "adequate controls." Making it clear that the agreement does not require that all antiviruses have to access the kernel, but simply gives them the possibility of doing so. In fact, not all of them do so, as they look for other formulas and their own innovation to protect the system.

On CrowdStrike's part, after the incident it offered its partners an Uber Eats gift card worth approximately 9 euros as an apology. As if that were not enough, after the flood of people who wanted to redeem it, the application recognized it as fraud and canceled the coupons.

As usual, criticism has been abundant. A compensation of 9 euros is not even remotely proportional to the damage caused, becoming insulting to entities like ADIF, one of the hardest hit by the situation.

It is essential to be honest and have a lot of humility when publicly facing an event of this caliber, and public relations departments must be up to the task.

Third lesson: you should not depend on a single solution

Large software that centralizes resources is increasingly used, which generates a great dependence on technology. This is efficient, but also very dangerous, as if a critical component fails, it can cause a domino effect. And the European regulation DORA (Digital Operational Resilience Act) places a lot of emphasis on this, which dedicates a section to proposing solutions for what they call “concentration risk”, inviting companies not to depend on only 3 or 4 suppliers and distribute their needs of services and products among more organizations, thus avoiding the generation of dependencies that make them lose control.

The incident that occurred only highlights this fact and that the current effort to connect everything (critical infrastructures, companies, public administrations, all types of gadgets, household appliances, etc.) generates more risk of cyber attacks, since we depend on Internet for almost everything. In fact, not only Spanish companies were seriously affected, but it also affected about 125 Fortune 500 companies. What would have happened if it had affected MacOS and Linux? What if it had affected 100% of the world's Windows machines? The situation would have gone from being chaotic to being apocalyptic.

Fourth lesson: cybersecurity as a priority

Today it is more important than ever to have cybersecurity among the highest priorities, whether it is a company, a public administration, an institution or a user. The number of cyberattacks has increased to levels never seen before and we must rise to the occasion.

The computer blackout that affected Windows was historic and affected many global sectors and, although it is not the first major blackout that has occurred, many experts continue to be surprised that companies in critical sectors such as banking, airlines and/or the media communication, did not have or could not address a better response with their contingency and recovery plans for incidents. Furthermore, as a result of the problem, some malicious actors are taking advantage of it to carry out phishing campaigns with a malicious file that supposedly solves the problem. The bad guy never wastes a good opportunity.

For all this and for similar previous events, cybersecurity must be a priority in all areas if we really do not want to regret losses, whether monetary or sensitive data of companies or users.

Javier Muñoz , Cybersecurity Analyst at Zerolynx

return to blog

Leave a comment

Please note that comments must be approved before they are published.