No items found.
July 22, 2024
·
0
Minutes Read

Windows hosts experiencing Blue Screens due to CrowdStrike – Update

Advisory
July 22, 2024
·
0
Minutes Read

Windows hosts experiencing Blue Screens due to CrowdStrike – Update

This is some text inside of a div block.
This is some text inside of a div block.
·
0
Minutes Read
Pierre Dumont
Find out more
table of contents
Share on
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Summary

In the early hours of Friday, July 19th, at 04:09 UTC (06:09 CEST, 21:09 MST Thursday), a faulty CrowdStrike sensor configuration update – specifically a “channel file” – led to a widespread issue causing Windows hosts with sensor version 7.11 and above to encounter the Blue Screen of Death (BSOD).

CrowdStrike addressed this issue within 90 minutes, reverting the faulty channel file by 05:27 UTC.

Key points to note:

  • CrowdStrike was not the target of a cyberattack
  • This issue did not impact Linux or Mac hosts; they remain protected
  • Windows hosts that were unaffected or have been restored are protected
  • All Kudelski Security systems have been restored and working as expected

Kudelski Security stands with CrowdStrike during these challenging times. Our global strategic partnership with CrowdStrike has been a cornerstone of Kudelski Security’s Managed Security business since 2016.

Kudelski Security has been working diligently with our clients and CrowdStrike partners to resolve this issue as swiftly as possible.

Affected Systems

The issue impacted Windows hosts that were online between 04:09 and 05:27 UTC and received the faulty channel file “C-00000291-*”

CrowdStrike has provided two methods to identify potentially impacted hosts: dedicated dashboards and an advanced event search query.

Windows hosts offline between 04:09 and 05:27 UTC are not impacted. Additionally, Linux and Mac hosts are not impacted.

“Why were so many systems affected? I am using N-2 as you recommended!”

Kudelski Security’s best practices reflected in our managed sensor update policies recommend using an N-2 version for your production hosts and an N-1 for a representative subset of hosts, i.e. your pilot hosts.

The key thing to understand is that it was not a faulty sensor update that went through testing, both on CrowdStrike and on the Kudelski Security side. The outage was caused by a channel file where there “are additional sensor instructions that provide updated settings for policies, allowlists and blocklists, detection exclusions, support for new OS patches, and more.”

Those files are pushed more often than new sensor versions and are not managed through sensor update policies.

Dashboard

CrowdStrike published multiple dashboards under Next-Gen SIEM > Log Management > Dashboards.

We recommend using the “hosts_possibly_impacted_by_windows_crashes_granular_status” dashboard as follows:

To use the dashboard:

  1. Open the dashboard and select your CID, or use * to select all CIDs if you have multiple.
  2. Select * for all aids (agent ids) or split them in groups if you have many hosts on aidsubset.
  3. Select the “CHECK” status if desired, note the following values definition
    1. DOWN: a high confidence assessment where remediation is likely to be required
      1. Endpoint has channel file version of 0 and has not checked-in after impact window.
      1. Endpoint received channel file during impacted window, but endpoint has NOT checked-in after impact window.
    1. VERIFY: a low to medium confidence assessment
      1. Endpoint received channel file during impact window and has checked-in after impact window.
    1. RECOVERY_LIKELY: a medium confidence assessment
      1. Endpoint received channel file during impact window and has checked-in after impact window with a total reported uptime of 5-10 hours.
    1. RECOVERY_VERY_LIKELY: a medium to high confidence assessment
      1. Endpoint received channel file during impact window and has checked-in after impact window a total reported uptime of 10-20 hours.
    1. UNKNOWN: there is not enough available data to form an assessment
      1. Cannot determine endpoint status based on available telemetry.

In the “Impacted sensors by aid subset” widget, click on the menu in the top-right corner to find the option to export the results to file.

Here are the links for each cloud:

More information can be found on the dedicated page here: https://supportportal.crowdstrike.com/s/article/ka16T000001tm1eQAA

Advanced Event Search

In addition to the dashboard, CrowdStrike has provided the queries used in the dashboards above to identify the potentially impacted hosts. These queries can be found at the end of the dashboard page:  https://supportportal.crowdstrike.com/s/article/ka16T000001tm1eQAA .

Here are the links to the Advanced event search page:

Remediations

CrowdStrike and cloud vendors have provided multiple official remediation options depending on the host type:

Individual hosts

Manually

We have seen reports that rebooting the hosts multiple times might allow the reverted channel file to be downloaded. It is recommended to connect the host to a wired network instead of via WiFi and try rebooting multiple times.

If the host continues to crash, follow these steps:

  1. Boot into Safe Mode or Windows Recovery Environment: https://support.microsoft.com/en-us/windows/start-your-pc-in-safe-mode-in-windows-92c27cff-db89-8644-1ce4-b3e5e56fe234,
  2. Navigate to the %WINDIR%\System32\drivers\CrowdStrike directory.
  3. Delete only the file matching “C-0000029*.sys
  4. Boot the host normally.

For hosts encrypted with BitLocker, a recovery key might be required. CrowdStrike provides multiple methods to retrieve the BitLocker keys on https://www.crowdstrike.com/blog/statement-on-falcon-content-update-for-windows-hosts/

Via USB tool

CrowdStrike and Microsoft have worked together to release a recovery tool available under https://techcommunity.microsoft.com/t5/intune-customer-success/new-recovery-tool-to-help-with-crowdstrike-issue-impacting/ba-p/4196959 to create a bootable USB drive to perform the remediation.

Automatically

Finally, CrowdStrike just released today (Monday 22nd) a way to automatically remediate hosts.

This process is opt-in: you need to contact CrowdStrike support or provide the CFC authorization from one of your Falcon Administrators.

Then, rebooting the impacted hosts multiple times is required to allow the sensor the chance to download the latest instructions (quarantine the faulty channel file) before it is applied.

It is recommended to connect the host to a wired network.

UPDATE: Tuesday 23rd, this is now applied for all clients and opt-out instead. No need to open cases to the CrowdStrike support or to the CFC anymore. Therefore, only perform the manual or via USB remediation if the host does not recover.

Recommendations

Due to the scale of the outage, it is likely that threat actors will target CrowdStrike clients. CrowdStrike intelligence has already reported the registration of domain names that could be used to impersonate their website.

The CFC is actively monitoring the situation and will inform clients of further developments if necessary.

References

Related Post