The recent CrowdStrike outage was a stark reminder that not even tech titans are immune to disruptions — and that these disruptions have massive, global ripple effects across industries. Flights were canceled, surgeries were delayed, emergency systems were down as well as an estimated 8.5 million Windows devices. The incident led to a 21% drop in CrowdStrike’s shares, equating to a $16 billion loss in valuation, and a 0.71% decrease in Microsoft's share price, which caused a $23 billion loss in market value. Economic damages from this event are estimated to reach tens of billions of dollars.
In an age where customer assurance matters more than ever, within hours, the trust that CrowdStrike had painstakingly built over the years was threatened — if not lost — and thousands of their team members worked overtime to restore systems. It’s not a situation I’d wish on any vendor — or any company reliant on a vendor — but there are ways to avoid and mitigate the impacts of these disruptions, and lessons we can all learn from it.
So, now that we’ve had this wake-up call, how can businesses protect themselves from the next big outage?
Due Diligence Before Implementation
Let’s take it from the top. Before signing with a new tech vendor, investigate how they collect and store your data, stage automatic software updates, and mitigate risks. Approaching the security review process with disruptions in mind may require your security and Governance Risk and Compliance (GRC) teams to incorporate new questions and requirements into security reviews. Still, considering the stakes of a global outage, it’s worth the extra effort.
For example, CrowdStrike’s standard software development processes, used to develop and test the feature that caused the outage, did not catch the issue before it was pushed to the world. Scrupulous security reviews might have determined that these standard processes were inadequate for every feature.
Before you partner with any tech vendor, you must dig deep into their operational processes, especially concerning data management, software updates, and risk mitigation strategies. In my experience, one of the most effective ways to evaluate a vendor’s resilience is through targeted questionnaires that go beyond the surface level. Here are some critical questions to include:
- How do you ensure the integrity and security of your software updates?
- Can you provide details on your incident response plan during an outage or breach?
- What are the redundancies in place for your data storage solutions?
- How do you conduct security testing, and how often?
- What specific measures do you take to ensure business continuity for your clients?
These questions will help gauge whether a vendor has the necessary safeguards to prevent or quickly address disruptions. A robust response strongly indicates a vendor’s commitment to minimizing risk.
- Incomplete or Evasive Answers: If a vendor provides incomplete responses, skips questions, or offers vague answers without specific details, it can indicate either a lack of understanding of their own security measures or an attempt to hide weaknesses.
- Overreliance on Certifications Without Details: While certifications like ISO 27001, SOC 2, or GDPR compliance are positive, relying solely on them without providing detailed explanations of security practices is a red flag. Vendors should be able to explain how these standards are implemented and how they apply to their operations.
- Outdated or Unsupported Software: Mention of using outdated software, operating systems, or technologies that are no longer supported can be a significant security risk. Vendors should use current, supported versions of all software and technology platforms.
- Lack of Multi-Factor Authentication (MFA): If a vendor does not use multi-factor authentication (MFA) for accessing sensitive systems or data, this suggests weak access controls and increases the risk of unauthorized access.
- No Incident Response Plan: The absence of a formal incident response plan, or the inability to articulate how they handle security incidents, is a serious concern. A robust incident response plan is crucial for minimizing damage in a security breach.
- Inadequate Data Encryption Practices: Vendors who do not encrypt data at rest and in transit or who only provide minimal encryption (e.g., relying solely on SSL/TLS without other encryption) pose a significant security risk. Data encryption is essential for protecting sensitive information from unauthorized access.
- Lack of Regular Security Audits and Penetration Testing: Vendors who do not conduct regular security audits, vulnerability assessments, or penetration testing may have undetected security flaws. Regular testing is necessary to identify and address vulnerabilities before they can be exploited.
- Poorly Defined Access Control Policies: If a vendor's access control policies are not well-defined or if they allow excessive permissions without proper role-based access, it indicates weak internal controls. Strong access control is essential to prevent unauthorized access to sensitive data.
- Weak or No Backup and Disaster Recovery Plans: Vendors should have robust backup and disaster recovery plans to ensure data integrity and availability during a disaster. If they do not, or if their plans are poorly defined, it could lead to prolonged downtime or data loss.
- Inconsistent Security Policies Across Locations: If a vendor operates in multiple locations but does not maintain consistent security policies across all sites, this inconsistency can lead to security gaps. Uniform security policies are crucial for maintaining a strong overall security posture.
Consistent Risk Assessment and Management
The CrowdStrike outage exposed vulnerabilities in many organizations’ business continuity plans. Ensure your organization has a proactive and comprehensive risk assessment and management approach so that impacts are mitigated and stakeholders are informed when issues arise. Lean on AI and cloud computing tools for early issue detection and automated real-time responses for operational continuity, with employees educated on the next steps and more high-touch processes.
Educate and Align Your Team on Protocols
Employee education goes hand-in-hand with regular risk assessment and management practices. Comprehensive training and skill development, making sure employees understand related protocols, such as GRC training or threat detection, and equipping teams with skills for effective incident response are paramount. Your governance and compliance framework should clearly define roles, responsibilities, and accountability for digital resilience. These policies should be communicated and enforced across the organization. Consider re-training employees every quarter, as pertinent regulatory frameworks shift, or when you introduce new tech to your stack.
The stakes are high for modern enterprises. While there will always be a risk of outages or disruptions, establishing guidelines, protocols, and employee education that permeate your organization can reduce negative impacts and maintain assurance with your customer base. Some of the tremors from last month’s outage will be remembered and even felt for months to come, so devoting hours to a stronger digital resilience strategy now will pay off in dividends.