How to Build Resiliency in Times of “Tail Risk” Events

A flexible threat-informed security strategy is key to maintaining business operations prior to threat events

Sept. 9, 2022

13 min read

In this increasingly volatile security environment, what’s needed is an integrated approach to manage “tail risk” contingencies.

The concept of an integrated security program is not new, but institutionalized security structures and genuine differences in areas of expertise have limited implementation on a widespread basis. Events of the last two years are changing this. The physical disruptions of the global COVID-19 pandemic not only highlighted the fragility of global supply chains but also created new opportunities for cyber threat actors: third-party logistics (3PL) providers and shippers themselves repeatedly became targets, creating highly disruptive downstream impacts. For its part, the Russia-Ukraine conflict was expected by many to be a localized event. Instead, we are dealing with an all-out war by Russia on Ukraine, with concerns about whether a similar China-Taiwan contingency could emerge.

Mapping Out the Threat Landscape

In this increasingly volatile security environment, what’s needed is an integrated approach to manage “tail risk” contingencies, or those risks of low probability but severe consequence. Three key elements are required. First, we need a business-driven approach to applying graduated levels of security flexibly based on severe but plausible risk scenarios. Second, threat-informed validation of security tools and procedures – physical and cyber – is key to their successful use in an incident. Third, whole-of-company preparedness for tail risk contingencies can help minimize disruption.

Why are these steps so important? Let’s first lay the groundwork in terms of key trends and changing threats:

Russia’s invasion of Ukraine has highlighted how geopolitical dynamics can change overnight. Few expected a full-scale and protracted invasion of Ukraine, and many organizations struggled to figure out how to rapidly suspend business operations and evacuate employees. Were the current Russia-Ukraine conflict to escalate further, a number of severe-but-plausible scenarios exist that could result in significantly increased operational disruption on a global basis. An example of what could happen is illustrated by a multi-pronged ransomware attack against the Government of Costa Rica by the Conti threat actor group earlier this year, targeting more than 20 government agencies as well as the country’s healthcare sector. While details around this incident, including attribution, are still developing (for example, the Conti negotiations website was reportedly down during this incident), the attack illustrates the potential scale of national-level ransomware attacks.
Ransomware attacks have increasingly impacted physical supply chains. Some of the most significant business interruption events over the last year have resulted from successful ransomware attacks on companies’ suppliers, resulting in the unavailability of related services. In December 2021, cloud-based human capital management solution provider Kronos disclosed that it had suffered a ransomware incident affecting the Kronos Private Cloud, and this incident caused widespread disruption in customers’ ability to process payrolls. Earlier this year, global logistics provider Expeditors International announced that it had shut down many of its core operating systems for an extended period after a ransomware attack. Expeditors is the sixth largest freight forwarding company in the world, as well as one of the largest customs brokers in the United States. This attack caused extensive disruption across the logistics sector. Last year’s Colonial Pipeline ransomware attack led it to temporarily halt operational technology (OT) functions, as a precaution, across four of its mainlines that transport gasoline, diesel, and jet fuel, stretching from Texas to New Jersey, which supplies approximately 45% of the gasoline and diesel fuel used on the U.S. East Coast.
Safety and operational control systems are increasingly dependent on Internet-connected systems that are built on top of increasingly complex software foundations. In March 2022, the U.S. Department of Justice unsealed two indictments charging that Russian state actors engaged in a multi-year campaign to corrupt industrial control system (ICS) software used in power and petrochemical plants, gaining strategic access across hundreds of energy sector entities in 135 countries, including a nuclear power plant in Kansas. In another, they gained control over safety systems at a large Middle Eastern petrochemical plant. Incapacitating these systems could have resulted in large-scale death and destruction at the facility. Also in 2022, the Cybersecurity and Infrastructure Security Agency (CISA) warned that a nation-state hacking group, likely Russia, had been actively targeting U.S. critical infrastructure with specially-coded malware, known as Pipedream, designed to take control of a wide variety of industrial systems. Industrial cybersecurity experts at Dragos described it as “the most expansive ICS attack tool anyone has ever documented.”
Domestic violent extremists have displayed an increasing interest in critical infrastructure systems. In January 2022, DHS reportedly warned that domestic violent extremists had developed “credible, specific plans” to attack electric power infrastructure. This follows an April 2021 arrest of a Texas man for plotting to blow up an AWS data center in Virginia. As recently as July of 2022, extremists called for Metcalf-Style kinetic attacks on ‘Sitting Duck’ Electricity Infrastructure. More generally, the U.S. Department of Justice is reporting a record increase in hate crimes in the United States – currently at their highest levels in more than a decade – case in point was the January 2022 anti-Semitic attack on Congregation Beth Israel in Colleyville, Texas, as well as the May 2022 racially-motivated attack on a Buffalo supermarket.
COVID-19-directed shift to work-from-home has necessitated further changes to security strategies. The move to work-from-home models created multiple benefits, including limiting super-spreader scenarios and maintaining employee morale during a time of high uncertainty (and attrition). However, it also created greater ambiguity around the physical location of key employees and employer duty of care requirements for remote workers. It also dramatically heightened threat actor motivations to target technologies that enable remote work and e-commerce, including virtual private network (VPN) systems and e-commerce checkout pages.

Exposing the Vulnerabilities

These trends have exposed three core challenges in managing security risk:

Many organizations are still working out how to reflect rapidly changing geopolitical risk in their business operations. There are many possible escalation scenarios for the current Russia-Ukraine conflict – including “wild card” scenarios such as misattribution, spillover and miscalculation –planning for these scenarios is, simply put, difficult. In evaluating the Russia-Ukraine conflict, many companies have asked themselves what the impact could look like if a similar contingency broke out between China and Taiwan. While supply interruptions are not new (organizations have for decades planned for hurricanes), the speed and scale of such disruptions when borne out of conflict or malice is hard to predict.

Companies can also struggle with managing protracted contingencies, specifically, when incidents and crises become long-term and localized disruptions require continued focus but with fewer resources. For example, at what point does an organization transition incident and crisis management teams, managing daily changes in Ukraine, to more traditional business continuity functions monitoring for escalations?

Organizations continuously struggle to validate security performance against low-probability, high-consequence threats. In our experience with incident after-actions in both physical and cyber domains, organizations tend to have many of the right tools and procedures in place, but the tools were not properly maintained, and personnel was not sufficiently practiced in managing the incident. The July 17 Texas State House of Representatives Interim Report of the Investigative Committee on the Robb Elementary Shooting, in noting shortcomings at the Uvalde Consolidated Independent School District and various law enforcement agencies, also acknowledged “that those same shortcomings could be found throughout the State of Texas.” This highlights a broader challenge: notwithstanding a significant law enforcement presence, low-likelihood events by definition mean that security personnel are not normally practiced in timely detection and response to threats, which can lead to “egregiously poor decision-making,” as it apparently did in Uvalde.

We see this same tension in aviation security – for example, detecting an explosive in a carry-on bag (an extremely rare event) and in cybersecurity (while massive numbers of intrusion events occur daily, automated systems tend to block or contain almost all these attacks for any given organization: a security team’s experience detecting malicious activity on a server, after a threat actor has achieved an initial foothold, is much rarer).

These challenges are often magnified when trying to manage third-party security risk. Both Expeditors and Kronos, referenced above, had presumably been subject to multiple customer security questionnaires, and yet both experienced extended outages at the hands of ransomware threat actors. These outages caused significant operational disruption for many downstream customers.

Applying Mitigation Strategies

While there is no such thing as risk elimination, 2022 is proving what we’ve known for many years but have struggled to apply in practice: the key to managing through an increasingly volatile risk environment is a threat-informed, resilient-by-design security strategy that business leaders can understand and embrace. What are the key elements?

Apply graduated levels of security flexibly depending on the nature of the threat and impact. While security frameworks have focused for decades on the concept of “high-value assets,” applications can tend to be stove-piped and often driven by security teams instead of business functions. Security teams can help companies understand assets that are likely to be targeted by threat actors (e.g., Active Directory in a cyber-attack), but other functions like procurement, finance, IT, manufacturing and business lines are better positioned to know what matters most in keeping the business running. We also know that there is, at times, a tension between convenience and security. While we try to strike that balance in business-as-usual operations, we don’t always think of opportunities to tweak it – for example, tightening vehicle or personnel checks in a physical context, or risk-based authentication rule-sets – during times of heightened threat. Flexibility and the ability to apply graduated levels of security quickly, based on changes in threat, are key elements of resiliency.

Apply a consistent, continuous threat-informed testing and validation approach. Low probability, high-consequence scenarios apply across physical and cyber domains. Threat emulation-based training and evaluation – of explosives detection in a physical context, or malicious Tactics, Techniques and Procedures (TTPs) in a cyber context – is critical to keeping security tools tuned to, and security operators practiced in, plausible threats. Organizations should also look for opportunities to continuously validate vendor security performance. Realistic and creative exercise scenarios for security personnel—particularly for low-probability, high-consequence events—can help fight complacency and keep security stakeholders engaged and thinking like an adversary.

Prepare for tail risk contingencies. Organizations can best prepare for tail risk contingencies by focusing on visibility; situational awareness; planning and exercises aimed at severe but plausible scenarios; related playbooks; and relationships with law enforcement and security organizations.

o Visibility. Preparedness starts with visibility into the identity and location of high-value assets (data centers, critical manufacturing plants, key software systems), critical personnel (not just executives, but also software development teams in technology companies, or key operations staff at manufacturing organizations) and strategic suppliers (key manufacturing component suppliers; IT delivery centers for IT managed service providers).

o Situational awareness. We can start to build situational awareness on tail risk contingencies by, for example, geofencing key facilities and alerting on open-source reporting of incidents adjacent to those locations or alerting on cyber-attacks that have impacted key suppliers.

o Scenario-based preparedness planning & exercises. In the aftermath of the September 11 terrorist attacks, DHS developed operational planning scenarios to help key security functions prepare for national-level all-hazards incidents. Scenario-based planning could help organizations think through plausible contingencies such as:

§ How long can our data center operate without power and water?

§ What critical locations – data center, ops center – are in a potential conflict zone?

§ What is the impact of an extended loss of a critical supplier (logistics provider for a manufacturer or retailer; software development team for a technology company)?

§ How do we manage contingencies where our employees are on both sides of a conflict?

§ Are we prepared for hacktivist and disinformation campaigns depending on what business decisions we make related to a conflict?

o Playbooks. Playbooks that anticipate things that can go wrong, for example – decision-making when key personnel are unavailable, and time is of the essence (delegation of authority) or when normal communications systems (email, videoconferencing) are untrusted or inoperative. For example, here is a set of guidelines our firm pulled together for businesses facing the need for a rapid withdrawal from conflict zones.

o Law enforcement and security relationships. Finally, we have found that maintaining active relationships with law enforcement and security agencies is key to timely and informed decision-making in times of crisis. And because crises are increasingly hybrid – entailing both physical and cyber elements – it is also important the scope of these relationships reflects this dynamic.

There is not one single organizational structure that optimizes this approach. What’s important instead is that integrated project teams exist or are stood up soon, are appropriately resourced to address these contingencies and ensure the right hand-offs are in place.

Conclusion

The goal of an effective security program has never been to prevent all threats, but in today’s volatile environment, the increased potential for “tail risk” scenarios puts a premium on integrated security planning and preparedness. A threat-informed security strategy built with flexibility in mind, continuous validation and testing, and sustained preparedness efforts are key to enabling an organization to maintain business operations through a “tail risk” event. A successful approach requires collaboration and is ultimately dependent on people, buy-in and resourcing from the top, as well as active participation across business units.

About the authors: Adam Isles leads the Cybersecurity Practice at The Chertoff Group. He also authored the firm’s security risk management methodology. He formerly served as DHS deputy chief of staff and started his career at the U.S. Department of Justice as a trial lawyer in the Criminal Division.

Ben Joelson leads Corporate Security Risk and Resilience engagements at The Chertoff Group. Ben served in the U.S. Air Force as a Security and Antiterrorism Officer, is the former President of an international defense company and is an ASIS Certified Protection Professional.