If disaster strikes, how quickly can you recover?

The longer an organization takes to recover, the more costly it becomes


If a hurricane, fire, earthquake or even a high-impact human error were to render your business facilities unusable, how long could your organization operate without mission-critical IT systems? How long would it take you to restore operations — and to what extent could you repair the damage short- and long-term?

In the face of a natural or man-made disaster, companies can be crippled for days, weeks or even months, and many risk a permanent loss impacting customers, revenues and reputation. Given the extent to which most companies today are dependent on computerized business processes, a disaster-recovery plan is a necessity. The longer it takes to restore systems and data, the more difficult it will be to recover from the disruption.

Creating a disaster-recovery plan involves prioritizing current systems, pinpointing mission-critical applications and data, and establishing the most cost-effective backup and recovery strategies. Since implementation of the plan may involve significant capital investment in IT infrastructure, fully realizing a disaster-recovery plan may require several years of phased implementation.

 

Following are a series of questions that your disaster-recovery plan should answer:

  • What are your business needs related to disaster recovery?
  • Where are the gaps?
  • How can you close the gaps?
  • How long will it take to close the gaps?
  • What are your disaster-recovery business needs?

Disaster-recovery planning should begin with a review of possible threats and impacts to your organization’s processes and systems. Health care and higher education organizations, for example, may use hundreds of applications in many different departments — and near-constant uptime is more critical for some than for others. Prioritization is essential, because establishing immediate recovery for every single system will require more investment than would be feasible for most organizations.

The best way to separate mission-critical from “nice-to-have” applications is to interview end users, application owners and other stakeholders, and to quantify the business impact of potential system disruptions. What will impact human health and life safety? What scenarios might arise if an application or data set becomes unavailable? How long can a service be unavailable without causing irreparable harm? What is the true cost of system downtime?

Quantifying the business impact will enable the planning team to objectively separate the mission-critical from secondary systems. This “business impact analysis” (BIA) can be used to establish the “recovery point objective” (RPO) for data and a “recovery time objective” (RTO) for each critical system.

For example, one Midwest hospital had long used an electronic medical record (EMR) system to dramatically increase its capacity for emergency-room admissions. The hospital determined that EMR downtime of more than two hours would lead to significant delays in patient care because it would need to rely upon inefficient manual paper-based processes. The delay in patient care would first result in health risks to the patients — the first and most primary concern should the system not be restored quickly.

If the system remained down long enough, the hospital would need to redirect ambulances to competing facilities in order to protect the well-being of patients, and as a result revenues would decline significantly. Even after the EMR system was restored, a hospital would face an uphill battle to restore its reputation, and thus potentially suffer a reduction in patient visits during a much longer period of time than the initial system failure.

Clearly, the EMR system was mission-critical. Therefore, its recovery point objective was to restore 100 percent of EMR data for the past three months, with a recovery-time objective of two hours — the maximum time length for which the emergency department could function without the EMR system before incurring a waterfall of high-impact negative events.

This content continues onto the next page...