In Case of Emergency, Activate Business Continuity Plan

Aug. 7, 2006
Making business continuity a CEO concern, not just for the CIO or CSO

Gemstar-TV Guide International hired Ed Sullivan to direct Business Continuity Services in 2003, soon after an audit found that TV Guide's infrastructure was essentially unrecoverable in the event of a sustained crisis. There was a time when Sullivan's first stop for addressing the issue would have been IT and the datacenter. But times have changed -- Sullivan first conducted several weeks of meetings with senior executives and various business unit executives to talk about the company's business processes. "The fact that I work for the CIO is almost irrelevant," Sullivan says. "I'm there to provide recovery for the business units."

Assessing potential impact to the business before crafting a business continuity plan is not new, but the trend is picking up steam as technology becomes more tightly bound up with ongoing operations and overall success. "Take just-in-time inventory," says Jim Grogan, vice president of consulting product development at SunGard Availability Services. "It would not be possible without the technology to enable it. But it's a business decision to manage operations differently, and its success means literally billions reinvested in the business."

Another reason to start with a business-impact analysis is that technology underpinnings are not as clear-cut as they once were. "We used to think that if we brought up the datacenter after a disaster, we'd be OK," says Fred Dillman, CTO of Unisys. "But with servers all over the world and laptops, desktops, and other devices playing such an important role, bringing up a datacenter is no longer enough, and it's simply not feasible to bring up everything. You have to know which business processes and their underlying technologies are critical to your goals."

Hidden dependencies are becoming more common. "Often, the application that was deployed two years ago with 'Yeah, let's give this a try,' is now silently driving 15 percent of revenue," SunGard's Grogan says.

Click for larger view.

Recent regulations such as Sarbanes-Oxley, as well as natural and man-made hazards now firmly rooted in reality, have made analyzing the business risks of technology failures and disasters critical. "Before Katrina, few companies had given much thought to what would happen if an entire city was out of commission," says Michael Porier, director of Protiviti's technology risk practice.

The risks to revenue and market capitalization are higher than ever. "Today you need to prevent an outage because if you get a customer service black eye, you'll pay a price on Wall Street," Grogan says.

Modeling the Business Modeling the business is the first big step toward falling in line with risk management aspects of best- practices frameworks, such as COSO (Committee of Sponsoring Organizations of the Treadway Commission), COBIT (Control Objectives for Information and related Technology), ITIL (Information Technology Infrastructure Library), and ISO/IEC (International Organization for Standardization and the International Electrotechnical Commission) 17799:2005, all of which can help with the nuts and bolts of business continuity planning.

"COBIT is a framework for IT control that can be used to mitigate risks once you've modeled the process," Walch says. "COSO is in many respects the counterpart to that on the business side, leaning to business control and governance as opposed to IT controls.

ITIL looks at business continuity, change management, and problem and incident management from a service perspective. All of these frameworks are very complementary to the business modeling process." ISO/IEC guidelines contain a wealth of best practices specific to information security management.

Increasingly then, an effective strategy for business continuity planning lies first with carefully modeling the business itself. This means gaining an in-depth understanding of business goals, priorities, functions, underlying processes, and the people and expertise involved with those processes. "It's too easy to get down to the guts of networks, systems, servers, and patches without understanding that the real mission here is to wholesale leather shoes," says Trent Henry, senior analyst at the Burton Group. "IT has to understand fundamentally the business they're in the middle of. Then they need to understand the dependencies of the business processes on the infrastructure that they're running." This business-impact analysis is generally followed by a technology profile and is a staple of many enterprise business continuity teams. "That begins to pull in risks to the business," Henry says, "and what aspects need to be prioritized, including people."

Getting started in modeling a business generally involves extensive interviews, typically in a workshop setting, with senior executives and representatives of the business units. Where you start depends on whether you have gone through part of this process before. "Most companies we see doing this today are in the mature or world-class category," says Damian Walch, national practice executive of business resilience consulting at IBM Global Services. He adds that the financial services and energy sectors tend to be the ones that come to them for this purpose. "They've gone through at least one business-impact analysis before and know which processes are critical." What they sometimes don't have a handle on, according to Walch, is the dependences among various business functions and between the business functions and IT.

"You start with the lines of business the company defines as where their core products and core revenue are coming from," SunGard's Grogan says. "Then you dive under the covers. What are your salespeople doing to generate the sales cycle? What are your accounting people doing to get the bills out? What are your distribution or production people doing to create the product and get it shipped?" Some of these pieces may be internal and some external. "Many companies outsource their shipping and distribution to UPS or FedEx, but that process is key to their customer satisfaction."

Who should be involved? "Usually it's not the line-of-business executive, it's more like someone at the manager or director level," IBM's Walch says. "Someone who inherently understands the process and can help you walk through claims processing, drilling, or exploration."

But Unisys' Dillman disagrees. "We like to start with the executives because ultimately it comes down to what goals you're trying to achieve in a degrading situation and how much you're willing to spend."

Mapping interdependencies among processes, departments, employees, and external players is the overarching goal of the business continuity planning process. This is where diagramming software, such as Microsoft Visio, can help. SunGard has a software product called Paragon that provides tools to guide companies along the entire business continuity planning process, and it includes diagramming software that can map interdependencies. "People may not understand initially a dependency between customer service and product development or how much order entry does or doesn't depend on finance," says Jacques Murphy, SunGard's Paragon product manager.

Finding these dependencies requires a lot of discussion, collaboration among people with different functions, and input from an IT expert who understands the underlying systems. "An oil exec may say that it's really critical that we have this data warehouse up because we can't analyze exploration areas without it," Walch says. "Someone from IT can then say, 'Well, it's the systems that feed the data warehouse and the process control mechanisms that are really critical here.' "

It's also important to clarify business process goals. If the help desk's goal is to ensure that no client is on hold for more than 30 seconds, then it's important to look very closely at the phone system and redundant routing to various switch stations, says Tim Leech, principal consultant and chief methodology officer at Paisley Consulting.

Often it's best to prepare business unit reps by distributing a survey or questionnaire before the workshop to get the thinking going. "We gave the business unit people information and questionnaires to answer in advance and then got together in a workshop approach with a team from SunGard to do the workshops and data collection," TV Guide's Sullivan says.

The Business Model Challenge The hard part, particularly from a technology standpoint, is identifying all the layers of dependency. Business units may know about the payroll system, but it takes a lot of IT participation to get down the stack, layer after layer. Applications such as payroll run on operating systems and adhere to system configurations, which in turn integrate with an application infrastructure of back-end systems, identity management systems, and protocols. All of this sits on a physical infrastructure of server platforms, networking, and routing infrastructure, which in turn depends on an underlying critical infrastructure of cooling, power, communications, local and regional government services, and, perhaps most important, people. And, as obvious as it may seem, people depend on food and shelter and, perhaps less obvious, a perception of a certain amount of safety. "You may have a disaster situation in which the people didn't die but the families panicked and forced them to quit," Burton's Henry says.

The other challenge is identifying dependencies that come from BPO and supply-chain arrangements. "If you're outsourcing HR, you probably want to keep backups of everything you send to that company," says Fred Cohen, CEO of Fred Cohen & Associates, information security specialists.

Mobility is another potential stumbling block. "Many people are surprised at how much data is on peoples' PCs and laptops," Unisys' Dillman says. "If they can't use them, which occurred after Katrina, the business may not be able to operate." And finally, it's important not to overlook information lifecycle issues tied to regulations such as HIPAA and Sarbanes-Oxley.

Several vendors -- including Fred Cohen & Associates, IBM, Paisley Consulting, PricewaterhouseCoopers, Protiviti, SAIC, SunGard, and Unisys -- provide services to help companies through this process.

What should this modeling exercise produce? In some cases it's simply a spreadsheet or database that lists different processes and their dependencies. In other cases it may be a large diagram or several diagrams that map out these dependencies through the various layers using icons and colored arrows or process flows. In other cases, the diagram may be linked to a database. "We find that Excel spreadsheets are a major tool for this purpose," Burton's Henry says. Tools are also available from Paisley Consulting, Proforma, SunGard, Strohl Systems, and Triaster to help with parts of this process, as well as the final result. Protiviti and Unisys have their own tools that they use with customers.

"Our deliverable is usually a diagram with some type of text or database behind it," Walch says. "The process model usually shows different entities such as people, business units, business partners, applications, infrastructure, and databases and describes the relationships between them and information flow. You'll see integration with Tivoli, Remedy, Paragon, or something similar." IT can use these diagrams and databases to understand the consequences and scope of various types of outages, as well as for subsequent forensic analysis.

Assessing the Risks This model is then used in the other part of this workshop process, which is to rank the importance of processes and to assess risks. Disaster recovery specialists are more concerned about specific risks, such as hurricanes, but business continuity planners tend to talk more in systems and processes. "I need a plan to tell me what to do when the power goes out," Protiviti's Porier says. "It doesn't matter what caused it."

In assessing risks, it's also important to understand the importance of manual work-arounds. "Right-sizing is really important. Payroll may say that if the payroll system fails on Tuesday, they can't pay anyone on Thursday," Cohen says. "But actually they could make copies of last week's pay stubs and use those until the system is up and running. That stretches their recovery time to a week or more."

Some of these processes may already be in place. "An executive may tell you that sales can't be without their systems for more than a half hour," Cohen says. "Then you talk with the salespeople, and they say, 'Oh, we have outages longer than that all the time. We know what to do.'" The lesson: You have to talk to a lot of people.

Be aware of the tendency of many departments to label their functions mission-critical. That's why, after mapping out processes and risks with each business unit, it's essential to go back to senior management for a reality check on what really is a Tier 1 process and what is more likely Tier 2 or 3. "We call it the management filter," Protiviti's Porier says.

Staging the Alternatives A thorough understanding of the business and all its dependencies leads to cost-effective business continuity strategies. "You can replicate in real time, electronically vault to another site, or use the old standby: recovery from magnetic tape. The more redundant and available a system is, the more expensive it is," Porier says. The practice is usually to price options that meet the risk profile and then price solutions a little ahead and a little behind to assess cost/benefit implications. "Everything is critical when recovery costs a penny," Unisys' Dillman says. "When it costs $10 million, certain things suddenly become less critical."

It's also important to model alternative processes, such as telecommuting, that might occur during an incident. A perfect example is having sufficient remote access capacity in place for situations in which a large portion of your staff will be working at home. And consider staff dispersion and cross training to ensure that alternate staff can do what has to be done to keep the business running.

Finally, the business model is never static, so it's important to keep the model current to prevent any devastating surprises when an outage actually occurs.

Leon Erlanger is a freelance author and consultant specializing in security.