Disaster Recovery vs Business Continuity

Introduction
Resiliency is the key to business success!

COVID-19 has once again revealed exigency for resiliency in organizations and governments. Vulnerability to operational disruptions due to pandemics, disasters, cybercrime, attacks, system failures, etc. are posing greater threats to organization’s resiliency and business continuity. According to various reports, it is estimated that, hardware failures cause 45% of total unplanned downtime. Followed by the loss of power (35%), software failure (34%), data corruption (24%), external security breaches (23%), and accidental user error (20%). More than 50% of companies experienced a downtime event in the past five years that is longer than a full day. Companies without a disaster recovery plan are at an increased risk of shuttering their doors. In fact, according to the Federal Emergency Management Agency, roughly 40 to 60 percent of small businesses that experience a disaster never reopen. 93% of companies without Disaster Recovery who suffer a major data disaster will be out of business within one year. Reportedly, more than 50% of businesses don’t even have the budget to recover from the attack. Disruptions and data loss can cause serious business, financial, reputational, customer or legal implications. The average cost of a data breach is $3.92 million, which has increased 12% over the past five years. Costs can be staggering, based on the type of attack. Estimate are that unplanned downtime can cost between $926 to $17,244 per minute.

Building Resilience
Prevention is always better than cure: 

96% of businesses with a disaster recovery solution in place will be able to fully recover their operations. Resilience is the ability of an organization or business process to:

  • Anticipate: Maintain a state of informed preparedness in order to prevent compromises of business functions from adversary attacks
  • Withstand: Continue essential business functions despite successful execution of an attack by an adversary
  • Contain: Localize containment of crisis and isolate trusted systems from untrusted systems to continue essential business operations in the event of cyber attacks
  • Recover: Restore business functions to the maximum extent possible subsequent to successful execution of an attack by an adversary
  • Evolve: To change business functions and/or the supporting system capabilities, so as to minimize adverse impacts from actual or predicted adversary attacks

Definitions

Business Continuity (BC) is defined by ISO (International Organization for Standardization) as “the capability of the organization to continue delivery of products or services at acceptable predefined levels following a disruptive incident”. It is all about building out a plan that ensures your business is able to deliver the same (or as near to as possible) level of service, as it did before the plan was required to be invoked.

Disaster Recovery (DR) is the way in which an organisation would retrieve key information and services after an unforeseen disaster. Any disaster recovery plan should include a set of policies and procedures to follow in order to get the affected parts of the business working again after a significant disruptive event.

Disaster recovery is a subset of business continuity planning. Disaster recovery is the process of getting all important IT infrastructure and operations up and running following an outage. Business continuity differs in that it is the process of getting the entire business back to full functionality after a crisis.

Disaster Recovery (DR)
How can we recover from a disaster?

Disaster recovery aims to minimize business downtime and focuses on getting technical operations back to normal in the shortest time possible. This is not enough, resilient organizations not only implement plans to recover from different types of disruption; they also build resiliency into the fabric of their processes, systems and infrastructure. That means identifying single points of failure and implementing back-up plans so that a failure at that site doesn’t stop services to a halt. Disaster recovery plans also involve restoring vital support systems like communications, hardware, and IT assets. Longer the recovery time, the greater the adverse business impact. Therefore, a good disaster recovery plan should enable rapid recovery from disruptions, regardless of the source of the disruption. A Disaster Recovery Plan (DRP) refers to the steps and technologies for recovering from a disruptive event, especially as it pertains to restoring lost data, infrastructure failure or other technological components.

Managing recovery plans for business processes, systems, people, locations and other interdependencies is complicated enough. Disconnects between business continuity and IT disaster recovery only make it worse. Coordination between incident and crisis response teams are absolutely essential to resiliency.

A detailed and tested disaster recovery plan would target:

  • To minimize interruptions to normal operations.
  • To limit the extent of disruption and damage.
  • To minimize the economic/reputational/legal impact of the interruption.
  • To establish alternative means of operation in advance.
  • To train personnel with emergency procedures.
  • To provide for smooth and rapid restoration of service.

Steps in Disaster Recovery:

Objectives
A detailed and tested disaster recovery plan should target:

  • To minimize interruptions to normal operations.
  • To limit the extent of disruption and damage.
  • To minimize the business/economic/reputational/legal impact of the interruption.
  • To establish alternative means of operation in advance.
  • To train personnel with emergency procedures.
  • To provide for smooth and rapid restoration of service.

Prepare an IT asset inventory
Prepare a list all IT assets including servers, storage devices, applications, data, networks, etc., map each of the resources to it’s physical location / network segment, etc. and identify any dependencies

Perform risk assessment
Associate threats (internal and external) to each of the assets in the asset inventory. Imagine the worst case scenario and assign probabilities (low, medium, high) that the event may occur (likelihood). Evaluate the impact (low, medium, high), if the respective event occurs. Risk can be calculated using the formula, Risk = Threat x Vulnerability x Consequence.

Identify critical business processes
Once risk assessment is done, identify the most critical (single points of failure and other high-priority resiliency risks) and the impact of it on business, to build necessary resilience into them before any disruption occurs. It can also help you identify the upstream and downstream dependencies, systems and processes so that you can make them a priority in your resiliency planning

Define recovery objectives
Different resources/systems will have different recovery objectives. Critical systems will need aggressive recovery objectives and others will require less stringent recovery objectives. The key here is to understand business needs and provide a differentiated level of service availability based on priority. These recovery objectives may also specify details of time objectives (max acceptable downtime) and point objectives (max acceptable data loss)

Determine right tools and techniques
Once you have identified all your IT assets, mapped their dependencies, and grouped them together based on their criticality and recovery objectives, it’s now time to choose what tools and techniques to use. Automate and streamline the recovery process as much as you can. In the event of a disaster, key IT staff may be unavailable. Automation also lessens the risk of human error.

Get stakeholders involved
Go beyond the walls of the data centre and involve key stakeholders in all your business units (i.e. application owners and business managers). Consult with all of the key stakeholders, enlist an executive-level sponsor who will get behind you and the project. The importance of collaboration, consensus and executive support for your disaster plan’s success is critical.

Document and Communicate
It is important to have detailed documentation of how to get back to a normal working state in case of a disaster. This document should be communicated and shared with all the stake holders. Store the DR strategy in a place, where it can be accessed securely and easily during a disaster.

 – Test and Practice
Practice makes perfection. Practice will help in finding and rectifying problems in your plan, as well as enable you to execute it faster and more accurately. So test your DR plans regularly.

Evaluate and Update
DR plan is a living document. It is very important to regularly review the DR plan to adjust it to the changing business environments and threats. DR plan should always reflect the current state of the organization.

Business Continuity (BC)
How can we continue business, if disaster strikes?

Business continuity consists of a plan of action, a strategy. It ensures continuity of operations with minimal service outage or downtime, during and after a disaster. BCP is the preparedness of an organisation against disasters. This includes policies, standards and procedures to ensure continuity, resumption and recovery of critical business processes, at an agreed level by limiting the impacts of the disaster. BCP is responsible for the strategic, long-term, business-oriented plan for uninterrupted operation when faced with a threat or disruption, while DRP is a short-term plan for dealing with specific IT-oriented outages. Disaster recovery is actually just one part of a complete business continuity plan, while business continuity plan looks at the continuity of the entire organization, which includes strategies for:

  • Prevention: Steps and systems to prevent certain disasters from occurring in the first place
  • Mitigation: Processes to limit the impact of disasters when they occur.
  • Recovery: Protocols for restoring operations as quickly as possible to limit downtime or other adverse consequences.

Stages of Business Continuity Plan

Business Impact Analysis

  • Develop questionnaires
  • Train business functions and managers
  • Conduct BIA
  • Review BIA
  • Follow up and validate information

Recovery Strategies

  • Identify resource requirements
  • Conduct gap analysis (between recovery requirements and current state)
  • Explore recovery strategy options
  • Select recovery strategies and get approval
  • Implement strategies

 – Plan Development

  • Develop a plan
  • Organise recovery teams
  • Develop relocation plans
  • Create business continuity and DR procedures
  • Get management approval

 – Testing and Exercise

  • Development test requirements
  • Conduct training for business continuity team
  • Conduct orientation exercises
  • Conduct testing and document test results
  • Update BCP based on test results

As you can see, a DRP is more of a subset for a BCP. For example, you should have several DRPs under one BCP that outline various scenarios like cyberattack, power outage, or hurricane.

Summary

Common business disruptors like global pandemics, natural disasters, utility outages and cyber security incidents are ever increasing. BCP and DRP helps organisations to be prepared and resilient against such threats. If you do not protect your business with a proper business continuity and disaster recovery plans, then when a catastrophe occurs you won’t be prepared – leaving you with the worst of both worlds. Statistics for businesses surviving without a business continuity and disaster recovery plan do not read well. Regulatory authorities, federal and state laws too require a formal disaster recovery planning in organizations. For example, financial enterprises must have a business continuity plan. The healthcare industry must comply with HIPAA requirements. Automate and streamline the recovery process as much as you can, so that recovery can be faster leading to shorter down times and lesser damages.

About The Author

Dr. Anil Kumar

VP Engineering
Founder | Architect | Consultant | Mentor | Advisor | FacultySolution

Architect & IT Consultant with more than 25 yrs of IT Experience. Served in various roles, with both national and international institutions. Expertise in working with both legacy and advanced technology stacks and business domains.

Enterprise Risk