1. Field of the Invention
This invention is related to the field of data protection and recovery in computer systems.
2. Description of the Related Art
Data protection for computer systems is an important part of ensuring that the information generated on a computer system and/or stored on the computer system is not lost due to the occurrence of a hardware failure, a software failure, user error, or other environmental event (e.g. power outage, natural disaster, intentionally-caused disaster, accidental disaster, etc.). Generally, events that the data protection scheme is designed to protect against are referred to herein as disaster events. The data protection scheme attempts to make redundant copies of the data and locate those copies such that the data is safe from the disaster events and such that the data can be restored to the computer system or to another computer system rapidly enough to be acceptable given the nature of the data, its importance to the creator of the data, etc.
While restoration of the data after a disaster event is part of recovery, restoration alone may not be enough to ensure recovery. Generally, recovery refers to actually bringing back into operation the applications and other software/functionality that were in operation on a computer system or systems prior to the disaster event. Generally, the application software, the underlying operating system software, and data/configuration files for the application and operating system must be restored to a logically consistent state to permit recovery.
Recovery is frequently performed under significant time pressure. Unfortunately, current data protection tools are not focused on recovery. Determining how long a recovery operation will take is frequently merely guesswork for the administrator performing the recovery. When multiple recovery operations are needed and time pressure is high, the guesses are insufficient to help the administrator plan the overall recovery.
Increasingly, organizations are adopting formal service level agreements (SLAs) with their information technology (IT) departments or third party IT providers. Disaster recovery planners (and/or business continuity planners) in the organization assign recovery requirements to various information assets based on the importance of the information assets to the continued functioning of the organization. Currently, the disaster recovery planners specify a recovery point objective (RPO) and a recovery time objection (RTO). The RPO indicates, relative to a specified point in time, how close in time that it must be possible to recover the state of the corresponding information asset. For example, an RPO of 0 indicates that is must be possible to recover the state of the information asset at any point in time. On the other hand, an RPO of 30 minutes indicates that is must be possible to recover the state of the information asset to a state within 30 minutes of the specified point in time. The RTO specifies the maximum amount of time that the recovery operation may take.
The RTO and RPO are objectives, but they may not actually be achievable given data protection technology, budgetary constraints, etc. Accordingly, corresponding recovery targets (recovery time target (RTT) and recovery point target (RPT)) are negotiated by the disaster recovery planners/business continuity planners with the IT department/provider. The RTT and the RPT are formalized as the SLA. Typically, SLAs only cover the immediate recovery of the current state of an asset in response to a disaster event.
Once the SLAs are in place, the IT department/provider must then establish a protection scheme for the information assets that will meet the SLA. There are myriad protection schemes and protection products available which may provide pieces of an overall protection solution that would meet an SLA. However, the number of combinations and permutations of schemes is dauntingly large. Additionally, protection schemes and products are typically focused on the protection provided, not on the recovery metrics that may be achievable using the schemes/products to recover from a disaster event. Thus, it is difficult to determine if a protection scheme will meet a given SLA. Additionally, it is desirable over time to revise the SLA (the recovery targets) to come closer to meeting the recovery objectives.