1. Field of the Invention
This invention is related to the field of computer systems and, more particularly, to recovery of computer systems.
2. Description of the Related Art
Computer systems and their components may reach undesirable states in various ways. For example, some undesirable states may be due to failures of hardware components, others due to erroneous software or hardware operation, and others due to malicious intrusions by viruses, worms or human agents. Some undesirable states result in application failure or system failure. Others may result in logical corruption such as incorrect data being displayed or in other undesirable behavior such as poor system performance.
A variety of approaches may be used to recover from different kinds of undesirable states. To reduce the costs of application and system failures, solutions such as clustering may be employed. When an application running on a node A of a cluster fails, or when the node fails, the application may be failed over to another node B. To mitigate the risk of logical corruption, backup copies of data may be stored periodically on various storage devices. When data corruption is discovered, a backed up version of the data may be used to restore the state of the data to an acceptable previous state.
One approach to the problem of logical corruption is to provide functionality that restores the state of a component to its state as of a given earlier point in time at which the component was known to be functioning in an acceptable state. The user of the functionality may choose the point in time to which the state is restored, typically from a set of possible points in time, where the set of possible points of time varies with the implementation of the functionality. This functionality is known as point-in-time recovery. Some database management systems may provide functionality for point-in-time recovery of the data in the database tables. Similarly, some data storage vendors may provide functionality to revert the state of data on a disk or on a set of disks to the state as of an earlier point in time. The manner of selecting the point of time to which recovery is desired may vary with component type and solution vendor. For example, some vendors may support recovery only to some discrete point in time, such as the time of the last database checkpoint, rather than to any arbitrary instant.
The problem of point-in-time recovery is more complicated for complex applications that depend upon a set of interdependent hardware and software resources to function. For example, an application may utilize resources such as application software and libraries, a database management system, file systems, disk volumes, physical disks, TCP/IP host and port information, and network interface cards. Some of these resources depend upon others; for example, file systems may not work unless the underlying disk volumes and physical disks are functioning correctly. In order for the application to provide acceptable operation, all the resources must be functioning. In such complex applications, the set of resources may change over time, and the dependencies among the resources may also change over time. In addition to the factors described earlier that can lead to applications reaching undesirable states (hardware failures, intrusions and the like), suboptimal resource configurations can also lead to undesirable states for complex applications.
As mentioned above, backup copies of application data may be stored periodically on various storage devices. In the event that recovery from an undesirable application state is desired, such backup copies can be used to restore the application data to an earlier state. However, in cases where the set of resources used by an application changes over time, or where the dependencies among the resources change over time, or where suboptimal resource configurations contribute to the application reaching an undesirable state, the restoration of the data state alone may not be sufficient to bring the application back to a desired earlier operational state.