1. Field of the Invention
The present invention relates to the backup or restoration of aspects of a computer system. More particularly, the present invention relates to a method and system for storing information about application dependencies, so that a computer system may efficiently return to a pre-crash state in connection with the dependency information. The present invention is also directed to any restore operation for a point-in-time for which backup information is available. Furthermore, the present invention is directed to circumstances wherein a rollback to a known or desired prior system configuration may be desirable.
2. Brief Description of Prior Developments
When a computer system crashes or the system freezes, many consequences ranging from the trivial to the irreparable may result. For standalone computers or client computers, a local system crash can result in the loss of work product. For example, anything that has not been saved properly may be lost from further access or use. Furthermore, a user may be inconvenienced by having to reboot the computer system, thereby expending additional time. In the case of network servers and other computer systems, a system crash can be even more sweeping, affecting multiple users, clients, and/or customers. As computer systems grow ever more complex, it seems that programmers, collectively speaking, have not been able to entirely eliminate system states in which a computer or application “freezes” or “crashes.”
In the “restore to prior state” scenario what one desires is to restore a system to a desirable prior state. This has value when, for example, one has installed new software and the new system did not operate as expected.
Accepting it to be true that the probability of a system crash or freeze is not zero, a field of study, known as recovery, has arisen which relates to improving the process whereby a computer system recovers from a crashed state to a stable state. Recovery from system instability has been the subject of much research and development.
In general, the goal of reboot or redo recovery is, after a crash, to return the computer system to a previous and presumed correct state in which the computer system was operating immediately prior to the crash or at a point in time for which a consistent set of backup information is known. Because point in time information representative of a consistent state for all of an application's dependencies can not be ensured, some restore or backup services may incorrectly reset an application's state to an incorrect state, or alternatively, an extremely resource intensive brute force freeze or flush of every process of the computer system may be required in order to recover to a stable state for the application, volume or other object being backed up or restored.
For example, database system designers have attempted to design database recovery techniques which minimize the amount of data lost, the amount of work needed to recover to the pre-crash operating state, and the performance impact of recovery on the database system during normal operation. However, any time a file save operation, registry write operation, database storage operation, active directory write operation, or other like dependencies exist prior to a restore or backup service, it is desirable to recover to a pre-crash state representative of an atomic point in time.
A type of efficient backup technique that has been developed utilizes a snapshot provider for providing a snapshot of volume(s). Instead of implementing a brute force file by file recovery of a volume any time the system crashes, a snapshot enables the state of a computer system to be ‘frozen’ at an arbitrary point in time, thereby enabling a much faster and less resource intensive backup process. In general, a request is made for a snapshot to be created at a time t0 and the creation of that snapshot is reflective of the volume data at time t0. After t0, the content of the snapshot itself can be backed-up, and a full backup is thus avoided through the use of differential data or files, which enables the system to efficiently act upon only the data that has changed since a previous time.
“Support for Multiple Temporal Snapshots of Same Volume”, U.S. patent application Ser. No. 09/505,447, now U.S. Pat. No. 6,651,075, filed Feb. 16, 2000, to Cabrera et al., “Kernel-Based Crash-Consistency Coordinator” U.S. patent application Ser. No. 09/505,344, now U.S. Pat. No. 6,647,473, filed Feb. 16, 2000, to Golds et al and “System and Method for Growing Differential File on a Base Volume of a Snapshot”, U.S. patent application Ser. No. 09/505,450, now U.S. Pat. No. 6,473,775, filed Feb. 16, 2000, to Kusters et al. relate to backup processes generally and are directed to different aspects of snapshot systems. These applications are hereby incorporated by reference, as background information relating to the provision of snapshot services.
However, one difficulty that is often encountered when trying to perform a backup or restore service for volume(s), application(s) or other object(s) is that, for a given object to be backed up or restored, the state of the object must be known for an atomic point in time. The most recent point in time for which the state of the object is known for a ‘snapshot’ in time is thus the most recent point in time to which the object can be restored.
Similarly, if a backup or snapshot service is being provided, the dependency information of the object to be backed up must be frozen in time and in a proper order for the snapshot to consistently reflect information about an atomic point in time.
If the state of applications, tasks or other objects upon which the given object to be backed up depends can not be extracted for such a single point in time, the back up process may be ruined since data in those applications, tasks or other objects may be corrupted by an arbitrary amount of time needed to finish backing up one task versus another. First, different tasks generally take a different amount of time to complete i.e., a read operation will generally take a different amount of time to complete than a write operation. Even the same task, such as a write operation in connection with a file system, can take different times for completion when the system resources are used dynamically. Furthermore, it may be unknown whether multiple applications are writing or otherwise depending to the same object, thereby creating dependencies of which the backup service may be unaware.
Thus, for a proper backup or restore operation for a given volume, application or object, some processes or files may first need to be frozen (no more writes associated therewith for the time being), and in a certain order according to the given object's dependencies. Especially in connection with a snapshot service, wherein state information regarding an atomic point in time is paramount to providing the point in time backup information, an intelligent determination of which applications or processes and only those applications or processes that need to be acted upon (frozen) and in what order would be advantageous.
It would be further beneficial to provide an informed determination of which applications may be excluded from a restore operation(s) to a previously known state of the object to be restored would provide still more efficiency to the recovery processes.
Often, a system crash or freeze might involve only a few problematic files or processes. For example, when an application is no longer responding, it is usually desirable from a system standpoint to terminate only the application that is not responding, or to find an application dependency that may be causing the program to freeze its response. Indeed, it is often the case that a programmer of application A can not anticipate all of the combinations and permutations of other applications B, C, D, etc. that will be installed on a given computer system for running application A. A state of operation of an application A may depend upon certain other applications, storage components, processes and files that are in application A's system space for proper execution, and these dependencies may be changing dynamically. Thus, a way of freezing the state of an application A at an atomic point in time by freezing objects upon which it may depend either vertically or horizontally is useful, and would be a vast improvement over freezing an entire volume of data (which may be much more freezing than is necessary to backup a single object) for a snapshot in time. The objects to which a given object being backed up may depend must be frozen or flushed in a proper order prior to the backup operation(s) since there may be embedded dependencies that must be unraveled in an order that is determined by those dependencies.
Thus, since a full backup or restore of an object of a computer system is undesirable from a resource standpoint, a way to retroactively recover from a crash or institute a backup, such as a snapshot service, by selectively and intelligently freezing certain data in a selective, efficient and proper order based upon the dependencies of the object would be a useful addition to the field of recovery.
“Database Computer System Using Logical Logging to Extend Recovery”, U.S. U.S. application Ser. No. 09/268,146, now U.S. Pat. No. 6,978,279, filed Mar. 15, 1999, to Lomet et al. describes a technique for utilizing logical logging operations to reduce logging cost during a recovery process. By introducing identity operation that unexpose a nodal object from flush dependencies, an on the fly unwrapping of application flush dependency information is disclosed. Thus, in the event of a system crash, the problem of application dependencies has already been handled via logical logging operations and accordingly, no recovery difficulties are encountered in connection with the unraveled or unwrapped dependency. However, this technique does not determine an order in which to freeze objects based upon hierarchical object dependency information, and consequently may not be used for the freezing of a state of an object across all of the object's dependencies for a given point in time.
Consequently, it would be desirable to provide a way to retroactively recover from a crash or perform a backup or restore operation by selectively and explicitly analyzing application dependency information in a selective, efficient and proper order. It would be further beneficial to maintain information about volume snapshot dependencies i.e., hierarchical information about which applications depend on which other applications, which may have even further dependencies and so on, for a given point in time. It would be further desirable to use the volume snapshot dependency information in connection with a recovery from a crash, so that the process of recovery from a crashed state may be improved. It would be still further desirable to provide a uniform protocol by which applications may convey information about the objects upon which the applications depend.