1. Field of the Invention
The present invention relates to computer hardware and software, and more particularly to a method and system for recovering the state of object-oriented software in the face of partial or total failure of the underlying computing platform.
2. Description of the Prior Art
Failure of a computer can often result in the loss of significant amounts of data and intermediate calculations. The cause of failure can be either hardware or software related, but in either instance the consequences can be expensive, particularly when data manipulations are interrupted in mid-stream. In the case of large software applications, a failure might require an extensive effort to regenerate the status of the software and data prior to the failure. Several techniques have been developed to address this problem, and are disclosed in the following issued U.S. Patents:
U.S. Pat. No. 5,594,861 discloses an error handling system in a telecommunications exchange. Certain objects within software applications are defensively programmed to detect and report errors. An error handler object provides process centralized error handling functionality, and is configured to determine and specify a recovery for returning the software application to a well defined state.
U.S. Pat. No. 5,151,987 discloses a system and method for recovering objects in an object oriented computing environment. A recovery from an unplanned failure is executed by storing recovery information in recovery objects. The recovery information is limited to only that information which is necessary to recover from unplanned failures.
U.S. Pat. No. 5,469,562 discloses a system that provides recovery from the effects of incompletely executed transactions in the event of a fault. During execution, certain data is stored in persistent memory. During fault recovery, the system calls the agent specific procedures, as needed, using the recovery and recovery sequence information stored during normal transaction execution.
U.S. Pat. No. 4,814,971 discloses a virtual memory recovery system wherein periodic checkpoints are taken of the state of a computer system. If a system crash occurs, the machine state can be rolled back to the checkpoint state and normal operation restarted. Modifications made after the checkpoint time are discarded when the system state is rolled back to the saved checkpoint state. As used herein, the term "persistent" is in reference to a computer memory storage device that can withstand a power reset without loss of the contents in memory. Persistent memory devices, have been used to store data for starting or restarting software applications. In simple systems, persistent memory devices are static and not modified as the software executes. The initial state of the software environment is stored in persistent memory. In the event of a power failure to the computer or some other failure, the software restarts its execution from the initial state. One problem with this approach is that all intermediate calculations will have to be recomputed. This can be particularly onerous if large amounts of user data must be reloaded during this process. If any of the user data is no longer available, it may not be possible to reconstruct the pre-failure state.
More sophisticated executable programs might dynamically update the configuration of persistent memory. The updates can take the form of a "snapshot," or duplicate, of the entire contents of the relevant portion of computer memory. The updates can also be limited to certain key intermediate results. This allows for more efficient software recovery because intermediate calculations can be stored and then recovered from persistent memory. During the recovery process, the software restarts from its last saved state. For example, whenever a large batch of data is processed, a snapshot of the current state of the software can be stored to persistent memory. In the event of a failure, the large batch of data will not have to be reloaded and reprocessed.
Another class of solutions is to have redundant hardware configurations. In the event of a hardware failure, the redundant processors can take over the functions of the failed hardware. Ideally, this should happen with no human interaction, but in any event, within a time frame consistent with "high availability" objectives. Most of these schemes depend upon a predefined "configuration" record, together with copies of the state of the "lost" programs. Thus, the relationships are static, and often the recovered programs proceed using the last known state of the failed program. Often this happens without the recovered program having a record that a recovery has taken place.
Object-oriented programming environments present unique challenges for software recovery efforts. Software objects are typically encapsulated blocks of code that can be saved in persistent storage. In the event of a failure, this strategy can often recover the objects. Some examples of this approach can be found in the set of UNIX start scripts and user preferences used by end-user type applications. However, object oriented software environments typically have rich inter-object relationships. These relationships are established due to the logical dependencies between the objects. A simple recovery strategy can successfully recover the objects, but will not recover the inter-object relationships.
A desirable system for software recovery in object-oriented environments would recover the objects themselves, along with the inter-object relationships. The present invention addresses this need.