The present invention relates to fault tolerant computing systems. In particular, the present invention deals with recovering from a fault occurrence detected within a computing system. A fault occurrence is an occurrence during an execution of machine instructions which renders data or a subsequent execution of machine instructions invalid. Rather than halt entirely and reboot the computing system, it is desirable to recover and continue execution of machine instructions, with a minimum amount of disruption, assured that data and subsequent execution of machine instructions will be valid.
The computing system is characterized by a set of attributes called a system state. A system state includes process data, consisting of process control blocks and local data accessible to processes, and file data, consisting of permanent data, such as database files.
Prior recovery schemes only partially recovered from a fault occurrence. Any modifications of file data begun before fault occurrence were either completely finished or completely undone. Prior recovery schemes periodically recorded, at checkpoints, enough data to completely restore a checkpoint system state, which is a system state existing at a checkpoint.
When a fault was detected, a previously performed file modification was undone by running backwards through previously logged information describing the file modification. The computing system was reset to a checkpoint system state most recently recorded, defined as a last checkpoint system state.
Prior recovery schemes typically did not restore file data to an identical condition existing immediately before fault occurrence. Processes which did not finish modifying file data before fault occurrence were aborted, not restarted. The system state reached by completing fault recovery, defined as a final system state, was typically not a pre-fault system state. The pre-fault system state is the system state existing immediately before fault occurrence. The final system state was often merely the last checkpoint system state.
Prior recovery schemes used modular redundancy to provide fault tolerance. Two or more processors would execute in parallel, executing identical code. At periodic checkpoints, parallel results would be compared. Should the results be found to be different, an arbitration scheme would choose between the parallel results. Modular redundancy was cost prohibitive; duplicating hardware was too expensive.
In the prior art, a scheme enabled a final system state identical to a pre-fault system state to be reached. Checkpoints were inserted before every point at which a nonrepeatable I/O operation was to be performed. At each checkpoint, a user had to insert code which would record enough information to restore a checkpoint system state.
The scheme suffered several disadvantages. The scheme was not transparent to a user, but instead made the user partially responsible for insuring that error recovery would be correct. The scheme required a user to select what information had to be recorded at each checkpoint, and was therefore more prone to human error than a scheme that was transparent. Selecting insufficient information would jeopardize correct recovery, yet selecting too much information would degrade system performance.
Another disadvantage was that a checkpoint interval, which is an interval between two successive checkpoints, was program driven instead of being program independent. Excessive overhead was involved in recording checkpoint information before each non-repeatable I/O operation. The excessive overhead seriously degraded system performance. Checkpoint intervals could not be made longer than intervals between non-repeatable I/O operations. Overhead could not be spread over a longer checkpoint interval, thereby improving system performance. Since mean recovery time is related to the checkpoint interval, no trade-off could be made between system performance and mean recovery time.