The basic scheme used in most fault tolerant computer systems is to employ a primary and a backup computer. Users interact with the primary computer in order to obtain a service. The primary computer performs the tasks requested by users and informs the backup of its actions so that the backup can take over providing the service if the primary computer fails. Thus, hardware failures in the primary computer do not cause interruption of service to the users.
In a properly implemented instance of this scheme, the backup processor must generate no interactions with its environment before the primary computer has failed. And, after the primary computer has failed, the backup processor must generate interactions with its environment in such a way that the environment is unaware of the primary computer's failure.
Fault tolerance in computers is usually implemented either (A) by constructing special purpose computer hardware, or (B) by modifying the computer's operating system. The special purpose hardware approach requires hardware that is intimately related to the computer processor's design. As a result, such computers are usually unnecessarily costly for clients who do not require fault tolerance.
The major problem associated with using special purpose operating system code to implement fault tolerance is that the only operating system that can be used on that computer system, and still maintain fault tolerance, is the operating system containing the special purpose code. If a user who needs fault tolerance wants to use another operating system with the computer, extensive (and thus expensive) changes to this second operating system will be required.
The goals of the present invention are (1) to provide a fault tolerant computer system with little or no added cost for clients and processes that do not require fault tolerance, and (2) to provide a fault tolerance mechanism that works regardless of the operating system software used by the system's clients. While the "special-purpose hardware" and "modified operating system" approaches are both capable of meeting the basic requirements for a fault-tolerant computer system, the present invention overcomes cost problems associated with the "special-purpose hardware" approach and provides more flexibility in terms of operating system selection than the "modified operating system" approach.