Successful program execution in computer systems is vulnerable to power supply interruptions which may, in the case of military applications, be initiated by detectors which respond to radiation induced by a nuclear event. That is, in order to avoid damaging effects to circuits susceptible to surge currents which can be caused by the event, radiation detectors are typically incorporated to rapidly switch off power to those circuits. There is, therefore, insufficient time after the detection of an event to prepare the system for recovery. The ability to reestablish program execution after such an event is known as circumvention recovery.
At least two design methods have been investigated to achieve recovery from such an interruption. One is known in the art as "snapshot" and involves the processor saving critical data to a hardened memory as each program instruction is executed. This data may then be used for recovery after an event. This technique, however, requires a large amount of additional processing in normal operation to execute the extra instructions required. Since the occurrence of a nuclear event cannot be known in advance, this extra processing represents a burden to normal system activity at all times rather than only when circumvention recovery is required.
The second known method for circumvention recovery is called "rollback" because the program is constructed so as to establish "rollback points". These points are like safe harbors with critical data saved that the processor can use to begin re-execution of the program. This method causes the system design to become more complex and entails the execution of more program instructions to prepare for rollback at each point.
A variation of the snapshot technique is used to survive a fault that renders a portion of the main computer memory inoperative. Fault tolerant systems often prepare for such a fault by writing data to two memory units so as to have a backup. Because this slows the system down considerably an optional method has been used in which a second area in memory is updated only periodically. However, if a fault occurs during the copying operation to the second area the data in both locations may be suspect. An example of a technique using two backup areas in memory to correct this problem is given in U.S. Pat. No. 4,654,819 to Stiffler, et al.
An example of a rollback technique is given by U.S. Pat. No. 4,751,639 to Corcoran et al. disclosing a system which uses dual processors for the detection of faults. Both processors execute the same program. A miscomparison circuit compares the data after the execution of each instruction. If the miscomparison circuit detects that the processor outputs do not agree, the programs of both processors are rolled back so that they execute the same instruction again. If the error was caused by a transient of some nature, the results of the second execution should be successful and the program may proceed.
Other relevant material is discussed in U.S. Pat. No. 3,536,259, U.S. Pat. No. 3,950,729, U.S. Pat. No. 4,251,863, U.S. Pat. No. 4,270,168, U.S. Pat. No. 4,330,826, U.S. Pat. No. 4,354,230, U.S. Pat. No. 4,356,546, U.S. Pat. No. 4,377,000, U.S. Pat. No. 4,412,280, U.S. Pat. No. 4,428,048, U.S. Pat. No. 4,453,215, U.S. Pat. No. 4,455,602, U.S. Pat. No. 4,484,275, U.S. Pat. No. 4,575,842, U.S. Pat. No. 4,581,701, U.S. Pat. No. 4,590,554, U.S. Pat. No. 4,607,365, U.S. Pat. No. 4,622,667, U.S. Pat. No. 4,634,110, U.S. Pat. No. 4,644,496, U.S. Pat. No. 4,645,459, U.S. Pat. No. 4,750,177, U.S. Pat. No. 4,805,095, U.S. Pat. No. 4,814,982, U.S. Pat. No. 4,816,989, U.S. Pat. No. 4,816,990, U.S. Pat. No. 4,817,091, U.S. Pat. No. 4,819,154, U.S. Pat. No. 4,819,232, U.S. Pat. No. 4,821,212, U.S. Pat. No. 4,823,256, U.S. Pat. No. 4,823,261, U.S. Pat. No. 4,860,192, U.S. Pat. No. 4,860,606, U.S. Pat. No. 4,868,744, U.S. Pat. No. 4,875,155, and U.S. Pat. No. 4,885,680.