The present invention is related to general purpose digital computer systems, and is more particularly related to a circuit and its method for selecting original data from a register log during rollback or fault tolerant computer systems.
A number of fault tolerant systems have recently been developed. Some such systems offer pure softwere solutions for non-stop operation by requiring the user to program checkpoints into the data processing routines wherein results from a processor of the system can be compared by software to determine if the system is continuing to operate correctly and without error.
Other systems offer complete hardware solutions, including redundant logic with total transparency to software on all solid failures. However, processing in such systems cannot continue on a unit when a transient error occurs because special diagnostics must be invoked to determine if, in fact, the error is a transient error rather than a solid failure. Many times, a second processor is required to ensure non-stop operation on both transient errors and solid failures. With two processors in the system, only 50% of the potential computational power of each processor is utilized, because both processors must be executing identical tasks in parallel to provide continued operation in the event of a failure. When a detected failure is corrected in the faulty unit of such a system, the two processors typically must be resynchronized to continue parallel operations.
Such systems generally require significant overhead on transient errors (which statistically occur from 10-100 times more frequently than hard errors) and have a period of vulnerability on the order of one million machine cycles (the time required to bring the first processor back on-line). A transient error occurring in the second processor during this period of vulnerability will bring the system down.
U.S. Pat. No. 4,453,215 to Reid issued June 5, 1984 for "Central Processing Apparatus for Fault-Tolerant Computing" discloses a fault tolerant computer system in which the information-handling parts of the system have a duplicate partner. Error detectors check the operation of the system to provide information transfers only on fault-free bus conductors and between fault-free units.
Other patents which show the state of the art include U.S. Pat. No. 4,165,533 to Jonsson issued Aug. 21, 1979 for "Identification of a Faulty Address Decoder in a Function Unit of a Computer Having a Plurality of Function Units With Redundant Address Decoders"; U.S. Pat. No. 4,453,210 to Suzuki et al. issued June 5, 1984 for "Multiprocessor Information Processing System Having Fault Detection Function Based on Periodic Supervision Of Updated Fault Supervising Codes"; U.S. Pat. No. 4,453,213 to Romagosa issued June 5, 1984 for "Error Reporting Scheme"; and U.S. Pat. No. 4,456,993 to Taniguchi et al. issued June 26, 1984 for "Data Processing System With Error Processing Apparatus and Error Processing Method."
The present invention is particularly useful with a fault tolerant computer system of the type described in an allowed co-pending patent application U.S. Ser. No. 748,361 filed June 24, 1985 by Corcoran et al. for "Virtual Command Rollback in a Fault Tolerant Data Processing System" which application is assigned to the assignee of the present invention. Such a system includes a processor which contains copies of the most used index registers such that during execution of a command, the processor may use the copies of the index registers rather than going to the main memory to fetch them. However, during the execution of the command, the index registers may be changed. If the fault tolerant system rolls back to re-execute the command, the index registers must be restored before the command may be re-executed. The present invention maintains copies of the original index registers and copies of the modified index registers, such that in the event of a rollback, original copies of the index register may be restored to the processor.
Patents which show the state of the art in maintaining copies of data in separate registers include U.S. Pat. No. 4,008,460 to Bryant et al. issued Feb. 15, 1977 for "Circuit For Implementing A Modified LRU Replacement Algorithm For A Cache," U.S. Pat. No. 4,024,508 to Bachman et al. issued May 17, 1977 for "Database Instruction Find Serial," U.S. Pat. No. 4,157,586 to Gannon et al., issued June 5, 1979 for "Technique For Performing Partial Stores In Store-Thru Memory Configuration," U.S. Pat. No. 4,168,541 to DeKarske issued Sept. 18, 1979 for "Paried Least Recently Used Block Replacement System," U.S. Pat. No. 4,250,546 to Boney et al. issued Feb. 10, 1981 for "Fast Interrupt Method," U.S. Pat. No. 4,336,588 to Vernon et al., issued June 22, 1982 for "Communication Line Status Scan Technique For A Communications Processing System," U.S. Pat. No. 4,425,618 to Bishop et al., issued Jan. 10, 1984 for "Method And Apparatus For Introducing Program Changes In Program--Controlled Systems," U.S. Pat. No. 4,439,829 to Tsiang issued Mar. 27, 1984 for "Data Processing Machine With Improved Cache Memory Management," and U.S. Pat. No. 4,464,712 to Fletcher issued Aug. 7, 1984 for "Second Level Cache Replacement Method and Apparatus.