Multiple redundant processor systems are implemented as fault-tolerant systems to prevent downtime, system outages, and to avoid data corruption. A multiple redundant processor system provides continuous application availability and maintains data integrity such as for stock exchange systems, credit and debit card systems, electronic funds transfers systems, travel reservation systems, and the like. In these systems, data processing computations are performed on multiple, independent processing elements and the results are compared to each other to detect processing errors.
A redundant processor system can generate processing errors that are written to memory and stored as erroneous data that remains undetected until an input/output operation is initiated, such as a write to disk or to a communications line. Erroneous data may also be detected if the independent processing elements test and branch on the erroneous data and then perform some other comparative operation. Alternatively, the undetected erroneous processing data may never be detected if the data is not requested and/or if the memory location that stores the erroneous processing data is written over.
These undetected, or latent, errors in a redundant processor system, however, can be unknowingly perpetuated. For example, in a triplex redundant processor system, a first processor may write a data error to a first memory location in a memory region that corresponds to the first processor. Additionally, a second processor may write a data error to a second memory location in a memory region that corresponds to the second processor. If the erroneous data is not requested via an input/output operation, or otherwise compared, the two processing errors are stored undetected as erroneous data in the different redundant memory locations.
If the third processor of the system fails, and is subsequently removed and replaced, the third processor and a corresponding memory region is reintegrated into the redundant processor system. The memory region corresponding to the first or second processor is copied into the replaced memory region corresponding to the third processor. If the first memory region is copied into the replaced memory region, for example, then the first processor data error written to the first memory location is also copied into the replaced memory region.
When an input/output operation is initiated at the first memory location of the redundant memory regions, the data in each memory region at the first memory location will be compared to determine which is the correct data. The erroneous data in the first memory region and the copied erroneous data in the replaced memory region will be compared and determined to be the correct data by a voting operation (e.g., ⅔), while the actual correct data in the second memory region will be determined to be the erroneous data.