This section is intended to introduce the reader to various aspects of the art that may be related to various aspects of the present invention. The following discussion is intended to provide information to facilitate a better understanding of the present invention. Accordingly, it should be understood that statements in the following discussion are to be read in this light, and not as admissions of prior art.
The present invention solves the problem of a redundant controller that remains reliable even when subjected to an anomaly such as a single event upset. In addition to redundancy, the design also provides for multiple compute engines (in the present invention, the RPUs) and a voting mechanism to detect failures. When a failure is detected, the system attempts to repair the condition to return to a fully functional state.
This invention is unique in that there is no auxiliary checkpoint or auxiliary replication scheme as there is when there is a fully redundant set of hardware to handle the redundancy. This eliminates costly periodic check pointing as the RPUs of the present invention are check pointed on every stimulus. This brings the type of costly hardware check pointing into a much simpler and less costly software solution.
An example of hardware in existing equipment that is no longer needed would be hardware such as Digital Equipment Corporation used to manufacture. One of their fault tolerant hardware solutions was the FT-410 model, also known as the Vaxft 410, where the processors of two physically separate but co-located computers were in precise instruction lock-step with one another. Each had their own hard drive, CPU, memory, etc. Additionally, in a scheme like the FT-410 there is additional cabling and dedicated hardware to monitor and enforce the lock-step of the processors. The present invention is void of such hardware devices.