Computer systems often operate with a certain amount of fault tolerance. Faults may occur for a variety of reasons, such as software bugs, hardware bugs, memory bit-flipping due to single event upset (“SEU”), or the like. Many applications are tolerant of such faults for a variety of reasons. However, many other applications require a certain level of safety built into a computer system in order to prevent processing faults from occurring.
Many techniques exist for adding fault tolerance to a computer system to prevent processing faults from occurring or to cause the computer system to take corrective measures when a fault occurs. For example, in one technique, fault tolerance is provided by adding redundant microprocessors that execute identical instructions. If the redundant processors produce inconsistent results, then the computer system detects a fault and may enter a fail-safe mode in which such a fault is corrected and/or the computer system is shut down.
However, adding fault tolerance into a computer system generally involves adding hardware, which adds cost to the computer system. For example, redundant processors require the area cost of additional processors.
As has been shown, what are needed in the art are techniques for improving safety in a computer system.