Fault-tolerance or graceful degradation is a property that enables a computer based system to continue operating properly in the event of the failure of some aspect of the system operation. A failure detection mechanism is generally required to enable use of complex CPUs in safety critical systems, such as automotive, aerospace, industrial, medical, etc. For simple CPUs, this has traditionally been done by the use of online software based testing or by a full duplication of CPUs with a compare of all outputs, which is also known as “lockstep” CPUs. The second CPU is effectively a real time hardware checker. A watchdog timer may be used in conjunction with software based testing. When the watchdog timer is not reset by a software operation within a defined amount of time, an interrupt or reset operation is invoked to determine why the software did not respond correctly.
As the need for safety critical systems has expanded into embedded applications in automotive, aerospace, industrial, medical, etc., fault tolerant concepts are now employed within microcontroller units (MCUs) and/or microprocessor units (MPUs) that may be part of a system on a chip (SOC). These embedded systems may include one or more central processor units (CPU) that may execute application software for controlling the operation of an automobile, airplane, process control system or medical device, for example.