In the field of safety applications, for example safety applications for automotive electronics, in order to satisfy reliability and/or functional safety requirements, it is known to utilize a redundant hardware architecture comprising, for example, two or more processing modules performing substantially the same operations synchronously. This is often referred to as operating in ‘lock-step’. Outputs of the two processing modules are continuously monitored and compared with one another to detect mismatches in the outputs of the modules, and thus to detect faults therein. Such lock-step architectures provide useful capabilities for the detection of faults, when such faults have an impact on the output. Typically, upon detection of a mismatch, further execution of the particular application being executed is inhibited, with affected systems being placed into a ‘safe’ condition. For example, a safe condition may comprise a system state where the system is unable to trigger potentially dangerous operations. The safe state may be enforced by a system component that is out of the fault propagation domain of the detected fault (e.g. an external window watchdog). For systems that are not able to remain in a safe state, a typical approach is to reset and reboot them in order to re-synchronize the processing modules. This process can take from several hundred milliseconds up to several seconds, during which the system is unavailable, potentially creating a temporarily dangerous situation.
A major problem with the use of such architectures operating in lock-step is a ‘lack of availability’ of applications caused by their execution being inhibited due to a detected mismatch indicating a fault. Faults occurring during the execution of an application may be divided into classes of faults, for example:
(i) permanent faults;
(ii) intermittent faults; and
(iii) transient faults.
Permanent faults may be defined as faults that, once present, are persistent and thus relatively constant (permanent) in nature. Permanent faults are typically caused by a physical defect of the hardware. Intermittent faults may be defined as faults that occur either periodically, or more commonly at irregular intervals. The cause of an intermittent fault is typically a result of several contributing factors occurring simultaneously. As a result, such faults can be difficult to detect since all contributing factors must be present in order to recreate the fault. Transient faults may be defined as temporary faults that occur during operation and disappear when the system is powered off or reset. Transient faults are typically caused by changes of data values without a physical defect of the hardware and may occur as a result of environmental conditions. Typically, transient faults occur much more frequently than permanent faults, and may typically be expected to occur in lock step architectures, approximately one hundred times more frequently than permanent faults.
Although known lock step architectures and techniques for processing modules operating in lock-step provide good detection capabilities for faults that have an impact on the output, they are not able to distinguish between the different classes of faults. In particular, they are unable to distinguish between permanent and transient faults. This is a severe limitation in lock step architectures as all faults will be treated the same. Thus, for any fault detected, the further execution of that particular application will typically be inhibited until a complete system reset of power-down has been performed, irrespective of the class of fault detected.