For safety or security relevant applications, etc., fault tolerance is important. In part, fault tolerance may be obtained by detecting faults in hardware and/or software. When a fault has been detected some appropriate response can be taken.
It is known in the art to use two almost identical central processing units (CPUs), one of which operates as the master CPU and the other as the “checker” CPU. Both central processing units execute basically the same program code and receive the same input data. The outputs of the two central processing units are compared to each other in order to identify errors of the master CPU during operation. Doing the reciprocal checking in software is quite complex. Also, monitoring the hard real-time constraints is difficult as software runs in a virtual time. In virtual time, events are partially ordered, but exact timing is often unknown.
For example, two software tasks have implemented a comparator to compare the results of both tasks. If one task or core fails, the other has to detect that. Each task has the real-time constraint that it has to provide the other task with the value for comparison within the required time window. These real-time constraints need also to be monitored.
The tasks may need to sync up with each other or even wait for each other. The sync up mechanisms must be designed with a timeout when waiting for a task. The monitoring tasks need to be observed for starvation. For example, comparison mechanism may be implemented in each thread, inter-core communication may be needed, time supervision may be required in case, say the first thread never sends data to the second thread, and vice versa. The waiting may use semaphores which are another potential source of dead-lock. All this adds complexity and requires resources to be used. Furthermore, analysis is then needed to assure that the monitoring constraints are met as all the comparison and monitoring is part of the software.
In the art hardware has been proposed to assist with the verification. For example, United States Patent Application 20080244305 A1, “Delayed Lock-Step CPU Compare” discloses a known CPU compare unit.
In the known system, an electronic device is provided which a first CPU, a second CPU, a first delay stage and a second delay stage and a CPU compare unit. The first delay stage is coupled to an output of the first CPU and a first input of the CPU compare unit. The second delay stage is coupled to an input of the second CPU. An output of the second CPU is coupled to the CPU compare unit. The first CPU and the second CPU execute the same program code and the CPU compare unit is adapted to compare an output signal of the first delay stage with an output signal of the second CPU. By delaying both, the input data of the second CPU and the output data of the first CPU, the time shift due to each of the two delays are compensated at the CPU compare unit. The CPU compare unit always compares data belonging to the same operation step of the CPU program codes being executed in either one of the CPUs. The execution of the program in the first and the second CPU is in a delayed lock-step. Yet, the output signals of the CPUs arrive at the CPU compare unit in lock-step. The CPU compare unit may be adapted to report a match or mismatch of the compared output signals to the system. The system may then react appropriately on the reported error.
In the known system, if one or the other of the two tasks does not complete, the system is stalled without detection. The system requires lockstep operation to ensure synchronization is maintained. The known system requires that the two programs are identical.