In the present context, by a non-destructive testing of safety-relevant registers there is meant the testing of electronic storage registers the operatability of which is not impaired by the test and the data content of which will correspond again to the original value after the test. Safety-relevant registers which, in the following, will also be referred to as safety registers, are storage registers in electronic storage media in which relevant data for the safety and the correct functioning of a system are stored.
In order to comply with automotive safety requirements according to the automotive safety integrity levels which are for instance defined in the functional safety standard ISO 26262 for road vehicles, various application-independent safety measures have to be taken in the hardware implementation of safety-relevant parts of vehicle microcontrollers. The functional safety standard defines specific maximum time intervals for fault tolerances in safety parts of the microcontroller and in the built-in safety mechanisms.
The influences of external events, as for instance mechanical actions or radioactive alpha radiation, can lead to inverted states in storage registers, whereby safety-relevant submodules may be controlled incorrectly and serious malfunctions of the complete vehicle controller may be effected. This can affect configuration registers as well as universal registers of relevant system components or particularly critical internal registers, as for instance finite state machines and counters. Consequently, such safety registers have to be highly protected by at least indicating, if not correcting, faults, and by issuing an alarm or the indication for the triggering or activation of a safety function or a safety mechanism, whereby a corresponding action or a safety function, for instance a system reset, will be triggered or activated. Such safety mechanisms, safety functions or safety measures can be configured in a so-called safety management unit.
In order to guarantee the functional safety standard, the non-availability of a safety mechanism has to be detected within a maximum time interval, wherein the maximum time interval usually corresponds to an average driving cycle of a vehicle of, for instance, several hours.
A testing of the safety measure can be carried out by simulating the possible external influence, for example, a radioactive radiation with alpha particles. This can be simulated by inverting safety register bits at least one time during a driving cycle, whereupon it can be checked whether corresponding alarms or safety mechanisms will be triggered as expected. One difficulty in this test requirement is the selection of the point of time at which such a safety test can be carried out in the system, as the test method can cause a detrimental influence on the normal function of the tested system.
The following alternatives in the selection of the point in time for the testing of the safety measure can lead to the following different problems:    A. When the test is performed after termination of the current driving cycle, it will not interfere with the normal operation of the tested system. It can not be guaranteed, however, that the safety function of the tested system will still be faultless when the next driving cycle is started. Thus, this alternative has a decisive drawback.    B. When the test is performed at a point of time within the driving cycle, it must not affect the normal function of the tested system. Hence, when the normal operation shall be maintained, the question arises, however, at what time important register bits, which are constantly required during the driving cycle, can be modified for test purposes, as it cannot be generally assumed that there are idle cycles during which the tested register bits are not needed. Furthermore, it has to be guaranteed that values or data stored in the tested register bits are restored after the test.    C. A test during the start of the system to be tested is only acceptable when basic configuration settings, as for instance for a clock control, which are required for the correct performing of the tests themselves, will not be impaired during the testing. Therefore it cannot suffice to modify the safety registers for the testing at random and to restore the reset values again after the tests. Furthermore, the testing of the safety measures for the storage registers must not require too many clock cycles, as the acceptable time interval for starting the system to be tested is limited. The taking into account of such dependencies can lead to a considerable complexity with regard to the software for starting the system together with a high risk of error.
Also, the costs that are caused by the storage space required for the safety registers and by the supply power pose a problem. Another challenge is to be able to flexibly increase or decrease the safety functions of the system with minimum expenditure or overhead in order to avoid a too high or too low protection by safety functions.
A further demand consists in the methodological problem that the protection or saving of the registers must not detrimentally affect the production time and the time of delivery which are required for the product to reach the customer. Consequently, the development or the configuration of the hardware and the verification plans must not be considerably burdened by an additional overhead for the implementation and verification of the safety logic.
Hitherto, the following basic approaches for the register saving have been known from prior art:
For CPU cores of, for instance, the AURIX product family there is provided a redundancy at the system level in that complete CPUs are duplicated and operate simultaneously. A special lockstep control logic enables the constant comparison of the outputs of the master CPU and of a redundant checker CPU with a configurable delay of checker inputs and master outputs, wherein the compare logic can be tested via separate test inputs.
If applied to a register saving, such an approach would correspond to the duplication of the entire sequence and combination logic for the calculation of the states of the saved registers. This would result in higher operating expenses and would not provide any flexibility for the selection of individual register bit fields for the protection of the register saving without modifying the architecture. Furthermore, only the compare logic would then be tested, but not the redundant registers.
Another known method is based on special redundancy codes (Error Correction Codes) and is used for the protection of RAM blocks as well as for the detection and correction of errors, wherein additional check bits are added to the data words by ECC encoders during writing and are checked by ECC decoders during reading which, based on the ECC configuration, are able to not only detect a certain number of bit errors but also to correct them up to a certain extent. Such an approach could also be used for the register saving. As the ECC protection, however, requires regulated memory structures with predefined word lengths, the design structure would have to be modified substantially in order to, one the one hand, insert the ECC logic itself, and, on the other hand, to arrange the register logic such that the required regularity is achieved. Such a restructuring would cause a higher design expenditure and would lack the desired flexibility.
So-called library cells with dual or triple modular redundancy are another known alternative which, however, would influence the design up to the synthesis, as in the case of duplication the register cells would have to be equipped with additional inputs and outputs for the testing and for the triggering of a safety function or of an alarm.
A triple modular redundancy would make it possible that the ports of the original register cells would be retained if no alarm were necessary, as the register outputs could be created by majority vote. The triple redundancy would, however, result in an increase in the costs in this area, and the exchange of library cells would have an enormous influence on the design flow and would require either a manual intervention or complex scripts.
The addition of check bits to registers is another possible approach. The resulting long chains of XOR gates would, however, increase the power consumption and the combinatorial propagation delays in the design. Moreover, an even number of simultaneous errors could have the effect that these errors could cancel each other so that no safety function or alarm function would be marked or selected. Finally, individual solutions for the implementation of hardware safety measures in each module would lead to extremely heterogeneous solutions which would be difficult to verify, to test and to certify.
Alternatively, the register could be protected by means of safety software, in that all write values for safety registers are additionally stored in another memory and are read therefrom and compared from time to time. If the repetition cycles are not too short with regard to the safety requirements, and if the software does not increase the load of the CPU too much, this approach would be acceptable. Not all registers, however, can be saved and protected by the safety software, as internal registers are out of reach for the software, and some externally visible registers are additionally updated by hardware so that the safety software could not decide whether a change has taken place owing to a regular hardware update or owing to an isolated case error.