Functional-safety standards such as the standard IEC 61508 and the standard ISO 26262 require, for electronic systems used in applications critical for life, testing of the capacity of the above systems to detect or tolerate hardware faults or software errors.
With the increase in use of electronic systems, the above critical applications are increasingly complex, which in this case by “complexity” is meant the complexity of the software functions that are carried out. The complexity may, for example, be measured via the so-called ‘cyclomatic complexity’ developed by Thomas J. McCabe in 1976.
An example of the above high-complexity critical applications is the autonomous-drive vehicle, i.e., a motor vehicle that is able to move without or with limited human intervention. It is evident on the one hand that a fault in such a vehicle may have very serious consequences for passengers or pedestrians and on the other hand that this application is very complex because it is made up of a number of software functions strictly interfaced with one another (detection of lane, detection of distance from the surrounding vehicles, reading of traffic signals, etc.).
As has been said, functional-safety standards concern both hardware faults and software errors. The solutions discussed herein refer specifically to hardware faults.
One of the methods referred by the aforementioned standards for testing the capacity of the system to detect or tolerate hardware faults is the so-called “fault injection”, i.e., a form of test that envisages intentional injections of faults in order to verify that these faults do not have any effect on the application or else are detected by appropriate control systems.
Hardware-fault injection techniques are known in the prior art but present two main drawbacks.
When hardware-fault injections are carried out at the level of testing of a physical electronic system in the laboratory, these techniques are able to measure the effect of macroscopic hardware faults such as, for example, lack of power supply or failure of a connection at the printed-circuit level, but are altogether unable to measure the effect of microscopic faults such as, for example, the effect of a fault on a transistor of one of the integrated circuits forming part of the electronic system, because at a laboratory level it is not possible to inject such faults into the integrated circuit other than by extremely costly operations (such as, for example, irradiation of the component with alpha particles).
When hardware-fault injections are carried out at a level of hardware simulation of the electronic system, as indicated, for example, in the U.S. Pat. No. 7,472,051 B2 filed in the name of the present applicant, these techniques are able to measure the local effect of a hardware fault, i.e., for example how an integrated circuit reacts at its pins in the case of a microscopic fault such as failure of a transistor inside the integrated circuit. However, these techniques become ineffective when it is desired to measure the final effect, i.e., the effect that the microscopic fault has on the final application. For example, in an autonomous-drive vehicle, the final effect of interest caused by a microscopic fault may be activation of the braking system when it is not necessary, caused by an erroneous detection of an obstacle. It is possible to imagine how measuring the final effect starting from injection of a microscopic fault, for example for the entire duration of the simulation time, is not feasible since the distance in time (understood as the chain of events between cause and effect) between the initial cause (i.e., the microscopic physical fault) and the final macroscopic effect (i.e., erroneous detection of an obstacle) is too long.
In this regard, FIG. 1 is a schematic representation of failure modes, which are identified in the framework of procedures for measuring the effect of microscopic hardware faults in high-complexity applications implemented in a hardware electronic system, i.e., checking of the functional safety of electronic systems for critical applications. These failure modes correspond to incorrect behaviour of the electronic device or system as compared to its behaviour according to specifications. These failure modes regarding the device or system, referred to as the final effects, or final failure modes FFM, are represented, for example in the automotive field, by events such as “activation of the braking system when not necessary”. As may be seen from FIG. 1, if the electronic system is broken down into its subsystems and components, these final effects FFM may be brought back (via simple logic operations represented by logic gates PL, for example, AND or OR gates), through levels of other resulting failure modes, SFM, to other “root” failure modes, RFM, i.e., failure modes linked more directly to the components of the system. For example, in the case of an integrated circuit, a root failure mode RFM1 may be “erroneous calculation by the processor”, which, in logic OR with another root failure mode RFM2 “erroneous address by the processor during memory access”, may constitute a higher-level failure mode (“resulting” failure mode SFM) such as, for example, “erroneous operation of the processor”. By combining one or more levels of resulting failure modes, the final failure mode FFM is finally obtained, which, as has been said, is hence a logic combination of root failure modes RFM. In this sense, it is understood that the chain of events between microscopic cause and final effect may be very long, i.e., the number of levels that links, for example, the root failure mode RFM1 to the final effect FFM may be very large.
In systems for critical applications, as required by the international standards IEC 61508 and ISO 26262, safety mechanisms are implemented, which are able to prevent or detect the aforesaid failure modes. For example, in the case of integrated circuits, a safety mechanism is a software executed periodically during normal operation of the device for testing all the instructions of a processor.
The international standards IEC 61508 and ISO 26262 require said safety mechanisms to achieve certain values of “diagnostic coverage”, i.e., a certain ratio between the number of dangerous faults (i.e., faults that are such as to perturb the critical mission of the system) and the number of faults detected by the diagnostic mechanism.
In this case, the purpose of the test on functional safety is to verify that—given the set of the microscopic hardware faults possible—the number of dangerous faults and the number of important faults detected by the diagnostic mechanism are those taken into account in the stage of design of the electronic system.
FIG. 2 is a schematic illustration of elements and steps for measuring the effect of microscopic hardware faults in high-complexity applications implemented in a hardware electronic system undergoing testing, for example an integrated circuit, designated by the reference number 11. FIG. 2 represents a microscopic fault G injected at a location LC, in the integrated circuit 11 under test. Denoted by x, y, . . . , z is a plurality of designated outputs of the integrated circuit 11 under test. A function f(x, y, . . . , z) of these designated outputs x, y, . . . , z represents a point of observation O of the fault MG. A second function G (x, y, . . . , z) of the designated outputs x, y, . . . , z represents a diagnostic point D. Of course, these quantities and the related operations and functions are defined for a step of simulation of the integrated circuit 11 under test, for example via a computer 600 that can load in memory and execute a computer program product 610, in particular on a computer medium, or a non-volatile computer-readable medium, which comprises one or more of the operations of the method described herein, even though in general these operations could be carried out on the physical circuit, albeit with the difficulties discussed previously.
The above measurement of the effects of hardware faults may be incorporated in a step of simulation of the electronic system or circuit 11 with fault injection, which, typically, comprises:
injecting microscopic faults MG during simulation (for example faults of the stuck-at type) of the circuit 11 in given locations LC thereof;
verifying whether a certain function f(x, y, . . . , z) of the designated outputs x, y, . . . , z, or observation point O is perturbed by the fault, i.e., whether the observation point O has a measured value different from an expected value; in this case, the fault G is defined as potentially dangerous fault; otherwise, it is a safe fault; i.e., it does not cause a failure of the critical mission; the designated outputs x, y, . . . , z are the outputs of the integrated circuit the combination of which can, that is, determine the root failure mode RFM; for example, the root failure mode “erroneous calculation by the processor” has as designated outputs the outputs of the processor;
in the case of (potentially) dangerous fault, verifying whether the second function G(x, y, . . . , z) of the outputs corresponding to the safety mechanism (“diagnostic point”, D) is activated (i.e., for example, the value of G(x, y, . . . , z) differs from a precalculated value) by the dangerous fault, i.e., whether, when the observation point O assumes a measured value different from the expected value, the diagnostic point D is activated within a certain time interval identified by the safety specifications of the electronic device 11. If the diagnostic point D is activated, a condition of dangerous fault is understood as having been detected. For instance, in the case where the safety mechanism is a software executed periodically, the diagnostic point D is represented by the register of the processor or by the memory location in which the software stores its measured value to be compared with a pre-calculated value of the test (i.e., the expected value without faults). Usually, fault injection occurs while a generator of working load requests execution of commands, in particular for example an application in functional safety, by the electronic system. Monitoring modules are provided for tracing execution of the commands and gathering the data, for example from the aforesaid observation points, and analysis modules, which carry out, for example, evaluation of the dangerous faults, for instance according to FMEA analysis. All these modules may be software modules run on a computer, which more in general executes the simulation procedure. Similar procedures are described, for example, in the U.S. Pat. No. 7,937,679 filed in the name of the present applicant.