1. Field of the Invention
The present invention relates to techniques for achieving dependability in microcontrollers units (microcontroller) and was developed by paying specific attention to the possible use in automotive systems.
Exemplary of use of such microcontrollers are systems for anti blocking systems (ABS), torque control system (TCS) and automatic stability control (TSC). However, this is not to be construed in a limiting sense of the scope of the invention that is in fact applicable to a wide variety of applications, comprising x-by-wire, infotainment, biomedical applications, communications, and so on.
The term ‘microcontroller’ in the following description should be intended in a broad sense, i.e., a System On a Chip (SOC) comprising analog to digital and digital to analog converters, CPU and CPU peripherals, memories, system buses, internal or external sensors and/or transducers, controls, instrumentations, output devices.
The present invention also relates to techniques and corresponding computer program products for the design of dependable microcontrollers.
2. Description of the Related Art
In recent times, the implementation on vehicles of microelectronic systems in order to increase efficiency, safety, performance and comfort as well as information and entertainment has considerably grown.
Such microelectronic systems are based on central processing units and interconnected using robust networks, provided with means for the detection of faults. Such robust networks remove the need for thousands of costly and unreliable wires and connectors, used to make up a wiring loom.
Of course such systems must be highly reliable, as failure will cause fatal accidents. However, the car is a hostile environment having a wide range of temperatures and subject to dirt, salt spray, dust, corrosive fluids, heavy vibration and sudden shock, electrical noise, electromagnetic interference. Further it is needed that such microelectronic systems show near zero power consumption when the car is parked in order not to consume battery charge.
These requirements must combine with requirements for near perfect system modeling and verification such that system behavior is predictable and fail-safe under the most extreme conditions.
As a consequence the use of complex System-On-Chip in the automotive field has been limited, as compared to consumer and communications applications.
It is thus apparent the importance of having a design platform capable to help the system designer to design more complex ECU (Embedded Computational Unit) in less time and with the highest reliability. In general, the most important requirements for a design platform are the abilities to:                allow hardware and software standardization;        be adaptable at the customer's needs;        generate clean codes & scripts in a well defined flow;        be easily linkable with the operating system;        allow easy verification of sub-blocks and custom blocks;        be upgradeable.        
In particular, for automotive applications, it is also important:                scalability in a fast and easy way;        to give verification detail and allow customizability of the verification;        to be fault-robust and allow fault-injection in early stage (both for design and verification);        
In order to allow a better understanding of the description that follows some definitions pertaining fault-robustness are here supplied.
In general it is called a ‘failure’ the event that occurs when the delivered service of a system deviates from the specified service. It is called ‘error’ the part of the system state which is liable to lead to failure. The phenomenological cause of the error is called the ‘fault’. ‘System dependability’ is defined as the quality of the delivered service such that reliance can justifiably be placed on this service.
Dependability is evaluated on the basis of the measure of the three following quantities:                Reliability: the probability, quantified in MTTF (Mean Time To Failure) that a piece of equipment or component will perform its intended function satisfactorily for a prescribed time and under specified environmental conditions;        Availability: the probability, quantified by MTTR/MTTF (Mean Time To Repair) that the system will be functioning correctly at any given time;        Safety: the freedom from undesired and unplanned event that results (at least) in a specific level or loss (i.e., accidents).        
To achieve a dependable system, the following methods can be used, separately or together:                fault-avoidance, that provides for avoiding faults by design;        fault-removal, that provides for reducing the presence of faults, by verification. In this method fault-injection techniques are fundamental;        fault-tolerance, a technique that provides correct operation despite presence of faults;        fault-forecasting or fault-evasion, that provides for estimating, by evaluation, the presence, the creation and the consequences of faults and taking pre-emptive steps to stop the fault from occurring.        
From the previous definitions, it is worth noting that fault-tolerance by itself does not solve the dependability needs of a complex system. For instance, a fault-tolerant system could not be fail-safe. Thus, in the following, reference will be made to the concept of robustness, defined as the ability of a system to continue to function despite the existence of faults in its component subsystems or parts, even if system performance may be diminished or otherwise altered—but always in a safe way—until the faults are corrected. A fault-robust system will keep its function even with changes in internal structure or external-environment.
The solutions known from the prior art provide for hardening the fabrication technology, so that is less prone to radiation-induced soft errors, or for introducing redundancy: at gate level, for example by implementing triple flip-flop structures with majority voting (N-modular redundancy); or at CPU level, for example by introducing some coding in the logic units of the processor; or at microcontroller level, for example by using multiple processors running in step with watchdogs; or at software level, for example by using multiple software threads, or a mixture of all of the above techniques.
Technology hardening and redundancy at gate level are very expensive in terms of area and performance overhead, CPU redesign time and so on.
Redundancy at CPU level is less expensive in terms of area overhead, but it requires CPU redesigning.
Redundancy at microcontroller level is the most used technique, either using N-modular redundancy (N=2 is the most used, i.e., dual redundancy) or with dynamic redundancy. From the U.S. patent application No. 2002/00777882 a dual-redundant microcontroller unit is known, comprising a first central processing unit and a secondary processing unit coupled to the first processing unit. A functional comparison module is coupled to the primary processing unit and to the secondary processing unit for comparing a primary output of the primary processing unit and a secondary output of the secondary processing unit to detect a fault if the primary output does not match the secondary output. Functional comparison is performed by analyzing signatures of the outputs. Signatures are computed in idle cycles or are obtained as results of test sequences or are determined on the basis of external address and data buses of the CPU itself, so that only a limited visibility of the content of the CPU is available. This very often results in inefficient signatures from the point of view of fault coverage and memory occupation, and in a greater latency of error detection.
Dual redundant techniques, as depicted also in German patent application DE 19933086A1, of course imply a great increase in circuits size (at least the double) and an even greater increase in production costs. Moreover, performances are affected because of need for slower clocks, increased gate count and inclusion of redundant software.
Dynamic redundancy techniques are also known that require only the processor plus a fault detector that can identify faulty behavior of the processor. Such techniques allows for a higher utilization of computing resources, but they could generate a greater latency, because of the greater number of computations required to achieve good fault coverage. The most common solutions in this case consist in watchdogs or very simple fault detectors monitoring the data and address bus only to compute simple consistency checks. In other known solutions, the CPU itself is charged to handle part of the dependability issues, interacting with a simple external controller: in U.S. Pat. No. 5,436,837, a microcomputer and a monitoring module are disclosed. The monitoring module is preferably configured as a gate-array which executes a sequence control of the microcomputer. The monitoring module determines the correct or defective operation of the microcomputer from a comparison of the results of this processing.
Also solutions are known, for example from U.S. Pat. No. 5,880,568, that introduce redundancy at software level only. Such solutions however affect strongly the microprocessor performance because of the fault-detection tasks.