The present invention relates in general to an electronic control system for controlling the function of a processing system. In particular the invention relates to a method allowing to manage system fault situations of an electronic control system. Still more specifically, the invention deals with such a control system that can be used in an automotive vehicle.
In recent years the complexity of electronic control systems used in consumer products and specifically the automobile electronics has increased dramatically. Although manufacturers of electronic subassemblies try to ensure that their products are reliable, it is almost impossible to ensure not to have any faults somewhere in a system at any given time within the products lifecycle. As a result, reliability and fault tolerant behavior of complex systems has become a topic of major concern to designers, manufacturers and users.
There are two fundamentally different approaches that are presently used to increase the reliability of computing systems.
The first approach is called fault prevention, also known as fault intolerance. The second approach is represented by real fault tolerance.
In the traditional fault prevention approach the objective is to increase the reliability of each used part within the overall system. Since it is almost impossible to achieve an absolute reliable system in practice, the goal of fault prevention is to reduce the probability of system failure to an acceptably low value. The reliability of a system can be increased by employing the method of worst case design and by using high-quality components. Since system interconnection devices represent a very common crystallization point for various failures, refined interconnections and imposing strict quality control procedures during the assembly phase are further important reliability improving measures.
However, most likely this type of solutions and measures will increase the cost of a system significantly.
As to the fault tolerance approach, two major techniques are typically used:
(a) Incorporate redundancy (i.e. usage of additional, multiple identical resources) into a system with the aim of masking the effects of faults, and
(b) Use error corrections (most common realized and utilized by bus systems and by storage devices).
In this type of systems, faults are expected to occur during computation. In case of an detected, identified failure, the system will
(i) be reconfigured by enabling the respective redundant elements, and/or
(ii) the error correction circuitry generating, controlling and monitoring the error corrections codes will automatically correct the differing data.
The realization of such type of fault tolerant system will require to provide and manage multiple instances of the redundant (identical) hardware elements and/or error correction circuits. As a drawback, this type of system implementation is encountering a multiplicity of costxe2x80x94and going along physical size and power consumption.
FIG. 1 is illustrating a typical system using state of the art techniques. The examplary system is using a redundant instantiation for the NVRAM/VRAM (Non-Volatile and Volatile Random Access Memory) for the storage sub-system. The I/O devices are laid out redundantly for the I/O device controller and for the adjacent physical I/O device. A multiplexer element is switching to the redundant data path in case of occurring failure in this system area. A xe2x80x98system test and fault recovery controller unitxe2x80x990 is implemented to monitor the system functionality and to manage and to control the fault recovery steps to be performed. Additionally, a typical system supervising feature is provided by the Parity Checker. In this example, this feature is additionally providing Error Correction covering data integrity failures detected on the system bus.
Most commonly the system CPU is performing failure detecting and diagnostic routines as well. The application code for the additional diagnostic software routines is typically stored in the basic storage sub-systemxe2x80x94and of course, redundantly contained in the redundant storage devices as well.
The system CPU, as explained exercising failure detecting routines, supporting and assisting the xe2x80x98system test and fault recovery controllerxe2x80x99, can in addition be used to test and verify the integrity of all implemented failure detection and fault management devices and sub-systems.
This type of fault-tolerant system implementation is typically restoring the originally system functionality for all occurring xe2x80x98recoverablexe2x80x99 fault situations. Failures detected or not detected by the fault-management system will lead to axe2x80x94potentially hiddenxe2x80x94system malfunction, or to a general system abort.
Typically this type of fault-tolerant system realization is used in expensive and safety relevant commercial systems, justifying the extensive cost for implementation. Cost sensitive embedded systems for this reason only use partial and drastically reduced implementations, with the drawback of providing only limited fault recovery capability and emergency running attributes.
Nevertheless, the effectiveness of fault tolerance for enhancing the reliability of processing systems is much more pronounced in a system composed of basically reliable components than in a system of unreliable components. In other words, while fault tolerance can be used to increase the reliability of an already reliable system significantly, it is of little usexe2x80x94and can even have a detrimental effectxe2x80x94if the original system is unreliable in the first place.
Co-pending European Patent Application 99 101 817.7, assigned to the same assignee as the present application, dicloses an electronic control system for controlling the function of a processing system, especially for the use in an automotive vehicle, wherein said control system comprises a plurality of logical control elements, each of which is especially adapted to perform special tasks, whereby each of said control elements is able to communicate with every other control element.
It is therefore an object of the present invention to provide a method that to manage system fault situations with high system reliability and availability for Electronic Control Systems (ECUs) while maintaining low system cost.
It is a further objective to keep the hardware and software overhead at a minimum, thus limiting negative influence to the power dissipation as well as the physical measures size and weight.
It is still a further object to provide a system that is able to overcome the above mentioned shortcomings of the prior art.
The present invention describes a principle (hereinafter called xe2x80x9cIntelligent Fault Managementxe2x80x9d (IFM) principle) allowing to manage system malfunctions and to restore system vitality of complex electronic control systems, featuring multiple cooperating processing elementsxe2x80x94to an achievable extend and for a justifiable effort.
As mentioned above, IFM stands for a principle handling system failure situations maintaining minimum fault recovery time and providing high system availability. This principle is providing unique solutions for fault analysis, fault recovery definition and system re-vitalization. A method applying graceful degradation of system functionality is proposed, to allow to achieve the implementation of cost effective systems.
In differentiation to typical, i.e., state of the art fault management systems, the proposed idea is providing calculated deterministic fall back strategies, allowing to manage and to control the fault/vital system behavior. The method used by the IFM principle is supporting prioritized staggered fall-back solutions, degrading the system functionality in pre-assigned levels for system functionality.
The application of the IFM principle is focusing on the requirements most commonly encountered by embedded commercial systems and demanding advanced consumer electronics.
In particular the IFM principle is advantageous to be used in electronic control systems applied in highly cost sensitive fields as for example: devices used in modern automotive vehicles, pervasive computing devices, as well as in consumer electronics, requiring fault tolerant behavior.
Rather than providing extensive redundant hardware, the principle is utilizing the existing sub-systems or components of the electronic control system (as, e.g., shown in FIG. 2) in multiple xe2x80x98reusexe2x80x99 instances. In other words, existing sub-systems will be reused, performing entirely different xe2x80x98alienxe2x80x99, i.e., completely different, functionality as opposed to the original definitionxe2x80x94an important key, allowing to achieve the objectives for the IFM principle. It has to be mentioned that each sub-system can, in turn, consist of several further sub-systems.
In a top down approach, i.e., level for level of the pre-assigned levels (cf. FIG. 3), the system will xe2x80x98give upxe2x80x99 less important applicationsxe2x80x94trying to provide most critical and basic system relevant functionality.
The IFM principle is combining a balance of hardware and software elements enabling to develop and build highly reliable embedded processing systems.
The IFM method allows to keep the usage of redundant elements and fault preventive elements at a minimum of implementation. The measures, combined used by IFM, are leading to a significantly reduced overhead on electronic components. The IFM support elements and mechanisms can be implemented by using algorithms realized by software to a wide extend. The increase of the system storage size and the volume of hardware components are kept at a justifiable grade, thus leading to a significant cost advantage.
Precondition supporting profitable applicability of the IFM principle is a system architecture using loosely coupled sub-systems and processors.
In case of full system functionality has to be ensured at any time and for any type of occurring fault, the advantageous applicability for the proposed IFM method is reduced to xe2x80x98sub-setsxe2x80x99 of the IFM processes. The implementation for this type of fault behavior has to be verified and to be judged in accordance to the fault behavior specifications of the overall system in focus very thoroughly.