The invention relates to a method for detecting a failure in an error correcting unit, wherein the error correcting unit receives output data from a data source and determines, whether the received data are incorrect, and wherein if the received data are incorrect, the error correcting unit is expected to correct at least one error within the received data, output the corrected data and set an error vector, wherein the error vector at least indicates whether an error has been corrected or whether an error has been detected but could not be corrected.
The invention also relates to a system for detecting a failure in an error correcting unit, wherein the error correcting unit provides means for receiving output data from a data source, means for determining whether the received data are incorrect, means for correcting at least one error within the received data, if the received data are incorrect, means for outputting the corrected data and means for manipulating an error vector, wherein the error vector is adapted to indicate at least whether an error has been corrected or whether an error has been identified but could not be corrected.
Furthermore, the invention relates to a computer program which can be run on a computer, in particular on an embedded system.
Any computer-based system typically is expected to operate correctly and in particular to not produce errors or incorrect results originating from e.g. a faulty hardware element, a faulty software element or a faulty data transmission, independently of whether the failure is a permanent failure or a temporary failure (e.g. caused by an externally induced signal perturbation, power-supply disturbances or radiation from cosmic rays). In order to reach a certain level of reliability of a system, such failures at least have to be detected. Based on error detection, it is known to correct several errors in order to achieve a certain level of fault tolerance.
In order to detect and/or correct errors, computer systems of any kind, for example embedded systems, often provide error correcting units in order to increase the liability of the whole system. Such error correcting units are often based on so-called error correcting codes (ECC) which enable to detect a predefined number of errors that may occur in a data word. ECCs are further enabled to correct a predefined number of detected errors. In order to achieve this, in a first stage for each data that are known to be error-free, several so-called check bits are generated that are attached to the data and are used by the ECC in order to detect one or more errors and to correct at least one of the detected errors. The more errors are to be detected and/or corrected, the more check bits have to be provided. In order to keep the number of check bits at an acceptable level, often so-called SECDED methods are implemented in embedded systems, which enable to correct a single error but to detect a double error.
Generally, any hardware and software component within a computer system or an embedded system might possibly be erroneous. Since an error correcting unit is also realized in software or in hardware, even an error correcting unit might be faulty. A faulty error correcting unit might, for example, “correct” non-existing errors and thus generate errors by itself. Furthermore, faulty error correcting units might indicate that they have corrected an error without doing so.
It is therefore an object of the present invention to increase the liability of a computer system, in particular of an embedded system, comprising an error correcting unit. Furthermore, it is an object of the present invention to provide a method and a system for detecting, whether an error correcting unit is faulty.