1. Field of the Invention
The present invention relates to a method for executing a computer program on computing hardware, e.g., on a microprocessor, which computer program includes multiple program objects, in which method errors are detected during the running of the computer program on the computing hardware. The present invention also relates to an operating system capable of running on computing hardware, e.g., on a microprocessor. The present invention also relates to a computing hardware for running a computer program including multiple program objects, which computing hardware has an error detection mechanism for detecting an error during the running of the computer program on the computing hardware.
2. Description of Related Art
So-called transient errors may occur in running a computer program on computing hardware. Since the structures on semiconductor modules (so-called chips) are becoming progressively smaller, but the clock rates of the signals are becoming progressively higher and the signal voltages are becoming progressively lower, there is an increased incidence of transient errors. Transient errors occur only temporarily, in contrast with permanent errors, and usually disappear spontaneously after a period of time. In transient errors, only individual bits are faulty and there is no permanent damage to the computing hardware. Transient errors may have various causes such as electromagnetic influences, alpha-particles or neutrons.
The emphasis in error handling in communications systems is even presently on transient errors. It is known that when an error is detected in communications systems (e.g., in a controller area network, CAN), the erroneously transmitted data are resent. Furthermore, the use of an error counter is known in communications systems, which is incremented on detection of an error, is decremented when there is a correct transmission, and prevents transmission of data as soon as it exceeds a certain value.
In the case of computing hardware for running computer programs, however, error handling is performed essentially only for permanent errors. Taking transient errors into account is limited to incrementing and, if necessary, decrementing an error counter. This counter reading is stored in a memory and may be read out off-line, i.e., as diagnostic or error information during a visit to a repair shop, e.g., in the case of computing hardware designed as a vehicle control unit. Only then is it possible to respond appropriately to the error.
Error handling via error counters thus, on the one hand, does not allow error handling within a short error tolerance time, which is necessary in particular for safety-relevant systems, and also, on the other hand, does not allow constructive error handling in the sense that the computer program is being run again properly within the error tolerance time. Instead, in the related art, the computer program is switched to emergency operation after exceeding a certain value on the error counter. This means that a different part of the computer program is run instead of the part containing the error, and the substitute values determined in this way are used for further computation. The substitute values may be modeled on the basis of other quantities, for example. Alternatively, the results calculated using the part of the computer program containing the error may be discarded as defective and replaced by standard values that are provided for emergency operation for further calculation. The known methods for handling a transient error of a computer program running on computing hardware thus do not allow any systematic constructive handling of the transient nature of most errors.
It is also known in the art that transient errors occurring in running a computer program on computing hardware may be eliminated by completely restarting the computing hardware. This approach is also not actually satisfactory, because quantities obtained in processing of the computer program to that point are lost and the computing hardware is unable to fulfill its intended function for the duration of the restart. This is unacceptable in the case of safety-relevant systems in particular.
Finally, it is also known that, for error handling for transient errors of a computer program run on computing hardware, the computer program may be set back by a few clock pulses and individual machine instructions of the computer program may be repeated. This method is also known as micro-rollback. With the known method, the system only returns by objects on a machine level (clock pulses, machine instructions). This requires appropriate hardware support on a machine level, which is associated with a considerable complexity in the area of the computing hardware. It is impossible for the known method to be executed exclusively under software control.
The error handling mechanisms known in the art are unable to respond in a suitable manner to transient errors occurring in running a computer program on computing hardware.
However, transient errors are especially frequent in future technologies. If they are detected, e.g., via dual core mechanisms, the question of error localization still remains to be answered in order to identify the correct result. This is true even more so if one has the goal that a transient error does not always result in restarting the computer. As described, error localization can typically only be achieved via comparatively complex methods.