The invention relates to a method and apparatus for detecting a fault condition in a computer processor.
In computer operated systems, it is desirable to be able to detect when a fault or malfunction has occurred in the computer processor. In particular, the detection of a fault is vitally important in safety-critical computer processor applications, such as in aircraft computer systems. A known method for detecting a fault or malfunction in a computer processor utilises a timer counter, commonly referred to as a xe2x80x9cwatchdog timerxe2x80x9d. The timer counter receives a clocked input pulse of predetermined frequency and the count of the timer counter is incremented each time a pulse of the clocked input is applied. In the event that the count reaches a pre-set maximum count, the timer counter generates an output pulse.
The computer processor is programmed with a self-test module which checks whether the computer processor is performing correctly. Periodically, a signal derived from the self-test module is supplied by the processor to the reset input on the timer counter to reset the counter. Providing the computer processor is functioning correctly, the timer counter does not therefore reach the pre-set maximum count and does not provide an output. If a fault occurs in the computer processor, the reset signal is not provided to the timer counter and, on reaching the predetermined count, the timer counter generates an output pulse, the generation of the output pulse thus signifying that a fault has occurred in the computer processor.
A disadvantage of this fault detection method is that when a fault occurs in the computer processor the signal provided by the processor to the timer counter may become xe2x80x9cstuckxe2x80x9d so that the reset signal is continuously supplied to the timer counter. Thus, even though a fault may have occurred in the computer processor, an output will not be provided by the timer counter to indicate that there is a fault.
A more sophisticated type of watchdog timer is described in U.S. Pat. No. 5,073,853. Using this method, the computer runs a self-test module and the signal supplied by the self-test module alternates between two values. Each value is derived from the preceding value by a calculation performed by the computer processor. The alternating signal is supplied to an input of a comparator which provides a reset signal to the watchdog timer only if the correct sequence of values is received at the comparator input. Using this method the correct sequence of reset signals cannot be produced if the computer processor has failed. In addition, the watchdog timer described in U.S. Pat. No. 5,073,853 includes a xe2x80x9cwindow timerxe2x80x9d, arranged such that the watchdog timer responds to the reset signal only if the signal is received within a predetermined time window. Any signals received outside the predetermined time window are regarded as faults and a fault output is generated.
Another known method for detecting a computer fault is described in U.S. Pat. No. 5,257,373 in which a control program is loaded onto the processor and performs a number of separate functions on an input value. After each function of the control program has been completed, a software check is made to determine whether the function was executed correctly and, if so, a counter associated with that function is incremented accordingly. At the end of the sequence, the count in each counter is checked in software and only in the event that all the counters have incremented correctly will a reset signal be provided to a watchdog timer.
A disadvantage of this method is that the final checking step in the procedure (i.e. the checking of the counter contents) is performed in software and thus is itself vulnerable to computer failure. In addition, the counter contents are not cleared during computer processing so that the control program may become xe2x80x9cstuckxe2x80x9d, thereby causing an erroneous reset signal to be provided to the timer counter even in the event of a fault.
It is an object of the present invention to provide a method for detecting a fault condition in a computer processor which has an improved fault detection capability.
According to the present invention there is provided a method for detecting a fault condition in a computer processor, comprising the steps of:
sequentially performing a plurality of functions on an initial input value so as to compute a final value, the input value to each of the second and subsequent functions being provided by the output value from the preceding function in the sequence;
loading at least one self-test module onto the computer processor for detecting whether a fault condition has occurred in the computer processor, wherein at least one of the functions is carried out within a self-test module; and
comparing the computed final value with a predetermined value to provide an indication of whether a fault condition has occurred in the computer processor.
Each of the functions must be performed, and in the correct sequence, for a correspondence to be obtained. Thus, the method has an improved fault condition detection capability. By distributing the functions throughout the control program the method can be used to check whether the various steps of the program are being performed in their correct sequence. Furthermore, by performing at least one of the functions within a self-test module, a check is made on the functioning of the self-test module itself.
The computed final value may be made up of two secondary computed values, a correspondence being obtained when the secondary computed values are generated in a required sequence.
Conveniently, the self-test modules are provided within the main control program operated by the computer.
The method preferably includes the further steps of:
generating a service pulse if the computed final value is equivalent to the predetermined value;
generating a time window;
detecting whether the service pulse is received within the time window; and
generating a fault condition output if the service pulse is received outside of the time window.
Thus, a fault can be detected even if the computed final value becomes xe2x80x9cstuckxe2x80x9d at the correct value, as the subsequent service pulse must be received within the time window for a valid service to be registered. If the service pulse is received before the time window has been started, or after expiry thereof, a fault condition output is generated to indicate that a fault has occurred in the computer processor.
Alternatively, the method may include the further steps of:
incrementing a count of counter means, the counter means providing a fault condition output in the event that a pre-set count is reached and;
changing the count of the counter means in response to a correspondence between the computed final value and the predetermined value, such that, in the event that no such correspondence occurs, the counter means provides a fault condition output, thereby indicating that a fault condition has occurred in the computer processor. The count is preferably reset to a zero count in response to a correspondence between the computed final value and the predetermined value.
According to another aspect of the invention, there is provided an apparatus for detecting a fault condition in a computer processor comprising:
means for sequentially performing a plurality of functions on an input value so as to compute a final value, the input value to each of the second and subsequent functions being provided by the output value from the preceding function in the sequence; and
at least one self-test module, loaded onto the computer processor, for detecting whether a fault condition has occurred in the computer processor, wherein at least one of the functions is carried out within a self-test module, and
means for comparing the computed final value with a predetermined value to provide an indication of whether a fault condition has occurred in the computer processor.
The apparatus preferably includes means for generating a service pulse if the computed final value is equivalent to the predetermined value, means for generating a time window, and means for detecting whether the service pulse is received within the time window, whereby receipt of the service outside the time window results in generation of a fault condition output.
For the purpose of this specification, the occurrence of a fault or functional error in a computer processor shall be referred to as a xe2x80x9cfault conditionxe2x80x9d.