The present invention is directed to a method for error recognition in a processor system working with a plurality of programs and containing at least one processor.
Software errors are largely unavoidable given processor systems with extensive software equipment. So that the running of the programs and the operation of the overall processor system, however, is not too greatly deteriorated overall by such software errors, an optimally fast recognition of software errors, i.e. program errors, is desirable since the software errors can lead to malfunctions up to and including a complete resetting of individual programs or even of the entire processor system. The down times caused by such software errors must be minimized, particularly in real-time systems such as, for example, electronic switched systems for a telecommunications that represent a preferred but not exclusive area of employment of the present invention.
For improved error recognition and handling, one could consider resetting a process that generates an error message in order to thereby eliminate the error problem. Alternatively, one could also consider resetting the complete processor and to have it start up again, particularly when the number of error messages of a processor has exceeded a predetermined threshold. Given such a procedure, however, a resetting generally occurs to too great an extent, since programs that are working error-free are also reset and, thus, the performance capability of the overall processor is deteriorated.
It is an object of the present invention to provide a method for error recognition in a processor system working with a plurality of programs and containing at least one processor with which an improved recognition of incorrectly working programs is possible.
Advantageous developments of the invention are recited in the subclaims.
In the inventive method, error recognition checks, particularly plausibility checks in view of the date with which the respective programs work and which they have obtained from other programs are implemented by at least one program within the processor system preferably working in real time. As a result thereof, errors can be recognized, so that an error propagation can be prevented. When the plausibility check, which can ensue in a traditional way, shows that the received information (data) is inconsistent, i.e. is to be classified as being faulty, this program outputs an error message to the operating system. An error message table is updated in conformity with this error message in the operating system, information about the programming outputting the faulty data and classified as faulty and computationally, information about the reporting the reporting program as well being registered therein. The reporting process thus also provides a pointer to the suspected source of error i.e., to another program. As a result thereof, the system acquires improved overview information over programs that may possibly be incorrectly working and can, for example, compile characteristic data for the reporting program and for the program reported as faulty in order to thereby enable an improved error isolation, and can also be implement a statistical evaluation, particularly a summing-up of the plurality of error messages selectively for each program. The operating system can initiate suitable error elimination measures, for example, a resetting of a program repeatedly reported as faulty, or can also initiate a resetting of the entire processor when necessary.
A more exact recognition and determination of the requirement for corrective measures, particularly of individual program resettings or a start-up to a greater extent that are required for returning the system to full performance capability, can thus be achieved with the present invention. The inventive method thus allows an optimized acquisition of programs to be reset due to faulty behavior. This ensues in that a program or the higher-ringing operating system as well is provided with the possibility of, so to speak pointing to another program and accusing this of being faulty. As a result thereof, the scope of potentially required resettings is limited to the necessary degree and an unnecessary resetting of a great plurality of processes or, potentially, of the entire processor or even of a system containing a plurality of processors can be avoided.
The user software thus contains a specific reporting possibility, so that it can provide the operating system with indications what process is potentially faulty. On the basis of this error message, the operating system can localize the other process that is to be potentially reset. This need not necessarily be the process reported as faulty but can also be another process that drives this process or ranks higher than this process. In any case, the operating system can likewise identify the identity of the process that generated the error message, preferably system-wide in an unambiguous way. The operating system can implement this process identification for a broad plurality of interfaces, for example of transmitted messages, remote procedure calls, etc.
The capability of the user processes, i.e. the user software running on the lowest interrupt level zero, of classifying another process as faulty is thereby directed to the programs and interface partners coming into consideration as possible candidates. The possibility that user processes incorrectly classified other processes as faulty is thereby diminished by utilizing compiler-based rules dependent on the nature of the problem and on the nature of the interface.
Since the operating system stores information both about the reporting program as well as about the program referred to as faulty in the error message table, the functioning of the inventive method can preferably be configured such that the operating system resets a program and allows it to start up again as soon as the plurality of messages stored for this program (plurality of error messages that a program output or a plurality of error messages that point to a program as faulty) reaches a predetermined threshold. As a result thereof, the probability of finding the process to be in fact reset due to faulty behavior is clearly enhanced.
The inventive method can also be configured such that, given a system with a plurality of processors or, respectively, with a plurality of platforms as present given an electronic switched system for telecommunications, the operating system transports error message information or other accompanying information that allow error isolation on a platform from its own platform to another platform on which the process suspected to be faulty is in fact running.
The error message output by a program can cause the operating system to identify the identity of the indicated program classified as faulty and to immediately store it in the error message table. Alternatively, the error message can first be interpreted only as a call in response whereto the operating system hands the information about the identity of the program suspected as being faulty back to the reporting program, after which these information are then handed over from the reporting program to the error message table, which is preferably located in the operating system. The information about the respective program is thereby preferably located in a data frame containing the data classified as faulty, particularly in the form of an information header.