The present invention has as an object an apparatus for handling errors in the central processor of a program-controlled data-processing system. When an error occurs, the purpose of the apparatus is either to correct it or to formulate messages which are transmitted to the software or to the persons responsible for the maintenance of the system. Although the reliability of components used in data-processing systems is particularly high and although every care is taken when wiring or fabricating the components, it may still happen that errors occur, even in the best equipment. Statistics show that they mainly originate from weaknesses in certain of the very many connections which are required.
Manufacturers very quickly become aware of the necessity for dealing with the errors which might occur in the course of processing and have already fitted data-processing systems with detecting and correcting devices which, for the most part, make use of data redundancy.
An early method consists in adding a parity bit to the useful data. In the computer field, where the probability of random errors is very low, a self-checking code termed a parity code is often used which has M + 1 bits of which the first M bits are used to code 2M significant data items. The M + 1th bit, termed the parity bit, is set in such a way that the total number of bits in the "1" state is an even number (even parity code) or an odd number (odd parity code). This code is only partly self-checking since it does not permit double errors to be detected (two wrong bits in the same data item for example). In a particular system which employs the invention and operates on the basis of a data octet, the code adopted is an odd parity code. This results in the simultaneous transmission of eight data bits, one parity bit, and a bit for validating the parity checking register. Effectively, in particular when the system is initialized, since the register for the parity bit is empty, the machine would immediately report an error if this precaution were not taken. The present invention uses this method to detect errors.
A second method consists in using more elaborate codes termed error auto-correction codes. In a particular system which employs the invention, the central memory uses autocorrection codes. However, in what follows, the discussion will be confined to how errors detected by the central processor are dealt with by the use of parity bits. The manner of detecting the error is not the purpose of the apparatus, which only acts after the error has been detected. The errors which are capable of occurring in a data-processing system may result either from poor programming, in which case the errors are "software" errors, or from a temporary or long term fault in one of the machine components, in which case the errors concerned are said to be hardware or machine errors. The invention concerns chiefly, but not exclusively, the latter type of error.
Until now, when an error was detected, the machine interrupted the program and immediately warned the supervisor software which accepted the handling of the error, possibly with the assistance of the programmer. Of course, in the case of intermittent faults, repeating an instruction often allows the error to be corrected. In the present invention, hardware means and specialized microprograms allow the error to be located, the current process to be interrupted, and the error to be dealt with directly, recourse being had to the software only in certain special cases. In particular, the mechanism relieves the software and also allows performance to be improved.
The purpose of the present invention is an apparatus which allows an instruction to be retried or repeated when an error has been detected, that is to say the instruction can be retried until the error is corrected. After a certain number of unsuccessful tries, the apparatus informs the software of the existence of a semi-permanent error.
In the present invention, the error handling apparatus for a multiprogrammed system in which the software creates processes which are either in execution, the ready state, the waiting state, or in the suspended state, with the system containing at least one central processor and at least one central memory connected to the said processor, is characterised in that it includes registers which store the characteristics of the error detected and which enable retry operations to be performed, the outcome of these operations being reported to a block of specialized microprograms which may possible call up a central exception mechanism, which itself interrogates the software.