In high-reliability data processing systems it is necessary to have circuitry which checks data that has been transmitted between system units or which has been stored in system memories and detects errors which may have been introduced during the transmission or storage operations. Error detection and correction is especially necessary in systems in which data transmission or storage is performed at high speeds because such systems are more prone to disturbances by noise and other erroneous signals.
Accordingly, many prior art arrangements have been developed to protect data stored in memories and data which is transmitted between data processing circuits. The simplest of these systems calculate the "parity" of the data. Generally the parity of the data is related to the number of logical "1"s in the data word. A data word with an even number of ones is said to have "even" parity, while a data word with an odd number of ones is said to have "odd" parity. A parity bit representing the parity of the data word (usually the parity bit is a logical "0" if the parity is even and a logical "1" if the parity is odd) is calculated from the data bits and appended to the data word prior to transmitting or storing the data.
After the data has been retrieved from memory or received over the transmission channel, the parity is again calculated from the received data bits and compared to the retrieved or received parity bit. If the two are not equal, then it is assumed an error has occurred and the data can be re-transmitted or re-retrieved. A problem with this simple system is that it cannot indicate in which data bit (or bits) the error occurred, so that the entire data word must be re-transmitted rather than just the erroneous bit or bits. Accordingly, the system is slow.
Other, more sophisticated systems have been developed which are capable of detecting errors in particular bits and, in some cases correcting the errors without requiring re-transmission. These protection arrangements generally operate by appending to the data a multiple-bit parity code word which is calculated from the values of the data bits. One common method of calculating the parity code word bits is to exclusive-OR the data values in selected bit positions to generate one bit of the parity code word. Thus the value of a parity code word bit will be the parity (number of logical "1"s) of the selected data bit positions. Both the data and the appended parity code word are then stored or transmitted.
After the data has been received over the transmission channel or has been retrieved from memory, the code word bits are again calculated from the retrieved data bits and compared against the retrieved code word bits. In order to do the comparison, the re-calculated code word bits and the retrieved code word bits are combined in predetermined combinations called "syndromes". The values of the syndromes are then decoded (compared to predetermined patterns) to detect whether an error has occurred. In some systems the syndromes are further processed to generate error correction information which is then used to correct erroneous data bits.
Many prior art error-detecting and correcting techniques have been devised which are capable of both detecting and correcting errors which affect only one data bit. These techniques have met with varying degrees of success, depending on the application in which a particular technique is used. For example, when an error-detecting technique that is capable of detecting single errors is used with random access memories that are implemented with one-bit-wide memory elements, a high degree of protection is achieved since the vast majority of faults that occur in this situation are single-bit errors. However, when the same technique is used with a different implementation in which multiple-bit memory elements are used, the degree of protection achieved is significantly less since the probability of multiple simultaneous errors increases.
Prior art techniques are also available which can detect or correct double errors or higher numbers of simultaneous errors. However, the use of syndromes to correct more than one error requires complicated circuitry. Consequently the calculations necessary to generate the syndromes are lengthy and require complex and expensive circuitry. Therefore, there is a need for an error-correcting arrangement which is relatively simple and which can at least detect multiple simultaneous errors.
In high-reliability systems another problem arises because the encoding and decoding circuitry needed to generate the error-correcting codes and to correct detected errors in the retrieved and coded information is itself subject to failure. Although error-correcting encoders and decoders are generally considered to be more reliable than the memories they are protecting they are usually considerably less reliable than the protected memory. That is, the probability that the memory produces an error that is not detectable by the decoder is typically orders of magnitude less than the probability that the decoder itself fails. Since a failed encoding/decoding system which corrects detected errors can obviously alter the data which it is supposed to be coding and thereby introduce errors into the data instead of removing them, prior art systems which generate undetectable errors are unacceptable in high reliability data processing systems.
To ensure that a encoder/decoder cannot fail in such a way that it generates data containing undetectable errors it is necessary that it be both fail safe and self-checking. To be fail safe a circuit must not itself generate undetectable errors. That is, if as a result of the failure the fail-safe circuit erroneously alters one or more data bits that fact will become apparent to a receiver of the erroneous data. A circuit which is self-checking must, in ordinary usage, exercise all of its data paths in such a way that, if it does contain a faulty element, that fault will be exposed. In a high-reliability system, it is necessary that the data detection/correction circuitry be both fail-safe and self-checking.