1. Field of the Invention
The present invention relates to a system for detecting and correcting errors in data transfer and, more particularly, to a system for detecting single or multiple bit errors and for correcting single or double bit errors.
2. Description of the Prior Art
In any digital system where data is transmitted, one or more of the data bits in each data word or message may be received in error. This has been a problem from the time data processing systems were first invented.
As more sophisticated data processing operations are performed, involving more complex equipment, there is a greater need for systems to detect and correct multiple errors in data transfers. For example, operations such as merging of files, sorting of data within files, numerical/statistical analyses, complex data handling procedures and word processing operations require increased reliability in data transfer. In the field of telecommunications and telemetry, error rates tend to increase when data is transmitted over analog lines at high baud rates. If data errors occur and are undetected, valuable information and system operation itself may be affected. Thus, error detecting and correcting features are not only advantageous, they are required to improve system integrity.
In response to the problem of error generation during data transfers, systems have been developed to detect such errors. One of the earliest methods for detecting errors was the parity check code. A binary code word has odd parity if an odd number of its digits are 1's. For example, the number 1011 has three 1 digits and therefore has odd parity. Similarly, the binary code word 1100 has an even number of 1 digits and therefore has even parity.
A single parity check code is characterized by an additional check bit that is added to each word to generate either odd or even parity. An error in a single digit or bit in a data word would be discernible since the parity check bit associated with that data word would then be reversed from what is expected. Typically, a parity generator adds the parity check bit to each word before transmission. This technique is called padding the data word. At the receiver, the digits in the word are tested and if the parity is incorrect, one of the bits in the data word is considered to be in error. When an error is detected at a receiver, a request for a repeat transmission can be given so that the error can be corrected. It should be noted that only errors in an odd number of digits can be detected with a single parity check, since an even number of errors results in the parity expected for a correct transmission. Moreover, it should be noted that the specific bit in error cannot be identified by the parity check procedure as hereinabove described.
A more sophisticated error detection system was later devised. Data words of a fixed length of bits were grouped into blocks of a fixed number of data words each. Parity checks were then performed between different data words as well as for each individual data word. The block parity code detected many patterns of errors and could be used not only for error detection, but also for error correction when an isolated error occurred in a given row and column of the matrix. While these geometric codes were an improvement over parity check bits per se, they still could not be used to detect errors that were even in number and symmetrical in two dimensions.
After parity check codes and geometric codes were devised, a code was invented by Hamming, after whom it is named. The Hamming code is a system of multiple parity checks that encodes data words in a logical manner so that single errors can be not only detected but also identified for correction. A transmitted data word used in the Hamming code consists of the original data word and parity check digits appended thereto. Each of the required parity checks is performed upon specific bit positions of the transmitted word. The system enables the isolation of an erroneous digit, whether it is in one of the original data word bits or in one of the added parity check bits.
If all the parity check operations are performed successfully, the data word is assumed to be error free. If one or more of the check operations is unsuccessful, however, the single bit in error is uniquely determined by decoding so-called syndrome bits, which are derived from the parity check bits. It should be noted once again that only single bit errors are detected and corrected by use of the conventional Hamming code. Double bit errors, although detectable by the Hamming code, are not correctable.
The Hamming code is only one of a number of codes, generically called error correcting codes (ECC's). Codes are usually described in mathematics as closed sets of values that comprise all the allowed number sequences in the code. In data communications, transmitted numbers are essentially random data patterns which are not related to any predetermined code set. The sequence of data, then, is forced into compliance with the code set by adding to it at the transmitter, as hereinabove mentioned. A scheme has heretofore been developed to determine what precise extra string to append to the original data stream to make the concatenation of transmitted data a valid code. There is a consistent way of extracting the original data from the code value at the receiver and to deliver the actual data to the location where it is ultimately used. For the code scheme to be effective, it must contain allowed values sufficiently different from one another so that expected errors do not alter an allowed value such that it becomes a different allowed value of the code.
A cyclic redundancy code (CRC) consists of strings of binary data evenly divisible by a generator polynomial, which is a selected number that results in a code set of values different enough from one another to achieve a low probability of an undetected error. To determine what to append to the string of original data, the original string is divided as it is being transmitted. When the last data bit is passed, the remainder from the division is the required string that is added since the string including the remainder is evenly divisible by the generator polynomial. Because the generator polynomial is of a known length, the remainder added to the original string is also of fixed length.
At the receiver, the incoming string is divided by the generator polynomial. If the incoming string does not divide evenly, an error is assumed to have occurred. If the incoming string is divided by the generator polynomial evenly, the data delivered to the ultimate destination is the incoming data with the fixed length remainder field removed.
A longitudinal redundancy code (LRC) is a special case of CRC where the particular generator polynomial chosen results in the same CRC code as would be obtained by performing an EXCLUSIVE OR operation once for every bit in the data word. If the data stream were represented as a succession of multi-bit words, for example, the LRC code added to the end of the stream would equal the first word EXCLUSIVE ORed with the second, EXCLUSIVE ORed with the third, and so on. When the check is made at the receiver, the result is zero if no errors occurred. This is simply because the EXCLUSIVE OR of any value with itself is zero. A multiple memory error correction technique is shown in J. Datres, et al, "Multiple Memory Error Correction", IBM Technical Disclosure Bulletin, Vol. 24, No. 6, November 1981. This system first detects an error and then stores the erroneous double word back in memory in its complemented form. The double word is then fetched from memory again. The newly fetched double word is then complemented and the ECC check syndrome is examined. Finally, the recomplemented data is then stored back into memory.
U.S. Pat. No. 4,163,147, issued to Scheuneman, et al also discloses a double bit error correction system using double bit complementing.
Both of the references hereinabove cited have disadvantages. One disadvantage is that memory time, normally slower than CPU time, is required for these complementing and storage/restorage operations. Another disadvantage for both of the above-mentioned systems is that the error correction technique is reliable only when two errors occur, one being a so-called hard error, induced by media defects, mechanical nonlinearities and the like, and the other being a soft error, induced by random noise, correlated noise and the like. That is, these systems are reliable if, and only if, one of the two bits is erroneous due to memory failure. If both data bits detected are in error due to hard causes or if both errors are due to soft causes, these detection/correction systems fail. Other errors not related to memory devices can occur during the transfer of data over electrical lines. Thus, another disadvantage of the above-mentioned references is that during the course of transferring data and the complemented form of the data back and forth to memory, more errors may be generated.
U.S. Pat. No. 4,397,022, issued to Weng, et al discloses a weighted erasure codec for a Golay code. This system uses a pair of read only memories (ROM's) which are used to store the most likely 12-bit error patterns corresponding to each syndrome. This system is inherently expensive due to the very large number of patterns that may occur, requiring correspondingly great memory capacity for look-up tables.
U.S. Pat. No. 4,330,860, issued to Wada, et al discloses an error correction scheme requiring two types of check codes, P and Q. These two check codes result in the accumulation of a great number of check bits that must be stored and processed in the course of data operations. The large number of error correction check bits, requiring relatively large memory size, forces the system with which it is used to become more costly as larger data words are handled.
It would be advantageous for a system not only to detect single and multiple errors during data transfer, but also to correct single and double bit errors.
It would also be advantageous for an error correcting system to minimize the amount of time required for memory operations.
It would also be advantageous to detect and correct one or two errors in a data message that may have occurred due to hardware malfunction alone, extraneous causes other than hardware malfunction, or a combination of both hard and soft causes.
It would also be advantageous to minimize data transfer operations during error detection and/or correction in order to reduce the probability of extraneous data errors occurring.
It would also be advantageous to minimize the number and size of error patterns required as look-up tables during the course of error detection/correction operations, thus reducing memory capacity required therefor.
Finally, it would be advantageous to use an error correction system requiring a minimum number of check bits in order to reduce memory capacity, processing time and probability of further error during data manipulation.