1. Field of the Invention
This invention relates to electronic circuits for ensuring data integrity in data storage or data communication applications; in particular, this invention relates to minimizing the probability of miscorrection in applying error correction and error detection codes to data storage and data communication applications.
2. Discussion of the Related Art
Error correction and error detection codes have been used extensively in data communication and data storage applications. In a data communication application, data is encoded prior to transmission, and decoded at the receiver. In a data storage application, data is encoded when stored in a storage device, e.g. a disk drive, and decoded when retrieved from the storage device. For the present discussion, it is unnecessary to distinguish between these applications. Hence, although the remainder of this description describes a data storage-retrieval system, the principles discussed herein are equally applicable to a data communication application.
In a typical application of error detection and correction codes, data symbols are stored in blocks, which each include a selected number of special symbols, called check symbols. A symbol may consist of a single bit or multiple bits. The check symbols in each block represent redundant information concerning the data stored in the block. When decoding the blocked data, the check symbols are used to detect both the presence and the locations of errors and, in some instances, correct these errors. The theory and applications of error correction codes are described extensively in the literature. For example, the texts (i) "Error-Correcting Codes", Second Edition, by W. Wesley Peterson and E. J. Weldon, published by the MIT Press, Cambridge, Mass. (1972), and (ii) "Practical Error Correction Design for Engineers", revised second edition, by N. Glover and T. Dudley, Cirrus Logic, Colorado, publisher (1991), are well-known to those skilled in the art.
In a typical application of error correction codes, the input data is divided into fixed-length blocks ("code words"). Each code word consists of n symbols, of which a fixed number k are data symbols, and the remaining (n-k) symbols are check symbols. (For convenience, in this description, such a code is referred to as an (n, k) code). As mentioned above, the check symbols represent redundant information about the code word and can be used to provide error correction and detection capabilities. Conceptually, each data or check symbol of such a code word represents a coefficient of a polynomial of order (n-1). In the error correcting and detecting codes of this application, the check symbols are the coefficients of the remainder polynomial generated by dividing the order (n-1) polynomial by an order (n-k) "generator" polynomial over a Galois field.sup.1. For an order (n-1) polynomial divided by an order (n-k) polynomial, the remainder polynomial is of order (n-k-1). Typically, in a data storage application, both the data symbols and the check symbols are stored. FNT .sup.1 For a discussion of Galois fields, the reader is directed to .sctn.6.5 in the aforementioned text "Error-Correcting Codes" by W. Peterson and E. Weldon Jr.
During decoding, both data symbols and check symbols are read from the storage medium, and one or more "syndromes" are computed from the code word (i.e. the data and the check symbols) retrieved. A syndrome is a characteristic value computed from a remainder polynomial, which is obtained by dividing the code word retrieved by the generator polynomial. Ideally, if no error is encountered during the decoding process, all computed syndromes are zero..sup.2 A non-zero syndrome indicates that one or more errors exist in the code word. Depending on the nature of the generator polynomial, the encountered error may or may not be correctable. If the generator polynomial can be factorized, a syndrome computed from the remainder polynomial obtained by dividing the retrieved code word by one of the factors of the generator polynomial is called a "partial syndrome". FNT .sup.2 In some applications, e.g. in certain cyclic redundancy check schemes, a non-zero characteristic number results when no error is encountered. Without loss of generality, a syndrome of zero is assumed when no detectable error is encountered.
A useful measure for quantifying the difference between two binary words is the "Hamming distance", which is defined as the number of bit positions at which the two words differ. For example, the 4-bit words `0000` and `0010` have between them a Hamming distance of one since these words differ only at the second least significant bit. In the design of error correction codes or error detection codes, a measure "minimum distance" can be defined. The minimum distance is the minimum Hamming distance between valid code words. In such an error correction or detection code, if the valid code word and the retrieved code word have a Hamming distance between them which is less than the minimum distance, the probability is high that the retrieved code word results from errors occurring in the valid code word at the bit positions defining the Hamming distance between these words. Thus, in error correction codes, the decoder can resolve, with minimum risk of miscorrecting the retrieved code word, in favor of the valid code word having the smallest Hamming distance from the retrieved code word.sup.3. Of course, if the minimum distance is an even integer, ambiguity for error correction exists when a retrieved code word is the same Hamming distance from two or more valid code words. For an error correcting code with a minimum distance d, the numbers of errors which can be corrected and detected are ##EQU1## respectively. The concepts of the Hamming distance and the minimum distance can be extended to multi-bit symbols. In such an extension, the Hamming distance and the minimum distance are each expressed in symbol units. For example, if an error correction code using multi-bit symbols, the Hamming distance is defined as the number of symbol positions at which two code words differ. In such an error correction code, two code words differ by a distance of one, if they differ at one symbol position, regardless of the number of bit positions these code words differ within the corresponding symbols at that symbol position. FNT .sup.3 The probability of miscorrection is given by: ##EQU2## FNT .sup.4 Here, the symbol .vertline.x.vertline. denotes the floor function of x.
One goal in designing error correction and error detection codes is the selection of an appropriate minimum distance. In general, the minimum distance of an error correction code can be increased by increasing the number of check symbols in a code word. At the same time, however, increasing the number of check symbols in a code word increases both the complexity of the necessary decoding logic and the overhead cost (in terms of decoding time and storage space) associated with the additional check symbols. The capability of an error correction or detection code is sometimes characterized by the size of the maximum error burst the code can correct or detect. For example, a convenient capability measure is the "single error burst correction" capability, which characterizes the code by the maximum length of consecutive error bits the code can correct, as measured from the first error bit to the last error bit, if a single burst of error occurs within a code word. Another example of a capability measure would be the "double error burst detection" capability, which characterizes the error correction or error detection code by the maximum length of each error burst the error correction code can detect, given that two or less bursts of error occur within a code word.
A well-known class of error correcting codes is the Reed-Solomon codes, which are characterized by the generator polynomial G(X), given by: EQU G(X)=(X+.alpha..sup.j)(X+.alpha..sup.j+1)(X+.alpha..sup.j+2) . . . (X+.alpha..sup.j+i-1)(X+.alpha..sup.j+i)
where .alpha. is a primitive element of GF(2.sup.m) and, i and j are integers.
Because errors often occur in bursts, a technique, called "interleaving", is often used to spread the consecutive error bits or symbols into different "interleaves", which can each be corrected individually. Interleaving is achieved by creating a code word of length nw from w code words of length n. In one method for forming the new code word, the first w symbols of the new code word are provided by the first symbols of the w code words taken in a predetermined order. In the same predetermined order, the next symbol in each of the w code words is selected to be the next symbol in the new code word. This process is repeated until the last symbol of each of the w code words is selected in the predetermined order into the new code word. Another method to create a w-way interleaved code is to replace a generator polynomial G(X) of an (n, k) code by the generator polynomial G(X.sup.w). This technique is applicable, for example to the Reed-Solomon codes mentioned above. Using this new generator polynomial G(X.sup.w), the resulting (nw, kw) code has the error correcting and detecting capability of the original (n, k) code in each of the w interleaves so formed.
Another approach used in the prior art for error detection is the use of the cyclic redundancy check (CRC) symbols. Under this approach, a CRC checksum, which is the value of a polynomial function applied to the input data, is computed. This CRC checksum is transmitted or stored with the data and is recomputed at the receiver or when retrieved. If no error is encountered, the computed CRC checksum matches the retrieved or received CRC checksum. Otherwise, one or more errors exist in the data or the retrieved CRC checksum. Unlike error correction codes, the CRC checksum does not pin-point where the error or errors occur, and thus does not provide the capability for correcting errors.
In a practical implementation, the full error correction capability of an error correction code is often not fully exploited because the complexity of logic circuits required for error location and error correction can become prohibitive for a relatively small minimum distance. It is, however, desirable to exploit the full capability of error detection, even though the full capability of error correction is not exploited. Indeed, in some applications, the consequence of a miscorrection is extremely undesirable. In those applications, an error is merely flagged, so that some other error recovery procedures may take over. For example, in a disk drive application, during data read, noise may cause a transient error in the magnetic head mechanism. Such an error is best handled by re-reading the sector from which the error occurred, rather than by attempting to correct the faulty data. In such an application, the additional cost of exploiting the full capability of the error correction code may not be economically justified.