The importance of error correction coding of data in digital computer systems has increased greatly as the density of the data recorded on mass storage media, more particularly disks, has increased. With higher recording densities, a tiny imperfection in the recording surface of a disk can corrupt a large amount of data. In order to avoid losing that data, error correction codes ("ECC's") are employed to, as the name implies, correct the erroneous data.
Before a string of data symbols is written to a disk, it is mathematically encoded to form ECC symbols. The ECC symbols are then appended to the data string to form code words--data symbols plus ECC symbols--and the code words are written to or stored on the disks. When data is to be read from the disks, the code words containing the data symbols to be read are retrieved from the disks and mathematically decoded. During decoding any errors in the data are detected and, if possible, corrected through manipulation of the ECC symbols For a detailed description of decoding see Peterson and Weldon, Error Correction Codes, 2d Edition, MIT Press, 1972!.
Stored digital data can contain multiple errors. One of the most effective types of ECC used for the correction of multiple errors is a Reed-Solomon code For a detailed description of Reed-Solomon codes, see Peterson and Weldon, Error Correction Codes!. To correct multiple errors in strings of data symbols, Reed-Solomon codes efficiently and effectively utilize the various mathematical properties of sets of symbols known as Galois Fields, represented "GF(P.sup.q)", where "P" is a prime number and "q" can be thought of as the number of digits, base P, in each element or symbol in the field. "P" usually has the value 2 in digital computer applications and, therefore, "q" is the number of bits in each symbol.
The number of symbols which an ECC based on a Reed-Solomon code can effectively encode and correct, or "protect," is limited by the size of the Galois Field selected, i.e. P.sup.q symbols, and the maximum number of errors which the code is to be capable of correcting. The maximum length of a cyclic Reed-Solomon code word for GF (P.sup.q) is P.sup.q -1 symbols. Thus the maximum number of data symbols which can be protected by the ECC, that is, included in the code word, is P.sup.q -1 symbols minus "e," where "e" is the number of ECC symbols. The larger the Galois Field, the longer the code word, and the more data the ECC can protect for a given maximum number of errors to be corrected.
While larger Galois Fields could be used to protect larger strings of data symbols, is using Galois Fields that result in code word symbols that have a number of bits that is greater than eight and not a multiplier of eight, complicates the circuitry of the system. The remainder of the system operates with 8-bit symbols or bytes, or symbols that are multiples of eight bits. Accordingly, if the ECC uses symbols that are longer than eight bits, the system must include an interface to translate the symbols between the 8-bit symbols used by the remainder of the system and the longer symbols used by the ECC circuitry.
An ECC based on GF(2.sup.8) can protect a string of up to 253 8-bit data symbols or "data bytes" against a single error, if two 8-bit ECC symbols are appended to the data, making the code word 255 or 2.sup.8 -1 bytes long. If the ECC is to correct more than one error, more ECC symbols, two for each additional error to be corrected, must be used in the code word. This means that fewer data bytes can be protected for a given length of code word.
Information is often stored on magnetic disks in sectors which are 512 or 576 bytes in length. Therefore, ECC's which are based on GF(2.sup.8) must be interleaved some number of times to protect an entire 512 or 576 byte sector. Interleaving effectively splits the string of data symbols into several smaller segments, i.e., segments of less than 255 symbols each, and treats each segment as a stream of data symbols to be encoded. The benefits of interleaving are that it permits a larger number of data symbols to be encoded by a given code, and that it effectively separates bursts of errors by encoding adjacent data symbols in different code words. However, in the systems that use interleaving there is a chance that the error correction actually introduces errors by modifying the data symbols to produce valid, but incorrect, code words. To prevent this "miscorrection" a separate error detection code, or cross-check, is typically used to ensure that the modifications made using the interleaved error correction code produce the correct data symbols.
In prior systems 8-bit-symbol error detection codes ("EDCs") over GF(2.sup.8) or 16-bit-symbol error detection codes over GF(2.sup.16) are used. The 8-bit-symbol codes are easy to implement but are relatively weak, with maximum distances of only two. Alternatively, the more powerful 16-bit-symbol codes are complex to implement, requiring manipulation of 16-bit symbols.
An error detection code based on GF(2.sup.10) has sufficient code word length, i.e. 2.sup.10 -1 or 1023 symbols per code word, to readily cross check an entire sector. However, the encoding and decoding of the 10-bit symbols used in a GF(2.sup.10) code present certain problems.
As discussed above, computer transmission and storage hardware is set up for bytes, i.e. 8-bit symbols, or symbols whose length are some multiple of 8-bits, such as 16-bit symbols. Thus they are, in general, not arranged for manipulation of 10-bit symbols. Therefore, if a GF(2.sup.10) EDC is to be used, the information has to be translated back and forth between bytes and 10-bit symbols, first for encoding as 10-bit symbols, next for transmission and storage as bytes, and finally for decoding as 10-bit symbols. The requirement of translating between bytes and 10-bit symbols at both the encoder and the decoder adds the complexity of another step to the EDC cross check process. Further, since the data are modulated by modulation codes that are based on 8-bit symbols, using 10-bit symbols for the EDC may result in more errors since a modulation code symbol may be demodulated into more than one EDC symbol. Thus, an erroneous demodulation of one modulation code word symbol may result in two erroneous EDC code word symbols.
One solution is to generate a code word in GF(2.sup.10) to protect up to 1023 bytes (data, ECC symbols) and yet use 8-bit symbols, or bytes, as the EDC symbols. In a prior system that uses an ECC over GF(2.sup.10), one or more predetermined pseudo data bytes are appended to the data bytes and the string comprised of the data bytes plus the pseudo data bytes is encoded to produce the desired number of 10-bit ECC symbols. Then, two selected bits in each of the 10-bit ECC symbols are compared to a known 2-bit pattern, e.g. "00." If the selected two bits in each of the 10-bit ECC symbols are the same as the 2-bit pattern, the selected bits of the ECC symbols are ignored or truncated and the remaining 8 bits of each of the ECC symbols are concatenated with the data bytes and the appended pseudo data bytes to form the code word. The code word bytes can later be decoded, and any error correction performed, by appending the known 2-bit truncation pattern as necessary for Galois Field addition and/or multiplication.
If any of the selected bits in any of the ECC 10-bit symbols are not the same as the truncation pattern, the appended pseudo data bytes are modified such that encoding the data bytes plus the modified pseudo data bytes produces 10-bit ECC symbols with the selected bits the same as the truncation pattern. Then the selected bits, which are now the same as the known truncation pattern, are ignored and the remaining 8 bits of each of the ECC symbols and the modified pseudo data bytes are stored along with the data bytes as the code word. Again, the modified pseudo data bytes and the remaining 8 bits of the ECC symbols contain all the information necessary to allow the decoding and error correction of the code word as bytes.
The prior system is discussed in U.S. Pat. No. 4,856,003 entitled Error Correction Code Encoder, which is assigned to a common assignee. We have devised an error detection/correction system that uses less complex circuitry to manipulate the data in accordance with a generator polynomial g(x) over GF(2.sup.10) and produce code word redundancy symbols in GF(2.sup.8). Such a system can also be used to encode data over selected fields GF(2.sup.w+i) and produce redundancy symbols in GF(2.sup.w).