The importance of error correction coding of data in digital computer systems has increased greatly as the density of the data recorded on mass storage media, more particularly disks, has increased. With higher recording densities, a tiny imperfection in the recording surface of a disk can corrupt a large amount of data. In order to avoid losing that data, error correction codes ("ECC's") are employed to, as the name implies, correct the erroneous data.
Before a string of data symbols is written to a disk, it is mathematically encoded to form ECC redundancy symbols. The ECC redundancy symbols are then appended to the data string to form code words--data symbols plus ECC redundancy symbols--and the code words are written to or stored on the disks. When data is to be read from the disks, the code words containing the data symbols to be read are retrieved from the disks and mathematically decoded. During decoding any errors in the data are detected and, if possible, corrected through manipulation of the ECC redundancy symbols For a detailed description of decoding see Peterson and Weldon, Error Correction Codes, 2d Edition, MIT Press, 1972!.
Stored digital data can contain multiple errors. One of the most effective types of ECC used for the correction of multiple errors is a Reed-Solomon code For a detailed description of Reed-Solomon codes, see Peterson and Weldon, Error Correction Codes!. To correct multiple errors in strings of data symbols, Reed-Solomon codes efficiently and effectively utilize the various mathematical properties of sets of symbols known as Galois Fields, represented "GF(Pq)", where "P" is a prime number and "q" can be thought of as the number of digits, base P, in each element or symbol in the field. "P" usually has the value 2 in digital computer applications and, therefore, "q" is the number of bits in each symbol.
The number of symbols which an ECC based on a Reed-Solomon code can effectively encode and correct, or "protect," is limited by the size of the Galois Field selected, i.e. Pq symbols, and the maximum number of errors which the code is to be capable of correcting. The maximum length of a cyclic Reed-Solomon code word for GF (Pq) is Pq-1 symbols. Thus the maximum number of data symbols which can be protected by the ECC, that is, included in the code word, is Pq-1 symbols minus "e," where "e" is the number of ECC redundancy symbols. The larger the Galois Field, the longer the code word, and the more data the ECC can protect for a given maximum number of errors to be corrected.
While larger Galois Fields could be used to protect larger strings of data symbols, using Galois Fields that result in code word symbols that have a number of bits that is greater than eight and not a multiplier of eight, complicates the circuitry of the system. The remainder of the system operates with 8-bit symbols or bytes, or symbols that are multiples of eight bits. Accordingly, if the ECC uses symbols that are longer than eight bits, the system must include an interface to translate the symbols between the 8-bit symbols used by the remainder of the system and the longer symbols used by the ECC circuitry.
An ECC based on GF(2.sup.8) can protect a string of up to 253 8-bit data symbols or "data bytes" against a single error, if two 8-bit ECC redundancy symbols are appended to the data, making the code word 255 or 2.sup.8 -1 bytes long. If the ECC is to correct more than one error, more ECC redundancy symbols, two for each additional error to be corrected, must be used in the code word. This means that fewer data bytes can be protected for a given length of code word.
Information is often stored on magnetic disks in sectors which are 512 or 576 bytes in length. Therefore, ECC's which are based on GF(2.sup.8) must be interleaved some number of times to protect an entire 512 or 576 byte sector. Interleaving effectively splits the string of data symbols into several smaller segments, i.e., segments of less than 255 symbols each, and treats each segment as a stream of data symbols to be encoded. The benefits of interleaving are that it permits a larger number of data symbols to be encoded by a given code, and that it effectively separates bursts of errors by encoding adjacent data symbols in different code words. However, in the systems that use interleaving there is a chance that the error correction actually introduces errors by modifying the data symbols to produce valid, but incorrect, code words. To prevent this "miscorrection" a separate error detection code, or cross-check, is typically used to ensure that the modifications made using the interleaved error correction code produce the correct data symbols.
In prior systems 8-bit-symbol error detection codes ("EDCs") over GF(2.sup.8) or 16-bit-symbol error detection codes over GF(2.sup.16) are used. The 8-bit-symbol codes are easy to implement but are relatively weak, with maximum distances of only two. Alternatively, the more powerful 16-bit-symbol codes are complex to implement, requiring manipulation of 16-bit symbols.
An error detection code based on GF(2.sup.10) has sufficient code word length, i.e. 2.sup.10 -1 or 1023 symbols per code word, to readily cross check an entire sector. However, the encoding and decoding of the 10-bit symbols used in a GF(2.sup.10) code present certain problems.
As discussed above, computer transmission and storage hardware is set up for bytes, i.e. 8-bit symbols, or symbols whose lengths are some multiple of 8-bits, such as 16-bit symbols. Thus they are, in general, not arranged for manipulation of 10-bit symbols. Therefore, if a GF(2.sup.10) EDC is to be used, the information has to be translated back and forth between bytes and 10-bit symbols, first for encoding as 10-bit symbols, next for transmission and storage as bytes, and finally for decoding as 10-bit symbols. The requirement of translating between bytes and 10-bit symbols at both the encoder and the decoder adds the complexity of another step to the EDC cross check process. Further, since the data are modulated by modulation codes that are based on 8-bit symbols, using 10-bit symbols for the EDC may result in more errors since a modulation code symbol may be demodulated into more than one EDC symbol. Thus, an erroneous demodulation of one modulation code word symbol may result in two erroneous EDC code word symbols.
One solution is to generate a code word in GF(2.sup.10) to protect up to 1023 bytes (data, ECC redundancy symbols) and yet use 8-bit symbols, or bytes, as the EDC symbols. In a prior system that uses an ECC over GF(2.sup.10), one or more predetermined pseudo data bytes are appended to the data bytes and the string comprised of the data bytes plus the pseudo data bytes is encoded to produce the desired number of 10-bit ECC redundancy symbols. Then, two selected bits in each of the 10-bit ECC redundancy symbols are compared to a known 2-bit pattern, e.g. "00." If the selected two bits in each of the 10-bit ECC redundancy symbols are the same as the 2-bit pattern, the selected bits of the ECC redundancy symbols are ignored or truncated and the remaining 8 bits of each of the ECC redundancy symbols are concatenated with the data bytes and the appended pseudo data bytes to form the data code word. The code word bytes can later be decoded, and any error correction performed, by appending the known 2-bit truncation pattern as necessary for Galois Field addition and/or multiplication.
If any of the selected bits in any of the ECC 10-bit symbols are not the same as the corresponding bits of the truncation pattern, the ECC symbols and the appended pseudo data bytes are modified in a second encoding operation to produce a modified code word consisting of the data bytes, the modified pseudo data bytes and 10-bit ECC redundancy symbols with the selected bits set in the truncation pattern. The selected bits, which are now the same as the known truncation pattern, are then ignored and the remaining 8 bits of each of the ECC redundancy symbols and the modified pseudo data bytes are stored along with the data bytes as the data code word. The modified pseudo data bytes and the remaining 8 bits of the ECC redundancy symbols contain all the information necessary to allow the decoding and error correction of the data code word.
The prior system is discussed in U.S. Pat. No. 4,856,003 entitled Error Correction Code Encoder, which is assigned to a common assignee. As discussed in detail below, we have made improvements to the prior system. As discussed in U.S. patent application Ser. No. 08/786,894, entitled Modified Reed-Solomon Error Correction System Using (w+i+1)-Bit Representations of Symbols of GF(2.sup.w+i), which is incorporated herein by reference, we have devised an error correction system that includes in the data code word additional pseudo redundancy symbols, instead of including therein the pseudo data bytes. This allows the system, for example, to manipulate the data in a conventional manner as 10-bit symbols (with appended bits set in the truncation pattern) to produce the 10-bit ECC redundancy symbols.
The system discussed in the above-referenced patent application next modifies the ECC redundancy symbols, as necessary, by combining them with an ECC modifier code word, to set the appropriate two bits in each ECC redundancy symbol in the truncation pattern. The ECC modifier code word also appends to the ECC symbols the pseudo redundancy symbols that contain the information necessary for decoding. The system then truncates the 2-bit pattern from both the modified ECC redundancy symbols and the pseudo redundancy symbols and appends them as 8-bit symbols to the 8-bit data symbols to form the data code word. This speeds up the encoding process over that used in the prior system, since the system is not required to modify pseudo data symbols before appending the ECC redundancy symbols to the data symbols.
We have also further improved the system by including therein an encoder that produces the data code word redundancy symbols, that is, both the modified ECC redundancy symbols and the pseudo redundancy symbols, in a single encoding operation.