Error correcting codes (ECCs) play a vital role in today's world. As storage systems pack more and more information into smaller physical spaces, as wireless and other communications systems transmit data over cluttered communications media, as our technology pushes the limits of physics, there is an ever-increasing need to be protected from data errors. Compact discs, magnetic tapes and disks, memory chips, and satellite transmissions are among the many technologies that are prone to error from physical forces, yet we take these things for granted. The Voyager spacecraft developed by NASA in the 1970s, for instance, transmits radio signals to Earth with less power than emitted by an ordinary 60-watt light bulb, yet it has captured some of the clearest pictures of the outer planets taken to date.
A remarkable technology makes all of this reliability possible. All of these technologies rely, at least in part, on the use of error-correcting codes. Error correcting codes add a sufficient level of redundancy to a piece of data to be able to recover the data should the data become corrupted. Many such error correcting codes have been developed, such as Hamming codes, which are described in Hamming, R. W., Error detecting and error correcting codes, Bell System Tech. J., 26 (1950), pp. 147-160.
Many modern applications that make use of error correcting codes use what are known as Reed-Solomon codes, including the Voyager spacecraft and compact discs (which use a variant of Reed-Solomon codes known as CIRC, Cross-Interleaved Reed-Solomon Codes). Reed-Solomon codes were first described in Reed, I. S. and G. Solomon, Polynomial Codes over Certain Finite Fields, J. Soc. Ind. Appl. Math., 8 (1960), pp. 300-304.
The Reed-Solomon codes are part of a family of codes known as Bose-Chauduri-Hocquenghem (BCH) codes. BCH codes have the desirable property that they can correct a large number of errors with a minimum of redundant information; BCH codes can also correct a larger number of errors if the locations of the errors are known in advance. Errors with known locations are called “erasures” in the art. Also, in the art, an “error” is an error with an unknown location. Several algorithms, such as the Berlekamp-Massey algorithm, are known to those skilled in the art for efficiently decoding BCH codes to recover erasures and errors. Several of these algorithms are described in Blahut, Richard E., The Theory and Practice of Error Correcting Codes, Addison-Wesley, Reading, Mass. (1983), pp. 161-206.
Mathematically, a Reed-Solomon code maps values from a vector space of a first dimension over a finite field to a vector space of a second, higher dimension over the same field. The values in the second vector space correspond to coefficients of a set of linear equations, the solution to which is the data to be recovered. The element of redundancy in a Reed-Solomon code stems from the fact that the number of linear equations provides exceeds the minimum number “m” needed to recover the data, and that any “m” of the equations are linearly independent. In other words, you can recover the data by solving any “m” of the equations as a system.
Algorithms such as the Berlekamp-Massey algorithm described previously, perform this decoding step efficiently, taking into account the known locations of erasures (that is, which of the linear equation coefficients have been corrupted) so as to be able to recover a greater amount of data when erasures can be identified.
In any Reed-Solomon code, the entire vector of data can be recovered only if2t+e≦n−mwhere “t” is the number of errors, “e” is the number of erasures, “n” is the number of linear equations available to choose from (i.e., the dimension of the second vector space), and “m” is the is the minimum number of linear equations necessary to recover the data. In other words, it takes one more uncorrupted linear equation to find an error than it does to simply correct one.
Reed-Solomon codes, because they rely on the use of multiple sets of coefficients, are easily adapted for use in multi-track recording media, such as magnetic tape. An encoded vector of data can be made to span multiple tracks, such that the coefficients for each of the Reed-Solomon equations reside on a separate track. In that way, if one or more tracks become corrupted, but at least “m” tracks can be read successfully, the entire original vector of data can be recovered. Of course, identifying those “m” tracks is easier when some of the corrupted tracks are already known.
Error detecting codes provide a way of identifying errors in a stream of data. One very effective error-detecting code is the cyclic redundancy check (CRC). CRC codes are described in Messmer, Hans-Peter, The Indispensable PC Hardware Book, 2d. Ed., Addison-Wesley, Reading, Mass. (1995), pp. 694-702. An erasure on a particular track can be identified by interposing CRC codes on each track at periodic intervals. The CRC codes act as a sort of checksum for the data they follow. An erasure can be identified by comparing a CRC code calculated from a block of data on a given track with the CRC code recorded at the end of the block.
While this is an accurate way to identify a track containing corrupted data, it says nothing about the location within the block of data or the extent of the corruption. Thus, a section of magnetic tape with small amounts of corruption on all or most of the tracks will appear as all containing erasures simultaneously, when in fact, there may only be isolated incidents of corruption on different tracks at different times. If, in such a situation, all of the tracks known to contain errors are treated as erasures, no data will be recovered, since the erasure tracks will be disregarded as corrupted, and there will not be enough remaining tracks to be able to recover the data.
Thus, while identification of erasures is helpful when large amounts of data on a given track are corrupted, it can cause problems when small errors are randomly distributed across many of the tracks. It would, thus, be desirable to have a system for correcting errors in a multi-track medium that is adapted to handle both large erasures and small errors.