The present invention relates to systems and methods for correcting errors on a data storage medium, and more particularly, to systems and methods for direct partial updates of Reed-Solomon CRC/ECC check bytes.
In data storage devices and systems such as, for example, hard disk drives (HDD), the combination of poor write/read conditions and low signal-to-noise ratio (SNR) data detection is likely to cause a mixed error mode of long bursts of errors and random errors in data sectors stored on the disk. Typically, byte-alphabet, Reed-Solomon (RS) codes are used to format the stored sector data bytes into codewords protected by redundant check bytes and used to locate and correct the byte errors in the codewords. Long codewords are more efficient for data protection against long bursts of errors as the redundant check byte overhead is averaged over a long data block. However, in data storage devices, long codewords cannot be used, unless a read-modify-write (RMW) process is used, because a logical unit data sector is 512 bytes long and computer host operating systems assume a 512 byte long sector logical unit. Each RMW process causes a loss of a revolution of the data storage medium. Losing revolutions of the data storage medium lowers input/output (I/O) command throughput. Therefore, frequent usage of the RMW process becomes prohibitive because it lowers I/O command throughput.
Rather than uniformly adding check bytes to short codewords to correct more random errors in the short codewords, U.S. Pat. No. 5,946,328, issued on Aug. 31, 1999, invented by Cox et al., and assigned to International Business Machines Corporation, discloses a method and means for generating check bytes that are not rigidly attached to a short codeword but are shared by several short codewords in an integrated interleaved Reed-Solomon (RS) Error Correction Coding (ECC) format. Interleaving is a commonly used technique by which the bytes in a data sector are split into several byte streams, each of which is encoded separately, and thus constituting a short Reed-Solomon codeword. A reason for interleaving is to split the errors in a sector among several codewords, thus avoiding the need to build in hardware a very complex Reed-Solomon decoder that can correct a very large number of errors. In the presence of random errors, distinct interleaves may be affected differently, to the effect that a sector can fail on-the-fly (OTF) correction due to an excess of a small number of random errors in one interleave. At low SNR's the probability of such sector failures increases due to an uneven distribution of random errors among the interleaves. U.S. Pat. No. 5,946,328 addresses this specific problem of sector failures due to random errors exceeding the OTF correction capability in one interleave by using shared check bytes in an integrated interleaving two-level ECC format.
In U.S. Pat. No. 5,946,328, the method and means for generating shared check bytes for a two-level ECC among a plurality of interleaves is implemented by performing byte-by-byte summation of all interleaves prior to encoding as well as the repeated byte-by-byte summation of all resulting codewords obtained after encoding. This requires that the interleaved data strings be simultaneously available for byte-by-byte summation, as is the case when the combined interleaves constitute a single data sector. Each individual interleave, as well as their sum, are encoded by Reed-Solomon encoders where the interleave sum codeword has more check bytes than each individual interleave codeword. Summation of the codewords produces a summed interleave codeword that is equally protected against random errors as all the other interleave codewords. The summed interleave codeword has less message bytes at the expense of additional potential check bytes to be used in any interleave codeword with an excess of random errors provided that the remaining interleave codewords do not have errors in excess of the OTF ECC capability.
The combination of low SNR detection and poor write/read conditions may result in both random errors as well as long bursts of byte errors (“mixed error mode”) becoming more and more likely at high areal densities and low flying heights, which is the trend in HDD industry. The occurrence of such mixed error mode combinations of random as well as burst errors is likely to cause the 512 byte sector interleaved OTF ECC to fail resulting in a more frequent use of a data recovery procedure (DRP) which involves rereads, moving the head, etc. These DRP operations result in the loss of disk revolutions that causes a lower input/output (I/O) throughput. This performance loss is not acceptable in many applications such as audio-visual (AV) data transfer, for example, which will not tolerate frequent interruptions of video data streams. On the other hand, uniform protection of all single sectors against both random as well as burst errors, at the 512 byte logical unit sector format, would result in excessive and unacceptable check byte overheads. Such check byte overheads also increase the soft error rate due to the increase in linear density of the data.
Long block data ECC, such as 4 K byte physical block comprising eight sectors, for example, could be a solution for some applications, but it would require a change in the operating system standard, unless read-modify-write (RMW) is accepted when writing single 512 byte sectors. Present operating systems are all based on a 512 byte long sector logical unit. RMW is required to update the long physical block check bytes. Thus, when a single 512 byte sector is written, the other sectors in the long block need to be read, the long block check bytes need to be recalculated, and the whole long block is then rewritten. Hence, the RMW causes an I/O throughput performance loss that is generally unacceptable for typical HDD operation.
Therefore, it would be desirable to have an ECC format for a data storage device that has a low sector failure rate for the mixed error mode of random error and burst error, that avoids frequent DRP or RMW use, and that also has an acceptable check byte overhead.