The use of cyclic error correcting codes in connection with the storage of data in storage devices is well established and is generally recognized as a reliability requirement for the storage system. Generally, the error correcting process involves the processing of syndrome bytes to determine the location and value of each error. Non-zero syndrome bytes result from the exclusive-ORing of error characters that are generated when data is read from the storage medium.
The number of error correction code (ECC) check characters employed depends on the desired power of the code. As an example, in many present day ECC systems used in connection with the storage of 8-bit bytes in a storage device, two check bytes are used for each error to be corrected in a codeword having a length of at most 255 byte positions. Thus, for example, six check bytes are required to correct up to three errors in a block of data having 249 data bytes and six check bytes. Six distinctive syndrome bytes are therefore generated in such a system. If there are no errors in the data word comprising the 255 bytes read from the storage device, then the six syndrome bytes are the all zero pattern. Under such a condition, no syndrome processing is required and the data word may be sent to the central processing unit. However, if one or more of the syndrome bytes are non-zero, then syndrome processing involves the process of identifying the location of the bytes in error and further identifying the error pattern for each error location.
The underlying mathematical concepts and operations involved in normal syndrome processing operations have been described in various publications. These operations and mathematical explanations generally involve first identifying the location of the errors by use of what has been referred to as the “error locator polynomial”. The overall objective of the mathematics involved employing the error locator polynomial is to define the locations of the bytes in error by using only the syndrome bytes that are generated in the system.
The error locator polynomial has been conventionally employed as the start of the mathematical analysis to express error locations in terms of syndromes, so that binary logic may be employed to decode the syndrome bytes into first identifying the locations in error, in order to enable the associated hardware to identify the error patterns in each location. Moreover, error locations in an on-the-fly ECC used in storage or communication systems are calculated as roots of the error locator polynomial.
A specific concern facing the data storage industry is the combination of poor read/write conditions and low signal-to-noise ratio data detection that are likely to cause read hard errors. A read hard error is comprised of an arbitrary mixture of B-byte burst errors and t-byte random errors in data sectors stored on a disk or data storage medium.
Typically, byte-alphabet, Reed-Solomon codes are used to format the stored sector data bytes into codewords, protected by redundant check bytes and used to locate and correct the byte errors in the codewords. Long codewords are more efficient for data protection against long bursts of errors as the redundant check byte overhead is averaged over a long data block. However, in data storage devices, long codewords cannot be used, unless a read-modify-write process is used because the logical unit data sector is 512 bytes long and the computer operating system assumes a 512-byte long sector logical unit. Each read-modify-write process causes a loss of a revolution of the data storage medium. Losing revolutions of the data storage medium lowers the input/output (I/O) command throughput. Therefore, frequent usage of the read-modify-write process becomes prohibitive.
Rather than uniformly adding check bytes to short codewords to correct more random errors in the short codewords, a method has been proposed for generating check bytes that are not rigidly attached to a short codeword but are shared by several short codewords in an integrated sector Reed-Solomon Error Correction Coding (ECC) format.
The combination of low signal to noise ratio and poor read/write conditions may result in both random errors as well as long bursts of byte errors (“mixed error mode”) becoming more and more likely at high areal densities and low flying heights, which is the trend in HDD industry. The occurrence of such mixed error mode combinations of random as well as burst errors is likely to cause the 512-byte sector interleaved on-the-fly ECC to fail, resulting in a more frequent use of a data recovery procedure that involves rereads, moving the head, etc.
These data recovery procedures result in the loss of disk revolutions, which, in turn, causes a lower input/output throughput. This performance loss is not acceptable in many applications such as audio-visual (AV) data transfer, for example, which will not tolerate frequent interruptions of video data streams. On the other hand, uniform protection of all single sectors against both random as well as burst errors, at the 512-byte logical unit sector format, would result in excessive and unacceptable check byte overheads. Such check byte overheads also increase the error rate due to the increase in linear density of the data.
Furthermore, the decoding latency, is typically a function of the square of the number of the check bytes (R2), which could further decrease the throughput performance of the storage system.
Therefore, it would be desirable to have an algebraic decoder and associated method for correcting an arbitrary mixture of burst errors and random errors with an improved decoding latency. The decoder is not limited to a specific number of random errors, such as 1 or 2 random errors. Further, the decoding latency should be a linear function of the overhead as compared to a conventional quadratic latency function (e.g., in the case of 2 random errors).