The present invention relates generally to an error correction apparatus and method. In particular, the invention relates to an apparatus and method for detecting miscorrections of data read from a storage medium using Linear Block Code encoding and decoding as an error detection code (EDC).
A storage medium is subject to various types of noise, distortion, and interference; various errors can occur at the output of the storage medium. This is particularly true for rotating magnetic media based storage systems and media, such as hard magnetic disc drives. The current trend is to store greater amounts of digital data in smaller areas on the storage medium. This trend increases the probability of errors. To detect errors and correct the detected errors, error correction coding (ECC) is used. To verify that the corrections made by the ECC are valid, an EDC is used.
Error correction coding typically consists of at least an xe2x80x9cencoding algorithmxe2x80x9d and a xe2x80x9cdecoding algorithmxe2x80x9d. For example, if systematic encoding is used, the encoding algorithm takes a block of k data symbols and computes r parity symbols which are appended to the data symbols to form a block of data. For any given code, the number r is fixed and the number k must be less than a fixed bound.
The block of data consisting of data symbols and parity symbols is referred to as a xe2x80x9ccodewordxe2x80x9d. However, a codeword written to the storage medium can be corrupted during read-back from the storage medium. Consequently, the decoding algorithm takes the codeword, which may or may not be corrupted, as input and then determines if errors have occurred and, if so, attempts to correct those errors. The decoding algorithm can correct up to T errors, where T is a fixed constant depending on the code and related to the number of parity symbols used to construct the codeword. Typically, the decoding algorithm computes symbols called xe2x80x9csyndromesxe2x80x9d, which are determined from the (possibly corrupted) data and parity symbols read from the storage medium. As a result of the codeword design, if all bits in all the syndromes are 0, the data read constitutes a valid codeword. Otherwise, errors are detected and the syndromes are used as the inputs to the error correction algorithm.
Linear Block Codes (LBC) are a very broad class of codes that are useful for error correction. Details concerning LBCs are apparent to those skilled in the art of error control systems and are therefore set forth briefly for a proper understanding of the invention. Generally, source data is segmented into blocks of k m-bit data or message symbols; each block can represent one of 2mk distinct messages. An encoder then transforms the blocks of k data/message symbols into larger blocks of n symbols. The encoder essentially adds. r=(nxe2x88x92k) parity symbols to the k data/message symbols. The collection of all such blocks of length n is a linear (n, k) block code because its 2mk codewords form a k-dimensional subspace of the vector space of all the n-tuples over a Galois Field (GF) of order 2m (GF(2m)). Consequently, it is possible to define a generator matrix as well as a parity check matrix. Encoding of the source data to generate a codeword, using non-systematic encoding, is accomplished by multiplying the k data/message symbols by the generator matrix. Decoding occurs by multiplying a received codeword by the parity check matrix to generate syndromes. If the syndromes equal zero, the received codeword is valid and the k data/message symbols are, with high probability, uncorrupted (i.e., the received codeword is the original codeword).
One typical and frequently employed class of LBC error correcting codes (ECC) are Reed-Solomon (RS) codes. For a TECC-error correcting RS code, 2TECC ECC parity symbols 102, such that r=2TECC, are appended to each block of k m-bit user data symbols 102 to create a codeword 103, as depicted in FIG. 1. There are k+2TECC symbols in a RS codeword. The RS code views each m-bit user data symbol as an element of GF(2m). A GF is a finite field, the elements of which may be represented as polynomials in alpha (xcex1), where xcex1 is a root of an irreducible polynomial of degree m. Table 1 depicts a subset of irreducible polynomials of degree m. The RS codeword consists of a block of m-bit symbols. Constructing the GF(2m) requires defining an irreducible polynomial P(x) of degree m. In addition, a primitive element beta xcex2 is chosen so that every nonzero element of GF(2m) is a power of xcex2. The element xcex2 is not necessarily a root of P(x).
A RS codeword is viewed as a codeword polynomial (C(x)) and the redundancy symbols are chosen so that the roots of C(x) include the roots of a generator polynomial G(x) whose roots are 2T consecutive powers of xcex2 beginning with xcex2m0, such that G(x)=(xxe2x88x92xcex2m0+1) . . . (xxe2x88x92xcex2m0+2Txe2x88x921) and mO is an arbitrary integer. The k user data symbols are viewed as the high order coefficients of a degree k+2Txe2x88x921 polynomial (U(x)), and the redundancy symbols are the coefficients of the remainder (B(x)) when the polynomial U(x) is divided by G(x) such that B(x)=U(x) modulo G(x), and C(x)=U(x)xe2x88x92B(x).
Once the codeword C(x) is formed, C(x) is generally written to a data storage medium. The process of corrupting the original codeword C(x) generally occurs during read back from the storage medium where a received polynomial R(x) is returned. However, if the received polynomial R(x) is not corrupted, R(x) is equal to C(x), the codeword polynomial before being written to the storage media.
However, errors often occur in bursts rather than in random patterns, so that several consecutive bytes or symbols are in error. Since the correction code""s capacity is limited to 2T errors, error bursts will often exceed the code""s capacity. For example, errors in such a burst are confined to a single codeword, the error correction coding technique cannot correct these errors.
Interleaving the user data is often used to overcome this deficiency respective of burst errors. Interleaving involves partitioning a block of data into smaller sub-blocks called interleaves and independently computing parity for each of the interleaves. Interleaving provides two principal advantages: (a) error bursts are distributed over several codewords; and (b) larger blocks of data can be encoded.
The ability to encode larger blocks of data with interleaving is vital when dealing with binary data. Binary data is partitioned into sub-blocks referred to as symbols. Typically, the symbols consist of 8 bits and are called bytes. Yet, the codeword size for RS codes of binary data is limited to 2mxe2x88x921 symbols, where m is the number of bits per symbol. If a RS code uses byte symbols, the number k of data symbols is bounded by 255xe2x88x922TECC. However, a typical block size for user data in a byte based system is often a 512-byte sector of data read from a storage medium. Unfortunately, 512-bytes cannot be encoded with a GF(28) RS code unless interleaving is used.
However, the use of interleaving does not eliminate the possibility of undetected corruption of user data. One possibility occurs when the data is corrupted in such a way as to form a different codeword. Another, and more likely possibility occurs when the number of errors exceeds the correction power of the code and the correction algorithm returns another valid codeword. This is referred to as a miscorrection. In other words, miscorrection refers to a situation where the decoding algorithm cannot recognize that its error correction capability is exceeded and returns a valid, yet incorrect, codeword.
To reduce the probability of a miscorrection, an additional code called an error detection code (EDC) is used. Here EDC parity symbols are computed and appended to user data. For the purposes of ECC encoding, the user data together with the EDC parity symbols are viewed as xe2x80x9cdata symbolsxe2x80x9d, so that ECC parity is computed based on both user data and EDC parity, as depicted in FIG. 2. As FIG. 2 shows, for FIG. 1: kxe2x89xa6255xe2x88x922TECC and FIG. 2: k is typically 512; the same k-user data symbols 100 as in FIG. 1, we now employ 2TEDC EDC parity symbols 104 in addition to the 2i TECC ECC parity symbols 102, where i is a natural number representing the number of interleaves into which the user data and EDC parity were partitioned.
One can use a RS code with byte symbols as an EDC. On read-back, both EDC syndromes (i.e., syndromes based on the EDC codeword formed by the user data and EDC parity) and ECC syndromes (i.e., syndromes based on the ECC codeword formed by the user data, EDC parity, and ECC parity) are computed.
However, RS codewords using byte symbols must have 255 or fewer data bytes. Thus, with a block of 512 user data bytes, the user data bytes must be implicitly or explicitly combined in order to meet the restriction on the number of data bytes. FIGS. 3 and 4 depict one possible way of explicitly combining a 512-byte user data block 106 to enable representation of the user data block with a GF(28) RS EDC code. As depicted FIG. 3, the 512-byte user data block 106 is represented by bytes B0, B1, . . . B511, separated into four columns, as shown. As depicted in FIG. 4, the data bytes to be encoded by the EDC might be 128 bytes such that U131=B0 XOR B1 XOR B2 XOR B3, . . . , U4=(B508 XOR B509 XOR B510 XOR B511) as shown in FIG. 4, where XOR refers to the exclusive-OR operation performed across the four columns of the 512-byte user data block 106 for a given row. For a T=2 EDC, parity bytes Q0, Q1, Q2, and Q3 would be constructed as the xe2x80x9cEDC codeword polynomialxe2x80x9d 112 C(x), such that C(x)=U131*x131+ . . . +U4*x4+Q3*x3+ . . . Q0. ECC parity (ECC0, ECC1, ECC2, ECC3) is then computed from the 516 bytes B0, . . . B511, R3, . . . , R0 using four interleaves as depicted in FIG. 3 to form an ECC codeword 122 for each interleave. In other words, the data encoded by C(x) is formed by performing an XOR across the four interleaves of the ECC codeword 122 for each row of user data 106.
Unfortunately, the explicit combination technique described above fails when identical miscorrections occur in two interleaves. Erasure flagging is a technique of marking certain bytes as suspect. For example, when a thermal asperity (TA) is detected by the read channel in a hard disk drive, the data read during the TA are considered suspect and can be flagged as erasures. In addition, erasure flagging can increase the likelihood of such identical miscorrections. The ability to flag suspected bad data can often double the error correction capability of a RS code. For example, if e is the number of erasures flagged and v is the number of other errors, we can correct the e erasures and v errors so long as 2v+exe2x89xa62T. As such, if no other error occurs, such that v=0, the error correction capability is doubled. However, the use of erasure flagging reduces the error detection capability of a RS code as e approaches 2T. Although the probability of identical miscorrections in two interleaves is very unlikely, the probability of miscorrection cancellation dramatically increases when erasure flags are used as a result of the explicit combination described above.
Therefore, there remains a need for a technique that provides the benefits of miscorrection detection and eliminates the possibility of error cancellation of the EDC described above. Moreover, the EDC should not be limited by a user data block size of 512 bytes.
The present invention overcomes the problems associated with the above system by disclosing a method and apparatus that utilizes a Linear Block Code with m-bit symbols with m greater than 8, for example a RS code defined over GF(2m), as an EDC as part of an error correction and miscorrection detection system. The LBC must allow codewords longer than 255 symbols, thereby avoiding the XORing that was necessary with a RS code over GF(256). For example, a RS code with 10-bit symbols allows codewords with 1023=210xe2x88x921 which easily accommodates the 512-byte data sectors commonly used in the computer industry. The present invention allows such a code to be used in an error correction system using a byte-based ECC. The EDC codeword is used to verify the ECC, such that when the ECC corrects errors in the data, the ECC corrections are used to adjust the EDC syndromes computed from the corrupted EDC codeword to determine if the ECC has made a miscorrection. More generally, the invention allows an EDC with m-bit symbols to be used with an ECC with n-bit symbols where m greater than n.
In accordance with one embodiment of the invention, a method for error correction and miscorrection detection in a data transmission is disclosed in which a data signal is received. The data signal is comprised of an EDC codeword which is further encoded with a plurality of n-bit error correction code (ECC) parity symbols; the EDC codeword is comprised of user data encoded with a plurality of m-bit EDC parity symbols, such that mxe2x89xa7n. A plurality of EDC syndromes are calculated from the EDC codeword. A correction signal consisting of an error location and an error value is received for each correction of the data. Each correction signal is then used to adjust the EDC syndromes. A miscorrection is detected if the value of the adjusted EDC syndromes is not equal to zero once all corrections are applied.
In accordance with another embodiment of the invention, an apparatus and system implementing the inventive methods is disclosed. An error correction and miscorrection detection in data transmission apparatus includes a memory buffer for storing user data contained within the data signal. A syndrome computer circuit is configured to receive the EDC codeword and compute EDC syndrome signals. An ECC error correction circuit is configured to receive the data signal and correct the user data contained in the memory buffer using a correction signal generated for each detection of a data corruption of the data signal. A completion signal is generated once correction of the user data is complete. An EDC syndrome fix-up circuit is configured to receive the EDC syndrome signals, each of the correction signals and the completion signal. The EDC syndrome signals are adjusted in response to each received error signal. A miscorrection is detected if a value of the adjusted EDC syndromes signals is not equal to zero once the completion signal is received.
Advantages of the invention include the ability to detect miscorrections by the ECC without having to explicitly combine the user data to form the EDC codeword. This results in a system in which 2mxe2x88x921 or more symbols per codeword are possible; for example, 1023 bytes, thereby eliminating the problem of error cancellation associated with the above systems for m=10. In addition, the EDC codewords can be constructed in a byte-based system with the addition of minimal hardware elements.