1. Field of the Invention
This invention relates generally to computer storage devices, and more particularly to devices for detecting and correcting errors in data that are read in from storage device media.
2. Description of the Related Art
Modem computer systems typically include one or more storage devices (e.g., hard disk drives, CD-ROM drives, DVD-ROM drives, etc.) to store large amount of data and programs. These mass storage devices can provide information to the processors in the computer systems through random access memory (RAM) circuitry such as dynamic random access memory (DRAM), static random access memory (SRAM), etc. For most computer systems, storing information in a mass storage device and retrieving the information as needed in a hierarchical structure is generally far more economical than using exclusively a RAM circuitry. As used herein, the term xe2x80x9cstorage devicexe2x80x9d refers to any suitable mass storage devices that store and access data to and from a circular disk such as hard disk drives, CD-ROM and CD-RAM drives, DVD-ROM and DVD-RAM drives, removable disk drives, and the like.
Mass storage devices typically store information in sectors by using magnetic or optical technology. Like most recording technology, reading data bits from the sectors often generates errors due to noise, manufacturing imperfections of the physical medium, dust, etc. To detect and correct such errors, mass storage devices typically implement an error correction code (ECC) scheme in writing to and reading from hard disk drives. The implementation of ECC schemes allows encoding of user data for reliable recovery of the original data.
Conventional ECC schemes often implement well known Reed-Solomon codes for detecting and correcting errors in data that have been read in from the devices. The Reed-Solomon codes are defined by a generator polynomial, which has 2t consecutive powers of xcex1 as roots, where a is a primitive element in an extension field GF(2m). In this case, each codeword polynomial c(x) will have the same sequence of roots, or c(xcex1i)=0, where i=1, 2, 3, . . . , 2txe2x88x921. Thus, the codeword polynomial c(x) may be evaluated at each power of a to yield a set of simultaneous equations.
In this scenario, a single received word represented as a polynomial r(x) can be evaluated as the sum of the transmitted codeword polynomial c(x) and the error polynomial e(x): r(x)=c(x)+e(x). Then, the polynomial r(x) can be evaluated for each of the roots xcex1, xcex12, xcex13, . . . , xcex12txe2x88x921 to yield r(xcex1k)=c(xcex1k)+e(xcex1k), where k=1, 2, . . . , 2txe2x88x921. Since c(xcex1k) is equal to zero for k=1, 3, . . . , 2txe2x88x921, the equation is simplified to: r(xcex1k)=e(xcex1k). The values produced by this equation are called syndrome values and are typically denoted as sk=r(xcex1k)=e(xcex1k), which is equivalent to the following: e0(xcex1k)0+e1(xcex1k)1+e2(xcex1k)2+ . . . +enxe2x88x921(xcex1k)nxe2x88x921.
Since the coefficient ei will either be 0 or 1, the syndrome value may be expressed as a sum of the terms having nonzero coefficients only. By reversing the order of exponents, the syndrome value can be derived in accordance with the following equation:                                           S            k                    =                                    ∑                                                e                  i                                ≠                0                                      ⁢                          xe2x80x83                        ⁢                                          (                                  α                  i                                )                            k                                      ,                  xe2x80x83                ⁢                  k          =          1                ,        3        ,        …        ⁢                  xe2x80x83                ,                              2            ⁢            t                    -          1                                    (        1        )            
Equation (1) thus defines a system of equations that can be solved for the nonzero coefficients ei based on the syndrome values sk. The Reed-Solomon codes for encoding and decoding error correction codes are well known in the art and are described in more detail in in Error Coding Cookbook (1996), by C. Britton Rorabaugh, ISBN 0-07-911720-1, and in Error Control Systems for Digital Communication and Storage by Stephen B. Wicker, ISBN 0-13-200809-2. These references are incorporated herein by reference.
In order to utilize the ECC scheme, data is first encoded into an ECC format for storage. For example, a conventional ECC scheme typically computes ECC checkbytes for a given block of user data such as a sector. Then, the computed ECC checkbytes are appended to the sector of user data to form ECC data sector and then recorded on a storage device medium. Thus, each ECC data sector typically contains user data (e.g., 512 bytes) and additional ECC check bytes appended to the user data bytes.
A Each of the ECC data sectors also includes a sync pattern or bytes for identifying the beginning of the sector. The sync pattern or bytes are thus used to delineate a sector boundary. When the recorded sectors of data are read from a storage device medium, the ECC scheme decodes the received sector data including the ECC bytes by generating syndromes for the received data in each of the sectors. Zero syndromes indicate that no error has been detected in the sector while non-zero syndromes indicate that one or more errors have been detected in the sector. For each of the sectors with non-zero syndromes, error locations and error patterns are determined and based on the error locations and patterns, the detected errors in the sector are corrected.
Hard disk drives implementing the ECC schemes are well known in the art and is described, for example, the following references: U.S. Pat. No. 6,192,499, by Honda Yang and entitled xe2x80x9cDevice and Method for Extending Error Correction Beyond One Sector Timexe2x80x9d and U.S. Pat. No. 6,092,233, by Honda Yang and entitled xe2x80x9cPipelined Berlekamp-Massey Error Locator Polynomial Generating Apparatus and Method.xe2x80x9d In addition, optical disk drives (e.g., CD-ROM, CD-RAM, DVD-ROM, DVD-RAM, etc.) implementing the ECC schemes are also well known in the art and are described, for example, in U.S. Pat. No. 6,457,156, by Ross J. Stenfort and entitled xe2x80x9cError Correction Method and Apparatus.xe2x80x9d These references are incorporated herein by reference.
As the storage device density increases to store more data on a given storage medium, however, more errors will need to be detected and corrected when reading data off the medium. In addition, modern storage devices typically gain performance advantages by reading the data off the medium at a higher data rate. In both instances, more data are read and processed for a given time or more time is required to process the same amount of data. Even in the absence of these factors, it is often desirable to implement a higher correction power in the ECC schemes by increasing the number of errors detected and corrected.
Unfortunately, detecting and correcting more errors require more time to decode errors in ECC data sectors by determining error locations and patterns. For example, in order to detect more errors, more syndromes need to be generated for a given ECC data sector. The generation of more syndromes, in turn, requires more computing resources and/or time to determine the error locations and patterns.
Furthermore, modem ECC decoders typically strive to process error on-the-fly by computing the error patterns and locations for a received ECC data sector within the time to receive the next ECC data sector. In such a circumstance, the time to compute the error locations and patterns is further diminished. As a result, the ECC decoder may not be able to decode the errors for the received ECC data sector within the time to receive the next ECC data sector. In this case, an error event often called xe2x80x9ccorrection overrunxe2x80x9d is generated to suspend reading of the next ECC data sector from a storage medium until the ECC decoder generates the error locations and patterns. Then, the reading of the next ECC data sector resumes by waiting for the storage medium to make another revolution to the beginning of the interrupted sector. Such interruption of data flow thus causes undesirable delays and performance penalties.
One solution implements a very fast ECC decoder to ensure that the worst case buffer access latency is within the allotted time to receive the next ECC sector data. This approach, however, would require complex and expensive hardware resources for implementing the ECC decoder. For the most part, the typical time needed to correct the errors in an ECC data sector is substantially less than the time to read in the next sector. Hence, using such an expensive ECC decoder may not be economically feasible in practice. On the other end of the spectrum, using a slow but relatively inexpensive ECC decoder may lead to frequent crashing of applications when a substantial number of errors are present.
Another approach stores all the data and ECC checkbytes in a buffer as is done in optical drives such as CD-ROM drives. However, ECC checkbytes may use up a large percentage of the buffer so that the data may not be adequately cached. Furthermore, the ECC checkbytes and the sector data for each sector may not be stored in a concurrent manner, thereby requiring additional methods to make the data and ECC checkbytes concurrent. In addition, under this approach, an ECC decoder is forced to fetch both data and ECC checkbytes from the buffer to compute the syndromes for error correction. The data fetch to compute the syndromes often takes up significant amount of bandwidth.
Thus, what is needed is a cost effective device and method that can detect and correct errors on ECC data sectors on-the-fly without interrupting data flow. What is further needed is an extended error correction device and method that can be implemented without integrating costly hardware resources.
Broadly speaking, the present invention fills these needs by providing method and device for extending error correction beyond one sector time. It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, a device, a method, or a computer readable medium. Several inventive embodiments of the present invention are described below.
In one aspect of the invention, the present invention provides a method for detecting and correcting errors in error correction coded (ECC) data sectors. The ECC data sectors are sequentially received from a data storage medium. The method includes the operations of: (a) receiving a current ECC data sector, where the current ECC data sector is an ECC data sector currently being received; (b) while receiving the current ECC data sector, detecting errors in the current ECC data sector by generating a set of syndromes; (c) storing the set of syndromes for the current ECC data sector when the current ECC data sector has been received; (d) repeating the operations (a) through (c) for a next ECC data sector, wherein the current ECC data sector becomes a past sector and the next ECC data sector becomes the current ECC data sector; and (e) while repeating the operations (a) through (c) for the current sector, decoding the errors for the past ECC data sector by accessing the set of syndromes for the past ECC data sector. In a preferred embodiment, erasure information containing one or more bad data byte locations is also generated and stored along with the syndromes for each ECC data sector. Then, the stored erasure information is accessed along with the syndromes for decoding errors in the ECC data sectors.
In another aspect of the invention, the present invention provides a device for detecting and correcting errors in error correction coded (ECC) data sectors. The ECC data sectors are sequentially received as a data stream from a data storage medium. The device includes a buffer and an error detection and correction (EDAC) circuitry. The buffer is arranged to sequentially receive and store the ECC data sectors from the data storage medium. The EDAC circuitry is arranged to sequentially receive the ECC data sectors for sequentially generating a plurality of syndrome sets for the ECC data sectors with one syndrome set per ECC data sector. Each syndrome set includes a plurality of syndromes. The EDAC circuitry sequentially stores the syndrome sets into the buffer while accessing the stored syndrome sets sequentially to decode errors in the associated ECC data sectors. In one embodiment, the device generates erasure information, which is stored along with the syndromes for each ECC data sector. The EDAC circuitry accessed the stored erasure information and the syndromes for decoding errors in the ECC data sectors.
In yet another aspect of the invention, the present invention provides a method for detecting and correcting errors in error correction coded (ECC) data sectors. The ECC data sectors are sequentially received as a data stream from a data storage medium. The ECC data sectors are sequentially received and stored from the data storage medium. While receiving the ECC data sectors, a plurality of syndrome sets is sequentially generated for the ECC data sectors with one syndrome set per ECC data sector. Each syndrome set includes a plurality of syndromes. The generated syndromes sets are sequentially stored. Then, the stored syndrome sets are accessed to decode errors in the associated ECC data sectors. In one embodiment, erasure information is also generated and stored in the buffer along with the syndromes for each ECC data sector. The stored erasure information is then accessed along with the syndromes for sequentially decoding errors in the ECC data sectors.
The present invention thus generates syndromes as the ECC data sectors are received and stores the generated syndromes in a buffer. The stored syndromes can then be fetched from the buffer for performing error correction. The buffer is updated with a new set of syndromes for a new ECC data sector after the new sector has been received. By thus storing the generated syndromes for each of the sectors, the present invention effectively decouples decoding of errors from the one sector time limitation without substantially affecting buffer performance and without the increased cost associated with a faster EDAC circuitry.
Specifically, decoding of errors need not occur within the time to receive the next ECC data sector since the syndromes are stored and accessed for on-the-fly error correction. For optical disk drives, in particular, buffer performance is not degraded since data need not be fetched to compute syndromes. Furthermore, by using the buffer in a circular buffer configuration to overwrite previously accessed syndromes, the buffer space for storing the syndromes can be kept to a minimum. Accordingly, the present invention provides significant savings in cost while providing a performance boost at the same time.