A hard error in a hard disk drive (HDD) occurs when the data in a sector cannot be recovered despite repeated attempts. Hard errors are especially important in enterprise storage applications. For example, in RAID 5 systems the most likely mechanism for data loss is that a hard drive fails followed by a subsequent hard error on one of the other (redundant) drives during the rebuild process. For this reason, hard error rate is carefully monitored during the process of qualifying a new enterprise HDD product.
Soft errors, i.e., misreads due to poor signal-to-noise ratio or disturbances in the read process, can usually be eliminated by repeated re-reads. In contrast, hard errors are usually caused by problems which are repeatable from read to read. Sources of hard errors include scratches and other media defects or disturbances (collectively, “defects”), such as a head-disk contact, occurring when the sector was written. Defects tend to produce bursts of errors which can be corrected very efficiently by the error correction code (ECC) of the HDD. As an example of the power of ECCs, in a HDD with 4 kB sector formats, bursts of errors up to almost 3200 bits in length in the data field can be corrected, assuming that almost all of the ECC redundancy bytes can be used for erasure correction as opposed to error correction.
As understood by the present invention, each data storage sector of a HDD begins with a preamble consisting of a sync field and one or two sets of sync bytes. The preamble is used in accordance with HDD principles known in the art to coordinate proper reading of the ensuing data field of the sector. Accordingly, if a defect destroys both sets of sync bytes or a large proportion of the sync field then the data in the main body of the sector cannot be read reliably. This means that a relatively small defect, if it occurs in the wrong location, i.e., in the preamble, can cause a hard error that cannot be corrected by the ECC.
As further understood herein, in present 512 B sector formats the likelihood of sector failure due to a defect compromising the preamble is less than the likelihood of sector failure due to a defect compromising the main data field to the extent that it overwhelms the capacity of the ECC to correct it. In 4 kB sector formats the ECC is more robust than in 512 byte formats, meaning that the likelihood that a defect in the main data field of a 4 kB format sector will overwhelm the ECC is much less than in a 512 byte format sector. As critically observed herein, however, the likelihood that a defect compromises the preamble beyond repair remains almost the same in both 512 byte and 4 kB formats, and thus becomes the dominant mechanism for hard errors particularly in 4 kB formats. That is, for conventional sector formats, even small bursts of errors can cause a sector to fail if the burst occurs around the sync byte at the end of the preamble.
The disclosure below refers to “burst erasure correction power”. As is understood by those skilled in the art, this is an intrinsic property of an error correcting code. Error correcting codes have a fundamental parameter called minimum (Hamming) distance, which is the smallest number of symbols that must change to go from one valid codeword to another. For uncoded data the minimum distance is one since a single symbol of a codeword can be changed to arrive at another codeword, whereas for data with a parity symbol the minimum distance is two, because a data symbol of a codeword can be changed along with the parity symbol to arrive at a codeword with valid parity. The value of the minimum distance in this latter case is the number of redundant parity symbols plus one. This can be proved to be the theoretic maximum value in all cases. Codes that meet this limit are known as Maximum Distance Separable or MDS codes, one example of which are Reed-Solomon codes. In any case, a code with distance 2 T+1 can always correct T or fewer errors. Furthermore, a code with distance 2 T+1 can always reconstruct 2 T or fewer erased symbols. Regardless of how calculated, this latter characteristic, i.e., of erasure correction power, is referred to herein as “burst erasure correction power”.