A disk drive is a digital data storage device that stores digital information within concentric tracks on a storage disk. In magnetic disk drive systems, the storage disk is coated with a magnetic material that is capable of changing its magnetic orientation in response to an applied magnetic field. During operation of a disk drive, the disk is rotated about a central axis at a substantially constant rate. To read data from or write to the disk, a magnetic transducer is centered above a desired track of the disk while the disk is spinning. Writing is performed by delivering a write signal having a variable current to the transducer while the transducer is held close to the spinning track. The write signal creates a variable magnetic field at a gap portion of the transducer that induces magnetic polarity transitions into the desired track which are representative of the data being stored.
Reading is performed by sensing the magnetic polarity transitions on the rotating track with the transducer. As the disk spins below the transducer, the magnetic polarity transitions on the track present a varying magnetic field to the transducer. The transducer converts the varying magnetic field into an analog read signal that is then delivered to a read channel for appropriate processing. The read channel converts the analog read signal into a properly timed digital signal that can be recognized by a host computer system.
The transducer can include a single element, such as an inductive read/write element, for use in both reading or writing, or it can include separate read and write elements. Transducers that include separate elements for reading and writing are known as “dual element heads” and usually include a magnetoresistive (MR) read element for performing the read function. Dual element heads are advantageous because each element of the transducer can be optimized to perform its particular function. For example, MR read elements are more sensitive to small variable magnetic fields than are inductive heads and thus can read much fainter signals from the disk surface. MR elements, however, are not capable of writing to the disk surface. Because MR elements are more sensitive, data can be more densely packed on the surface of the disk.
There are many variables that can affect the read performance of a magnetic disk drive. One of the variables, for example, is the flying height of the transducer above the disk surface during the write operation which wrote the data to the disk surface. If the transducer is not within a specific flying height range during the write operation, the number of read errors increases significantly. These type of read errors are commonly referred to as high fly write errors. Another variable that affects read performance is the strength and position of the magnetic polarity transitions on the surface of the disk. If the transitions are weak or the data is not properly “centered” on the track, then the signal to noise ratio (SNR) of the analog read signal will be correspondingly low and poor read performance may result. These type of errors are commonly referred to as track misregistration (TMR) or offtrack errors. Another variable that can affect the read performance of the disk drive is the presence of foreign particles or other aberrations on the surface of the disk that modulate the analog read signal when passed by the transducer. Signal distortions created by such particles are known as thermal asperities. Other types of errors may also be present, which are well understood by those of skill in the art.
Disk drives typically have error recovery routines which can help a drive to recover from errors (e.g., those errors mentioned above) and to read data from the disk surface. The type of error recovery routine used for recovering data will depend upon the type of error present. Generally, when recovering errors, a disk drive will follow a preset error recovery table. This error recovery table contains error recovery steps which are often successful at recovering particular errors. Upon detection of an error, the disk drive enters an error recovery routine, where the error recovery steps are initiated in a preset order contained in the error recovery table. If an error recovery step is not successful in recovering the error, the disk drive will move to the next error recovery step in the error recovery table and attempt to recover the error according to that step. This continues until either the error is recovered, there are no more steps in the error recovery table, or until the host reaches a time limit for receiving the data. When the host reaches this time limit, the disk drive will receive notification and discontinue error recovery.
For example, the error recovery table may contain an entry for a high fly write error as the first error recovery step in the error recovery table. The disk drive, upon detecting an error, will enter the error recovery routine, and try this error recovery step first. If the high fly write error recovery step is successful in recovering the data, the disk drive exits the error recovery routine, delivers the data, and continues on as normal. If the high fly write error recovery step was not successful, the next error recovery step in the error recovery table would be attempted. As mentioned above, this continues until the error is recovered, or until a maximum retry limit is reached which corresponds to the number of entries in the error recovery table. The error recovery table typically contains more entries than can be attempted before the host will reach a time limit for receiving the data. However, if all of the error recovery steps in the error recovery table are attempted with no successful recovery, the disk drive will report a fatal error.
The error recovery table is typically generated such that the most common error in a population of disk drives will be the first error recovery step attempted. Likewise, the second most common error in the population of disk drives will be the second error recovery step, and so on. This order of error recovery steps in the error recovery table is preset, and disk drives follow the preset order when doing an error recovery routine. As is understood in the art, when attempting to recover from an error, an error recovery step requires the disk to rotate a full revolution, and the data is attempted to be read using the error recovery step. If an error recovery step is not successful, the disk drive moves to the next error recovery step in the error recovery table, waits for the beginning of a revolution of the disk, and attempts to recover the error using the next error recovery step. Thus, the time required to reach a step in the table increases the further down the table the error recovery routine needs to go to recover an error. Increased time to recover from an error reduces the amount of data that is delivered from the disk drive, thus reducing the transfer rate of a disk drive compared to a disk drive which has fewer errors, or has a reduced time to recover from errors. As will be understood, an important performance factor in disk drives is the transfer rate of the drive. Thus, it would be beneficial to reduce error recovery time.
The order of error recovery steps in an error recovery table is typically derived from extensive testing of a sample population of disk drives, and may be adjusted as more drives are produced when a different order of error frequency develops. Thus, for drives which are typical of the population of disk drives, the time to recover from an error is reduced because the most common errors encountered in the population are the first error recovery steps attempted by the disk drive. This testing of disk drives to determine the order of error recovery steps in an error recovery table can take a significant amount of resources to complete. Thus, it would be beneficial to reduce the amount of resources required to enhance the error recovery routine.
Even with an error recovery routine which first attempts to recover common errors with respect to the population of disk drives, the error recovery routine may not be enhanced for certain drives. This is because some disk drives are outliers with respect to the rest of the population. These outlier drives do not share the same error occurrence frequency as the rest of the population, which results in increased error recovery time as compared to a disk drive that is typical of the population of disk drives. These outlier drives may have a relatively large amount of errors not typically encountered by the population of disk drives in general because a number of factors, such as non-uniformity in the magnetic media. These types of read errors can generally be recovered, although the error recovery step which can recover the error may be relatively far down the error recovery table. Thus, the error recovery table which is used for the entire population of disk drives may not be as efficient for these outlier drives, which can result in increased time to recover from errors.
Furthermore, as the bits per inch (BPI) and tracks per inch (TPI) increase on hard disk drives, error recovery becomes less predictable, because these localized areas of non-uniformity in the magnetic media have a more significant impact on the read signal. This traditionally has been compensated for by requiring tighter design limits on the distribution of materials which are used for the magnetic media. However, these design limits are becoming more difficult to control, and the material distribution in the magnetic media is playing an increased role in determining the frequency and type of error which occurs in an individual drive. These factors result in many more disk drives being outliers with respect to the population of disk drives. Accordingly, it would be advantageous to have an enhanced error recovery routine for outlier drives.
Furthermore, many original equipment manufacturers (OEMs) which incorporate disk drives into their product are requiring tighter performance standards for disk drives. This often results in OEMs rejecting disk drives which have a relatively low transfer rate. Low transfer rates are often the result of increased error recovery time within the disk drive. In many cases, increased error recovery time is a result of the drive having to perform many steps in the error recovery table before getting to the error recovery step which recovers the error. This often happens because a portion of the disk surface has a non-uniformity magnetic media, which is magnified as the BPI and TPI increase. Thus, it would be advantageous to have an error recovery routine which can improve transfer rates so that OEM performance standards may be met by more drives.
Accordingly, it would be advantageous to have an error recovery scheme which can (1) reduce the amount of time required to enhance the error recovery routine, (2) reduce the error recovery time for disk drives which are outliers with respect to the rest of the population, and (3) improve the transfer rate of a disk drive.