1. Field of the Invention
This invention relates to data recovery in a storage device and more particularly relates to write recovery of erroneous data in the storage device.
2. Description of the Related Art
A modern computer utilizes a storage system consisting of a storage controller and at least one attached storage device. Typically, the storage device is a hard disk drive (“HDD”), floppy disk, optical disk, tape drive, micromechanical device, solid state disk, or the like. The storage controller can be a simple host bus adapter, or a sophisticated redundant array of independent disks (“RAID”) controller, managing the operation of non-redundant RAID and/or redundant RAID storage devices. The computer may be a personal computer (“PC”), a server or client computer, a network computer, and/or another type of general/special-purpose computer.
Regardless of the type of computer or the type of storage controller to which a storage device such as HDD is attached, the HDD performs the basic function of reading data and writing data for the computer via a READ and WRITE command, respectively, from the storage controller. Without the explicit knowledge of the availability of upstream hardware and software capabilities and the type of applications the data is used in, the HDD invariably strives to attain independently the highest degree of user data protection possible. Specifically, the HDD may employ a near-worst case approach to protecting data. This near-worst case approach is justifiable because, besides a redundant RAID storage system wherein unreadable data from one member HDD can be regenerated from other member HDDs of the array, a vast number of computers, especially PCs, each employ a single HDD, a non-redundant RAID array or some other drive configuration without redundancy for cost/performance or other reasons. It is well known that failure to deliver data reliably by the HDD when requested may have extremely serious consequences for a variety of computer applications, such as financial calculations, manufacturing processes, and health and environmental controls, to name a few.
When a READ or WRITE command is completed with no error, the HDD, conforming to Small Computer System Interface (“SCSI”) standards, for example, returns a GOOD status. For performance reasons, data is typically written on the HDD without an immediate readback to verify that the data is correctly written. Unless a failure occurs during the write operation, the write is considered successful, with a GOOD status returned from the HDD. Unfortunately, as well known to those skilled in the art, even though no write error was encountered at a certain data block location on the HDD, it is common for a subsequent read operation at the data block to be unsuccessful. As HDDs increase recording density, pushing the technology to its limits with tracks and sectors spacing ever closer and more data on the HDDs being accessed, higher spindle spin rates, and weaker signals to avoid interference, the probability of read error occurrences increases dramatically.
HDDs generally utilize a number of advanced techniques to manage errors while reading data from the media. The basis of read error detection and correction in an HDD is the inclusion of a powerful error correction code (“ECC”) consisting of a number of bits appended to the end of a fixed-length block of data in each disk sector. Errors corrected by the ECC on-the-fly are not considered real read errors. When a data block encounters an error that fails to be readily corrected by the ECC, the HDD enters into an automatic retry.
Basically, there are two types of read errors: recovered and unrecovered. Recovered read errors are errors that require re-reads to retrieve the data without error. Depending upon the nature of the error detected and the manufacturer's preference, one or more methods are available to invoke during read retries, such as reloading read channel registers to calibrated values, using various off-track offsets and retrying the read, using a fixed gain while retrying, margining the error tolerance of sync mark detection, applying an advanced software ECC algorithm, and switching the bias current of a certain head between retries. Some of the methods are time-consuming and complex. The erroneous data may be recovered after applying those techniques. Unrecovered read errors are those that are not correctable using the ECC or retries within the retry limits specified by the using computer even after sophisticated correction methods are applied.
Recurring data error activity at the same physical location is an indication of a problem. The problem can be due to magnetic damage or a media defect. Magnetic damage is a defect in the bit pattern written to the media. A media defect such as a pit, scratch, or thin spot is physical damage to the recording capability of the media. Recovery action to correct these types of defects differs. In both cases, however, the error can be corrected without replacing the HDD unit. For magnetic defects, a rewrite at the failing location address may be all that is required. For media defects, the data block written in the defective physical sector is reallocated to a spare sector, usually defect-free. HDDs determine the need to either rewrite or reallocate during a read error recovery.
All HDDs have spare sectors located across the drive. Defective data blocks such as those containing marginally recovered read errors found during normal use of the HDD can be reallocated by the HDD automatically during a read operation if allowed by the computer. Prior to the reallocation, the HDD may, for example, first verify that the original sector location is defective with multiple tests involving writes and verifies using the recovered data. If those tests fail, the HDD then reallocates the recovered data to a new location using one of available spare sectors and stores the recovered data therein. If the automatic reallocation during a read is not allowed, the HDD recommends that the storage controller initiate the reallocation. For unrecovered read errors, the HDD generally recommends that the storage controller reallocate the defective blocks, since the HDD does not have valid replacement data for those defective blocks.
The logical block address (“LBA”) addressing scheme used for specifying locations of data blocks stored on the HDD is also used for reallocating data blocks. The LBA addressing scheme describes the disk as a linear, consecutively-numbered set of logical data blocks. Each of those consecutive numbers is known as the LBA of the data block. The HDD maps a requested LBA to a particular cylinder-head-sector (“CHS”) address for accessing the data block on the media in response to a READ or WRITE command. For reallocated defective blocks, the HDD maintains a list of each of their LBAs and its reassigned CHS address. The operation of reallocating a defective block to a spare block is also commonly referred to as block reassignment or simply reassignment. The selection of an available spare sector for reallocation is made by the HDD regardless of whether the reassignment is initiated by the HDD or the storage controller.
The storage controller may be configured to initiate reassignment operations. When recovered errors are reported during a read, some storage controllers take proper recovery action in accordance with the sense data received, such as performing a reassignment as recommended. Details of appropriate error recovery procedures will be discussed later. Unfortunately, implementation of other storage controllers for response to the sense data recommending a reassignment varies from one manufacturer to another. For recovered read errors, certain storage controllers may not perform reassignments as their developers figure that the data transferred to the read buffer by the HDD in each case is already good. Furthermore, in some cases in which the HDD is not allowed by the computer to report recovered errors, the storage controller will certainly not reassign any marginally recovered data blocks as the HDD would have explicitly recommended had reporting of recovered errors been allowed.
Normally, when a data block needs to be updated, a WRITE command is issued by the storage controller, without prior knowledge or retained memory that the block on the HDD may have been determined by the HDD as defective (marginally recoverable or unrecoverable, but not yet reassigned for any of the aforesaid reasons). Merely rewriting a previously found erroneous block may not resolve the problem, especially when a reassignment for that block is already recommended by the HDD. On a write request by the host computer, the storage controller will usually issue a WRITE command to the HDD. Unless a write error occurs, the storage controller will not normally verify the data written or perform a block reassignment. As a result, the data may remain erroneous at the same defective physical location on the HDD. Therefore, in some cases, data at a defective location may never be recovered even when updated data or replacement data becomes available for a write or relocation.
From the foregoing discussion, it should be apparent that a need exists for an apparatus, system, and method that allow the HDD to enforce a write recovery procedure to be performed by the storage controller, including a reassignment to a spare sector recommended for a defective data block. Beneficially, such an apparatus, system, and method would increase computer system performance by avoiding futile re-read attempts.