1. Field
The disclosure relates to a method, system, and article of manufacture for the detection and recovery of dropped writes in storage devices.
2. Background
Write operations that write data to disk drives may fail intermittently or persistently. To detect such failures, certain drives use head read and write preamplification circuits that detect if the write current is below a certain threshold. The added complexity of the detection circuitry may present reliability problems. Additionally, the detection threshold setting may not ensure the detection of all write errors.
Certain implementations may attempt to detect that a disk drive has a dropped write problem by periodically moving the actuator to a reserved area of the disk, and subsequently writing and verifying what was written by each head. This verification of the writeability for all heads may be referred to as a persistent problem self test (PPST). This mechanism detects the dropped writes only if the write problem is persistent, i.e., the drive that is dropping writes continues to drop all subsequent writes involving the bad head(s). The PPST verification mechanism is not fully effective in detecting intermittent dropped writes. Additionally PPST verification does not allow the recovery of data that was not written due to the dropped writes between successive PPST writeability verifications. Furthermore, if the frequency of PPST verifications is increased to minimize the amount of data corruption, the input/output (I/O) performance may degrade to an unacceptable level.
In certain implementations, the PPST verifications may be augmented by retaining all of the writes in a cache between successive PPST verifications. If an erroneous write is detected the data is recovered directly from the cache. This mechanism does not protect against intermittent dropped writes, as intermitted dropped writes may not be detected by periodic checks of the heads. Furthermore, a fairly substantial and potentially expensive dedicated cache may be needed to reduce performance degradations due to the overhead of the PPST verifications. The cache needed to capture all of the writes increases with the increase in the time interval between the PPST verifications.
Certain implementations that provide protection against both intermittent and persistent dropped writes may perform write verification for each write operation, wherein each time a write is performed the disk drive completes a revolution and reads the just written data and compares the just written data to the data in the write buffer. While this guarantees no loss of data, it is adds to the latency of the drive, and the resulting I/O performance may be unacceptable.