Magnetic disk drives store data in sectors that are discreet portions of surface space on the disk. Latent sector errors are magnetic disk drive errors which are failures to correctly read data from a sector disk. These latent sector errors have a number of causes, including undetected write failures (such as “high fly” writes where the disk head was not close enough to the sector to correctly encode the data), physical damage to the disk medium (such as scratches) and deterioration of the disk medium or disk head. In redundant storage systems, latent sector errors reduce data durability because they are not typically detected until data needs to be read from the disk sector and the latent error is detected.
One problem is that in a data-redundant system where data is typically only stored in two locations, the failure to read data in one location may be the only time the second backup storage location is checked. Thus, in a write dominated storage system, the common case of reads is re-replication after the failure of a single replica. The presence of an undetected latent sector error during re-replication can accordingly cause data loss.
To reduce data loss in most disk storage systems, the magnetic disks periodically “scrubbed,” e.g., intentionally accessed, to ensure that data on disks is still readable. The scrubbing process is a simple read (or equivalent SCSI command) to pull the data off the disk. Scrubbing strategies include a simple linear read of the disk, staggered strategies and strategies adaptive to the arrival time of disk failures. However, disk scrubbing costs disk head movement, disk wear, and disk I/O bandwidth, and negatively effects both disk lifetime and overall storage system performance.