1. Technical Field
This invention relates to storage systems and particularly to disk storage systems for electronic data storage.
2. Description of the Related Art
Due to advances in recording technology, the capacity of hard drives is doubling annually. The areal density is shortly expected to reach 100 Gbits per square inch and a 3.5″ drive will be capable of storing 300 GB.
The reliability of a hard drive is specified in terms of its MTBF and the unrecoverable error rate. Typical specifications for current server-class drives are 1,000,000 hours and 1 unrecoverable error in 1015 bits read. However, increases in areal density make it harder to maintain reliability due to lower flying heights, media defects, etc.
RAID (Redundant Array of Independent Disks) arrays (e.g., RAID-1 or RAID-5) are often used to further improve the reliability of storage systems. However with high-capacity drives a single level of redundancy is no longer sufficient to reduce the probability of data loss to a negligible level.
It is also possible for a disk drive to occasionally return erroneous data on a read command because a previous write command has not written to the correct location on the recording medium or it failed to record on the medium at all. This may be due to an intermittent hardware failure or a latent design defect. For example, the drive might write the data to the wrong LBA (Logical Block Address) due to a firmware bug, or it may write off track, or it may fail to write at all because a drop of lubricant (commonly referred to as ‘lube’) lifts the head off the disk surface.
There is increasing interest in using commodity drives such as Advanced Technology Attachment (ATA) drives in server applications because they are about 3 times cheaper in terms of cents/MB. However these drives were originally intended for intermittent use in PC's and so they may be less reliable than server-class drives. Also ATA drives only support 512-byte blocks and so block-level LRC (Longitudinal Redundancy Check) cannot be used to detect data corruption.
For a single disk drive the controller could read back each block and verify it just after it has been written.
Any type of redundant RAID (Redundant Array of Independent Disks) array could be implemented in a way that allows the read data to be checked. For example, with a RAID-5 array the controller could check that the read data is consistent with the other data drives and the parity drive.
However, these approaches have the disadvantage that both methods drastically reduce the overall throughput in terms of I/O (Input/Output) commands per second, since the first method requires an extra revolution and the second method requires several drives to be accessed for each read command).
A need therefore exists for detection of write errors in a storage system wherein the above mentioned disadvantage may be alleviated.