Over the years, disk drive performance and reliability has been continually increasing. Today's disk drives are faster, have greater storage capacities, consume less power, and have a longer service life than disk drives from only a few years ago. Despite the many improvements, however, modern disk drives are still prone to mechanical failure. Consequently, mechanisms for protecting against data loss due to disk failures are an essential requirement of modern day computer systems.
To protect against data loss due to disk failures, many system developers implement data storage systems based on a redundant array of independent disks, or RAID. RAID is a category of disk-based storage that employs two or more physical disk drives in combination to create one logical storage device, generally for the purpose of fault tolerance. There are a variety of RAID implementations, referred to as RAID Levels, each with its own particular set of characteristics. The more commonly implemented RAID Levels are selected for their performance and fault tolerance characteristics. In particular, most RAID-based data storage systems include the ability to recover “lost” data by reconstructing the “lost” data utilizing parity data.
For example, FIG. 1A illustrates a data storage sub-system 10 based on a RAID Level 4 implementation. The data storage sub-system 10 includes a RAID disk group with three independent disk drives (e.g., disk 1, 2 and 3) connected to a common RAID controller 12. As illustrated in FIG. 1A, disks 1 and 2 are dedicated to storing “stripes” of data, while disk 3 is dedicated to storing “stripes” of parity data. Accordingly, during a write operation, data is written to disks 1 and/or 2, while parity data is written to disk 3. If any one of the three disks fails, the data on the failed disk can be reconstructed using data from the other two disks.
The process of reconstructing “lost” data by combining data and parity data from other disks is generally referred to as data reconstruction. FIG. 1B illustrates a reconstruction operation for the data storage sub-system 10 illustrated in FIG. 1A. In FIG. 1B, disk 2 is shown with several bad disk blocks 14. If an attempt to access the bad disk blocks 14 on disk 2 fails during a read operation, the data from the bad disk blocks 14 on disk 2 can be reconstructed by combining data 16 from disk 1 and parity data 18 from disk 3. Moreover, if disk 2 fails completely, such that no data on disk 2 can be read, then a reconstruction operation can be initiated to reconstruct the entire data contents of disk 2.
In some RAID-based data storage systems, the reconstruction operation may be automated. For example, some RAID-based storage systems include “hot” spare disks that sit idle until needed. When a disk in a RAID disk group fails in some respect, a “hot” spare disk can automatically be swapped to take the place of the failed disk. Accordingly, the data storage system may automatically reconstruct the data from the failed disk and write the reconstructed data to the “hot” spare disk. The entire process happens seamlessly in the background while the data storage system continues to process read and write requests.
One problem with this solution is that it may be inefficient in the sense that the entire data contents of the failed disk are reconstructed, despite the possibility that only a small portion of data on the failed disk cannot be read directly from the disk. Because modern disk drives have relatively large storage capacities (e.g., 500 Gigabytes (GB)), reconstructing the entire data contents of a failed disk can take a long time and place a heavy computational burden on the data storage system. Moreover, the computational burden and the time it takes to reconstruct the data on a failed disk increases as the number of disks in the RAID disk group increases. Furthermore, the burden placed on the data storage system during the reconstruction operation causes system performance degradation. For example, it may take longer for the data storage system to service client-initiated read and write requests while the data from the failed disk is being reconstructed. Finally, the reconstruction operation may increase the likelihood that a second disk in the RAID disk group will fail—a situation referred to as a double disk error—thereby resulting in a situation in which data cannot be reconstructed.
Another problem with the solution described above is that the data on the failed disk may become stale before it is reconstructed and written to the spare disk. For example, the storage system may write new data to the spare disk while the data from the failed disk is being reconstructed. If data stored on the failed disk changes after the disk fails but before the data is reconstructed, the storage system may overwrite the new data on the spare disk with old data from the failed disk.
Other RAID-based storage systems depend on predictive failure analysis to predict when a disk is going to fail. For example, a RAID-based storage system may identify a disk as being likely to fail, and in response, initiate a procedure to reconstruct or copy data to a spare replacement disk. However, such systems generally depend on the ability of the failing disk to continue to service client-initiated disk access requests in a timely manner, which may not be possible for some portions of the disk. Furthermore, the efficiency of such systems is highly dependent on the ability to accurately detect when a disk is actually going to fail. For example, system resources may be wasted each time a disk is incorrectly identified as likely to fail.
Although the example storage system described above was shown to be using RAID Level 4, other RAID approaches have analogous drawbacks and limitations. For example, the problems described above exist for RAID 5, in which parity is distributed over all of the disks in a RAID array.