1. Field of the Invention
The present invention is related to a mass storage device and more particularly to failure recovery and data regeneration during failure recovery in disk arrays.
2. Background Description
An array of disks, and in particular a disk array referred to as a Redundant Array of Independent Disk (RAID) are well known in the art. Disk arrays such as RAIDs increase data availability and storage capacity, to improve system performance and flexibility, and for improved data protection. Drives may be grouped within the RAID and normally, the drives are aggregated and appear as a single storage device to a host data processing system.
When one disk of the RAID fails, data from that failed disk must be regenerated, e.g., rebuilt using error correction information from the remaining disks in the group. Typically the regenerated data is written to a new, replacement disk. Many RAID systems support what is known as hot sparing, where a spare drive is included (e.g., within the drive group) for temporarily swapping for a failed disk drive. The hot spare drive provides a temporary patch and holds regenerated data until the failed disk drive is physically replaced. Then, the regenerated data is transferred to the replacement disk and the spare disk is freed for subsequent hot sparing.
Consumer product type disk drives, e.g., Parallel ATA (PATA) and Serial ATA (SATA), are used to reduce overall cost of lower cost RAIDs. While using these relatively cheap drives reduces RAID cost, these cheaper drives fail more frequently and increase the risk of losing data. As the number of independent drives in a particular RAID increases, the likelihood of a drive failing increases linearly with the number of drives. RAID algorithms that protect against data loss from a single failing disk can mitigate the effects of a single disk failure. Another approach is to create exact copies of data on multiple disks, also known as data mirroring. Data mirroring has increased data safety, but at the cost of significantly reduced RAID storage capacity, i.e., reduced by a factor equivalent to the number of copies. Various encoding schemes have been used to reduce the redundancy required for data protection and, thereby, increase RAID storage capacity. These typical solutions not only increase RAID cost, perhaps to completely offset any cost reduction from using cheaper drives and, further, impair RAID performance. Additionally, these cheaper drives have high enough failure rates that a second disk may fail before a first failing drive is replaced and rebuilt. A double disk fail in a typical low cost RAID, very likely, could not be recovered.
Thus, there is a need for a low cost, transparent failure recovery mechanism that reduces the window of vulnerability to data loss in RAIDs.