1. Technical Field
The present invention relates to providing protection from data loss on a storage device. More particularly, some examples of the invention concern providing protection from data loss on a storage device by calculating a parity value as a function of information stored on a plurality of strips on the storage device.
2. Description of Related Art
Important data is often stored in storage devices in computing systems. Because storage devices can fail and data in failed storage devices can be lost, techniques have been developed for preventing data loss and for restoring data when one or more storage devices fail.
One technique for preventing data loss comprises storing parity information on a storage device (such as a disk drive), which is a member of a storage array, and storing data on one or more of the other storage devices in the array. (Herein a disk drive may be referred to as a “disk”.) With this technique, if a storage device fails, parity information can be used to reconstruct the data that was on the failed storage device. Moreover, if sufficient parity information is added to another storage device, the additional parity information may be used to reconstruct data stored on more than one failed storage device. Another technique for preventing data loss, called data mirroring, comprises making a duplicate copy of data on a separate storage device. With this technique, if a storage device fails, data can be restored from the copy of the data.
A Redundant Array of Inexpensive (or Independent) Disks (RAID), may be used to provide a data storage system that has increased performance and capacity. Data mirroring and parity information storage, or a combination of the two, may be implemented on a RAID array to provide data protection. Also, a technique called striping may be utilized, wherein data records and parity information are divided into strips such that the number of strips equals the number of disks in the array. Each strip is written or “striped” to each of the different disks in the RAID array, to balance the load across the disks and to improve performance. A group of strips comprising one pass across all of the drives in a RAID is called a stride. Several RAID protocols have been devised, wherein different mirroring, parity, and striping arrangements are employed. As an example, in a RAID 5 array consisting of six disks, five data strips and one parity strip are striped across the six disks, with the parity information rotated across the disks. The rotation of the parity across the disks ensures that parity updates to the array are shared across the disks. RAID 5 provides a redundancy of one, which means that all data can be recovered if any one and only one of the disks in the array fails.
A type of data loss known as a strip loss can occur during a RAID rebuild after an array has had one or more disk drive failures, when the total number of disk drive failures is equal to the disk drive fault tolerance of the RAID code. For example, with RAID 5, a rebuild of lost data on a spare disk drive may be accomplished as long as no more than 1 disk drive fails. Strip loss occurs during the rebuild of a RAID 5 array if any media error occurs when trying to read the strips in any one of the surviving drives. This is because the rebuild process requires reading each of the strips from the remaining drives and using parity reconstruction to recover the lost data. Because there is no redundancy remaining in a RAID 5 disk array after the first disk drive failure, the media error effectively prevents the full recovery of the original data stride. In higher RAID codes (such as RAID DP and RAID 51), the exposure to a strip loss occurs when 2 or more (e.g. 2 for RAID DP and 3 for RAID 51) disk drive failures have occurred and a media error is encountered during the rebuild on the surviving array disks.
A known solution to this problem is to provide additional RAID fault tolerance by using higher RAID codes. These higher codes require a substantial increase in the number of disk drives, or alternately are achieved at a significant loss in effective capacity. For example, a user may opt to go from a 5 disk RAID 5 array to a 10 disk RAID 51 array wherein the RAID 5 array is mirrored. As another example, the storage efficiency for a RAID 6 array, for the same data storage capacity as a RAID 5 array, is lower than the RAID 5 array because a RAID 6 array requires an additional disk. RAID 6 has an arrangement similar to RAID 5, but requires two parity strips in each stride, to provide a redundancy of two.
Retrieving a strip from a drive that may have a poorly written track or some other localized problem (e.g. excessive off-track disturbance at a particular physical location on the drive), frequently causes the device adapter to resort to a preemptive reconstruct of the data, and often requires an undesirably long period of time. A preemptive reconstruct occurs when the RAID adapter times out the target disk drive for being too slow in its data recovery process (DRP) attempts. The RAID adapter may then reconstruct the target strip using the remaining array members and parity. This reconstruction of the target strip requires reading each strip in the same stride as the target strip, from the other array disk drives, and XORing them to recover the target strip. This reconstruction can take a significant amount of time to complete.
In the extreme case where a drive repeatedly times out in attempting to read from a particular location, the adapter or host may permanently fence the drive from the array and request service to replace it with a spare disk drive. In that situation the array is exposed to strip loss until the spare drive is brought on line and the rebuild is successfully completed.
A known solution to the exposure to possible strip loss is to use a mirror scheme to allow the target strip to be read quickly from the mirror image, in lieu of reconstructing the target strip as described above. However, mirroring is accomplished at the expense of halving the effective capacity of the RAID array, or equivalently, doubling the cost of storage.
In summary, known techniques for recovering or reconstructing unreadable target strips often have significant shortcomings.