Redundant array of inexpensive (or independent) disks (RAID) is an evolving data storage technology that offers significant advantages in performance, capacity, reliability, and scalability to businesses that have demanding data storage and access requirements. In 1988, a paper was published by Patterson, Gibson, Katz, entitled “A Case for Redundant Arrays of Inexpensive Disks (RAID),” International Conference on Management of Data, pages 109–116, June 1988. This paper described how RAID data storage would improve the data input/output (I/O) rate over that of a comparable single disk data storage system, and how RAID data storage would provide fault tolerance, i.e., the ability to reconstruct data stored on a failed disk.
RAID data storage systems are configured according to any one of a number of “RAID levels.” The RAID level specify how data is distributed across disk drives in the array. In the paper noted above, the authors describe RAID levels 1 through 5. Since the publication of the paper mentioned above, additional RAID levels have been developed.
RAID data storage systems include an array of disk drives. These disk drives may include magnetic or optical data storage disks, or combinations thereof. RAID data storage systems may also include a RAID controller, although the term RAID data storage system should not be limited to a system that includes a RAID controller. The RAID controller is an electronic circuit or series of electronic circuits that provides an interface between a host computer and the array of disk drives. From the viewpoint of the host computer, the RAID controller makes the array of disk drives look like one virtual disk drive that is very fast, very large, and very reliable.
RAID levels are typically distinguished by the benefits provided. These benefits include increased I/O performance and fault tolerance as noted above. Increased performance is achieved by simultaneous access to multiple disk drives which result in faster I/O and faster data access requests. Fault tolerance is typically achieved through a data recovery method in which data of a disk drive can be reconstructed in the event of failure of the disk drive. Fault tolerance allows the disk drive array to continue to operate with a failed disk drive.
Data recovery is accomplished, in many RAID levels, using error correction data. The error correction data is typically stored on one or more disk drives dedicated for error correction only, or distributed over several disk drives within the array. When data on a disk drive is inaccessible due to, for example, hardware or software failure, the data sought can be reconstructed using the error correction data and data from other disk drives of the array. Reconstruction can occur as data is requested. Further, reconstruction can occur without a substantial degradation in system I/O performance. RAID controllers may reconstruct all data of a failed disk drive onto a spare disk drive, so that the data storage system can survive another disk drive failure.
RAID data storage systems employ data interleaving in which data is distributed over all of the disk drives in the array. Data interleaving usually takes form in data “striping” in which data to be stored is broken down into components called “stripe units” which are then distributed across the array of disk drives. A stripe unit is typically defined as a bit, byte, block, or other unit of data. A “stripe” is a group of corresponding stripe units. Each disk drive in the array stores one stripe unit from each stripe. To illustrate, RAID level 5 uses data interleaving by striping data across all disk drives. RAID level 5 also distributes error correction data across all disk drives.
Reconstruction of data in RAID data storage systems using error correction data is a procedure well known in the art. Error correction data usually takes form in parity data. Parity data for each stripe is typically calculated by logically combining data of all stripe units of the stripe. This combination is typically accomplished by an exclusive OR (XOR) of data of the stripe units. For a RAID level 5 data storage system having N disk drives, N−1 of the N disk drives will receive a stripe unit of the stripe, and the Nth disk drive will receive the parity data for the stripe. For each stripe, the disk drive receiving the parity data rotates such that all parity data is not contained on a single disk drive. I/O request rates for RAID level 5 are high because the distribution of parity data allows the system to perform multiple read and write functions at the same time.
As noted, should a disk drive fail on a RAID data storage system, the RAID controller can reconstruct data using corresponding parity data. Using parity data reconstruction algorithms well known in the art, data of a stripe unit in the failed disk drive can be reconstructed as a function of the parity data and data of stripe units corresponding to the stripe unit of the failed disk drive.
Disk drive failure is one problem in RAID data storage systems. Another problem relates to data corruption. Data corruption has many sources. To illustrate, suppose the RAID controller of a data storage system receives new data Dnew from a computer system coupled thereto. This new data Dnew is to replace existing data Dold of a stripe unit B1 of a stripe S. Before the RAID controller overwrites the existing data Dold of the stripe unit B1, the RAID controller must update the exiting parity Pold for stripe S. To this end, the RAID controller reads the existing parity Pold for stripe S. Thereafter, the RAID controller generates a new parity Pnew for stripe S as a function of the existing parity Pold. Thereafter RAID controller successfully overwrites the existing parity Pold for stripe S with the newly generated parity Pnew.
Unfortunately, because of improper operation of hardware or software, existing data Dold of the stripe unit B1 is not overwritten with the new data Dnew. For example, the new data Dnew is inadvertently written to a disk track adjacent to the disk track that stores the existing data Dold of the stripe unit. When this happens, two tracks of the disk drive contain invalid or corrupted data. But the RAID controller believes the existing data Dold of the stripe unit has been properly overwritten with the new data Dnew. If the RAID controller receives a subsequent request from the computer system to read data of stripe unit B1, Dold will be returned rather than Dnew. Other manifestations of data corruption can be caused by the failure of software or hardware to write Dnew at all, or write data that got corrupted sometime during processing and transmission from the computer system to the disk media or from the disk media to the computer system.
The computer system requesting the data may perform a checking algorithm and recognize that the returned data is not what is expected. If the computer recognizes that the data returned is invalid, the computer system may send a second request for the same data. If data is correct on the disk media and got corrupted during transmission to the computer system, this second request (re-read) may return correct data. Also, if the RAID system stored data as RAID-1 (mirroring), it has a second complete copy of the data (alternate mirror) and may be able to obtain correct data from the alternate mirror. Unfortunately, a parity RAID controller has no readily available alternate copy of the data. If data is corrupted on the disk media in one of the ways described earlier, the controller will once again return corrupted data in response to the second request.