Redundant array of inexpensive (or independent) disks (RAID) is an evolving data storage technology that offers significant advantages in performance, capacity, reliability, and scalability to businesses that have demanding data storage and access requirements. In 1988, a paper was published by Patterson, Gibson, Katz, entitled “A Case for Redundant Arrays of Inexpensive Disks (RAID),” International Conference on Management of Data, pages 109–1116, June 1988. This paper laid the foundation for use of RAID data storage that would not only improve the data input/output (I/O) rate over that of a comparable single disk data storage system, but would also provide fault tolerance, i.e., the ability to reconstruct data stored on a failed disk.
RAID data storage systems are configured according to any one of a number of “RAID levels.” The RAID levels specify how data is distributed across the disks in the array. In the paper noted above, the authors describe RAID level 1–RAID level 5. Since the publication of the paper mentioned above, additional RAID levels have been developed.
RAID data storage systems include an array of data storage disks. These data storage disks may take form in magnetic or optical data storage disks, or combinations thereof. RAID data storage systems may also include a RAID controller, although the term RAID data storage system should not be limited to a system that includes a RAID controller. The RAID controller is an electronic circuit or series of electronic circuits that provides an interface between a host computer and the array of disks. From the viewpoint of the host computer, the RAID controller makes the array of disks look like one virtual disk that is very fast, very large, and very reliable.
RAID levels are typically distinguished by the benefits included. These benefits include increased I/O performance and fault tolerance as noted above. Increased performance is achieved by simultaneous access to multiple disks which result in faster I/O and faster data access requests. Fault tolerance is typically achieved through a data recovery method in which data of a disk can be reconstructed in the event of failure of the disk. Fault tolerance allows the disk array to continue to operate with a failed disk.
Data recovery is accomplished, in many RAID levels, using parity data. The parity data is typically stored on a dedicated disk, or distributed over several disks within the array. When data on a disk is inaccessible due to, for example, hardware or software failure, the data sought can be reconstructed using the parity data. Reconstruction can occur as data is requested. Reconstruction can occur without a substantial degradation in system I/O performance. RAID controllers may reconstruct all data of a failed disk onto a spare disk, so that the data storage system can survive another disk failure.
RAID data storage systems employ data interleaving in which data is distributed over all of the data disks in the array. Data interleaving usually takes form in data “striping” in which data to be stored is broken down into components called “stripe units” which are then distributed across the array of disks. A stripe unit is typically defined as a bit, byte, block, or other unit of data. A “stripe” is a group of corresponding stripe units. Each disk in the array stores one stripe unit from each stripe. To illustrate, RAID level 5 uses data interleaving by striping data across all disks. RAID level 5 also distributes parity data across all disks.
Reconstruction of data in RAID data storage systems using parity data is a procedure well known in the art. Parity data for each stripe is typically calculated by logically combining data of all stripe units of the stripe. This combination is typically accomplished by an exclusive OR (XOR) of data of the stripe units. For a RAID level 5 data storage system having N disks, N−1 of the N disks will receive a stripe unit of the stripe, and the Nth disk will receive the parity data for the stripe. For each stripe, the disk receiving the parity data rotates such that all parity data is not contained on a single disk. I/O request rates for RAID level 5 are high because the distribution of parity data allows the system to perform multiple read and write functions at the same time.
As noted, should a disk fail on a RAID data storage system, the RAID controller can reconstruct data using corresponding parity data. Using a parity data reconstruction algorithm well known in the art, data of a stripe unit in the failed disk can be reconstructed as a function of the parity data and data of stripe units corresponding to the stripe unit of the failed disk.
Disk failure is one problem in RAID data storage systems. Another problem relates to data corruption. Data corruption has many sources. To illustrate, suppose the RAID controller of a data storage system receives new data Dnew from a computer system coupled thereto. This new data Dnew is to replace existing data Dold of stripe unit B1, of stripe S. Before the RAID controller overwrites the existing data Dold of stripe unit B1, the RAID controller must update exiting parity Pold for stripe S. To this end, the RAID controller reads existing parity Pold for stripe S and existing data Dold of stripe unit B1. Thereafter, the RAID controller generates new parity Pnew for stripe S as a function of existing parity Pold, the new data Dnew, and existing data Dold. The RAID controller successfully overwrites existing parity Pold for stripe S with the newly generated parity Pnew.
Unfortunately, because of improper operation of hardware or software, existing data Dold of stripe unit B1 may not get overwritten with the new data Dnew. For example, the new data Dnew may get inadvertently written to a disk track adjacent to the disk track that stores the existing data Dold of the stripe unit (i.e., mis-tracking). When this happens, two tracks of the disk contain invalid or corrupted data. But the RAID controller believes the existing data Dold of the stripe unit has been properly overwritten with the new data Dnew. If the RAID controller receives a subsequent request from the computer system to read data of stripe unit B1, Dold will be returned rather than Dnew. The computer system or application requesting the data may employ a data consistency checking algorithm and recognize that the returned data is not what is expected. If it is recognized that the data returned is invalid, the computer system may be able to send a second read request for the same data. This approach has a chance of recovering proper data contents in some mirrored (RAID-1) configurations, by causing the read to happen from an alternate mirror; and in some configurations where data is stored directly on raw disks additional attempts to read corrupted data may result in the disk's own error correction logic repairing data. Unfortunately, in the mis-tracking error situation described above, the RAID controller will once again return the corrupted contents of Dold in response to the second request.
Another form of data corruption can occur if disk drive's firmware doesn't write the data to the disk platter, but reports successful completion of the write. In that case, the data stored in the disk block may be internally consistent, but “stale” and there for considered corrupted.
Yet another form of data corruption can occur if a faulty software or hardware component corrupts the data “in flight” by improperly copying or transmitting one or more bits of the data (bit-flipping).