Magnetic disk storage is currently the most widely used method of mass storage for computer systems. Traditionally, systems using this method of mass storage have included a single disk capable of storing large amounts of data. However, systems using an array of smaller capacity, less expensive disk drives are currently emerging as a low cost alternative to large single disk systems. These array systems are known as RAID (redundant array of independent drives) systems.
When used in conjunction with a host computer, a RAID system appears to behave just like a single disk system. RAID systems, however, offer many advantages over single disk systems. One of the most important advantages of RAID technology is the greatly improved system reliability it provides. Reliability is improved through the use of redundancy information in the array which allows the system to continue operating, in a degraded mode, even though one of the drives in the array has failed. The failed drive may then be replaced, and the lost data regenerated, without having to shut down the system. This is a great improvement over a single disk system which is rendered inoperable and may lose valuable data if the one disk in the system fails.
RAID technology encompasses a series of techniques for managing the operation of multiple disks. These techniques are discussed in an article entitled "A Case for Redundant Arrays of Inexpensive Disks (RAID)" by Patterson, Gibson, and Katz of the University of California (Report No. UCB/CSD 87/391, December 1987) which categorizes the different techniques into five RAID "levels" and is hereby incorporated by reference. Each RAID level represents a different approach to storing and retrieving data and the associated redundancy information across the array of disk drives.
For example, FIG. 1 illustrates one embodiment of a RAID level 5 data storage system 10. As seen in the figure, the system 10 comprises: an array of disk drives 16, identified as DRIVE A through DRIVE E; a disk array controller 12; and a host computer 14. The host computer 14 delivers I/O requests to the disk array controller 12 requesting that certain read/write operations be performed. The controller 12 coordinates the transfer of data between the host computer 14 and the array of disk drives 16 according to RAID level 5 techniques in response to those requests. In addition, the controller 12 calculates and stores the required redundancy information, which in a RAID level 5 system comprises parity information. The parity information is simply a collection of parity bits which are binary digits used to make the sum of all the digits across each redundancy group either an even or odd number.
Blocks 20 through 24 in FIG. 1 illustrate the manner in which data and parity information are stored on the five array drives in system 10. Data is stored in data blocks identified as BLOCK 0 through BLOCK 15. Parity information is stored in parity blocks identified as PARITY 0 through PARITY 3. Each parity block is associated with four corresponding data blocks, all located on a common "stripe" across the five array drives, to form a redundancy group. The parity information stored in the parity block of any particular redundancy group is calculated using the data stored in the four corresponding data blocks. Consequently, if the data stored in one of the data blocks of a redundancy group is changed, the corresponding parity information must be updated.
Because there is a direct relationship between the data stored in a redundancy group and the corresponding parity information, if some of the data in the group is lost, such as by the failure of one of the disk drives in the array, the parity information may be used to reconstruct the lost data. In this way, the system 10 can continue to perform read and write operations even before the failed drive is replaced. It should be apparent, however, that in order for the system 10 to maintain the increased reliability provided by the above-described technique, it is mandatory that the system maintain consistency between the data and the parity information stored in each of its redundancy groups.
A problem can arise when the system 10 is performing a write operation. The problem stems from the fact that, during a write operation, new data and new parity information are normally written to the redundancy groups at different times. Therefore, if a system interruption, such as a loss of system power, occurs during the write operation, a condition may result where either the new data or the new parity information has been written to the redundancy group without the other. This creates an inconsistency between data and parity within the array 16 which can, in certain circumstances, negatively effect the system's ability to operate properly. For example, if one of the drives in the array fails, it will be impossible to read the data block corresponding to the failed drive in a redundancy group containing inconsistent data/parity information. In addition, a retry of the write operation interrupted during the power failure will not correct the inconsistency in the redundancy group.
Therefore, a need exists for a method and apparatus for preserving the data/parity consistency in a RAID system.