1. Field of the Invention
This invention relates to computer system data storage, and more particularly to a system and method for ensuring the completion and integrity of data modification operations to a redundant array data storage system and for ensuring the integrity of redundancy values in such a system.
2. Description of Related Art
A typical data processing system generally involves one or more storage units which are connected to a Central Processor Unit (CPU) either directly or through a control unit and a channel. The function of the storage units is to store data and programs which the CPU uses in performing particular data processing tasks.
Various types of storage units are used in current data processing systems. A typical system may include one or more large capacity tape units and/or disk drives (magnetic, optical, or semiconductor) connected to the system through respective control units for storing data.
However, a problem exists if one of the large capacity storage units fails such that information contained in that unit is no longer available to the system. Generally, such a failure will shut down the entire computer system.
The prior art has suggested several ways of solving the problem of providing reliable data storage. In systems where records are relatively small, it is possible to use error correcting codes which generate ECC syndrome bits that are appended to each data record within a storage unit. With such codes, it is possible to correct a small amount of data that may be read erroneously. However, such codes are generally not suitable for correcting or recreating long records which are in error, and provide no remedy at all if a complete storage unit fails. Therefore, a need exists for providing data reliability external to individual storage units.
One solution to this problem is disk array systems. Disk array systems are of various types. A research group at the University of California, Berkeley, in a paper entitled "A Case for Redundant Arrays of Inexpensive Disks (RAID)", Patterson, et al., Proc. ACM SIGMOD, June 1988, has catalogued a number of different types by defining five architectures under the acronym "RAID" (for Redundant Arrays of Inexpensive Disks).
A RAID 1 architecture involves providing a duplicate set of "mirror" storage units and keeping a duplicate copy of all data on each pair of storage units. While such a solution solves the reliability problem, it doubles the cost of storage. A number of implementations of RAID I architectures have been made, in particular by Tandem Corporation.
A RAID 2 architecture stores each bit of each word of data, plus Error Detection and Correction (EDC) bits for each word, on separate disk drives. For example, Flora et al. U.S. Pat. No. 4,722,085 discloses a disk drive memory using a plurality of relatively small, independently operating disk subsystems to function as a large, high capacity disk drive having an unusually high fault tolerance and a very high data transfer bandwidth. A data organizer adds 7 EDC bits (determined using the well-known Hamming code) to each 32-bit data word to provide error detection and error correction capability. The resultant 39-bit word is written, one bit per disk drive, on to 39 disk drives. If one of the 39 disk drives fails, the remaining 38 bits of each stored 39-bit word can be used to reconstruct each 32-bit data word on a word-by-word basis as each data word is read from the disk drives, thereby obtaining fault tolerance.
A RAID 3 architecture is based on the concept that each disk drive storage unit has internal means for detecting a fault or data error. Therefore, it is not necessary to store extra information to detect the location of an error; a simpler form of parity-based error correction can thus be used. In this approach, the contents of all storage units subject to failure are "Exclusive OR'd" (XOR'd) to generate parity information. The resulting parity information is stored in a single redundant storage unit. If a storage unit fails, the data on that unit can be reconstructed onto a replacement storage unit by XOR'ing the data from the remaining storage units with the parity information. Such an arrangement has the advantage over the mirrored disk RAID 1 architecture in that only one additional storage unit is required for "N" storage units. A further aspect of the RAID 3 architecture is that the disk drives are operated in a coupled manner, similar to a RAID 2 system, and a single disk drive is designated as the parity unit. One implementation of a RAID 3 architecture is the Micropolis Corporation Parallel Drive Array, Model 1804 SCSI, that uses four parallel, synchronized disk drives and one redundant parity drive. The failure of one of the four data disk drives can be remedied by the use of the parity bits stored on the parity disk drive. Another example of a RAID 3 system is described in Ouchi U.S. Pat. No. 4,092,732.
A RAID 4 architecture uses the same parity error correction concept of the RAID 3 architecture, but improves on the performance of a RAID 3 system with respect to random reading of small files by "uncoupling" the operation of the individual disk drive actuators, and reading and writing a larger minimum amount of data (typically, a disk sector) to each disk (this is also known as block striping). A further aspect of the RAID 4 architecture is that a single storage unit is designated as the parity unit.
A RAID 5 architecture uses the, same parity error correction concept of the RAID 4 architecture and independent actuators, but improves on the writing performance of a RAID 4 system by distributing the data and parity information across all of the available disk drives. Typically, "N+1" storage units in a set (also known as a "redundancy group") are divided into a plurality of equally sized address areas referred to as blocks. Each storage unit generally contains the same number of blocks. Blocks from each storage unit in a redundancy group having the same unit address ranges are referred to as "stripes". Each stripe has N blocks of data, plus one parity block on one storage unit containing parity for the remainder of the stripe. Further stripes each have a parity block, the parity blocks being distributed on different storage units. Parity updating activity associated with every modification of data in a redundancy group is therefore distributed over the different storage units. No single unit is burdened with all of the parity update activity. For example, in a RAID 5 system comprising 5 disk drives, the parity information for the first stripe of blocks may be written to the fifth drive; the parity information for the second stripe of blocks may be written to the fourth drive; the parity information for the third stripe of blocks may be written to the third drive; etc. The parity block for succeeding stripes typically "precesses" around the disk drives in a helical pattern (although other patterns may be used). Thus, no single disk drive is used for storing the parity information as in the RAID 4 architecture. An example of a RAID 5 system is described in Clark et al. U.S. Pat. No. 4,761,785. RAID 3, 4, and 5 disk storage array configurations provide a lower cost alternative to RAID 1 and 2 configurations. However, RAID 3, 4, and 5 systems that have been optimized for performance are very susceptible to data and/or parity information corruption if a WRITE operation fails before completion because of a component failure. In such systems, it is desirable to have the update of the parity information occur simultaneously with the update of the data, rather than serially, to save time. Thus, if a temporary "failure" (such as a power loss or controller failure) occurs to a storage unit during a WRITE operation, there is no assurance that the data or the corresponding parity information were properly written and valid. Since two concurrent I/O operations are undertaken to update the data and its associated parity, it is difficult to determine which I/O operation was completed before the system termination. Thus, the data that was being written could be corrupted.
The term "Atomic Write" is used in the art to define a WRITE operation to a data storage unit in which the operation, once initiated, (1) invariably completes with data being reliably written to the data storage unit, or (2) positively indicates that the data was not written, thus allowing for complete recovery from the operation, so that data is never lost regardless of the failure of any component or subsystem during the operation.
Tandem Computers Incorporated has for some time provided Atomic Writes for RAID 1 type mirrored data storage units in its line of fault-tolerant computers. However, ensuring write data integrity and redundancy integrity in RAID 3, 4, and 5 disk storage array systems presents a challenge that has not been fully resolved in the art. In particular, a complete system that ensures Atomic Writes in RAID 3, 4, and 5 disk storage arrays has not been described in the art.
Therefore, a need exists for a system architecture which ensures that WRITE operations complete and valid redundancy information is generated in a RAID 3, 4, or 5 system even in the event of a component failure. It is also desirable to have such a RAID system in which restoration of potentially corrupted redundancy information can be conducted with minimum impact on normal processing operations.
The present invention provides a system and method for accomplishing these objectives.