1. Field of the Invention
This invention relates generally to computer system data storage and, more particularly, to the distribution of data regions in a redundant array of storage units in a computer system.
2. Description of the Related Art
Conventional computing systems include a central processing unit (CPU) and multiple storage devices connected to the CPU either directly or through a control unit and a data channel. Storage units store data that the CPU accesses in performing data processing tasks. Access to the data in a storage unit is gained through reading data from the storage unit, writing data to the storage unit, or both. A great variety of storage units can be used. Typical large capacity computing systems use tape units and disk drives. The disk drives can be magnetic, optical, semiconductor, or some combination of technologies. If there is a failure of one such storage unit, then the entire computing system can be shut down and the data stored on the failed storage unit can be lost.
It is very important for computing systems to have great reliability and efficiency. Therefore, a number of schemes are used to recover from the failure of a storage unit and permit the computing system to continue with data processing tasks. Conventionally, the needed reliability is provided by redundant arrays of independent storage units, such as disk drives, in which a computing system having a number of primary storage units is provided with one or more spare storage units. The spare storage units provide a means of duplicating or rebuilding the data stored in a failed primary unit.
One such redundancy scheme is to duplicate the number of storage units provided and to duplicate the data stored in them, providing a fully redundant system. That is, all data write operations are duplicated in that each data write operation to a primary storage unit is duplicated with a write operation to a corresponding secondary unit. All read operations can be accomplished with a read to the primary unit or the secondary unit or can alternate between primary and secondary units. If one storage unit or another fails, then the data and function of the failed unit are simply replaced with the remaining unit until the computing system can be temporarily shut down for repairs, whereupon a new storage unit is provided or the failed unit is repaired and the data from the remaining unit is copied to the replaced or repaired unit. The likelihood of two independent storage units failing simultaneously or in a small time interval before repair can be effected is relatively small. In this way, the reliability of the computing system is increased and there is minimal interruption to system operations from a storage unit failure.
Unfortunately, providing duplicate storage units can be very costly, because the number of storage units needed for a storage array of a computing system is doubled, and is very inefficient, because the number of write operations that must be executed is doubled. It is known to reduce the number of spare storage units required while still providing redundancy through the use of the exclusive-OR (XOR) logical operator to generate parity information for a group of data regions. Under the XOR redundancy concept, data stored in the storage units of a redundant array is organized into equally sized address areas referred to generally as regions, each storage unit having the same number of regions. The regions can comprise, for example, a disk block, disk track, disk cylinder, tape record, or the like. The regions are organized into parity groups having the same address ranges across the storage units. The XOR operation is performed on all the data regions of a parity group and the result is placed in a parity region of a last storage unit. An additional storage unit provides a corresponding spare region for the parity group.
If there is a storage unit failure, then the data from the failed unit is recovered by performing the XOR operation on the remaining data regions in the parity group and on the parity region of the parity group. The result of the repeated XOR operations provides the data from the failed storage unit. This result is stored on the corresponding spare region of the spare storage unit. In this way, the data from the failed storage unit is rebuilt parity group-by-parity group onto the spare storage unit. Eventually, the computing system can be temporarily shut down for repairs and the failed unit replaced. Until the shut down occurs, however, the computing system can continue with data processing with a minimum of interruption required for the rebuild process itself. In this way, only one spare storage unit is necessary, not counting the storage unit needed for parity information.
It should be clear that a redundancy scheme using parity and a single spare storage unit is much more cost-effective than providing duplicate storage units, especially for very large computing systems. It is not uncommon, for example, to have ten or more disk drives in a disk array of a large computing system. In such a computing system, the parity redundancy scheme described above would reduce the number of storage units necessary from twenty to twelve. Unfortunately, some inefficiencies remain.
Each write operation to any one of the data storage units requires generating new parity information and writing the new parity information to the parity storage unit in addition to the data write operation. That is, with each write of new data, the old parity information stored on the parity storage unit must be read and XOR'd with the old data to "remove" the parity information from the old data and the resulting sum must then be XOR'd with the new data to generate the new parity information. The new parity information is then written to the parity region of the parity storage unit. This process is referred to as a "read-modify-write" process. Thus, the parity storage unit is disproportionately burdened with write operations as compared with the remaining storage units of the array, as a write operation to any one of the data storage units results in a write operation to the parity storage unit. This extra work adds significantly to the work load of the parity storage unit. In addition to being inefficient, the extra work also results in more frequent maintenance to the parity storage unit and can result in more frequent failures of the parity storage unit.
To more uniformly distribute the write operation workload among the data storage units and parity storage unit, it is known to evenly distribute the parity regions among the storage units. In any parity group, one region of one storage unit is designated a parity region for the remaining data regions of the group. The designated parity region is rotated among the storage units in successive parity groups. Such a storage scheme is generally referred to as dedicated sparing or, in the case of disk arrays, by the acronym "RAID-5" (the term "RAID" denoting redundant arrays of inexpensive disks).
A dedicated sparing distribution of regions is illustrated in FIG. 1, where a data region "m" in a storage unit "n" is represented by Dm in column n and a similar notation scheme is used for parity regions P and spare regions S. In the FIG. 1 array, having three storage units and a spare storage unit, the parity region P1 for the first parity group is stored in the first region of the third storage unit, the parity region for the second parity group is stored in the second region of the second unit, and the parity region for the third group is stored in the third region of the first unit. The cycle is repeated for additional regions as the parity region precesses among the storage units for successive parity groups. Thus, the parity region for the fourth parity group is stored in the fourth region of the third storage unit, the parity region for the fifth parity group is stored in the fifth region of the second unit, and so forth.
It has been noted that, in a disk array with a designated spare storage unit, the spare unit is not used in normal operation and increased efficiency can be gained by distributing the regions from the spare unit among the storage units in the array. Such a distributed sparing scheme is illustrated in FIG. 2. In the FIG. 2 array, there are twelve parity groups and three regions in each group. Thus, each storage unit includes six data regions, three parity regions, and three spare regions. It should be apparent that write operations among the data regions and parity regions in a parity group will be equally distributed in terms of read-modify-write processes.
While a more uniform distribution of write workload is achieved with distributed sparing, some unequal distribution of workload still can remain during the rebuild process, after a failure and before the repair or replacement of a failed storage unit. This is illustrated in FIG. 3, which depicts the array of FIG. 2 after the second storage unit has failed and the rebuild process has been completed using the spare regions and reconstructing missing data by using the XOR operation. In FIG. 3, the regions that have been rebuilt are designated by the notation Dm, for data regions, where m is the number of the parity region and n is the number of the storage unit from which the data region was derived. A similar notation scheme is used for the rebuilt parity regions.
Storage Unit 2 included six data regions, three parity regions, and three spare regions. Therefore, with respect to the rebuilding process, it should be clear that nine regions must be rebuilt, as the now-missing spare regions S3.sub.2, S7.sub.2, and S11.sub.2 of Storage Unit 2 (FIG. 2) need not be rebuilt. Rebuilding D1 of Storage Unit 2 is achieved by reading D1 from Storage Unit 1 and P1 from Storage Unit 3, XOR'ing those regions together to generate the D1.sub.2 information, and writing the result to Storage Unit 4. Rebuilding P2.sub.2, is achieved by reading D2 from Storage Unit 1 and D2 from Storage Unit 4, XOR'ing those regions together, and writing the result to Storage Unit 3. The remaining regions are rebuilt in a similar fashion. Thus, rebuilding the missing regions requires one access operation to each of the three remaining storage units and the rebuilding workload is evenly distributed among the remaining storage units.
After the rebuilding process is completed, the distribution of regions appears as depicted in FIG. 3. It should be noted that Storage Unit 1 has nine data regions and three parity regions, Storage Unit 3 has six data regions and six parity regions, and Storage Unit 4 has nine data regions and three parity regions. That is, it just so happened that all of the parity regions that were on Storage Unit 2 were moved to Storage Unit 3 during the rebuild process. Storage Unit 3 now has a disproportionate share of the parity regions so that the write workload in connection with the read-modify-write process is not uniformly distributed. It would be advantageous if data regions, parity regions, and spare regions were distributed so that workload was evenly divided among the storage units of an array before, during, and after a failure of one or more units. The problem of uneven workload distribution can become more complicated, and more uneven, if more than one storage unit fails.
From the discussion above, it should be apparent that there is a need for a redundant array of storage units that provides uniform workload distribution among the storage units before, during, and after one or more storage unit failures so that the rebuild process does not result in uneven distribution of workload in connection with read-modify-write processes. The present invention satisfies this need.