1. Field of the Invention
The present invention relates generally to data storage devices, and more particularly to a system and method for managing requests directed to a disk array. Still more particularly, the present invention is a system and method for managing read and write requests directed to a RAID type disk array.
2. Description of the Background Art
In a data processing environment, multiple data storage devices can be used for improving data storage reliability. Improved reliability occurs as a result of storing duplicate data or by storing parity information corresponding to each block of data that is written. A well-known organization of multiple data storage devices is that of a Redundant Array of Inexpensive Disks (RAID). In a RAID system, each disk drive in the array is partitioned into a set of data segments, such that each data segment stores a predetermined quantity of data. A data write request corresponding to a block of data having a given size occurs across at least one and possibly several disk drives within the array, depending upon the exact size of the data block and the predetermined data segment size. In a like manner, a read request for a given data block can also occur across more than one disk drive within the array. The partitioning of data across an identical segment on multiple disk drives is known as "striping." Each group of identical segments traversing the array of disk drives is referred to as a "stripe."
RAID systems exhibit several distinct architectures; the most commonly encountered RAID architectures are known as RAID 1, RAID 3, and RAID 5. The classification of RAID architectures is described in the paper "A Case for Redundant Arrays of Inexpensive Disks" by Patterson, Gibson, and Katz, published in 1987. In a RAID I architecture, data duplication is used to provide improved data storage reliability. This is commonly referred to as "mirroring." RAID 3 and RAID 5 architectures provide improved data storage reliability through the use of parity information. When one of the disk drives within the RAID system ceases to function properly, the data stored on the problematic disk drive can be retrieved directly if duplicate data has been stored, or the data can be reconstructed if parity information has been stored. In either case, the duplicate data or the parity information must have been stored on one of the disk drives that is functioning properly.
Typically, data is stored via striping across several disk drives in a RAID 1 architecture. For a set of disk drives used to store data, an additional set of disk drives is used to store a duplicate copy of the data. The number of disk drives in the additional set is such that both disk drive sets have equal data storage capacities. Herein, both disk drive sets are assumed to have an identical number of disk drives for simplicity. Thus, for a given number of disk drives N used in a stripe, N additional disk drives are required to provide data redundancy. If a given disk drive fails, the additional disk drive associated with the failed disk drive can be used for subsequent data accesses. Use of an additional disk drive set, however, significantly increases the hardware cost, operational costs, and space requirements related to storing data. When N additional disk drives are used, these costs and requirements are doubled. Such additional costs may prevent a given RAID I implementation from being a cost-effective means for increasing data reliability.
A RAID 3 architecture dedicates one disk drive as a parity drive for storing parity information corresponding to data stored on the other disk drives in the disk array. Accesses to the remaining disk drives in the disk array occur in parallel, thereby maximizing the data transfer rate for large blocks of data. The data transfer rate is further enhanced by requiring that all disk drive spindles be synchronized. If one of the disk drives used for storing data fails, the data corresponding to the failed disk drive is reconstructed by performing an Exclusive-OR (XOR) upon the data stored on the remaining functional disk drives and the parity information stored on the dedicated parity drive. Updated parity information must be computed for each write operation. Since all disk drives are written to in parallel, only one write operation can be issued at any given time. Writing a block of data that occupies only a portion of a stripe is inefficient in RAID 3 systems because some of the disk drives are not used for storing data in the read-modify-write operation. The unused disk drives, however, cannot be used for another write operation at the same time due to the parallel access limitation described above.
When new or modified data is to be written to a target stripe, the parity information is computed by performing an XOR between the data currently stored at the target stripe and the new or modified data. The computed parity information is then written on the parity drive. The process of computing the parity information therefore requires a read operation to obtain the data currently stored at the target stripe; a modify operation corresponding to the XOR performed; and a write operation, where the new or modified data and the computed parity information are written to the target stripe. While the hardware costs are reduced compared to RAID 1 systems, the read-modify-write process is required each time data is to be written to the RAID 3 disk array. This significantly increases the time required to complete a write operation. The time required to perform a write operation effectively doubles, and therefore decreases the overall RAID 3 system performance. Due to the limitations described above, RAID 3 systems are generally useful only in environments where large data records are maintained.
The RAID 5 architecture stores parity information corresponding to each data stripe, but does not dedicate a single disk drive for storing all parity information. Instead, parity information is stored on one of the disk drives within each stripe according to a predetermined pattern. In other words, one disk drive within each stripe is predefined as the parity disk, and the parity disk on successive stripes are not identical. The RAID 5 organization therefore allows all disk drives to be used for storing data. In addition, the RAID 5 organization allows multiple write operations to occur simultaneously when each write operation accesses a unique subset of disk drives and therefore a unique disk drive for storing parity information. The RAID 5 architecture therefore eliminates one of the bottlenecks associated with RAID 3 architectures.
Two methods exist for the computation of parity via a read-modify-write process in RAID 5 systems. The first method is to compute parity in the same manner as in RAID 3 systems. Namely, first read the data stored on each disk drive within the target stripe; compute the new parity information by performing an XOR between the data read and the new or modified data to be written; and write the new or modified data and the computed parity information to the target stripe. The disk drive that has been defined as the parity drive for the target stripe receives the computed parity information. This first method for parity computation is referred to herein as a long write. Long writes are used when the number of disk drives in the stripe that are to receive the new or modified information is greater than a predetermined constant. Since information from each disk drive in a stripe must be read, operated upon, and written, the long write is an inefficient operation when small amounts of data are to be written.
The second read-modify-write method for computing parity information in a RAID 5 system is referred to herein as a short write. In a short write, only the data that will be overwritten or changed as a result of writing the new or updated data is read. Thus, if updated data is to be written to one segment within the target stripe, only the data stored within this segment and its corresponding parity information are read. Next, a first XOR operation is performed between the data read and the new or modified data. A second XOR operation is then performed between the target stripe's stored parity information and the result of the first XOR operation, the result of which is the computed parity information. Finally, the computed parity information is written in conjunction with the new or updated data to the target stripe, where the parity drive defined for the target stripe receives the computed parity information. Preferably, the computed parity information and the new or updated data are written simultaneously. Short writes are used when the number of disk drives within the target stripe that are to receive the new or modified information is less than the predetermined constant.
As in the case of RAID 3 systems, the read-modify-write process, even for short writes, results in a significant increase in the time required to perform a write operation since each write operation requires a read operation and a modify operation in addition to the actual writing of the new or updated data. In RAID 5 computational situations involving long write operations or multiple short write operations, the overall operation of the RAID 5 system is dramatically slower than systems using a single disk drive. Therefore, there is a need for a system and method for minimizing the amount of time required for performing read-modify-write operations in RAID systems, particularly when long write operations or multiple short write operations are required.