Disk arrays comprising a multiplicity of small inexpensive disk drives, such as the 51/4 or 31/2 inch disk drives currently used in personal computers and workstations, connected in parallel have emerged as a low cost alternative to the use of single large disks for non-volatile storage of information within a computer system. The disk array appears as a single large fast disk to the host system but offers improvements in performance, reliability, power consumption and scalability over a single large magnetic disk. In addition to data, redundancy information is stored within the array so that if any single disk within the array should fail, the disk array continues to function without the loss of data.
Several disk array alternatives are discussed in an article titled "A Case for Redundant Arrays of Inexpensive Disks (RAID)" by David A. Patterson, Garth Gibson and Randy H. Katz; University of California Report No. UCB/CSD 87/391, December 1987. The article, incorporated herein by reference, discusses disk arrays and the improvements in performance, reliability, power consumption and scalability that disk arrays provide in comparison to single large magnetic disks. Five disk array arrangements, referred to as RAID levels, are described. The simplest array, a RAID level 1 system, comprises one or more disks for storing data and an equal number of additional "mirror" disks for storing copies of the information written to the data disks. The remaining RAID levels, identified as RAID level 2, 3, 4 and 5 systems, segment the data into portions for storage across several data disks. One or more additional disks are utilized to store error check or parity information.
RAID level 2 and 3 disk arrays are known as parallel access arrays. Parallel access arrays require that all member disks (data and parity disks) be accessed, and in particular, written, concurrently to execute an I/O request. RAID level 4 and 5 disk arrays are known as independent access arrays. Independent access arrays do not require that all member disks be accessed concurrently in the execution of a single I/O request. Operations on member disks are carefully ordered and placed into queues for the member drives. The present invention is directed to improvements in the operation of RAID level 4 and 5 systems.
A RAID level 4 disk array is comprised of N+1 disks wherein N disks are used to store data, and the additional disk is utilized to store parity information. Data to be saved is divided into portions consisting of one or many blocks of data for storage among the disks. The corresponding parity information, which can be calculated by performing a bit-wise exclusive-OR of corresponding portions of the data stored across the N data drives, is written to the dedicated parity disk. These corresponding portions of data together with the parity associated therewith are referred to as redundancy groups. The parity disk is used to reconstruct information in the event of a disk failure. Writes typically require access to two disks, i.e., one of the N data disks and the parity disk, as will be discussed in greater detail below. Read operations typically need only access a single one of the N data disks, unless the data to be read exceeds the block length stored on each disk.
RAID level 5 disk arrays are similar to RAID level 4 systems except that parity information, in addition to the data, is distributed across the N+1 disks in each group. Each one of the N+1 disks within the array includes some blocks for storing data and some blocks for storing parity information. Where parity information is stored is controlled by an algorithm implemented by the user. As in RAID level 4 systems, RAID level 5 writes typically require access to two disks; however, no longer does every write to the array require access to the same dedicated parity disk, as in RAID level 4 systems. This feature provides the opportunity to perform concurrent write operations.
A RAID level 5 system including five data and parity disk drives, identified as DRIVE A through DRIVE E, is illustrated in FIG. 1. An array controller 100 coordinates the transfer of data between the host system 200 and the array disk drives. The controller also calculates and checks parity information. Blocks 101 through 105 illustrate the manner in which data and parity is stored on the five array drives. Data blocks are identified as BLOCK 0 through BLOCK 15. Parity blocks are identified as PARITY 0 through PARITY 3. The relationship between the parity and data block is as follows:
PARITY 0 =(BLOCK 0) XOR (BLOCK 1) XOR (BLOCK 2) XOR (BLOCK 3) PA0 PARITY 1 =(BLOCK 4) XOR (BLOCK 5) XOR (BLOCK 6) XOR (BLOCK 7) PA0 PARITY 2 =(BLOCK 8) XOR (BLOCK 9) XOR (BLOCK 10) XOR (BLOCK 11) PA0 PARITY 3 =(BLOCK 12) XOR (BLOCK 13) XOR (BLOCK 14) XOR (BLOCK 15)
As stated above, parity data can be calculated by performing a bitwise exclusive-OR of corresponding portions of the data stored across the N data drives. However, because each parity bit is simply the exclusive-OR product of all the corresponding data bits from the data drives, new parity can be more easily determined from the old data and the old parity as well as the new data in accordance with the following equation: EQU new parity =(old data XOR old parity) XOR new data.
The read-modify-write method is advantageous in that only the data and parity drives which will be updated need to be accessed during the write operation; whereas all the drives in the array will have to be read or accessed to perform a bit-wise exclusive-OR of corresponding portions of the data stored across the data drives in order to update parity information. A disadvantage of the read-modify-write operation is that a typical RAID level 4 or 5 write operation will require a minimum of two disk reads followed by two disk writes.
Drive utilization efficiency may be improved by modifying the read-modify-write process, separating the execution of data read and write operations from the execution of parity read, generation and write operations. This modified read-modify-write operation identifies the disk drives containing the data and parity to be updated and places the proper read and write requests into the I/O queues for the identified data and parity drives, scheduling some or all parity operations; i.e. reading old parity information from the parity drive, generating new parity information and writing the new parity information to the parity drive; for execution when best accommodated in the I/O queue for the parity drive, following the read of old data from the data drive.
In both the read-modify-write procedure or the modified read-modify-write procedure discussed above, actual write transfers of new data and parity need not occur at the same time. If either the new data or new parity is written prior to a system failure, but the other is not, the contents of the redundancy group will be inconsistent after the system restarts, i.e., the parity information will not be in agreement with the data stored within the redundancy group. A retry of the write operation interrupted during the system failure will not correct the inconsistencies in the redundancy group.
A method and structure for safeguarding disk array write operations is required to prevent the data loss resulting from the occurrence of a power failure or array failure prior to completion of all write procedures.