A Redundant Array of Inexpensive Disks, referred to as a RAID storage system, is a collection of disk drives which appears as a single large disk drive to a host computer system. Additionally, part of the disk storage capacity is utilized to store redundant information about user data stored on the remainder of the storage capacity. This redundant information allows the disk array to continue to function without the loss of data should an array disk drive member fail, and permits the regeneration of data to a replacement array disk drive member.
Several RAID disk array design alternatives were presented in an article titled "A Case for Redundant Arrays of Inexpensive Disks (RAID)" by David A. Patterson, Garth Gibson and Randy H. Katz; University of California Report No. UCB/CSD 87/391, December 1987. The article, incorporated herein by reference, discusses disk arrays and the improvements in performance, reliability, power consumption and scalability that disk arrays provide in comparison to single large magnetic disks.
Five disk array arrangements are described in the article. The first level RAID comprises N disks for storing data and N additional "mirror" disks for storing copies of the information written to the data disks. RAID level 1 write functions require that data be written to two disks, the second "mirror" disk receiving redundant information, i.e., the same information provided to the first disk. When data is read, it can be read from either disk.
RAID level 3 systems comprise one or more groups of N+1 disks. Within each group, N disks are used to store data, and the additional disk is utilized to store redundant information, i.e., parity information. During RAID level 3 write functions, each block of data is divided into N portions for storage among the N data disks. The corresponding parity information is calculated by determining the exclusive-OR product of the data written to the N data disks and written to a dedicated parity disk. When data is read, all N data disks must be accessed. The parity disk is used to reconstruct information in the event of a disk failure.
A RAID level 2 system is similar to the RAID level 3 systems described above, but includes additional redundant disks for identifying disk drive failures,
RAID level 4 systems are also comprised of one or more groups of N+1 disks wherein N disks are used to store data, and the additional disk is utilized to store parity information. RAID level 4 systems differ from RAID level 3 systems in that data to be saved is divided into larger portions, consisting of one or many blocks of data, for storage among the disks. Writes still require access to two disks, i.e., one of the N data disks and the parity disk. In a similar fashion, read operations typically need only access a single one of the N data disks, unless the data to be read exceeds the block length stored on each disk. As with RAID level 3 systems, the parity disk is used to reconstruct information in the event of a disk failure.
RAID level 5 is similar to RAID level 4 except that parity information, in addition to the data, is distributed across the N+1 disks in each group. Although each group contains N+1 disks, each disk includes some blocks for storing data and some blocks for storing parity information. Where parity information is stored is controlled by an algorithm implemented by the user. As in RAID level 4 systems, RAID level 5 writes require access to at least two disks; however, no longer does every write to a group require access to the same dedicated parity disk, as in RAID level 4 systems. This feature provides the opportunity to perform concurrent write operations.
As with RAID level 3, parity data in either a RAID level 4 or 5 system can be calculated by performing a bit-wise exclusive-OR of corresponding portions of the data stored across the N data drives. However, because each parity bit is simply the exclusive-OR product of all the corresponding data bits from the data drives, new parity can be more easily determined from the old data and the old parity as well as the new data in accordance with the following equation: EQU new parity=(old data XOR new data) XOR old parity.
Although the parity calculation for RAID levels 4 or 5 shown in the above equation is much simpler than performing a bit-wise exclusive-OR of corresponding portions of the data stored across all of the data drives, a typical RAID level 4 or 5 write operation will require a minimum of two disk reads and two disk writes. More than two disk reads and writes are required for data write operations involving more than one data block. Each individual disk read operation involves a seek and rotation to the appropriate disk track and sector to be read. The seek time for all disks is therefore the maximum of the seek times of each disk. A RAID level 4 or 5 system thus carries a significant write penalty when compared with a single disk storage device or with RAID level 1, 2 or 3 systems.
In order to coordinate the operation of the disk drives within an array to perform read and write functions, map received data onto the array disk drive members, generate and check redundant information, and provide data restoration and reconstruction, complex storage management techniques are required. In many early disk array systems, the array management software necessary to perform these complex storage management techniques is executed within the host computer system. The host system thereby functions as the disk array controller and performs the generation and checking of redundant information as well as coordinating the many other storage management operations required of the disk array. Having the host perform these functions is expensive in host processing overhead.
Most disk array systems in use today are self-contained, including a dedicated controller for executing the array management software, thus relieving the host system of these operations. A simple architectural block diagram of a disk array system is shown in FIG. 1. The system includes an intelligent array controller 100 for managing the transfer of data between a host computer system 12 and N disk drive units, five of which, identified as DRIVE A through DRIVE E, are shown in FIG. 1. Central to the array controller is a high speed local bus 102, such as a Peripheral Component Interconnect (PCI). A host SCSI interface 104 and SCSI bus 14 provide connection between the host computer system 12 and PCI bus 102. Similarly, each of disk drives DRIVE A through DRIVE E are connected to PCI bus 102 through a SCSI drive interface, identified by reference numerals 112A through 112E, respectively, and corresponding SCSI buses 114A through 114E. Parity functions are performed by a parity logic circuit 108 and local memory 110, each also being connected with PCI bus 102. Communication between, and operation of, controller components are controlled by processor 106, in accordance with instructions residing in processor memory 118. The construction and operation of the array controller shown in FIG. 1, as well as the components included in the controller, should be readily understood by those skilled in the art.
The RAID storage process requires many parity calculations and data movement operations to create the necessary data redundancy, or reconstruct data following a disk failure. In the array controller architecture shown in FIG. 1 and described above, much use of PCI bus 102 is required to transfer new data, old data, reconstructed data, old parity information and new parity information between host computer system 12, array drives DRIVE A through DRIVE E, local memory 110 and parity logic 108 to generate new parity information during an array write operation or reconstruct data following an array failure.