1. Field of the Invention
The present invention relates to data storage devices, and, in particular, to arrays of disks for storing data.
2. Description of the Related Art
This application incorporates by reference in its entirety the following U.S. patent applications: (i) U.S. Provisional Patent Application No. 60/724,573 entitled “Storage Device Management” filed Oct. 7, 2005; (ii) U.S. patent application Ser. No. 11/544,442 entitled “Virtual Profiles for Storage-Device Array Encoding” filed Oct. 6, 2006; (iii) U.S. patent application Ser. No. 11/544,445 entitled “Back-Annotation in Storage-Device Array” filed Oct. 6, 2006; (iv) U.S. patent application Ser. No. 11/544,456 entitled “Ping-Pong State Machine for Storage-Device Array” filed Oct. 6, 2006; and (v) U.S. patent application Ser. No. 11/544,462 entitled “Parity Rotation in Storage-Device Array” filed Oct. 6, 2006.
In general, there are several defined categories of storage schemes that are used in conjunction with a Redundant Array of Independent (or Inexpensive) Disks (RAID). Different hardware and software components supplied by different vendors may support one or more of these schemes, which are identified as RAID “levels” having particular specifications, as follows.
RAID level 0 (or “RAID-0”) specifies a block-interleaved, striped disk array without fault tolerance and requires a minimum of two drives to implement. In a RAID-0 striped disk array, the data is broken down into blocks, and each block is written to a separate disk drive in the array. Input/output (I/O) performance is greatly improved by spreading the I/O load across a plurality of channels and drives. In RAID-0, optimal performance is achieved when data is striped across multiple controllers with only one drive per controller. RAID-0 involves no parity calculation overhead and is not a “true” RAID because it is not fault-tolerant, i.e., there is no redundancy of data. Thus, the failure of only one drive will result in all of the data in an array being lost. FIG. 1 illustrates the sequence of storing blocks in an exemplary RAID-0 striped disk array, wherein block A is written to the first disk, block B is written to the second disk, block C is written to the third disk, block D is written to the first disk, and so forth.
RAID-1 specifies a disk array with mirroring (redundancy) of data across different physical hard disks. In a RAID-1 array, each block of data on a disk exists in identical form on another disk in the array. For optimal performance, the controller performs two concurrent separate reads per mirrored disk pair and two duplicate writes per mirrored disk pair. RAID-1 requires a minimum of two drives to implement and makes data recovery following a disk failure relatively easy. FIG. 2 illustrates the sequence of storing blocks in an exemplary RAID-1 mirrored disk array, wherein block A is written to the first disk, a copy A′ of block A is written to the second disk, block B is written to the first disk, a copy B′ of block B is written to the second disk, and so forth.
RAID-4 specifies a block-interleaved, dedicated parity-disk array. In RAID-4, each entire block is written onto data disks, and a non-data disk called a parity disk is used to store parity blocks. Each parity block is typically generated by exclusive-OR (XOR) combining data contained in corresponding same-rank blocks on the data disks. To provide write verification, RAID-4 specifies that writes to the parity disk take place for each data block stored on a data disk. To provide read verification, reads from the parity disk take place for each data block that is read from a data disk. RAID-4 requires a minimum of three drives to implement and has a relatively high read-data transaction rate. High efficiency of a RAID-4 array correlates with a low parity-disk/data-disk ratio. RAID-4 exhibits relatively high read-data transaction rates, relatively high aggregate-read-transfer rates, and block-read-transfer rates equal to those of a single disk. Disadvantageously, however, RAID-4 has low write-transaction rates and relatively low write-aggregate-transfer rates. However, data can be rebuilt in the event of the failure of one of the disks in the disk array. FIG. 3 illustrates the sequence of storing blocks in an exemplary RAID-4 dedicated-parity disk array, wherein block A is written to the first disk, block B is written to the second disk, and then a parity block is generated by XOR-combining blocks A and B. The parity block pAB for blocks A and B is stored on the third disk. Block C is then written to the first disk, block D is written to the second disk, and so forth.
RAID-5 specifies a block-interleaved, distributed-parity disk array. In RAID-5, each entire data block is written on a data disk, and a parity block for the corresponding data blocks in the same rank is generated. The parity blocks are recorded in locations that are distributed across the disks in the array and are later verified on reads of data blocks. RAID-5 requires a minimum of three drives to implement, exhibits a relatively high read-data-transaction rate, a medium write-data-transaction rate, and relatively good aggregate transfer rates, and individual block data-transfer rates are about the same as those of a single disk. High efficiency of a RAID-5 array correlates with a low parity-disk/data-disk ratio. In RAID-5, disk failure has only a relatively-medium impact on throughput, but rebuilding data is difficult relative to, e.g., RAID-1. FIG. 4 illustrates the sequence of storing blocks in an exemplary RAID-5 distributed-parity disk array, wherein block A is written to the first disk, block B is written to the second disk, and then a parity block is generated by XOR-combining blocks A and B. The parity block pAB for blocks A and B is stored on the third disk. Block C is then written to the fourth disk, block D is written to the fifth disk, and then a parity block is generated by XOR-combining blocks C and D. The parity block pCD for blocks C and D is stored on the first disk. Block E is then written to the second disk, block F is written to the third disk, and so forth.
It is noted that a RAID array can implement multiple nested RAID levels, thereby conforming to the specifications of two or more RAID levels. For example, as shown in the exemplary RAID-1+0 (or “RAID-10”) array of FIG. 5, blocks written to the disk array are mirrored and then striped. Block A is written to the first disk, a copy A′ of block A is written to the second disk, block B is written to the third disk, a copy B′ of block B is written to the fourth disk, block C is written to the first disk, a copy C′ of block C is written to the second disk, block D is written to the third disk, a copy D′ of block D is written to the fourth disk, and so forth.
Alternatively, as shown in the exemplary RAID-0+1 array of FIG. 6, blocks written to the disk array are striped and then mirrored. Block A is written to the first disk, block B is written to the second disk, a copy A′ of block A is written to the third disk, a copy B′ of block B is written to the fourth disk, block C is written to the first disk, a copy C′ of block C is written to the second disk, and so forth.
Other combinations of RAID-array levels and arrays having different numbers of disk drives per array are possible, and other RAID configurations and levels exist (e.g., RAID-6 and RAID-50), although not specifically mentioned or discussed herein.
As discussed above, RAID levels 1, 4, and 5 support redundancy, i.e., if any one drive fails, the data for the failed drive can be reconstructed from the remaining drives. If such a RAID array is operating with a single drive identified as failed, it is said to be operating in a degraded mode. RAID-1 and RAID-4/RAID-5 provide redundancy of data using different methods. RAID-1 provides data redundancy by mirroring, i.e., maintaining multiple complete copies of the data in a volume. Data being written to a mirrored volume is reflected in all copies, such that, if a portion of a mirrored volume fails, the system continues to use the other copies of the data. RAID-5 provides data redundancy by using the stored parity information, which is used to reconstruct data after a failure. Since parity information is calculated by performing a known XOR procedure on data being written to a RAID-5 volume, if a portion of a RAID-5 volume fails, the data that was on that portion of the failed volume can be recreated by calculating the correct data using the remaining data and parity information.
Conventional RAID arrays suffer from a number of disadvantages, including the following.
RAID arrays typically use either (i) fixed-hardware implementations that permit a group of drives to appear as one or (ii) software implementations that use the host computer's CPU to perform RAID operations. Disadvantageously, such traditional hardware implementations are inflexible, and such software implementations use processor and memory overhead. Moreover, neither permits a single set of physical drives to be used in more than one configuration at a time.
In conventional RAID arrays, during write operations, one sector of data at a time is sent to various physical disks in the array, and such transfer of data is typically managed by software running on the host computer, which calculates and provides addresses on these physical disks at which the data will be written. Thus, memory and processor resources of the host computer must be used.
Moreover, in such arrays, a disk controller communicates directly with physical disks in the array. When writing to the disks, the controller must wait for the physical disk to be ready for the write operation, or software buffering by the host computer must be performed.
Additionally, during read and write operations in a conventional RAID array, one entire stripe is buffered at a time and stored (typically in memory on the host computer) so that parity calculations can be made, thereby requiring substantial processor and memory resources for this cumbersome storage and calculation process.
In conventional RAID arrays, an entire RAID array is unavailable for reading and writing while a volume is being reconstructed, and reconstruction typically involves running software on a host computer while all of the drives of the array are taken offline.
Another limiting aspect of conventional RAID arrays is that a user can define only a single profile defining parameters for the set of physical disk drives (or other storage devices) in the array. Such arrays store and retrieve data block-by-block, and the block size for an array is typically determined in the profile from the outset, before any data is ever written to the drives. This block size does not change after storage to the disks has begun.
Also in the profile, traditional arrays identify disk drives as physical drives in the order in which they are stored in the array's physical drive bays (i.e., slot 0, slot 1, slot 2). The order of drives can be changed only by physically removing, exchanging, or inserting drives within the drive bays. Drives can be added to a RAID array only when they are physically present in the array, and when drives are removed from the array, no configuration information for these drives is stored. Also, drive partitioning cannot be adjusted and resized on an ad-hoc basis, but, as with block size, this can only be done before the first data is ever written to the disks.
The drives in conventional RAID arrays are limited to a single file system, and there is no way for different portions of the same physical disk array to be used concurrently, except as part of one of the RAID-level schemes (e.g., mirroring or striping), as discussed above.
Excess capacity on disk drives in a physical disk drive array cannot be used when integrating physical drives of varying sizes into traditional RAID arrays, and all drives in the array are limited to using only the amount of storage available on the smallest-sized drive in the array. For example, in a traditional RAID array containing three 40 GB drives, if a fourth drive of 120 GB drive is added, only 40 GB of the fourth drive can be used.