The acronym “RAID” is an umbrella term for data-storage schemes that can divide and replicate data among multiple hard-disk drives. When several physical hard-disk drives are set up to use RAID technology, the hard-disk drives are said to be in a RAID group. The RAID group distributes data across several hard-disk drives, but the RAID group is exposed to the operating system as a single logical disk drive or data storage volume.
Although a variety of different RAID system designs exist, all have two key design goals, namely: (1) to increase data reliability and (2) to increase input/output (I/O) performance. RAID has seven basic levels corresponding to different system designs. The seven basic RAID levels, typically referred to as RAID levels 0-6, are as follows. RAID level 0 uses striping to achieve increased I/O performance. The term “striped” means that logically sequential data, such as a single data file, is fragmented and assigned to multiple physical disk drives in a round-robin fashion. Thus, the data is said to be “striped” over multiple physical disk drives when the data is written. Striping improves performance and provides additional storage capacity. The fragments are written to their respective physical disk drives simultaneously on the same sector. This allows smaller sections of the entire chunk of data to be read off the drive in parallel, providing improved I/O bandwidth. The larger the number of physical disk drives in the RAID system, the higher the bandwidth of the system, but also the greater the risk of data loss. Parity is not used in RAID level 0 systems, which means that RAID level 0 systems are not fault tolerant. Consequently, when any physical disk drive fails, the entire system fails.
In RAID level 1 systems, mirroring without parity is used. Mirroring corresponds to the replication of stored data onto separate physical disk drives in real time to ensure that the data is continuously available. RAID level 1 systems provide fault tolerance from disk errors because all but one of the physical disk drives can fail without causing the system to fail. RAID level 1 systems have increased read performance when used with multi-threaded operating systems, but also have a reduction in write performance.
In RAID level 2 systems, redundancy is used and physical disk drives are synchronized and striped in very small stripes, often in single bytes/words. Redundancy is achieved through the use of Hamming codes, which are calculated across bits on physical disk drives and stored on multiple parity disks. If a physical disk drive fails, the parity bits can be used to reconstruct the data. Therefore, RAID level 2 systems provide fault tolerance. That is, failure of a single physical disk drive does not result in failure of the system.
RAID level 3 systems use byte-level striping in combination with interleaved parity bits and a dedicated parity disk. RAID level 3 systems require the use of at least three physical disk drives. The use of byte-level striping and redundancy results in improved performance and provides the system with fault tolerance. However, use of the dedicated parity disk creates a bottleneck for writing data due to the fact that every write requires updating of the parity data. A RAID level 3 data storage system can continue to operate without parity and no performance penalty is suffered in the event that the parity disk fails.
RAID level 4 is essentially identical to RAID level 3 except that RAID level 4 systems employ block-level striping instead of byte-level or word-level striping. Because each stripe is relatively large, a single file can be stored in a block. Each physical disk drive operates independently and many different I/O requests can be handled in parallel. Error detection is achieved by using block-level parity bit interleaving. The interleaved parity bits are stored in a separate single parity disk.
RAID level 5 uses striping in combination with distributed parity. In order to implement distributed parity, all but one of the physical disk drives must be present for the system to operate. Failure of any one of the physical disk drives necessitates replacement of the physical disk drive. However, failure of a single one of the physical disk drives does not cause the system to fail. Upon failure of one of the physical disk drives, any subsequent data read operations can be performed or calculated from the distributed parity such that the physical disk drive failure is masked from the end user. If a second one of the physical disk drives fails, the system will suffer a loss of data. Accordingly, the data storage volume or logical disk drive is vulnerable until the data that was on the failed physical disk drive is reconstructed on a replacement physical disk drive.
RAID level 6 uses striping in combination with dual distributed parity. RAID level 6 systems require the use of at least four physical disk drives, with two of the physical disk drives being used for storing the distributed parity bits. The system can continue to operate even if two physical disk drives fail. Dual parity becomes increasingly important in systems in which each virtual disk is made up of a large number of physical disk drives. RAID level systems that use single parity are vulnerable to data loss until the failed drive is rebuilt. In RAID level 6 systems, the use of dual parity allows a virtual disk having a failed physical disk drive to be rebuilt without risking loss of data in the event that a physical disk drive of one of the other physical disk drives fails before completion of the rebuild of the first failed physical disk drive.
A hot spare disk is a physical disk drive which is flagged for use if another drive in the RAID group fails. RAID 1, RAID 0+1, RAID 3, RAID 5, and RAID 6 all support hot spare disks.
Normally if a physical disk drive fails that is a member of a RAID group, that array will run in a degraded mode. A RAID group operating in degraded mode is not operating at peak efficiency or performance, since not all physical disk drives are present or functioning.
When a hot spare disk is available, the RAID group can immediately start rebuilding stored data in the RAID group to the hot spare standby disk, without manual intervention. As soon as the rebuild completes, the RAID group operates at full functionality and performance. Thereafter, the failed physical disk drive can be replaced, and the new replacement drive becomes the hot spare disk.
FIGS. 1A-1D illustrate a known arrangement and method for using a hot spare disk to restore a data volume operating in a degraded mode. FIG. 1A shows a data volume 10 including a group of four physical disk drives. A data volume such as the data volume 10 is exposed to users of the data as a single logical drive. Physical disk drives D1 through D3 are designated for storing data. When the data volume 10 is supported by a RAID architecture that uses a dedicated parity disk drive, one of the physical disk drives D1-D3 is designated for storing parity information that is calculated from corresponding portions (e.g., similarly sized blocks) of the data in the other physical disk drives of the RAID group. The parity information is stored at a corresponding location in the dedicated parity disk drive. For example, in one RAID architecture or arrangement, D3 is a dedicated parity disk drive.
In the event of a physical disk drive failure of one of D1 or D2, the parity information stored in D3 can be used to regenerate the corresponding lost portion of either D1 or D2 that was used to generate the parity information. The physical disk drive labeled SPARE is provided as a hot spare or standby disk. When one of the physical disk drives D1 or D2 fails (i.e., data can no longer be written to and or read from the physical disk drive), as shown in FIG. 1B, the data volume 10a operates in a degraded mode using the data stored in D2 and the parity information stored in D3.
As further indicated in FIG. 1C, the data volume 10b, which is operating in a degraded mode due to the failure of physical disk drive D1, enters a data reconstruction mode, during which, data from physical disk drive D2 and corresponding parity information from D3 is used to calculate the lost data from D1 which is then transferred to populate corresponding locations in the hot spare or standby disk. During or after completion of the reconstruction process, the failed physical disk drive is removed. Thereafter, as illustrated in FIG. 1D, a new physical disk drive is inserted where the failed physical disk drive D1 was formerly located in the data volume 10c. The new physical disk drive becomes a hot spare or standby disk for the data volume 10c. 
When the data volume 10 is supported by a RAID architecture that uses distributed parity information, each of the physical disk drives D1-D3 stores both data volume and parity information that is calculated from corresponding portions (e.g., similarly sized blocks) of the data volume information stored in the other physical disk drives of the RAID group. The RAID architecture dictates the location of the data volume information and the corresponding parity information.
As physical disk drive capacities continue to outpace improvements in input/output interface data rates, hot spare rebuild times become increasingly problematic. When a physical disk drive fails in a RAID group with a hot spare disk, the reconstruction algorithm steps through the data stored in the RAID group by reading a portion of data from the remaining functioning physical disk drives and reading the corresponding information from the parity disk and performing an exclusive OR logic operation over the data. The result of the exclusive OR operation is written to the corresponding location in the hot spare disk. Accordingly, all restored data must be written to the hot spare disk drive. Thus, the maximum sustainable write data transfer rate to the media of the hot spare disk drive becomes a critical factor in the time it takes to complete a hot spare rebuild. This is especially true for lower cost physical drives such as serial advanced technology attachment (SATA) drives, which have relatively slower input/output interfaces, larger data storage capacities and higher failure rates.
The time it takes to generate and write the data to the hot spare (i.e., the rebuild time) is critical to the overall reliability of the RAID group. While the hot spare is rebuilding, the RAID group has no redundancy and is vulnerable to a second physical disk drive failure in the data volume 10. For example, it can take many hours and in some cases days to rebuild a failed disk drive in a RAID group of SATA disk drives with terabyte data capacities that is still servicing input/output requests from computing devices.