An effective storage subsystem is a critical concern in the computer system industry. Especially, the performance of a storage subsystem during its recovery from a disk failure is crucial to applications that mandate both high I/O performance and high data reliability. Effective storage subsystems may be required to provide not only the ability to recover from a disk failure without losing data, but also rapidly restore the system to its fault-free state, and have minimal impact on system performance as observed by users.
One of the most favored storage subsystems to achieve fault tolerance and enhance data availability may be redundant arrays of independent disks or redundant arrays of inexpensive disks (RAID) systems which are typically server-attached, networked, and equipped with internet storage application. Fault tolerance in a storage subsystem is generally achieved either by disk mirroring or by parity encoding, which various levels of RAID may provide.
RAID has been developed to combine multiple inexpensive disks drives into an array of disk drives to obtain performance, capacity and reliability that may exceed that of a single large disk drive. In RAID, the array of drives appears to the host computer as a single logical hard drive. By utilizing a striping technique, RAID provides adjustable partitions within one sector on a storage space of hard disks. The stripes of all the disks are interleaved and addressed in order. This allows overlapped disk Input/Output across drives.
There are several levels of RAID plus a nonredundant array (RAID Level 0). In RAID level 1, one or more duplicate copies of each user data unit are stored on separate disks (data mirroring). Other RAID levels (such as RAID levels 3, 4, and 5) store parity information but not redundant data (but parity information can be used to reconstruct data). Therefore, a small portion (as large as 25%, but often much smaller) of the array's physical storage is used to store an error correcting code (parity information) computed over the file system's data.
A RAID Level 5 breaks the data into blocks and stripes the data across disk drives. RAID level 5 also rotates the disks where the data and parity blocks are stored, i.e., all disks will have some parity blocks stored on them. All data and parity blocks are stored on different disks (striped). Generally, a failure of any one disk drive results in the loss of only one data block or the parity block. The array can then mathematically recreate the lost block using parity information. In RAID level 5, all read and write operations can be overlapped so it is best for multi-user systems in which performance is not critical or which do few write operations. A RAID level 6 takes this one step further and calculates two error correcting codes (parity information) using different mathematical formulas (dual parity system). This allows the array to have two failed disk drives and still be able to recreate all data.
An example of a typical RAID implementation may be a RAID level 5 controller (having a firmware implementing RAID level 5), based on a “descriptor” mechanism that allows the RAID controller to specify blocks of buffers in memory that are to be XOR'ed together to produce parity data. When the storage subsystem implements dual parity system (RAID level 6), the storage subsystem may have two failed disk drives and still be able to recreate all data and offers high fault tolerance. However, each data block within a stripe must participate in two independent error correcting code computations. Thus, each source data must be read twice. This is a significant drawback of RAID level 6. It may require approximately twice the memory bandwidth of the RAID level 5. Especially in application environments that demand very high bandwidth, the memory throughput of the RAID controller may be a critical factor of the storage subsystem's performance while the storage subsystem recovers from the failure of a disk drive. Consequently, the additional burden associated with reading each block of source data twice from memory may be a substantial detrimental effect on the overall system throughput when writing data in a RAID level 6 storage subsystems.
Therefore, it would be desirable to provide an effective data storage subsystem that offers high fault tolerance, with optimal memory bandwidth usage and reduced bottlenecks.