The redundant array of independent disks (RAID) configuration is designed to combine multiple inexpensive disk drives into an array to obtain performance, capacity, and reliability that exceeds that of a single large drive. The array of drives can be made to appear to the host computer as a single logical drive.
There are five types of array architectures, i.e., RAID 1 through RAID 5, each providing disk fault tolerance with different compromises in features and performance. In addition to these five redundant array architectures, it has become popular to refer to a non-redundant array of disk drives as a RAID 0 array.
RAIDs 2–5 employ a technique known as striping that writes a block of data across several hard disk drives. This is a method of combining multiple drives into one logical storage unit. Striping partitions the storage space of each drive into stripes, which can be as small as one sector (typically 512 bytes) or as large as several megabytes. These stripes are then interleaved in a rotating sequence, so that the combined space is composed alternately of stripes from each drive. The specific type of operating environment determines whether large or small stripes are used.
Of the original five RAID types, RAID 5 has become the most popular with networked storage system integrators. It provides an excellent balance between cost and performance while providing redundant data storage. Under RAID 5, parity information is distributed across all the drives. Unlike other striped RAID architectures, RAID 5 has no dedicated parity drive; therefore, all drives contain data, and read operations can be overlapped on every drive in the array. Write operations typically access one data drive and one parity drive. However, because different records store their parity on different drives, write operations can usually be overlapped. The following is a simplified example of how RAID 5 calculates parity and restores data from a failed drive.
Data reconstruction is accomplished by a RAID controller, in conjunction with array management software that examines the sum of each bit position across a slice of all the functional drives in the RAID 5 to assign an even or odd number to the missing data. The missing bit is the exclusive OR (XOR) of the other data bits in the slice including parity. This process is repeated, slice by slice, until the data is rebuilt. If a hard disk drive fails and the host calls for information on that disk, the data is built dynamically from the remaining hard disk drives and placed into memory until a replacement drive is obtained. In this manner, data loss is prevented. Consistent parity is defined as the parity as recorded on the media, and is the XOR of all the data bits as recorded on the media. If the data from one of the members becomes unavailable, that data may be reconstructed if the parity is consistent.
However, if a system fails or if power is lost with multiple writes outstanding to RAID 5 hard disk drives before parity is calculated and recorded, a write hole may occur. A write hole is a state in which parity is no longer consistent and cannot be used to reconstruct the data that was in process of being stored to disk when the failure occurred. One or several writes may have been completed before the failure; however, unless all writes were completed, the parity is inconsistent. Parity is only valid when all of the data is present for its calculation. The additional loss of a drive upon system restoration compounds the problem further by creating a situation in which the data contained on the failed drive is no longer reconstructable due to inconsistent parity. In this case, both the most recent write data and the data stored on the failed device are lost.
An example of a RAID 5 write hole protection scheme is identified in U.S. Pat. No. 5,744,643, entitled, “Enhanced RAID Write Hole Protection and Recovery”. The '643 patent describes a method and apparatus for reconstructing data in a computer system employing a modified RAID 5 data protection scheme. The computer system includes a write back cache composed of non-volatile memory for storing writes outstanding to a device and its associated data read and for storing metadata information in the non-volatile memory. The metadata includes a first field containing the logical block number or address (LBN or LBA) of the data, a second field containing the device ID, and a third field containing the block status. From the metadata information, it is determined where the data was intended to be written when the crash occurred. An examination is made to determine whether parity is consistent across the slice; if it is not, the data in the non-volatile write back cache is used to reconstruct the write that was being performed when the crash occurred to ensure consistent parity, so that only those blocks affected by the crash have to be reconstructed.
Because there are many RAID controllers available on the market, there are equally many RAID 5 write hole protection methods available. A RAID controller (or a storage controller) that includes a transaction processor may be used in conjunction with an alternative method for RAID 5 write hole protection. One transaction processor used in networked storage controllers is described in U.S. patent application Ser. No. 10/429,048, entitled “Scalable Transaction Processing Pipeline” and is hereby included by reference. The '048 application describes a parallel processing system that employs data structures and specific hardware to process networked storage commands and effectively manage host access to the storage drives.