A Redundant Array of Independent Disks (RAID) is a set of disk drives which can regenerate user data when a drive fails by using redundant data stored on the drives. There are five levels of RAID commonly recognized as described by Patterson, D., Gibson, G. and Katz, R. H., Reliable Arrays of Inexpensive Disks (RAID), June 1988, ACM SIGMOD Conference 1988, pp. 109-116. The RAID Level 5 disk array uses a parity technique to achieve high reliability and availability. A parity block protects the data blocks within its parity group. The parity block is the result of exclusive OR (XOR) operations of the data blocks in its parity group. Each block in a parity group is stored on a different disk drive of the array. In RAID 5, the parity blocks are stored on all the disks (with data blocks from other parity groups).
A RAID 5 disk array is robust against single disk crashes. If a disk fails, data on the disk can be recreated by reading data from the remaining disks in the array and performing the appropriate exclusive OR operations.
Whenever a request is made to update a data block, the corresponding parity block must also be updated to maintain consistency. Since the parity must be altered each time the data is modified, RAIDs require four disk accesses to update a data block: (1) Read the old data; (2) Read the old parity; (3) Write the new data; and (4) Write the new parity. The need for four disk accesses per update is often referred to as the RAID-5 update penalty. Following the required four disk accesses, the completion of the update is presented to the host system.
RAID is typically implemented in disk controllers having specialized hardware. XOR hardware performs the XOR operation to compute parity. Non-Volatile RAM (NVRAM) also referred to as a cache improves RAID performance and reliability. These RAID implementations are referred to as hardware RAIDs. Some low cost hardware RAIDs do not have an NVRAM or have a small NVRAM. A software RAID is implemented purely in software running on a host computer. Software RAIDs do not have access to special hardware, so they often need to use specialized algorithms. In particular, software RAIDs do not have access to NVRAM often used by hardware RAIDs to mark inconsistent parity groups and recover from power failures.
More sophisticated hardware RAIDs use NVRAM to improve write performance by implementing write caching (maintaining the write in the cache for easier access by the system) and fast write (considering a write operation to be complete when it is written in the NVRAM). Other hardware RAIDs use NVRAM solely for the purpose of marking inconsistent parity groups (parity groups where the new data has been written but the new parity has not yet been written) and recovering from power failures in the middle of update requests.
An example of a software RAID is the Paragon system from Chantal/BusLogic Corporation or the Corel RAID system from the Corel corporation. Both of these systems are for the Novell Netware servers.
Current software implementations of RAID 5 require a complete scan of all disk blocks following a power failure or a system crash to find and fix inconsistent parity groups. Long recovery times are unacceptable for most practical implementations.
A disk failure during recovery can cause data loss. The data on the broken disk would normally be reconstructed using the data from the other disks. However, if the parity group was inconsistent the data can not be accurately reconstructed. A related problem with having to scan all parity groups during recovery is that if one of the data blocks in a parity group cannot be read (uncorrectable ECC error on the disk block, for example), there is a data loss situation, since the parity group may be consistent. The more the parity groups that have to be scanned, the more likely a data loss situation will occur. Another secondary problem is that parity groups are locked for too long of a time, since the data and parity are written sequentially and the lock is held until both are written to disk.
In Chen, P. M. et. al., RAID: High-Performance, Reliable Secondary Storage, ACM Computing Surveys, June 1994, vol 26 (2); pp 145-186, a system is proposed where every time a write is made to a parity group, an indicator is written to the disk that the parity group has been modified. Such a write requires six disk accesses: (1) Write indicator that the parity group is modified; (2) Read the old data; (3) Read the old parity; (4) Write the new data; (5) Write the new parity; and (6) Write indicator that parity group is not modified. Chen proposes keeping a fixed-size list of parity sectors that might be inconsistent. This list is maintained on disk and in memory. Chen reduces the number of disk I/Os needed to maintain this list by using a group commit mechanism. This improves throughput at the expense of increased response time.
In general, previous software RAID proposals have not included discussion of concurrency and locking issues related to RAIDs. To the extent such discussion has existed, the assumption has been that locking is used to prevent more than one update concurrently executing against a parity group. It is also desirable to optimize concurrent processing of multiple updates against a parity group.