A typical data processing system generally includes one or more data storage disk devices which are connected to one or more Central Processing Units (CPUs) either directly or through a control unit and a channel. Various types of magnetic or optical data storage disk devices are currently used for this purpose in computer systems.
In recent years, there has been a growth in interest in disk arrays. Disk arrays consist of a number of disk devices connected to a host computer system via one or more controller elements which control the transfer of data between the host and disk devices. A disk array is designed to provide high capacity data storage, reliability and high data transfer rates to and from the host computer system.
One penalty of employing a disk array is the potential problem of reduced reliability. The reliability of a disk array declines as the number of devices increases, since any single device failure potentially results in a complete array failure.
A number of different disk array architectures have been proposed. A paper entitled `A Case for Redundant Arrays of Inexpensive disks (RAID)` (ACM SIGMOD conference proceedings, Chicago, Ill., Jun. 1-3, 1988, pp. 109-116) details five levels of array (RAIDS 1 to 5) which provide different levels of redundancy, space efficiency and workload dependency trade-offs. Each of the RAID levels permits users to increase their data storage capacity by linking together a number of inexpensive disk drives. Further details of the RAID Configurations may be found in the above referenced conference proceedings. RAID 5 is described in U.S. Pat. No. 4,761,785.
To avoid unacceptable degradation in system reliability, a method of enabling data recovery in the event of a (single) device failure occurring is provide by introducing `redundancy` into the array, by either storing two copies of the data on two drives (RAID 1) or by splitting the original data into a number of subsections and striping the data across two or more drives of the array. The parity data for the striped data is calculated and stored in the array. In the event that one of the data holding drives fails, it is possible using the parity data and the remaining data of the stripe to reconstruct the data on the failed drive. The parity or checksum can be stored either on a device separate to the associated data devices (e.g. in the RAID 4 configuration) or distributed over all the available disk drives (RAID 5).
The provision of parity/checksum in a RAID system confronts the system architect with many problems. These include error-recovery, data-loss protection, system performance, and implementation complexities.
During normal Read operations, there is no performance impact encountered by supporting parity generation. However, during Write operations, the generation of parity becomes a concern. This is due to the fact that any alteration to a data area requires an associated update of the parity data relevant for that data area. The new parity written to the parity sector can be computed using the following formula: EQU newparity=(olddata .XOR. newdata) .XOR. oldparity.
Most currently available disk drives only provide a destructive Write operation. Therefore, the result of an update of a sector is always independent of the previous contents of the sector. Moreover, it is not generally possible to read the contents of a sector while a write operation is performed against that sector. In consequence, a straightforward implementation of a write operation in RAID is performed using a read-modify-write sequence comprising two read and two write operations: the old parity block and old data block must be read and XOR'd, and the resulting sum must then be XOR'd with the new data to provide the new parity. Both the new data and the new parity blocks must then be rewritten to the disk drives.
While the two read operations may be done in parallel, as can the two write operations, every write operation can occur only one revolution after the corresponding read operation has been completed. Therefore, modification of a block of data in a RAID system still takes much longer than the same operation on a conventional disk which does not require the preliminary read operation, and thus does not have to wait for the disk to rotate back to the previous position in order to perform the write operation. The rotational latency can amount to a substantial proportion of the time required for a typical data modification operation.
Various methods have been proposed to alleviate this difficulty, including the use of caches, see for example EP-A-493984, and the writing of an entire recovery group, constituting all data sectors plus the corresponding parity sector. This latter solution has inefficiencies since either one is holding multiple arms for serving a single small request or one is batching independent requests. Furthermore, complicated space management techniques, such as the log-structure file system described in the paper Rosenblum et al `The Design and Implementation of a Log Structured File System` ACM Transactions on Computer Systems, Vol 10, No 1, February 1992, can result in additional overhead and performance uncertainty.