1. Field of the Invention
This invention relates generally to permanent storage systems for digital information, especially those of the disk type, and more particularly to disk array systems which create and store parity blocks in order to facilitate recovery from a disk failure.
2. Related Art
A Redundant Array of Inexpensive Disks (RAID) has been proposed as a low cost alternative to a Single Large Expensive Disk (SLED) for providing large storage of digital information with high throughput. The theory of RAID is to use relatively inexpensive disks, which may individually have a higher chance of failure than more expensive disks, and compensating for this higher failure rate by adding redundancy by creating and storing parity blocks to facilitate recovery from a disk failure.
FIG. 1A shows a disk array subsystem architecture on which a RAID organization can be implemented. A disk controller 30 connected to a host system 10, and having a cache 31 manages an array of inexpensive disks 40-43. In a RAID organization with a total of N+1 disks, one parity block is created for each N data blocks, and each of these N+1 blocks (N data blocks plus one parity block) is stored on a different disk. In one implementation, a parity block is computed from the N data blocks by computing a bitwise "Exclusive Or" (XOR) of the N data blocks. The parity block along with the N data blocks from which that parity block was computed are called a parity group. Any block in a parity group can be computed from the other blocks of that parity group.
In "A Case for Redundant Arrays of Inexpensive Disks (RAID)", Proc. of ACM SIGMOD International Conference on Management of Data, pp. 109-116, 1988, incorporated herein by reference, D. A. Patterson, G. Gibson and R. H. Katz describe five types of disk arrays classified as RAID levels 1 through 5. Of particular interest are disk arrays with an organization of RAID level 5, because the parity blocks in such a RAID type are distributed evenly across all disks, and therefore cause no bottleneck problems.
One shortcoming of the RAID environment is that a disk write operation is far more expensive than on a SLED, because a data write on RAID requires as many as four disk access operations as compared with two disk access operations on a SLED. Whenever the disk controller in a RAID organization receives a request to write a data block, it must not only update (i.e., read and write) the data block, but it also must update (i.e., read and write) the corresponding parity block to maintain consistency. For instance, if data block D1 in FIG. 2A is to be written, the new value of P0 is calculated as: EQU new P0=(old D1 XOR new D1 XOR old P0)
Therefore, the following four disk access operations are required: (1) read the old data block D1; (2) read the old parity block P0; (3) write the new data block D1; and (4) write the new parity block P0. The reads must be completed before the writes can be started.
In "Performance of Disk Arrays in Transaction Processing Environments", Proc. of International Conference on Distributed Computing Systems, pp. 302-309, 1992, J. Menon and D. Mattson teach that caching or buffering storage blocks at the disk controller can improve the performance of a RAID disk array subsystem. If there is a disk cache, the pre-reading from the disk array of a block to be replaced can be avoided if the block is in the cache. Furthermore, if the parity block for each parity group is also stored in the cache, then both reads from the disk array can be avoided if the parity block is in the cache.
Commonly assigned and co-pending U.S. patent application 07/017,920, filed Feb. 16, 1993 (IBM Docket Y0993-013), describes a system wherein parity blocks from high write activity parity groups are stored in a cache buffer in order to reduce the number of disk accesses during updating.
A drawback of prior schemes is that at any given time, various data blocks within a given parity group may not contain useful data. For example, some of the data blocks may not have been used, while others, although previously used, may have been freed by the operating system. Thus, while each parity block is assigned to a group of data blocks, each parity block bit depends on all corresponding data block bits, regardless of whether the data block is in use.