1. Field of the Invention
The present invention relates to disk array storage subsystems for computer systems and, more particularly, to a method for extending the effective size of a controller cache memory to thereby improve write performance of the subsystem.
2. Background of the Invention
The most cost-effective storage technologies, magnetic and optical disks, have electromechanical components which make them slower and more prone to failure than electronic devices. The mechanical process of seeking and disk rotation slows input/output of data from the disk to a host computer. Redundant arrays of independent (or inexpensive) disks (commonly known by the acronym RAID) speeds up access to data on disks by providing parallel or independent access to disks and improves reliability by generating redundancy data such as mirroring data or providing redundancy information (typically parity) on separate disks from which failed data can be reconstructed. These alternatives were set forth in the Berkeley Papers; one of which is A Case for Redundant Arrays of Inexpensive Disks (RAID) by Patterson et al., University of California Report NO. UCB/CSD 87/391, December 1987, incorporated herein by reference. The paper describes five RAID "levels", or arrangements of data on arrays of disks, which have become industry standards.
Parity RAID refers to those levels (RAID 3, 4, 5) in which parity data is used to protect user data. RAID level 5 has emerged as a popular choice because it allows for independent access to data which is striped in blocks across a disk array. Parity is also distributed across the array. Parity is the exclusive OR (XOR) of data in separate disks. Parity data can be used to reconstruct data if any single disk fails. There is, however, an asymmetry in input/output performance in RAID 5. The read operation is faster than the write operation because when an application writes data, the parity blocks that protect data at addresses affected by the write must be updated.
FIG. 1 shows a typical write operation 100 for RAID level 5. Write request data 101 is to be written to disk 102 and its redundancy data (e.g. parity) to disk 106, disk 104 is not involved. The initial contents of disk 102, the target block, must be first read from disk 102. This is shown as arrow 110. At the same time parity for the target block is read from disk 106, shown as arrow 112. The target block's contribution to parity is computed by taking the exclusive OR (represented by a circled plus sign) of the two via sum 114. New parity is computed as the exclusive OR of this intermediate result 115 and write request data 101 to be written via sum 116. The updated parity is then sent to disk 106 shown as arrow 118. The new write request data is then written to disk 102 shown as arrow 120. This read-modify-write sequence results in elapsed time and requires I/O resources greater than would be the case if the same request were made to an individual disk. This increased resource consumption and longer elapsed time are collectively known as the parity RAID write penalty.
Write-back caching techniques are known to reduce the effects of write penalty. A solid state cache memory on the RAID controller is effectively inserted into the I/O path. Data in the cache memory is returned from the cache memory in response to host generated read requests. Therefore, when applications request cached data it can be delivered immediately without delay caused by disk seeking and rotation. A write-back cache is also used to hold data supplied by application write requests for posting (flushing) to the disk array at a later time. A write-back cache enabled RAID controller receives application write request data, saves the supplied data in the cache, and then signals that the request is complete. This allows applications to continue executing without waiting for the written data to be posted to the relatively slow disk device(s). The I/O system actually writes the data at some later time, ideally when the disk(s) would otherwise be idle.
A particular advantage of write-back cache is that it alleviates some of the conflicts between tuning an array for I/O loads having high I/O request throughput requirements (a large number of I/O requests having small data portions associated with each) versus high data throughput requirements (a small number of I/O requests each having a large volume of data associated therewith). It is advantageous to accumulate small data I/O requests in cache memory so that the aggregation of their data may be written to the disk array as a single "stripe write" (an operation involving all disks of the array in parallel operation). Striping large data I/O requests across many disks allows parallel access allows quick reading or writing of large amounts of data. If a write-back-cache is present, then writes can be deferred, thus minimizing the write Penalty. With a large write-back cache, data can accumulate in cache and be consolidated into a single write so that only one read-modify-write sequence and disk seek and rotation operation needs to be done for multiple consolidated writes.
If a write-back cache is barraged with a steady stream of write requests for a long enough time to fill cache, then the system will revert to operation which is identical to that without cache. This is because cache would constantly have to flush data to disk in order to make space for newly arriving data. The system is then bounded by the disk's speed of absorbing data, just as when no cache is present. This is known as saturation.
Prior cache designs are typically tuned to a fixed size and architecture best suited to a particular application and performance goal. A large cache may be wasteful where a small number of large write requests are common while a smaller cache may be inadequate to reduce the write penalty where a large number of small write requests are common.
A need therefore exists to improve write operations in disk array storage devices. This need is particularly acute in applications having both high throughput and data intensive requests. Such an improved device would have a wider range of usefulness. However any cache architecture solution to this problem must assure data reliability and integrity which forms the core of all RAID subsystems.