1. Field of the Invention
The invention relates generally to control methods operable within a disk array subsystem (RAID) and in particular to methods operable within a disk array subsystem to simplify host computer RAID management and control software integration.
2. Background of the Invention
Modern mass storage subsystems continue to provide increasing storage capacities to fulfill user demands from host computer system applications. Due to this critical reliance on large capacity mass storage, demands for enhanced reliability are also high. Various storage device configurations and geometries are commonly applied to meet the demands for higher storage capacity while maintaining or enhancing reliability of the mass storage subsystems.
A popular solution to these mass storage demands for increased capacity and reliability is the use of multiple smaller storage modules configured in geometries that permit redundancy of stored data to assure data integrity in case of various failures. In many such redundant subsystems, recovery from many common failures is automated within the storage subsystem itself due to the use of data redundancy, error codes, and so-called "hot spares" (extra storage modules which may be activated to replace a failed, previously active storage module). These subsystems are typically referred to as redundant arrays of inexpensive (or independent) disks (or more commonly by the acronym RAID). The 1987 publication by David A. Patterson, et al., from University of California at Berkeley entitled A Case for Redundant Arrays of Inexpensive Disks (RAID), reviews the fundamental concepts of RAID technology.
There are five "levels" of standard geometries defined in the Patterson publication. The simplest array, a RAID level 1 system, comprises one or more disks for storing data and an equal number of additional "mirror" disks for storing copies of the information written to the data disks. The remaining RAID levels, identified as RAID level 2,3,4 and 5 systems, segment the data into portions for storage across several data disks. One of more additional disks are utilized to store error check or parity information. A single unit of storage is spread across the several disk drives and is commonly referred to as a "stripe." The stripe consists of the related data written in each of the disk drive containing data plus the parity (error recovery) information written to the parity disk drive.
RAID storage subsystems typically utilize a control module that shields the user or host system from the details of managing the redundant array. The controller makes the subsystem appear to the host computer as a single, highly reliable, high capacity disk drive. In fact, the RAID controller may distribute the host computer system supplied data across a plurality of the small independent drives with redundancy and error checking information so as to improve subsystem reliability. Frequently RAID subsystems provide large cache memory structures to further improve the performance of the RAID subsystem. The cache memory is associated with the control module such that the storage blocks on the disk array are mapped to blocks in the cache. This mapping is also transparent to the host system. The host system simply requests blocks of data to be read or written and the RAID controller manipulates the disk array and cache memory as required.
In RAID level 5 subsystems (as well as other RAID levels) there is a penalty in performance paid when less than an entire stripe is written to the storage array. If a portion of a stripe is written to the RAID subsystem, portions of the same stripe may need to be read so that a new parity block may be computed and re-written to the parity disk of the array. In particular, the old data stored in the portion of the stripe which is to be overwritten as well as the old parity block associated therewith needs to be read from the storage subsystem so that the new parity block values may be determined therefrom. This process is often referred to as a read-modify-write cycle due to the need to read old data from the stripe, modify the intended data blocks and associated parity data, and write the new data blocks and new parity block back to the storage array. This performance penalty is avoided if the entire stripe is written. When an entire stripe is written (often referred to as a stripe write or full stripe write), the old data and old parity stored in the stripe to be overwritten are ignored. The new stripe data is written and a new parity block determined therefrom is written without need to reference the old data or old parity. A stripe write therefore avoids the performance penalty of read-modify-write cycles.
A significant class of RAID applications may be designated as high bandwidth applications. Video data capture is exemplary of such high bandwidth RAID storage applications. In video data capture applications, each video image (frame) comprises a significant volume of data. In addition, sequences of such video frames may be captured in rapid succession to simulate real-time video in the playback of the captured video frames. The captured video frames are stored in a RAID storage subsystem for later retrieval and replay.
In such high bandwidth RAID applications, data is typically read and written in very large blocks as compared to the typical unit of storage in the RAID subsystem. For example, a single high resolution (640.times.480 pixels), mono-color, video frame (two video scan fields), comprises over 300,000 bytes of uncompressed storage. For real-time video, a capture rate of 30 frames per second is desirable. A typical full color video capture stream therefore consists of sequential 1.3941 megabyte I/O write requests directed to the RAID storage subsystem.
When data written to a RAID storage subsystem is aligned with stripe boundaries of the RAID subsystem, higher performance may be maintained in the subsystem by using stripe write operations. However, maintaining such high performance is problematic when the data to be written to the storage subsystem is not aligned to stripe boundaries in the RAID subsystem. This typical request size does not readily align to typical stripe boundaries in a RAID storage subsystem. Typical stripe sizes in RAID subsystems are 32, 64, 128, or 256 kilobytes. In such cases, read-modify-write cycles are used by present RAID subsystems thereby reducing the performance of the RAID subsystem below the level required for sustaining high bandwidth applications such as real time video capture.
One solution to this problem as known in the art is to provide large buffer memories so that the non-aligned data may be cached until additional data is received from an attached computer. The cached data may then be written along with the additional received data in an efficient stripe write operation. A problem with this simplistic solution is that potentially large buffer memories may be required to store the non-aligned data until an additional I/O write request for an adjacent buffer is received. The buffer memory required to store two or more such large I/O write requests can be prohibitively costly in a RAID storage subsystem.
It is evident from the above discussion that an improved method and apparatus is required for sustaining RAID performance for high bandwidth storage applications having large I/O write request sizes.