The present invention relates to arrayed mass storage devices for computers. More particularly, the resent invention relates to mass storage device arrays in which redundant information is stored on one or more of the devices in the array to permit reconstruction of data stored on another device in the event that other device fails.
The performance of a mass data storage system for a computer or computer network can be, and often is, characterized in several ways. The relative importance of these characterizations typically depends on the particular use to which the storage system is put. One common measure of performance is data availability or fault tolerance. By this measure, the performance of a mass storage system is rated according to its ability to maintain the integrity of, and to provide access to, stored data despite a component failure in the system. Fault tolerance is especially important in applications requiring continuously on-line mass storage. Another common measure of performance is bandwidth--i.e. the rate at which data can be transferred to or from a mass storage file. High bandwidth is especially advantageous in applications involving large data files, such as numerical analysis and image processing. Yet another common measure of performance is transaction rate or request rate. This is a measure of the rate at which the system handles a plurality of successive or simultaneously pending data access requests, and is of particular interest in applications requiring on-line transaction processing, such as an airline reservation system.
Magnetic disk, tape and optical drives are the most widely used media for mass data storage. Historically, as computer processors have become more powerful, there has followed a demand for storage systems with greater mass data storage capacity, to which manufacturers of mass storage systems have responded primarily by making larger capacity (higher storage density) drives. Increasing capacity in this manner, however, does not necessarily increase the performance of the storage system. For example, a failure of the drive can make a larger amount of data inaccessible in a single event. Also, the bandwidth of the drive, a typical bottleneck in large database applications, may still be a problem (even though increased bit density along a track, as well as fixed-head disks with multiple read/write heads, or tracks-in-parallel moving head disks, may be used to reduce transfer time). Further, given a limited number of independent read/write actuators, an increase in disk capacity decreases the density of such independent actuators per unit of stored data. As a result, the increase in capacity may reduce the transaction rate of the drive.
As an alternative to a mass storage system based on a single large disk drive, systems based on an array of smaller disk drives recently have been developed. The array-type design offers potential benefits of high bandwidth, high transaction rate and high fault tolerance. For example, a high bandwidth can be achieved by storing data in stripes across a set of multiple disks and accessing the disks in parallel as though the set of disks were a single logical unit (referred to herein as parallel mode processing).
A high transaction rate can be achieved, especially in applications where data accesses are typically small, by arranging data on the disks of the array such that less than all disks must be accessed to handle a single request. This arrangement is referred to herein as transaction mode processing. Where separate transaction mode requests do not compete for access to the same drive, they can be handled simultaneously, thus allowing a higher transaction rate.
Fault tolerance can be provided by duplicating stored data on a second set of disks; this technique, however, commonly known as mirroring the data, is expensive because it requires full redundancy. A more cost effective approach to providing reliability is to encode the redundant information (also called redundancy information herein) using an error detecting and correcting code such as a Reed-Solomon code, thereby reducing the amount of redundancy information that must be stored. This approach generally involves dividing data to be stored into data words each comprising a plurality of blocks of common size (e.g. four or eight bits). The data blocks are used as coefficients in one or more equations established by the particular error detecting and correcting code being implemented to transform each data word into one or more redundancy terms. The redundancy term (or terms) and the data blocks from which it is derived form a code word which is stored in the array such that each data block and each redundancy term of the code word is stored on a different disk. If a disk fails, each data block or redundant term stored on that disk is regenerated by retrieving the other data blocks and redundancy term(s) of its code word from other disks and transforming them into the missing term using error location and correction equations in accordance with the particular code employed.
Various implementations of fault tolerant arrays based on such encoding schemes have been suggested in the prior art. In one such implementation, the redundancy information for all data in the array is stored on a designated "check" disk. The performance of this implementation is limited because of access contention for the check disk during write operations. More particularly, the redundancy terms stored on the check disk must be updated any time any data in the array is changed. This means that for each write operation to any disk in the array, the check disk becomes busy with a "read-modify-write" operation: the redundancy terms corresponding to the old data are first read from the check disk into a buffer, modified based on the new data, and then written back to the check disk. The read-modify-write performed on the check disk is time consuming. It causes write operations to interfere with each other, even for small data accesses, and thus prevents simultaneous write operations in transaction processing applications.
To reduce the contention problem on write operations, redundancy terms can be distributed across all disks of the array, such that it may be possible in certain circumstances to perform two or more write operations simultaneously. However, contentions for access to redundancy information continue to limit the write-throughput of the system. This is especially disadvantageous in arrays in which multiple redundancy terms are generated for each code word, since each write operation then requires read-modify-write operations on at least three disks (one for the data and at least two for the redundancy terms). This is a costly overhead, particularly in an array with relatively few drives, as it reduces the probability that two or more write operations can be performed simultaneously.
It would be advantageous to be able to eliminate the requirement that a read-modify-write disk access operation be performed on redundancy information each time a transaction mode processing write operation is performed on data in the array. This would allow greater write-throughput by reducing redundancy information contentions, and also would be especially beneficial in applications in which numerous accesses to a localized area of the array (i.e. a "hot-spot"), must be processed at a high rate. An example of such an application is an airline reservation system, in which a single data file storing the reservation information for a given airline flight may become extremely active shortly before the flight. This burst of activity can generate a backlog of write requests to that one file, each of which, in a conventional redundant array, would require a read-modify-write disk access operation on the redundancy information for the file.
Applicants believe that the above advantage can be achieved by caching redundancy information. It is known in the computer arts that data can be cached to increase average memory access times. For example, caching of instructions is used in conventional microprocessor architectures to increase the speed of the microprocessor. For this purpose, a high-speed volatile memory such as a solid-state memory is provided. It is also known that caching may improve data access speed in disk drives. Disk cache memory capable of holding a track of data has been incorporated into disk drive units to eliminate seek and rotation delays for successive accesses to data on a single track. These and other cache memory techniques are well-known, and have been implemented in various data processing system architectures to provide an economical performance boost. However, applicants believe that known prior art applications of cache memory do not provide for the caching of redundancy information in a redundant array, and otherwise do not adequately address the performance limitation on transaction mode processing imposed by conflicting demands for access to redundancy information stored on mass storage devices in a redundant array.