1. Field of the Invention
This invention relates generally to redundant storage in an array of secondary storage means and, in particular, to providing a flexible, orthogonal implementation of user system performance and data availability goals in the array of secondary storage means.
2. Prior Art
Redundant arrays of inexpensive disk drives (RAIDs) have evolved as an alternative scheme for implementing secondary storage to earlier systems employing a single, large disk drive. The main reason for this scheme is to match secondary storage access with ever increasing processing speed. Typically, the speed of data transfer to and from a single, large disk is much slower than the processing speed of the central processing unit (CPU). To increase system throughput, the RAID scheme of secondary storage allows for the concurrent access of data from multiple disk drives.
Typically, RAID architectures consist of one or more host interface controllers connected to several peripheral interface controllers via a high speed data bus. Each peripheral interface controller is, in turn, connected to several individual disk drives which provide the secondary storage for the connected hosts. Peripheral interface controllers can be connected to the disk drives via common communication interfaces, i.e. SCSI. Generally, the speed of the data bus is much greater than the speed of the interface between the disk drives and the peripheral interface controllers.
One way to increase concurrent access is to efficiently distribute the data for a given process across the array of disks drives. This distribution of data is called striping. The amount of striping in a given distribution scheme is called the stripe degree. The stripe degree is equal to the number of disk drives across which the data is distributed. To maximize concurrent access, however, the data should be striped to disk drives connected to different peripheral interface controllers. For example, sector 1 (or some other unit of data) of the host's data might be stored on some disk attached to peripheral interface controller 1, sector 2 on another disk attached to peripheral interface controller 2, etc.
Striping, together with data buffering, takes better advantage of the full data bus bandwidth. For example, suppose a host makes a read request for data resident on all the disk drives to which its data has been striped. These disk drives are then required to transmit their data, through their respective peripheral interface controllers, onto the data bus for subsequent transmission to the host interface controller.
If a disk is ready to transmit its data and the data bus is unavailable, each peripheral interface controller typically has a buffer into which the data can be temporarily stored. These buffers match the speed of transmission rates of the data bus. For example, if the data bus operates at 25 Megabytes per second and the SCSI lines operate at 5 Megabytes per second, then the data from the disk drives fill up the buffer at 5 Mbytes/sec and is transmitted from the buffer onto the data bus at 25 Mbytes/sec. In this example, the striping allows for several disk drives to buffer their data concurrently into their peripheral interface controllers, while the speed matching buffers allow for transmission onto the data bus at rates approaching that of the data bus. The bandwidth of the data bus is therefore better utilized.
With the advantage of increased throughput, there are, however, disadvantages associated with the striping scheme. For example, reliability is an intrinsic problem with striping. Generally, the mean time to fail for a single disk drive in an array of drives is much less than a single, large, expensive disk. Moreover, this disparity in mean time to fail increases as the number of disk drives in the array increase.
Mean time to fail for any given disk in the array, however, is not necessarily the most appropriate measure of reliability. Mean time to data loss in a system is generally a more appropriate measure. In a single disk drive scheme, the failure of a single disk drive equates to immediate data loss. Such data loss need not be irreparable. It is possible that a human operator may be able to replace the disk and recover the data from the failed disk. Such a possibility exists if, for example, the only damage to the disk occurred in the file allocation table. In such a case, it is more appropriate to classify the data as unavailable rather than lost. The length of time to remedy such unavailability is measured by the time it takes to physically replace the disk and recover the data.
Similarly, in the case of a RAID scheme, the failure of a single disk in the array does not necessarily equate to data loss. Moreover, by using a variety of redundancy methods, the failure of a single disk in an array does not have the same degree of data unavailability as that of a single disk scheme. Data from an arbitrary number (depending upon the coding scheme) of disks may be reconstructed dynamically by the system without the need to replace any disks. Thus, the time length of data unavailability can be made very small; while the mean time to data loss can be made very large.
These two concepts, system performance (as measured by the degree of striping and concurrency) and data availability (as measured by degree of redundancy), in the RAID context are orthogonal, i.e. independent of each other. Thus, increasing the level of redundancy and availability of data in the array does not generally translate into an increase in system throughput and vice versa.
However, even though performance and reliability are orthogonal, these concepts are muddled in current RAID systems. A user of RAID storage should be able to specify the level of system performance and data reliability desired, independently of the other. Additionally, a RAID system should be flexible enough to allow a user to select from a range of performance and reliability configurations, depending upon a number of considerations, such as the cost of storage for different configurations.
For example, one user may place a premium on data reliability and request a high mean time to data loss for the storage service. This might translate into a greater number of redundancy units per array group. Another user, however, may desire high system throughput which would require a high degree of data striping. This, in turn, might require a smaller number of redundancy units to increase the degree of data striping of the user's data. Lastly, some users might require both high performance and availability on a given data partition.