The extensive data storage needs of modem computer systems require large capacity mass data storage devices. A common storage device is the rotating magnetic hard disk drive, a complex piece of machinery containing many parts which are susceptible to failure. A typical computer system will contain several such units. The failure of a single storage unit can be a very disruptive event for the system. Many systems are unable to operate until the defective unit is repaired or replaced, and the lost data restored.
As computer systems have become larger, faster, and more reliable, there has been a corresponding increase in need for storage capacity, speed and reliability of the storage devices. Simply adding storage units to increase storage capacity causes a corresponding increase in the probability that any one unit will fail. On the other hand, increasing the size of existing units, absent any other improvements, tends to reduce speed and does nothing to improve reliability.
Recently there has been considerable interest in arrays of direct access storage devices, configured to provide some level of data redundancy. Such arrays are commonly known as "RAIDs" (Redundant Array of Independent (or Inexpensive) Disks). Various types of RAIDs providing different forms of redundancy are described in a paper entitled "A Case for Redundant Arrays of Inexpensive Disks (RAID)", by Patterson, Gibson and Katz, presented at the ACM SIGMOD Conference, June, 1988. Patterson, et al., classify five types of RAIDs designated levels 1 through 5. The Patterson nomenclature has become standard in the industry.
The original theory of RAIDs was that a number of relatively inexpensive, small disk drives can provide the capacity of a single larger, expensive drive. The smaller drives will also be faster because they will all be reading or writing ("accessing") data at the same time. Finally, because the small drives cost so little, it is possible to include extra (redundant) disk drives which, in combination with certain storage management techniques, permit the system to recover the data stored on one of the small drives should it fail. Thus, RAIDs permit increased capacity, performance, and reliability. As RAIDs have gained acceptance in the marketplace, their use has not been limited to small capacity drives, and RAID systems are now configured with drives of all capacities. Furthermore, RAID systems may be constructed with other types of storage devices, such as optical disk drives, magnetic tape drives, floppy disk drives, etc. As used herein, the term "RAID" or "RAID system" should not be taken to be limited to any particular type of storage device.
Using the Patterson nomenclature, a RAID level 1 is a mirrored configuration. In accordance with a RAID-1 memory structure, each unit of data is stored on two independent storage devices within the array. Therefore, if one device fails, data can be recovered from the second device. FIG. 1 is an illustration of data stored in accordance with a RAID-1 memory structure. As shown in FIG. 1, the information (A, B, C, D) that is stored in Device 1 is "mirrored" in Device 2 (A', B', C', D'). Likewise, the information (E, F, G, H) that is stored in Device 3 is mirrored (E', F', G', H') in Device 4, and the information (I, J, K, L) that is stored in Device 5 is mirrored in Device 6 (I', J', K', L').
According to Patterson's nomenclature, RAID levels 3 and higher (RAID-3, RAID-4, RAID-5) employ parity records for data redundancy. Parity records are formed from the Exclusive-OR of all data records stored at a particular location on different storage units in the array. In other words, in an array of N storage units, each bit in a block of data at a particular location on a storage unit is Exclusive-ORed with every other bit at that location in a group of (N-1) storage units to produce a block of parity bits; the parity block is then stored at the same location on the remaining (Nth) storage unit. If any storage unit in the array fails, the data contained at any location on the failing unit can be regenerated by taking the Exclusive-OR of the data blocks at the same location on the remaining devices and their corresponding parity block.
In a RAID-3, all the read/write actuators on the different disk drives act in unison to access data on the same location of each drive. RAID-4 and RAID-5 are further characterized by independently operating read/write actuators in the disk drive units. In other words, each read/write head of a disk drive unit is free to access data anywhere on the disk, without regard to where other units in the array are accessing data.
In accordance with RAID-4, information is stored in "blocks", each block being stored in a different storage device. For the purpose of this document, the term "block" refers to a coherent unit of data comprising one or more sectors of data, the data being independently accessible from a single storage device. The information comprising one such RAID-4 memory structure is referred to as a "stripe". Each stripe includes a portion of the information stored in several of the devices 202a-202f of the array. For purposes of this document, the term "stripe" refers to a set of blocks, each block preferably being stored at an address that is related to the address of each other block within each other storage device, but which may alternatively comprise a set of blocks, each (or some) of which are stored at addresses which are unrelated to the addresses at which other blocks are stored within other devices. FIG. 2 illustrates four stripes of data 204a, 204b, 204c, 204d stored in accordance with RAID-4. As shown in FIG. 2, one stripe 204a includes a Data block 201 stored in Device 1, a Data block 203 stored in Device 2, a Data block 205 stored in Device 3, a Data block 207 stored in Device 4, and a Data block 209 stored in Device 5. Furthermore, one of the devices 202f is dedicated to storing "parity" information in Parity block 211. Each of the other devices store user information. If any one of the storage devices should fail, the information that was stored in the failed device can be recovered by logically exclusive ORing the remaining information of the stripe which is stored in each of the other devices.
One of the problems encountered with parity protected disk arrays having independent read/writes (i.e., RAID-4 or RAID-5) is the overhead associated with updating the parity block whenever a data block is written. For example, when information within the block 201 of Device 1 is to be changed (i.e., "new data" is written to update "old data"), the old data, the old parity information from Parity block 211, and the new data are typically XORed to produce the updated parity information. The new data and new parity are written to their respective blocks. These operations can be performed in various sequential orders, but they typically require that two blocks (both the old data and old parity) be read and two blocks (the new data and new parity) be written to every time data is updated. While the use of caches and other techniques may reduce this problem, there is a tendency for the parity disk in a RAID-4 system to become overutilized.
U.S. Pat. No. 4,761,785 to Clark et al., which is hereby incorporated by reference, describes a type of independent read/write array in which the parity blocks are distributed substantially equally among the disk storage units in the array. Patterson et al. have designated this type of array RAID-5. FIG. 3 illustrates a RAID-5 configuration. Distributing the parity blocks shares the burden of updating parity among the disks in the array on a more or less equal basis, thus avoiding potential performance bottlenecks that may arise when all parity records are maintained on a single dedicated disk drive unit, even though the same read-modify-write operation is normally used to write data in a RAID-5 system as well as in a RAID-4 system. RAID-5 is the most advanced level RAID described by Patterson.
Each of the different memory structure formats has its own characteristics, which may make it more appropriate for storage of certain data than any of the other formats. At the same time, there is no single format that is universally superior. For example, from a cost per byte of storage standpoint, RAID-1 is most expensive, parity protected RAID is less, and non-redundant storage is even less. Non-redundant storage and RAID-1 are generally faster than parity protected, due to the difficulty of updating data in a parity protected format. Both parity protected and mirrored formats are redundant, but mirrored is slightly more robust, in that even if two storage devices fail, all of the information stored on those two devices is recoverable, as long as the two devices that have failed do not have the same information stored therein. For example, referring to FIG. 1, if Device 1 and Device 3 both fail, the information that is stored in these two devices remains stored in Devices 2 and 4. Accordingly, the data within the failed storage devices can be recovered. However, in the case of a parity protected configuration, all of the data in the two devices that have failed will have been lost, since a parity protected configuration requires the information from N-1 devices to be available in order to reconstruct the information stored in one failed device, where N is the total number of devices in the RAID-4 stripe.
Because of the tradeoff that must be made between speed, robustness and required space when deciding whether to organize information in a particular configuration, it is advantageous to be able to allocate at least some of the array to be non-redundant, some to be mirrored, and some to be parity protected. Even more advantageous is the ability to dynamically allocate portions of an array to one format or another (i.e., transforming portions of the array from one format to another format and back again). Such conversions can be done by first copying all of the data from that portion of the array to be reformatted into a buffer (which may be either within or outside the array) and then reformatting that portion of the array and copying the data back in the new format. However, the process of copying the data to a buffer requires a considerable amount of overhead.