The present invention deals with storing data on magnetic discs. More particularly, the present invention deals with storing information on a disc array which includes a plurality of magnetic disc drives.
A typical magnetic disc drive includes one or more magnetic discs, a transducer supported by a hydrodynamic air bearing which flies above each magnetic disc, and a drive controller for controlling the disc drive based on commands received from a host system. The drive controller controls the disc drive to retrieve information from the magnetic discs and to store information on the magnetic discs.
Information is typically stored on the magnetic discs by providing a write signal to the transducer to encode flux reversals on the surface of the magnetic disc representing the data to be stored. In retrieving data from the magnetic disc, the drive controller controls the transducer to fly above the magnetic disc, sensing the flux reversals on the magnetic disc and generating a read signal based fin those flux reversals. The read signal is then decoded by the drive controller to recover the data represented by the flux reversals stored on the magnetic disc, and consequently represented in the read signal provided by the transducer.
The drive controllers for individual disc drives have typically included one of two different types. The first type is referred to as a device level controller (or a device level interface). The device level controller has a relatively small amount of intelligence. The information stored on the drive is simply read from the disc and converted to a usable form for a host system. An example of a device level controller is the ESDI controller.
A second type of controller is referred to as an intelligent interface or an intelligent controller. An intelligent interface includes a full controller which communicates with a host system. Such intelligent interfaces include the IDE and SCSI interfaces. For instance, the Elite SCSI interface manufactured by Seagate Technology Corp. presently provides intelligent performance equivalent to an original IBM PC/AT personal computer.
Since, the intelligent interfaces include full controllers, they not only arrange the data to be usable by the host system, but they are also suitable for implementing error correction code (ECC) information to detect and reconstruct certain errors which occur while reading data from the discs. The number of correctable errors is dependent on the particular ECC information being used.
In today's disc drive industry, storage of information utilizing disc drive arrays is gaining popularity. A disc drive array includes a plurality of disc drives which are coupled to an array controller. The array controller controls operation of each of the plurality of disc drives to store information. The array controller also typically controls the disc drive array so that, should one disc drive fail, the information stored on that disc drive can be recovered using information stored on the remaining disc drives in the disc array.
Because the information stored in a disc drive array is often much more valuable than the disc drives themselves, drive arrays are often referred to as Redundant Arrays of Inexpensive Discs (RAID). Several types of RAID systems or RAID levels are known.
First level RAID is characterized by providing mirrored discs. In first level RAID, all discs in the array are duplicated. Thus, should one disc or disc drive fail, the information is not lost since that exact information is mirrored on another disc drive. This is a very expensive option for implementing a disc drive array because of the duplicity of hardware.
Second level RAID includes a Hamming Code for error correction. In second level RAID, data is bit-interleaved across the discs of a group and check discs are added to detect and correct a single error. This has the disadvantage that if a read is directed to only a small amount of data, a full sector from each of the bit-interleaved discs in the group must still be read. Also, writing of a single unit still involves a read-modify-write cycle on all of the discs in the group.
Third level RAID is characterized by having a single check disc per group of discs. In third level RAID, the extra check discs used in second level RAID for storing error correction code information are eliminated. Rather, as the data is being stored to the disc array, (ECC) information is appended to the data. Also, a single disc or disc drive is used to store redundant data corresponding to the data stored in the array. When reading information from the array, the ECC information is used to determine whether an error has occurred, and which disc contains the error. Then, the information on the failed disc is reconstructed by calculating the parity of the remaining good discs and comparing bit-by-bit to the parity information that was calculated for the original full group of data and that was stored on the redundant or parity disc drive.
Fourth level RAID is characterized by being arranged so that it provides for independent reads and writes. In second and third level RAIDs, information stored in the array is spread across all of the discs in the group. Thus, any read or write operation to one disc in the group requires reading or writing all discs in the group. Fourth level RAID improves performance of small transfers by providing the ability to do more than one I/O operation per group of discs at any given time. Each data sector is no longer spread across several discs. Rather, each data sector stored in the array is kept as an individual unit on a single disc. The information stored in the array is interleaved among data discs on a sector level rather than at the bit level.
In fifth level RAID, both the data to be stored to the array, as well as the parity or redundant data, is spread over all discs in a group. Thus, there is no single check disc. While fourth level RAID allowed more than one read to be performed per group at any given time, it was still limited to one write per group since each write requires accessing the check disc. Fifth level RAID distributes the data and check information per sector across all the discs, including the check discs. Therefore, fifth level RAID can support multiple individual write operations per group. Since the check information for each sequential location is on a different disc in the group, the write operations can be performed in parallel since there is no need to sequentially access any one disc at a given time.
While the above discussion has provided an overview of some of the main differences between the different level RAID systems, a more detailed description of those differences along with illustrative examples is provided in the article entitled A CASE FOR REDUNDANT ARRAYS OF INEXPENSIVE DISCS (RAID), by Patterson, Gibson, and Katz, incorporated herein by reference.
Because of the characteristic differences between RAID 3 and RAID 4 and 5 type systems, the different systems are particularly well suited to different needs. The RAID 3 system has typically been suitable for, and demonstrated superior performance in, array systems which are required to exhibit a high data transfer rate. RAID 4 and 5 systems, on the other hand, typically demonstrate superior performance in disc arrays which are used in high aggregate input/output (I/O) applications. Such implementations are often found in business applications or with many UNIX users.
One problem with all array products, regardless of the RAID level, is that the array controller is called upon to perform many time consuming array support functions. For example, in a traditional write request wherein data is to be written to a target data sector, the typical RAID 5 controller performs the following steps:
1. Read the old data from the target sector;
2. Write the new data into the target sector;
3. Read parity data stored in a parity sector which corresponds to the target sector;
4. Exclusive OR the old data retrieved from the target sector with the new data written to the target sector;
5. Exclusive OR the result of step 4 above with the old parity data to provide new parity data; and
6. Write the new parity data from step 5 into the parity sector.
This sequence is typically referred to as a read-modify-write sequence. Since large system caches in today's UNIX systems result in most reads being serviced from main memory, write operations tend to be a very high percentage of the total I/O operations in a disc array system. This has been found to be a critical problem with a RAID 5 controller (e.g., when supporting UNIX, the RAID 5 controller is heavily burdened because it must perform four I/O operations for each write request).
Other array support functions traditionally performed by the array controller also provide significant overburdening of the array controller. Such functions include a reconstruction function which is implemented when, :for example, a single disc drive has failed. Upon detecting the failure, the array controller must retrieve data from all of the other disc drives, reconstruct the lost data and rebuild the entire failed disc drive one sector at a time. This imposes a significant burden on the already overtaxed RAID 5 array controller.
Another disadvantage with typical disc array systems is that the generic disc array model can be described as a two dimensional matrix of disc drives. This model requires a single array controller to control a large number of disc drives. Such a model can cause severe performance problems. For example, typical SCSI performance limitations and controller task burdening (i.e., the burden caused by the many concurrent events the controller must manage during heavy I/O activity) can cause significant degradation in performance. In addition, the controller is relatively expensive when only one or two rows of disc drives are attached.
Another limitation of this traditional model is that its economics are more or less fixed. For example, if one customer would like a 4+1 array (meaning 4 data disc drives and 1 parity disc drive), that customer must find a controller which has that particular organization. If the customer wants an 8+1 array, only those controllers having the appropriate organization are acceptable. The greater the number of interfaces, the more attractive the array because the cost of the parity drive is then amortized over more data drives. This would intuitively lead to the conclusion that an array as wide as possible is desirable.
However, there are two significant disadvantages which get worse as the array gets wider. The first is that in order to support a wide array, the cost and complexity of the controller increase and the granularity gets unacceptable. That is, the minimum number of discs that have to be added to the matrix--one row--gets to be so much storage capacity that the minimum capacity increments are both unwieldy and quite expensive.
The second disadvantage is that speed suffers greatly with a wide array. As more discs get added to the array, the performance gets progressively worse. The I/O levels that must be supported in a typical array have been aggravated by huge memory caches. Operation of these caches results in very high write percentages, which, in turn, cause a multiplication of the number of I/O operations that an array controller must perform. This is due to the fact that a RAID 5 subsystem must read old data and parity sectors and rewrite them to complete what would have been a simple write operation in a standard disc subsystem.
The combination of the increased percentage of write operations, and the complexity of the array controller cause such an array controller to be far too expensive with one or two rows of disc drives, and far too slow even with few disc drives, but especially with .multiple rows of disc drives.