1. Technical Field
The present invention relates in general to an improved data storage system and in particular to an improved RAID data storage system. Still more particularly, the present invention relates to a method and system for improved efficiency of parity calculated in a RAID data storage system.
2. Description of the Related Art
As the performance of microprocessor and semiconductor memory technology increases, there is a need for an improved magnetic disk storage technology with comparable performance enhancements. Unfortunately, the performance of newer processor and semiconductor memory technologies have outpaced that of magnetic disk storage technology. In 1988 however, a paper was published by Patterson, Gibson, Katz, A Case for Redundant Arrays of Inexpensive Disks (RAID), International Conference on Management of Data, pgs. 109-116, June 1988. This paper laid the foundation for the use of redundant arrays of inexpensive disks that would not only significantly improve the data transfer rate and data I/O rate over a comparable single disk access, but would also provide error correction redundancy and lower cost. In the paper noted above, the authors describe "levels" of RAID systems, which are described below, (RAID-1 through RAID-5). Since that time other RAID levels have also been described, two of which are also briefly noted below (RAID-0 and RAID-6).
Most RAID systems incorporate redundancy in some form of data interleaving, which distributes the data over all of the data disks in the array. Redundancy usually in the form of an error correcting code, with simple parity schemes predominating. However, RAID-1 uses a "mirroring" redundancy scheme in which duplicate copies of the same data are stored on two separate disks in the array. Parity and other error correcting codes are either stored on one or more disks dedicated for that purpose only, or distributed over all of the disks within an array. Data interleaving is usually in the form of data "striping" in which the data to be stored is broken down into blocks called "stripe units" which are then distributed across the data disk. A typically size of a stripe unit is 8K through 64K bytes. A "stripe" is a group of corresponding stripe units, one stripe unit from each disk in the array. Thus, the "stripe size" is equal to the size of a stripe unit times the number of data disks in the array. Data interleaving may also be accomplished on a bit-by-bit basis, such as is described in more detail below with regard to RAID-3. Six levels of RAID will now be described.
RAID-0 utilizes data striping, but does not use redundancy. RAID-0 has a lower cost than any other RAID level, and its write performance is the best because there is no writing of redundant information. The primary disadvantage of RAID-0 is its lack of redundancy. Consequently, any single disk failure in the array results in lost data.
RAID-1 uses mirroring in which identical data is stored on two disks. An advantage of RAID-1 is that it is simple to implement in software. RAID-1 is also error correcting because complete recovery is possible from the failure of any one disk drive by simply switching to the drive that contains the duplicate copy of the data. After replacing the defective drive, the data on the duplicate drive can be recopied to the replacement drive. When servicing two or more requests to read data that is stored on the same disk, RAID-1 has a faster read rate than RAID-0 because one request can be serviced from the first disk, and the second request can be simultaneously serviced by the duplicate disk. A disadvantage of RAID-1 is that it is expensive because it requires two times the number of drives necessary to store the same data. Thus, its efficiency is always 1/2. The necessity of making duplicate copies of all data also makes this RAID level somewhat slow to write data.
RAID-2 uses error correcting codes such as those found in error correcting semiconductor memory systems.
RAID-3 uses a separate parity disk to store error correcting parity information and a plurality of data disks that contain bit interleaved data information. Unlike semiconductor memories, a faulty disk drive is usually easily identified because disk drives and their associated controllers typically contain sophisticated error detecting mechanisms that can quickly identify a failed drive. Consequently, if a single data drive has failed, the contents of the failed drive can be easily reconstructed using the information from the "good" data drives plus the parity drive. Conceptually, the reconstruction of a specific bit of a failed drive could be accomplished by calculating the parity of the corresponding bit of each of the "good" drives and then comparing it to the corresponding bit of the parity drive. For example, if the parity of the first bit of each of the "good" drives is a logical 0, and the first bit of the parity drive is a logical 1 , then the first bit of the failed drive must have been a logical 1 (because the parity of the first bit of all the data drives must equal logical 1 , in this example). Mathematically speaking, the data on the failed disk can be calculated by starting with the parity information from the parity drive and subtracting, modulo two, the corresponding information on the "good" data drives. If, on the other hand, the parity drives fails, parity is easily reconstructed from all the data drives.
For this RAID level, data is bit interleaved on the data disks. For example, a basic RAID-3 system in which data is organized in 8 bit bytes and having 8 data disks and one parity disk would store the first bit of every byte on the first disk, the second bit of every byte on the second disk, and so on. Thus, a write request simultaneously accesses all 8 data disks plus the parity disk, while a read request accesses all 8 data disks. Consequently, the data rate, which is the rate at which data can be written to or read from sequential locations on the disk without head repositioning, is very high for RAID-3 . A primary disadvantage of this RAID level is that it only permits one request to be serviced at any one time. RAID-3 systems also have relatively low I/O rates, which is the rate at which data can be written to random locations on the disk, thereby requiring frequent head repositioning.
RAID-4 also uses a separate parity disk to store error correcting parity information and a plurality of data disks that contain interleaved data information. Unlike RAID-3, in which data is bit interleaved across the data disks, RAID-4 uses block interleaving or data spring, which is described in more detail above.
The performance of RAID-4 is particularly dependent on the type of access requested, read or write, and the size of the requested access relative to the size of the stripe unit and the size of the stripe. A request to read a block of data that is contained entirely within one stripe unit can be quickly serviced as soon as the disk drive containing the requested data becomes available. Consequently, multiple requests to read various blocks of data, each of which is entirely contained within one stripe unit on a different data drive, can be serviced simultaneously. In contrast, a RAID-3 system must service multiple requests serially, and if head repositioning is required between the servicing of each request, the performance of a RAID-3 system will be dramatically slower than a RAID-4 system for this type of access. A read operation of stripe size data blocks can also be very fast in RAID-4, particularly if scheduling permits all data disks to be accessed at one time.
A request to write data to a single stripe unit can be a relatively slow process, because it requires four disk accesses. Specifically, a data write to a single stripe unit requires that the old data and corresponding parity information be read from the appropriate data disk and the parity disk. Next, new parity information is computed using the old data, the new data and the old parity. Finally, the new data and the new parity are written to the data and parity disks, respectively. Requests for multiple writes to various stripe units located on different drives and in different stripes is even more problematic, because each write requires a read and write operation to the parity disk and, since there is only one parity disk, it can become "bottle necked." Writing an entire stripe of data is much easier because no read operations are required and the parity for the new stripe of data is easily computed.
RAID-5 is similar to RAID-4 in that it interleaves data by stripe units across the various disk drives, and also stores error correcting parity information. In RAID-5, however, there is no dedicated parity disk as there is in RAID-3 and RAID-4. Instead, RAID-5 distributes parity across all the disk drives, thereby eliminating the parity disk bottleneck problem described above with regards to certain write operations of RAID-4 systems. Furthermore, because RAID-5 distributes data over all the disks, and RAID-4 only distributes data over the data disks (which is equal to the total number of disks minus the total number of parity disks), RAID-5 has a slight performance advantage over RAID-4. With these performance enhancements, RAID-5 is usually preferred over RAID-4 and, consequently, most RAID-4 systems have disappeared from the market to be replaced by RAID-5 systems.
RAID-6 is similar to RAID-5 in that it interleaves data in stripe units and distributes parity information across all the disk drives. A disadvantage of RAID-5, as well as RAID-3 and RAID-4, is its inability to correct a failure in more than one single disk. As the size of disk arrays increases, however, the probability of a failure is more than one drive also increases which, in turn, increases the chance of an unrecoverable failure. To overcome the problem of a two disk failure, RAID-6 uses Reed-Solomon codes in a P+Q redundancy scheme that can recover from a failure of any two disks. One disadvantage of RAID-6 over RAID-5, however, is in the write performance of small amounts of data are slow in RAID-5 because of four data accesses are required. For such small write operations, RAID-6 is even more inefficient because it requires a total of six access to update both the "P" and "Q" information.
From a review of the above it may be seen that in larger arrays of disk the calculation of parity for a data stripe which is necessary upon the updating of data within that stripe can be substantially time consuming. It would thus be desirable to provide a method and system for improved efficiency of parity calculation in a RAID data storage system.