With the accelerating growth of Internet and intranet communication, high-bandwidth applications (such as streaming video), and large information databases, the need for networked storage systems has increased dramatically. System performance, data protection, and cost have been some of the main concerns in designing networked storage systems. In the past, many systems have used fibre channel drives, because of their speed and reliability. However, fibre channel drives are also costly. Integrated drive electronics (IDE) drives are much cheaper in terms of dollars per gigabyte of storage; however, their reliability is inferior to that of fibre channel drives. Furthermore, IDE drives require cumbersome 40-pin cable connections and are not easily replaceable when a drive fails. Serial advanced technology attachment (ATA) drives that use the same receptor as their fibre channel counterparts are now available. Serial ATA drives have the speed required for acceptable system performance and are hot-swappable, meaning that failed drives are easily replaced with new ones. Furthermore, they provide more storage than fibre channel drives and at a much lower cost. However, serial ATA drives still do not offer the same reliability as fibre channel drives. Thus, there is an industry push to develop high-capacity storage devices that are low cost and reliable.
To improve data reliability, many computer systems implement a redundant array of independent disks (RAID) system, which is a disk system that includes a collection of multiple disk drives organized into a disk array and managed by a common array controller. The array controller presents the array to the user as one or more virtual disks. Disk arrays are the framework to which RAID functionality is added in functional levels, in order to produce cost-effective, available, high-performance disk systems.
In RAID systems, stored data is distributed over multiple disk drives in order to allow parallel operation to thereby enhance disk access performance and to provide fault tolerance against drive failures. Currently, a variety of RAID levels from RAID level 0 through RAID level 6 have been specified in the industry. RAID levels 1 through 5 provide a single drive fault tolerance. That is, these RAID levels allow reconstruction of the original data if any one of the disk drives fails. It is possible, however, that more than one serial ATA drive may fail in a RAID system. For example, dual drive failures are becoming more common as RAID systems incorporate an increasing number of less expensive disk drives.
To provide, in part, a dual fault tolerance to such failures, the industry has specified a RAID level 6. The RAID 6 architecture is similar to RAID 5, but RAID 6 can overcome the failure of any two disk drives by using an additional parity block for each row (for a storage loss of 2/N, where N represents the total number of disk drives in the system). The first parity block (P) is calculated by performing an exclusive OR (XOR) operation on a set of assigned data chunks. Likewise, the second parity block (Q) is generated by using the XOR function on a set of assigned data chunks. When a pair of disk drives fails, the conventional dual-fault tolerant RAID systems reconstruct the data of the failed drives by using the parity sets. These RAID systems are well known in the art and are amply described, for example, in The RAIDbook, 6th Edition: A Storage System Technology Handbook, edited by Paul Massiglia (1997), which is incorporated herein by reference.
An examplary dual parity scheme performs an XOR operation on horizontal rows of drive sectors, in order to generate P parity, and then performs an XOR operation on diagonal patterns of sectors, in order to create Q parity. In general, these systems require a prime number of drives and a prime number of sectors per drive in order to perform. For example, Table 1 shows the process of performing an XOR operation for both the horizontal P parity calculation and the diagonal Q parity calculation in an 11+2 disk configuration, where disk 11 is the P parity disk and disk 12 is the Q parity disk. Note that there are 11 sectors per disk; the number of sectors per disk is equal to the number of data drives.
TABLE 1Process of performing an XOR operation for both the horizontalP parity calculations and the diagonal Q parity calculation
In the example in Table 1, P parity is calculated by performing an XOR operation on the data on sector 0 of disk 0 and the data on sector 0 of disk 1. An XOR operation is further performed on the interim result of the first operation and the data on sector 0 of disk 2 and so on, until the final sector 0 of disk 10 has been processed. The final result is stored in sector 0 of the P parity disk 11 (parity may be rotating and is, therefore, stored in a special row across multiple disks). For the Q parity, an XOR operation is performed on the data on the third sector of drive 0 and the data of the second sector of drive 1. An XOR operation is further performed on the result of the first operation and the first sector of drive 2. The process repeats for the eleventh sector of drive 3, the tenth sector of drive 4, and so on, through the fourth row of drive 10. The final result is stored in the third sector of the Q parity disk 12. This completes an entire diagonal across a prime number of drives, divided into an equal prime number of rows.
An examplary dual parity algorithm is found in U.S. Pat. No. 6,453,428, entitled, “Dual-drive Fault Tolerant Method and System for Assigning Data Chunks to Column Parity Sets.” The '428 patent describes a method of and system for assigning data chunks to column parity sets in a dual-drive fault tolerant storage disk drive system that has N disk drives, where N is a prime number. Each of the N disk drives is organized into N chunks, such that the N disk drives are configured as one or more N×N array of chunks. The array has chunks arranged in N rows, from row 1 to row N, and in N columns, from column 1 to column N. Each row includes a plurality of data chunks for storing data, a column parity chunk for storing a column parity set, and a row parity chunk for storing a row parity set. These data chunks are assigned in a predetermined order. The data chunks in each row are assigned to the row parity set. Each column parity set is associated with a set of data chunks in the array, wherein row m is associated with column parity set Qm, where m is an integer that ranges from 1 to N. For row 1 of a selected N×N array, a first data chunk is assigned to a column parity set Qi, wherein i is an integer determined by rounding down (N/2). For each of the remaining data chunks in row 1, each data chunk is assigned to a column parity set Qj, wherein j is an integer one less than the column parity set for the preceding data chunk, and wherein j wraps to N when j is equal to 0. For each of the remaining rows 2 to N of the selected array, a first logical data chunk is assigned to a column parity set Qk, wherein k is one greater than the column parity set for the first logical data chunk in a preceding row, and wherein k wraps to 1 when k is equal to (N+1). For each of the remaining data chunks in rows 2 to N, each data chunk is assigned to a column parity set Qn, wherein n is an integer one less than a column parity set for the preceding data chunk, and wherein n wraps to N when n is equal to 0.
The algorithm described in the '428 patent safeguards against losing data in the event of a dual drive failure. However, performing the algorithm described uses excess processing cycles that may otherwise be utilized for performing system storage tasks. Hence, the '428 patent describes a suitable dual parity algorithm for calculating dual parity and for restoring data from a dual drive failure, yet it fails to provide an optimized hardware system capable of performing the dual parity algorithm without affecting system performance. When one data sector changes, multiple Q parity sectors also need to change. If the data chunk size is equal to one or more sectors, it leads to system inefficiencies for random writes. Because parity calculations operate on an entire sector of data, each sector is read into a buffer. As the calculations continue, it may be necessary to access the buffer several times to reacquire sector data, even if that data had been used previously in the parity generation hardware. There is, therefore, a need for an effective means of calculating parity, such that the storage system is fault tolerant against a dual drive failure, provides optimal performance by improving buffer bandwidth utilization, and is capable of generating parity or regenerating data at wire speed for differing data sector sizes.
Therefore, it is an object of the present invention to provide hardware and software protocols that enable wire speed calculation of dual parity and regenerated data.
It is another object of the present invention to provide a programmable dual parity generator and data regenerator that supports both RAID5 and RAID6 architectures.
It is yet another object of the present invention to provide a programmable dual parity generator and data regenerator that operates independently of stripe size or depth.
It is yet another object of the present invention to provide a programmable dual parity generator and data regenerator that operates independently of sector order (i.e., order in which sectors are read from and written to the buffer).