With the accelerating growth of Internet and intranet communication, high-bandwidth applications (such as streaming video), and large information databases, the need for networked storage systems has increased dramatically. System performance, data protection, and cost have been some of the main concerns in designing networked storage systems. In the past, many systems have used fibre channel drives, because of their speed and reliability. However, fibre channel drives are very costly. Integrated drive electronics (IDE) drives are much cheaper in terms of dollars-per-gigabyte of storage; however, their reliability is inferior to that of fibre channel drives. Furthermore, IDE drives require cumbersome 40-pin cable connections and are not easily replaceable when a drive fails. Serial advanced technology attachment (SATA) drives that use the same receptor as their fibre channel counterparts are now available. These drives, therefore, have the speed required for acceptable system performance and are hot-swappable, which means that failed SATA drives are easily replaced with new ones. Furthermore, they provide more storage than do fibre channel drives and at a much lower cost. However, SATA drives still do not offer the same reliability as fibre channel drives. Thus, there is an industry push to develop high-capacity storage devices that are low cost and extremely reliable.
To improve data reliability, many computer systems implement a redundant array of independent disks (RAID) system, which is a disk system that includes a collection of multiple disk drives that are organized into a disk array and managed by a common array controller. The array controller presents the array to the user as one or more virtual disks. Disk arrays are the framework to which RAID functionality is added, in functional levels, in order to produce cost-effective, highly available, high-performance disk systems.
In RAID systems, the host data and check data (computed based on host data) are distributed over multiple disk drives in order to allow parallel operation and thereby enhance disk access performance and provide fault tolerance against drive failures. Currently, a variety of RAID levels from RAID level 0 through RAID level 6 have been specified in the industry. RAID level 5 provides a single-drive fault tolerance. That is, this RAID level allows reconstruction of the original data, if any one of the disk drives fails. It is possible, however, that more than one SATA drive may fail in a RAID system. Current RAID 5 failure algorithms are not sufficient to recover all data from a RAID system failure that involves more than one drive.
To provide, in part, a dual-fault tolerance to such failures, the industry has specified a RAID level 6. The RAID 6 architecture is similar to RAID 5, but RAID 6 can overcome the failure of any two disk drives by using an additional parity block (for a storage loss of 2/N, where N is the number of disk drives). The first parity block (P) is calculated by the user's performing an exclusive or (XOR) operation on a set of positionally assigned data sectors (i.e., rows of data sectors). Likewise, the second parity block (Q) is generated by the use of the XOR function on a set of positionally assigned data sectors (i.e., columns of data sectors). When a pair of disk drives fails, the conventional dual-fault-tolerant RAID systems reconstruct the data of the failed drives by using the parity sets. These RAID systems are well known in the art and are amply described, for example, in The RAIDbook, 6th Edition: A Storage System Technology Handbook, edited by Paul Massiglia (1997), which is incorporated herein by reference.
An examplary multiple drive failure algorithm is found in U.S. Pat. No. 6,694,479, entitled, “Multiple drive failure recovery for a computer system having an array of storage drives.” The '479 patent describes a method of and related system for generating error correction or parity information in a multiple disk computer system that supports multiple drive-fault tolerance. The method involves defining parity equations, to be based not only on data written to drives of the computer system, but also on other parity information, such that, in solving for missing data, specific equations need not be used. Defining parity equations in this manner, in combination with a coefficient matrix that defines the coefficients of the various parity equations, ensures the ability to solve for the missing data, even if some of the failed drives contain parity information.
The algorithm described in the '479 patent safeguards against the loss of data in the event of a multi-drive failure. However, the '479 patent method runs real-time, which limits processing bandwidth for other storage operations. Furthermore, the method described in the '479 patent does not exclusively use XOR operations on the data to regenerate data after a multiple drive failure. The recovery and encoding method described in the '479 patent requires extensive multiplication and division operations on the data. It is not solely parity based and, thus, requires additional hardware and processing cycles to recover and encode data. The method described in the '479 patent requires excessive processing to locate symbols required for regenerating data and further manipulating symbols to enable processing, because the parity symbol sizes are not equivalent to the size of the data symbols.
There is, therefore, a need for an effective means of calculating parity, such that the storage system is fault tolerant against any number of drive failures, provides optimal system performance by optimizing XOR bandwidth, and/or runs a priori, is capable of generating parity regardless of symbol position (i.e., is not dependent on row or diagonal/column parity), and requires only XOR operations in order to calculate parity or regenerate data.
It is therefore an object of the invention to provide an algorithm that compensates for multi-storage element failures in a networked storage system.
It is another object of this invention to provide an algorithm that compensates for multi-storage element failures in a networked storage system and that optimizes processing cycles by executing interpretive language scripts, generated offline, prior to system operation.
It is yet another object of this invention to provide an algorithm that compensates for multi-storage element failures in a networked storage system and that requires only XOR operations in order to regenerate data and calculate parity.