Modern, high-capacity data storage systems often utilize a plurality of physical disk drives for redundant storage of data. This arrangements speeds data access as well as protecting against data loss that might result from the failure of any single disk.
There are two common methods of storing redundant data. According to the first or “mirror” method, data is duplicated and stored on two separate areas of the storage system. In a disk array, for example, identical data is stored on two separate disks. This method has the advantages of high performance and high data availability. However, the mirror method is also relatively expensive, effectively doubling the cost of storing data.
In the second or “parity” method, a portion of the storage area is used to store redundant data, but the size of the redundant storage area is less than the remaining storage space used to store the original data. For example, in a disk array having six disks, five disks might be used to store data, with the sixth disk being dedicated to storing redundant data, which is referred to as “parity” data. The parity data allows reconstruction of the data from one data disk, using the parity data in conjunction with the data from surviving disks. The parity method is advantageous because it is less costly than the mirror method, but it also has lower performance and availability characteristics in comparison to the mirror method.
One aspect of this invention involves storing redundant data according to parity techniques. In conventional disk arrays utilizing parity storage, the space on the storage disks are configured into multiple storage stripes, where each storage stripe extends across the storage disks. Each stripe consists of multiple segments of storage space, where each segment is that portion of the stripe that resides on a single storage disk of the disk array.
FIG. 1 illustrates a conventional disk array 12 having six storage disks 13. In this simplified example, there are five storage stripes extending across the storage disks. FIG. 1 highlights data and storage segments of a single one of these five stripes. Data segments of the indicated stripe are indicated by cross-hatching. The corresponding parity segment of this same stripe is illustrated in solid black. Generally, of the six segments comprising any given stripe, five of the segments are data segments and the sixth segment is a parity segment.
This type of parity storage is referred to as 5+1 parity storage, indicating that there are five data segments for every single parity segment. This scheme is more generally referred to as N+1 grouping, where N is the actual number of data segments in a data stripe.
N+1 redundancy grouping such as illustrated in FIG. 1 protects against the loss of any single physical storage device. If the storage device fails, its data can be reconstructed from the surviving data. The calculations performed to recover the data are straightforward, and are well-known. Generally, a single parity segment P is calculated from data segments D0 through DN−1 in accordance with the following equation:P=x0+x1+x2+xN−1where x0 through xN−1 correspond to the data from data segments D0 through DN−1. After the loss of any single data segment, its data can be recovered through a straightforward variation of the same equation.
In many systems, however, it is becoming important to protect against the loss of more than a single storage device. Thus, it is becoming necessary to implement N+2 grouping in redundant storage systems.
While N+2 redundancy grouping enhances data protection, it also involves more complex calculations—both in initially calculating parity segments and in reconstructing any lost data segments.
A general form of the N+2 parity computation is as follows:P=p0x0+p1x1+p2x2+pN−1xN−1Q=q0x0+q1x1+q2x2+qN−1xN−1where:
P is the value of a first parity segment;
Q is the value of a second parity segment;
x0 through xN−1 are the values of the data segments
p0 through pN−1 and q0 through qN−1 are constant coefficients that are particular to a given parity scheme.
These equations form a two-equation system that, by the rules of linear algebra, can potentially solve for any two unknowns xa through xb which represent the data from a single stripe of any two failed storage devices. One requirement is that the two sets of coefficients pi and qi be linearly independent. This requirement is met, for example, if p0=1, p1=1, p2=1; etc.; and q0=1, q1=2, q2=3; etc. Other examples are also possible.
The mathematics of N+2 parity are well-known and are not the primary subject of this description. However, it is apparent from the brief description given above that N+2 parity computations are significantly more complex than N+1 parity computations. In actual implementations of N+2 disk arrays, this complexity threatens to limit the data throughput of storage device controllers and, consequently, of the overall disk array.
This invention includes methods and means for maintaining adequate data throughput in spite of the added complexity resulting from N+2 parity calculations.