1. Field of the Invention
The present invention generally relates to computer systems having multiple storage drives. More specifically, the invention relates to calculating error correction or parity values used for recovery from drive failures. More particularly still, the invention relates to eight bit encryption keys for performing finite field encrypted resultant coefficient multiplication for calculating parity values.
2. Background of the Invention
Early computer systems typically had only one hard drive or fixed storage device. Even today, computer systems having a single fixed storage device or hard drive are standard for personal computer systems. However, commercial and industrial computer users require greater data stability. That is, commercial and industrial computer users want some assurance that information stored on hard drives will not be lost in spite of drive failures.
Some users ensure data stability by performing periodic backups onto tape drive systems. For example, a user may make a complete backup of their hard drive contents on a weekly basis. The user may further make copies of only the changes since the last backup, commonly known as an incremental backup, on a daily basis. However, even this method leaves open the possibility that some information may be lost if there is a failure of the hard drive between data backups. Data stability demands drove computer manufacturers to make computer systems having multiple fixed storage devices.
FIG. 1A represents one approach computer manufacturers take in storing data in a computer system having multiple hard drives. In FIG. 1A, each of the large boxes represents a hard drive in a computer system. One block of data D, being the set of data [d0,d1,d2], is divided into small subsets and distributed across the hard drives of the computer system. This configuration is commonly known as a Redundant Array of Inexpensive Disks (“RAID”), and may also be known as a Redundant Array of Independent Disks. The system exemplified in FIG. 1A is commonly known as “RAID0.” The disadvantage of the RAID0 system is that upon failure of any one of the disk drives, the overall data D cannot be recovered. FIG. 1B represents, in matrix format, the storage system of RAID0. Carrying out the matrix multiplication of FIG. 1B reveals that d0=d0, d1=d1 and d2=d2, which is mathematically uneventful, but is important in other systems as described below. As compared to a single hard drive computer system, RAID0 actually increases the probability of data loss in that a failure of any one of the drives results in a complete data loss. RAID0 does however exemplify an important concept in multiple disk arrays, that concept being “striping”. With reference to FIG. 1A, data D is the combination of the smaller portions of data being [d0,d1,d2]. Placing small portions on each drive of a multiple drive system is known as striping. That is, data is striped across multiple drives.
Manufacturers may address the problem associated with a striped RAID0 system by “mirroring”. In a mirrored system, there are duplicate drives containing complete sets of duplicate information. For example, an array of drives may consist of four drives, data D may be striped across two drives, and likewise striped again across the other two drives. In this way, as many as two drives may fail without loss of data, so long as the drive failures are not the drives containing the duplicate information. Fault tolerance implemented in this configuration is known as “RAID1+0”, “RAID0+1” or “RAD10.” While a RAID1+0 ensures greater data stability over a RAID0 or a single disk system, the overhead associated with implementing such a system is high. In the exemplary system described, the effective storage utilization capacity of the four disk drives is only 50%. What was needed in the industry was a fault tolerance scheme that had a higher storage utilization capacity, which would therefore make it less expensive to implement.
FIG. 2A represents a series of hard drives in a computer system that has the same number of hard drives as described with respect to mirroring, however, this specific system reaches a 75% utilization capacity. In this system the data represented by D[d0,d1,d2] is striped across the first three of the four disk drives. The system of FIG. 2A further writes error correction or parity information to the fourth disk drive. Such a system is referred to as having three data drives and one parity drive. It is noted that having three data drives is merely an exemplary number and more or fewer data drives are possible. However, fewer data drives translates into lower storage utilization. Likewise, a greater number of parity drives represents higher storage utilization. Indeed, as the number of data drives significantly increases, with one parity drive, it is possible that the storage utilization may approach, but never actually reach, 100%.
The subset of data written to the parity drive of FIG. 2A is related to the data written to each of the data drives. FIG. 2B shows the relationship, in matrix format, of each data subset written to the data drives and the value of the parity subset written to the parity drive. Carrying out the matrix multiplication of FIG. 2B reveals that d0=d0, d1=d1, d2=d2 and P=d0^d1^d2, where “^” represents the logical exclusive-OR (XOR) function. Thus, as is indicated in the figure and shown above, the value of the parity subset is the XOR of each of the smaller subsets of the overall data. A system implementing the configuration of FIG. 2A, 2B is capable of recovery from a single drive failure. Loss of the parity drive does not affect stability of the data. However, loss of any one of the data drives is a recoverable error inasmuch as the data lost on the failed drive may be calculated using the remaining subsets of information in combination with the parity information. Such a fault tolerance scheme is known as “RAID4.”
In RAID4 systems any write operation to any of the data drives also requires a write to the parity drive. This is true even if only one of the data drives is written. In the three data drive system exemplified in FIG. 2A, data throughput is not significantly hampered by this requirement. However, as the number of data drives increases system performance suffers as write commands to the parity drive accumulate. In computer systems requiring more than a handful of data drives, the RAID4 system is less desirable because of the throughput capabilities associated with queuing of write requests at the parity drive. Manufacturers address this problem by rotating the parity drive. That is, rather than having designated data and parity drives, the particular hard drive containing the parity information shifts for each block of parity data. Such a distributed parity system is known as “RAID5.” Although parity information is written for each write of a subset of data, no one hard drive becomes the receptacle for all those parity writes. In this way, system throughput is not limited by one parity drive having numerous writes of parity data stacked in its input queue.
The disk arrays discussed to this point may each have desirability in particular systems. That is to say, a RAID5 system may be overkill for an application where there is a somewhat limited amount of data to be stored. It may be more economical in this circumstance to implement RAID1 system. Likewise, where large amounts of data must be stored, a RAID5 may be more desirable.
Except for the two-drive mirroring technique discussed with respect to RAID1, the systems discussed to this point have only had the capability of recovering from a single drive failure in the array. For systems having a relatively small number of hard drives, the ability to recover from a single drive failure may be sufficient. However, as the number of drives increase in a disk array system, the ability to recover from a single drive failure may not be sufficiently protect data integrity. For example, if a computer system has an array of 10 disks, the probability of having a second drive fail before a user fixes a first drive failure is significantly greater than for a three disk system. Thus, for computer system users requiring large arrays of hard disk drives, the capability to recover from multiple drive failures is desirable.
FIG. 3A exemplifies a computer system having six data drives and two parity drives. By having two parity drives the overall disk array has the ability to recover from the failure of up to two data drives. For ease of description, the hard drives exemplified in FIG. 3A are drawn in the RAID4 format. That is, the figure shows six distinct data drives and two distinct parity drives. However, it will be understood that while this system may be operational in this configuration, most manufacturers distribute parity responsibility across all the drives as discussed with respect to the RAID5 format. Also, only six data drives are shown. However, prior art RAID systems support up to fifteen data drives in addition to the parity drives. FIG. 3B shows, in matrix form, the relationship between the subsets of data [d0 . . . d5] to the values of the two parity blocks. The equation representing the values of parity block zero, P0, is merely an extension of the parity block as calculated and described with respect to the RAID4/5 system. However, the description of the equation for calculating the second parity block, P1, requires further description that must begin with a brief digression into linear algebra.
As is well known in mathematics, in systems of linear equations it is possible to solve for X number of unknown variables so long as there are X number of linearly independent equations. Linear independence means that each equations describe different information. Applying this concept to the array of hard disks exemplified in FIG. 3A, each parity block or value needs to contain information that is not related to the other blocks such that given two hard drive failures, for this system, there are two linearly independent equations from which data for the two failed drives may be calculated. Stated otherwise, and referring to FIG. 3B, the coefficients for the second parity equation, P1, being represented in the figure as α, β, γ, δ, and ε, are chosen such that the equations defining each of the parity blocks are linearly independent. The problem may be exemplified by assuming for sake of argument that α, β, γ, δ, and ε are all assigned a value of 1. Thus, the parity equations of the exemplary system using the assumption are:P0=d0^d1^d2^d3^d4^d5 P1=d0^d1^d2^d3^d4^d5 As is seen, the equation representing the parity value P0 exactly matches the equation representing the parity value P1. Therefore, using this assumption only one equation exists (they are not linearly independent) and the missing data cannot be calculated.
Assume for purposes of explanation that the system of FIG. 3A, 3B has failures of two drives, the drives that hold data subsets d2 and d3. FIG. 4A represents the matrix solution for determining missing data from drives d2 and d3 given the configuration of FIG. 3A. P0′ and P1′ are the equations for the parity information P0 and P1 solved for the missing components d2 and d3 respectively. Solving for the unknown data d2 and d3 involves taking the inverse of the 2×2 matrix and multiplying it by P0′ and P1′ as shown in FIG. 4B. However, not all matrices are invertable. A matrix may be inverted only if the coefficients of each row are linearly independent from the coefficients of all the other rows. Thus, stating that the 2×2 matrix given in FIG. 4A is invertable is equivalent to saying that the equations that the 2×2 matrix represents are linearly independent as discussed above. Therefore, selecting the variables α, β, γ, δ, and ε, etc. is critical to ensuring linear independence. However, problem arises in the prior art related to the size of these coefficients. The problem relates to the physical size of the resultant of the coefficient multiplied by the data subsets.
It is desirable to have the size of the parity data match the size of the data subsets written to each drive. With reference to a RAID4 format, the size of the data written to the parity drive should equal the data subset size. In this way, the parity drive may have the same physical capacity as the data drives. If this were not the case, the parity drives in the RAID4 format would have to be significantly larger than the data drives. Indeed, the size of the parity drive would increase dramatically as the coefficients increased in value. Rather than implement the hardware in this manner, the prior art addresses multiplication by the coefficient with a system such that any four bit number multiplied by a four bit number results in an equally sized four bit number, finite field arithmetic. This is accomplished specifically by a technique known as Datum.
Datum is a four bit encryption method with the characteristic that the multiplication of two four bit numbers has an encrypted resultant of four bits. Datum uses a four bit encryption key to encrypt the result of the multiplication of the coefficient with the data subset. This ensures the size of the parity blocks matches the size of the data subsets. However in using the four bit scheme there are only fifteen possible coefficients for use in creating linearly independent equations.
Thus, it would be desirable to ensure that, regardless of the number or order of failed drives, the equations used to solve for the missing data are linearly independent. Despite the desirability of computer systems with such an assurance, none have been developed.