The present invention relates generally to data fault tolerance. Specifically, the invention provides a data fault tolerance device and method, constructed after data has been written through the implementation and use of any deterministic method, which accepts blocks of data existing on separate fault domains and spreading redundancy data on blocks existing in another set of separate fault domains.
Many large scale data processing systems now employ a multiplicity of independent computer/disk systems, all of which operate in parallel on discrete portions of a problem. An independent computer/disk is called a node of the multiprocessing system. In such systems, it is possible that data files are distributed across the system so as to balance nodal work loads and to protect against significant losses of data should one or more nodes malfunction.
A variety of techniques have been proposed to enable data reconstruction in the event of failure of one or more nodes. For instance, in U.S. Pat. No. 4,722,085 issued to Flora, a relatively large number of independently operating disk devices are coupled to a read/write interface containing error circuitry and organization circuitry. Each data word read into the system has its bits spread across the disk devices so that only one bit of each word is written to a particular physical disk device. This assures that a single bit error will not cause a fault since it is automatically corrected by parity correction in the error circuitry. U.S. Pat. No. 4,817,035 issued to Timsit also describes a similar, bit-oriented, distributed storage across a plurality of disk units.
In U.S. Pat. No. 4,761,785 issued to Clark et al., assigned to the International Business Machine Corporation, another version of distributed storage is described to enable data recovery in the event of a malfunction. The Clark et al system employs the concept of the spreading of data blocks across a plurality of disk drives and exclusive-Or""ing a series of blocks to derive a parity check block. Each disk drive contains the same number of block physical address areas. Disk physical address areas with the same unit address ranges are referred to as xe2x80x9cstripes.xe2x80x9d Each stripe has n-1 blocks of data written across n-1 disk drives and a parity block on another disk drive, which parity block contains parity for the n-1 blocks of the stripe. Since a stripe of blocks is written across a plurality of disk drives, the failure of any one disk drive can be accommodated by employing the parity block and exclusive-Or""ing it with all remaining blocks, to derive the lost data block.
In U.S. Pat. No. 5,130,992 issued to Frey et al, assigned to the International Business Machines Corporation, overcomes limitations in prior art. Specifically, providing a file-based, parity protection structure which is integral to the file structure rather than the physical disk structure, enabling data blocks to be placed anywhere on an array of disk devices, while still retaining file parity protection, and enabling generated parity blocks included within a data file and to be relevant to and only to the data file, upon removal of the file from disks.
While the system described by Frey et al. does effectively provide a data protection method that is not limited by the physical disk structure, it has drawbacks. First, and most importantly, the parity is incorporated into the file requiring all data to be rewritten if the parity protection scheme is changed. In other words, the parity for n-1 blocks is stored in a specific block in the data file. Thus, if the stripe size is changed to m, all of the data in the file must be moved and parity for m-1 blocks is stored in a different block in the data file. Thus, if the stripe size is changed to m, all of the data in the file must be moved and parity recomputed for m-1 blocks written in the next block in the data file. Additionally, as only or""ing of data is described, no method is given for an alternate parity generation method which may provide similar or improved fault protection.
Accordingly, there is a need to overcome the limitations in both operations and structure of the prior art.
It is an object of this invention to provide for a parallel computing system, a file-based, protection structure which is separate from the file structure rather than integral to the file structure or the physical disk structure.
It is another object of this invention to provide a protection method which enables data blocks to be placed anywhere on an array of disk files, while retaining file protection, but also allowing the protection scheme to change without requiring the data to be rewritten.
It is still another object of this invention to allow the mixing of protection schemes within a single array of disks.
Specifically, the present invention provides a data fault tolerance by using any deterministic method which accepts blocks of data existing on separate fault domains to spread redundancy data on blocks residing in another set of separate fault domains.
Another object of the present invention is to enable change of data redundancy without requiring the protected data to be rewritten. Yet another objective of the present invention is to provide implementation in a manner that is independent of the file system that is being used to store the data. Further, it is the object of the present invention to overcome the need for protected data sets to use the same block size and/or the same deterministic function for data spreading.
It is also the object of the present invention to overcome the prior art limitation that space/room for the redundancy information has to be allocated at the same time as the data is written. The present invention enables space allocation for redundancy information after the selection of the deterministic function and to be changed, if desired at any future time without rewriting the data. Moreover, the present invention enables data dispersion and spread across multiple files, data storage units, controllers, or computer nodes. Yet another object of the invention is the ability to use any deterministic function which meets the redundancy requirements in determining where to write data blocks.
In the preferred embodiment, data redundancy is implemented by any deterministic method which accepts blocks of data which exist on separate fault domains and places the redundancy on any other separate fault domains.