1. Field of the Invention
The present invention generally relates to multiple data disk storage technology, and, more particularly, to a memory interface controller for DATUM RAID operations with a DATUM multiplier.
2. Description of the Related Art
Due to the increasing importance of business-critical data to many companies, fault tolerance is often a priority for network storage systems. Fault tolerance, in the context of a disk array subsystem, is the ability of a disk array to continue to perform its functions even when one or more disks have failed. Parity RAID and declustering architectures are network storage solutions commonly utilized to provide fault tolerance against a single disk failure. RAID, which stands for Redundant Array of Inexpensive Disks, relates to the concept of using multiple inexpensive disks as one unit in the place of a single large disk, for improved storage reliability and system performance. This idea, which is now the industry standard, was introduced in a December 1987 article entitled “A Case for Redundant Arrays of Inexpensive Disks (RAID)” by D. Patterson, G. Gibson, and R. H. Katz.
To date, a variety of RAID architectures (industry and proprietary) have been utilized for network storage. RAID 5, which utilizes parity information to provide redundancy and fault tolerance, is one example. RAID 5 architecture uses data striping to spread or interleave user data and redundancy information (e.g., parity) across all the disks in an array. Striping generally refers to spreading data evenly over multiple disks. In other words, a segmented data block is broken into segments of a unit length and sequential segments are written to sequential disk drives. The combination of corresponding sequential data segments across each of the disks is referred to as a stripe. In the event of a failed disk, the parity information allows for recovery or reconstruction of the data of the failed disk. Parity declustering is the uniform distribution of user data and parity data over a disk array where each stripe uses a subset of the disks.
In contrast to parity declustering and conventional RAID architectures, certain disk array architectures mask multiple simultaneous disk failures. One advantage of such architectures is handling of latent sector errors. A sector error can be discovered when an array has already suffered a failure. Another advantage of architectures capable of tolerating multiple concurrent failures is handling of communication failures since communication failures can render disks inaccessible. DATUM, which stands for Disk Arrays with optimal storage, Uniform declustering and Multiple-failure tolerance, is an example of an array architecture for masking or tolerating multiple disk failures in disk arrays. In terms of uniform declustering, the architecture basically provides an array layout of user data and redundancy data, which distributes redundancy data evenly or uniformly among disks in the array. Most declustered disk array layouts that can tolerate a single failure can be characterized by certain desirable properties. For example, to recover from a single disk crash, no two units of the same stripe are matched to the same disk. Another desirable property is distributed parity, whereby all disks have the same number of parity or check units mapped to them. A further desirable property, termed distributed reconstruction, involves providing a constant number of stripes with units mapped to both disks for every pair of disks.
In terms of optimal storage, DATUM uses a theoretical minimum amount of storage space for storing redundancy data in the array. DATUM employs an information dispersal algorithm (IDA) to uniformly distribute redundancy data on all the disks. The IDA algorithm encodes a sequence E=(d1, d2, . . . , dm) of m integers into a sequence of m+f integers (e1, e2, . . . , em, em+1, . . . , em+f) in such a way that any m of the m+f integers suffice to recover a sequence F. The sequence F represents “m” equal-sized portions of user or client data, and m+f values represents encoded data including redundancy data. The transformation of user data into encoded data by the IDA algorithm can be represented in the form of a m×(m+j) matrix T (i.e., a matrix having m linear independent columns and m+f rows). Both user data and redundancy data are organized in terms of striping units. Disk space is logically structured into striping units, where each striping unit has a fixed number of sectors. A stripe consists of a fixed number of user data stripe units and a number of redundant stripe units. Different striping units of the same stripe are mapped to different disks. In other words, units of the same strip are not stored in the same disk. Any data stripe can be reconstructed if m or more disks are correct; that is, if “f” or fewer disks have failed. DATUM thus uses only the theoretical minimum amount of disk space to store each stripe so that its contents are recoverable even if stripe units are missing.
Aside from storage space, DATUM is also optimal with respect to write overhead. That is, DATUM performs the minimum number of disk accesses to implement small writes. A small write occurs when a single stripe unit is written by a client application. A small write for a parity redundancy RAID architecture has generally reduced disk array performance. To implement a small write with DATUM, it has been necessary to (i) read the old values of the data unit being written and the “f” redundant units, (ii) recompute the check stripe unit values, and (iii) write the new data stripe value as well as the “f” check stripe units. In RAID architectures, this is often called read-modify-write (read old values from disk, modify them with the new values, and write them back to the disk). Since it has not been possible to write less than f+1 stripe units if the array is to tolerate up to f failures, DATUM performs the optimal number f+1 disk writes per small write operation.
When f=1, DATUM RAID can be modeled as a RAID5 system. In the read-modify-write process for a small write: 1) the old data and old parity are read from the disk drives, 2) new parity is calculated from old data, old parity and new data, 3) new data and new parity are written back to the disk drives.
DATUM is considered the first known technique for tolerating an arbitrary number of failures that is optimal with respect to both storage space and write overhead, and that distributes redundant data uniformly by using a set of layout functions that can be evaluated efficiently with very low memory requirements. With DATUM, the size of a redundant unit is independent of the number of units per stripe. Further details regarding DATUM can be found in an article entitled “Tolerating Multiple Failures in RAID Architectures with Optimal Storage and Uniform Declustering” by Guillermo A. Alvarez, Walter A. Burkhard and Flaviu Cristian, Department of Computer Science and Engineering, University of California, San Diego, which is incorporated herein by reference.
While this present technique improves system performance and reliability, certain RAID levels involve XORing data to generate parity. Current XOR methodology requires more memory bandwidth than is available in current designs to saturate other buses in the system (e.g. PCI). It would be beneficial if the same necessary XOR function could be performed in fewer steps, thus conserving memory bandwidth.