When digital information is stored or communicated, it may be lost due to the failure of a storage device or loss of a communication packet. Fortunately, various linear erasure correcting codes are available to recover lost information despite storage failures and packet losses. Examples include the common Reed-Solomon code and the Information Dispersal Algorithm (IDA) of Rabin (U.S. Pat. No. 5,485,474).
Linear erasure correcting codes provide a number of advantages over data replication and parity schemes. Unlike replication, these codes are storage optimal. And unlike parity schemes which can protect digital information from a single failure or loss, linear erasure correcting codes can protect data from an arbitrary number of losses.
Linear erasure correcting codes work by breaking the data to be stored or communicated into m segments, using a linear transformation to produce n=m+k encoded pieces and independently storing or communicating the encoded pieces (i.e. dispersing the information). The segments and encoded pieces are of equal size. The linear transformation has the special property that the original m segments can be recreated by decoding any m of the n encoded pieces that have survived a failure or loss. This provides systems using the codes with resilience to k failures or losses.
The linear transformation operates on chunks. The segments are broken into equal sized chunks, the chunks are assembled into m×l vectors and the vectors are transformed by multiplication with an n×m encoding matrix A defined over a finite Galois Field GF(2q). The result of the matrix multiplications is a sequence of n×l vectors consisting of transformed chunks that are reverse assembled into the pieces.
The matrix A has special properties that are well understood in coding theory. Reed-Solomon codes have traditionally used a Vandermonde matrix for the matrix A, but subsequent work has identified other matrices that have the necessary properties. Rabin provides an example of such a matrix:
      A    =          (              a                  i          ,          j                    )                  a              i        ,        j              =          1                        x          i                +                  y          j                    wherexi+yj≠0 for all i,jxi≠xj and yi≠yj for i≠jThis matrix is known as a Cauchy matrix. A Cauchy matrix has the important property that each square sub-matrix created from it is invertible. This is an important property since losing encoded pieces is mathematically equivalent to deleting rows from matrix A and segment recovery depends upon the invertability of the resulting row deprecated matrix.
Another important property of Cauchy matrices is that they can be inverted in O(n2) operations instead of the O(n3) operations required to invert general matrices. This property of Cauchy matrices is important for applications that must perform the matrix inversion as part of the recovery process. But for many applications, the number of possible inverse matrices is small and can be pre-calculated, stored and quickly looked up when a particular recovery scenario arises. When inverse matrices are pre-calculated, the O(n2) advantage of Cauchy matrices is diminished.
A Cauchy matrix does not result in particularly efficient encoding and decoding, however, for a couple of reasons. First, it does not result in what is called a systematic encoder in Coding Theory. Systematic encoders have the property that the first m of n encoded pieces produced by the encoder, called data pieces, are identical to the first m segments. The pieces that bear no resemblance to the segments are called ECC pieces. Systematic encoders create n-m ECC pieces. Using a Cauchy matrix, a non-systematic code, all n encoded pieces are ECC pieces and the decode operation is required as part of every data retrieval. A systematic encoder is more efficient in that fewer matrix multiplications are needed to encode and decoding is not required unless pieces have been lost.
Secondly, the Cauchy matrix approach is not as efficient as parity techniques when recovering from single failures. Parity techniques only require the XOR of all the surviving pieces in order to recover a missing segment. IDA using a Cauchy matrix requires the more complex calculations of a linear transformation consisting of multiplication and XOR operations over GF(2q). This difference in performance for single failure recovery is important because even in systems designed to withstand multiple failures, the single failure case is expected to dominate.
Improving the computational performance of linear transformations like that used for erasure correcting codes has been a topic of research. At the time that IDA was first disclosed, special hardware was considered the most promising way to improve performance. Bestavros [A. Bestavros, SETH: A VLSI chip for the real-time Information Dispersal and retrievalfor security and fault tolerance, Technical Report, TR-06-89, Harvard University, (January, 1989.)] discusses the design of a VLSI chip to offload IDA data manipulations. By the mid-1990s, however, the performance of general purpose processors had progressed enough that software implementations of linear codes became practical. Blömer et al [J. Blömer, M. Kalfane, R. Karp, M. Karpinski, M. Luby and D. Zuckerman. An XOR-based Erasure-Resilient Coding Scheme. Technical Report, International Computer Science Institute, Berkeley, Calif., 1995] describe such a software implementation prototyped on a Sun SPARCstation 20 workstation.
Blömer was able to achieve considerable performance gains through two improvements. The first improvement was to use a Cauchy derived matrix that was systematic. The second improvement was to map matrix operations over GF(2q) to equivalent operations over GF(2). In the GF(2) representation of the coding matrix, the arithmetic operations are AND and XOR on single bits rather than multiply and XOR on q-bit quantities. While operations on single bits may seem to be less efficient than operations on q-bit quantities (where q is typically limited to either 8 or 16), processors with 32-bit words can perform the operations in parallel, 32 bits at a time. Thus Blömer makes the connection between linear transformations over GF(2q) with XOR operations that can be efficiently implemented by general purpose processors.
While Blömer teaches how to create a systematic code, the code is still not as efficient as parity for the single failure case and is therefore not a practical replacement for parity techniques. In addition, the prototype implementation does not exploit nor teach optimizations that are in the present invention.