The present invention relates to in-place differential compression, where a body of data T is compressed with respect to a body of data S by performing copies from S in such a way that no additional memory is used beyond what is needed to store the longer of S or T; that is, when decoding, S is overwritten from left to right as T is constructed in its place.
There have been many patents and technical papers that pertain to data compression. Many relate to techniques different than ones that employ string copying such as Huffman coding (e.g., U.S. Pat. No. 4,646,061) or arithmetic coding (e.g., U.S. Pat. No. 4,905,297). Many relate to techniques that employ string copies but in a traditional data compression model where a single body of data is compressed, not in-place differential compression of a first body of data with respect to a second body of data; for example, U.S. patents such as Holtz [U.S. Pat. No. 4,366,551], Welch [U.S. Pat. No. 4,558,302], Waterworth [U.S. Pat. No. 4,701,745], MacCrisken [U.S. Pat. No. 4,730,348], Miller and Wegman [U.S. Pat. No. 4,814,746], Storer [U.S. Pat. Nos. 4,876,541, 5,379,036], Fiala and Greene [U.S. Pat. No. 4,906,991], George, Ivey, and Whiting [U.S. Pat. Nos. 5,003,307, 5,016,009, 5,126,739], Rubow and Wachel [U.S. Pat. No. 5,023,610], Clark [U.S. Pat. Nos. 5,153,591, 5,253,325], Lantz [U.S. Pat. No. 5,175,543], Ranganathan and Henriques [U.S. Pat. No. 5,179,378], Cheng, Craft, Garibay, and Karnin [U.S. Pat. No. 5,608,396], and technical articles such as Lempel and Ziv [1977, 1979] and Storer [1978, 1988, 1982, 2002].
There have also been a number of patents and technical papers relating to differential compression that do not perform decoding in-place; for example: Squibb [U.S. Pat. Nos. 5,479,654, 5,745,906], Morris [U.S. Pat. No. 5,813,017], Muralidhar and Chandan [U.S. Pat. No. 6,233,589], Thompson, Peterson, and Mohammadioun [U.S. Pat. No. 6,671,703], and technical articles such as Weiner [1973] (who developed a linear time and space greedy copy/insert algorithm using a suffix tree to search for matching substrings), Wagner and Fischer [1973] (who considered the string-to-string correction problem), Heckel [1978] (who presented a linear time algorithm for detecting block moves using longest common substring techniques), Tichy [1984] (who used edit-distance techniques for differencing and considered the string to string correction problem with block moves), Miller and Myers [1985] (who presented a comparison program for producing delta files), Fraser and Myers [1987] (who integrated version control into a line editor so that on every change a minimal delta is retained), Reichenberger [1991] (who presented a greedy algorithm for differencing), Apostolico, Browne, and Guerra [1992] and Rick [1995] (who considered methods for computing longest common subsequences), Burns and Long [1997b] (use delta compression to modify ADSM, Adstar Distributed Storage Manager of IBM, to transmit compact encodings of versioned data, where the client maintains a store of reference files), Hunt, Tichy and Vo [1998] (who combine Lempel-Ziv type compression and differential compression to compute a delta file by using a reference file as part of the dictionary to compress a target file), Factor, Sheinwald and Yassour [2001] (who present a Lempel Ziv based compression with an extended dictionary with shared data), Shapira and Storer [2002] (who give theoretical evidence that determining the optimal set of move operations is not computationally tractable, and present an approximation algorithm for a block edit-distance problem), Agarwal, Amalapurapu, and Jain [2003] (who speed up differential compression with hashing techniques and additional data structures such as suffix arrays).
There has also been commonly available software available for differencing that does not employ in-place decoding with string copying, such as the UNIX diff, xdelta and zdelta utilities.
Burns and Long [1997], M. Ajtai, R. Burns, R. Fagin, and D. D. E. Long [2002], and the U.S. patent of Ajtai, Burns, Fagin, and Stockmeyer [U.S. Pat. No. 6,374,250] use a hash table with Karp-Rabin footprints to perform differential compression of one file with respect to another, using constant space in addition to that used by both files, but do not provide for in-place decoding.
Burns and Long [1998], Burns, Stockmeyer, and Long [2002], and the U.S. Patent of Burns and Long [U.S. Pat. No. 6,018,747] present an in-place reconstruction of differential compressed data, but do not perform the reconstruction with copies that overwrite from left to right. They begin with a traditional delta file and work to detect and eliminate write-before-read conflicts (increasing the size of the delta coding).
The invention disclosed here is in part motivated by the research presented in Shapira and Storer [2003].