1. Field of the Invention
The present invention relates generally to data compression and archiving, file differencing, and patching. More particularly the present invention relates to a system and method for improved differencing on very large files. Still more particularly, the present invention relates to a method to perform the differencing algorithm over blocks of the target file, generating patch blocks in the process.
2. Discussion of Related Art Including Information Disclosed Under 37 CFR 1.97, 1.98
Current differencing technology, which is typically implemented in “differs”, “delta coders”, “delta encoders”, “updaters”, “patchers” and the like, has been available in various forms for some time. The present invention advances the state of the art by processing data incrementally and in an optimal order, thereby producing smaller update packages while using less memory.
Delta/differencing technology solves the need of device and software manufacturers, as well as information providers, to update software, operating systems and related data in the most efficient manner possible. The primary purpose of delta/differencing technology is to reduce the large bandwidth and space/storage costs associated with distributing updated data to existing users of devices, software and data. It does so by using a very efficient process in which only the differences between the old and new data or code are stored and or transmitted.
Differencing algorithms are designed to generate small patch files for similar, potentially very large source and target file pairs. Differencing algorithms benefit from having more source data available for referencing during their operation. In a typical application, patch data is a continuous stream applied in one operation. However, this can be problematic when memory available during patch application is limited.
Accordingly, to ensure that a patch application can be performed on arbitrarily large files in limited memory situations, the present invention provides a block-based differencing system and method that operates on blocks of the target file, generating independent patch blocks in the process. Block size is adaptive and is determined by the following three factors: First, maximum block size is imposed, and that size is based on the memory requirements of the patching algorithm and the desired upper limit on memory consumption. Second, when the number of source file blocks is small, it is beneficial to determine block boundaries in a way that makes all patch blocks equally or similarly sized. When patch blocks are compressed, this avoids inefficient compression of small blocks (the initial learning curve of the adaptive compressor). And third, when the number of source file blocks is large, the last few blocks are made equally or similarly sized, providing an effect similar to the one above.
Additionally, differencing algorithms perform best when their counterpart patching algorithms are allowed to reference the entire source file and the portions of the target file partially reconstructed during patching. On a device where the patching algorithm is executed, this requires enough memory space to accommodate both the source and the target file. Some situations (mobile and other low-end devices using flash memory) require patching to be performed in place because there is not enough memory for both the source and the target files.
Accordingly, the patching algorithm of the present invention divides the target file into blocks, which are processed independently. Each block carries its size and position in the target file, and blocks are put in place one at a time by the patching algorithm. During patching, each patch block is allowed to reference any part of the partially processed source/target file. Blocks can be processed in any order.
The order of block processing by the patch process is selected by the differencing algorithm with the goal of minimizing the sum of the sizes of patch blocks, by either heuristic rules or by exhaustive ordering search to ensure optimization.
Accordingly, the inventive differencing algorithm takes advantage of the incremental nature of patch generation by optimizing its search algorithm. Search data structures used to locate matching portions of the source and the target are updated incrementally, upon completion of each patch block.
Additionally, prior art differencing algorithms typically contain a built-in logic that results in smaller output. However, size reduction achieved this way is less than optimal. In the implementation of the present invention, a separate compression step is applied. In that step a universal compressor is employed.
Accordingly, for differencing followed by a separate compression stage, it is important to match the properties of the compressor to the properties of the differencing output. The optimal combination is achieved when a block-based compression algorithm (such as BWT) operates on entire patch blocks, thus ensuring that there is no misalignment of blocks and no compression loss.