Popular computer programs, including computer operating system software, are subject to near-constant revision. Their evolution is sometimes so rapid that, a month after installation, a newer version is available. The newer version may feature additional capabilities, bug fixes, and enhanced compatibility with other programs. Accordingly, many users desire to update their programs each time a newer version is released.
A user wishing to update a computer program can either acquire a new copy of the program, or “patch” the old. Patching is growing in popularity, particularly with the growth of the internet. Patches for updating many popular computer programs are now commonly available from software vendor's web sites, allowing users to update their software programs without leaving home.
Patching is an old technology, going back decades. Generally, patch files include a series of instructions specifying how a new version of a file can be assembled from snippets of data from an old version of the file, together with insertions of new data. An exemplary series of patching instructions may look like the following:                1. Load old file ABC.EXE into memory;        2. Check that the file data at offset 16 reads “Version 2.04”; if not, fail;        3. Copy bytes 1 through 16 of the old file into a new file;        4. Next, insert the ASCII text “Version 3.02” into the new file;        5. Next, copy bytes 22–256 of the old file into the new file;        6. Next, insert the following hex data into the new file:        
09030001606BF5D53B591A10B5690800                7. Next, copy bytes 289–496 of the old file into the new file;        8. Next, copy bytes 505–512 into the new file;        9. Close the new file and store under name ABC.EXE.It will be recognized that the foregoing instructions result in an new version of file ABC.EXE in which:        the first 16 bytes are unchanged;        the version number stored at bytes 17–28 has been rewritten from “Version 2.04” to “Version 3.02”        bytes 22–256 are unchanged;        32 bytes of hex data at bytes 257–288 have been rewritten;        bytes 289–496 are unchanged;        bytes 497–504 have been omitted; and        bytes 505–512 have been shifted to immediately follow byte 496.        
Due to the replication of long strings of data from the old file in the new file, the series of patching instructions is much shorter than the file being patched. This size economy is the reason patching is more popular than transferring an entire copy of the new file.
The process of generating patching instructions, like those reproduced above, is typically automated. The vendor inputs copies of the new and old program file to a pattern matching algorithm, which tries to locate where strings of data in the new file can be found in the old file. Where such matches are found, appropriate copy instructions are generated and added to the collection of instructions that will form the patch file. Where data in the new file has no counterpart in the old, the new data is literally specified in the patch file. When completed, the patch file—in conjunction with the old version of the file—contains all the information necessary to generate the new version of the file.
After the patching instructions have been specified in a patch file, the file is typically compressed to minimize its size and download time (assuming an internet or other network download). Many suitable compression processes are known. Various implementations of the popular LZ compression algorithms typically reduce file sizes on the order of 50%.
After the patch file is compressed, it is transferred from the vendor's computer to the user's computer—by internet in this example. On the user's computer a decompression process is first performed to restore the patching instructions to their original form. Then the various operations specified by the patching instructions are performed, transforming a copy of the user's old file into the latest version.
While the just-described process is a great improvement over transferring a new copy of the complete program file from the vendor to the user, it still suffers from certain drawbacks.
One is the size of the compressed patch file. As discussed below, patch file sizes considerably smaller than those resulting from prior art processes are possible, reducing internet download times (or reducing needed disk storage) commensurately.
Another problem is that the version of the old file on the user's computer may not precisely match the version distributed by the vendor. In particular, the file may have been tailored in certain respects—at the time of installation on the user's computer—to better conform to particular characteristics of the user's computer. Thus, for example, a program file as installed on a single-processor computer may be slightly different than the “same” program file as installed on a multi-processor computer. Unless the precise contents of the file as installed on the user's computer are known, patching is a risky business.
When a software vendor knows that there are several different versions of a file to be updated, the vendor may publish a multi-version patch file. Such a patch file can be a concatenation of several different sets of patching instructions, each one applicable to a different version of the file. The drawback of this approach is that half, or more, of the patch file is superfluous data—inapplicable to the file stored on a particular user's computer. Thus, its download time is far longer than is really necessary.
Another type of multi-version patch file has a general set of patching instructions (for code that is consistent through all versions of the old file), together with one or more specialized sets of patching instructions (for code that is different between different versions of the old file). Branch instructions in the patching file examine particular characteristics of the old file, and apply the appropriate set of specialized patching instructions.
Again, this approach suffers by reason of more patching data than is needed for any given user.
In accordance with a preferred embodiment of the present invention, the foregoing and additional drawbacks of the prior art are overcome. The two distinct operations of pattern matching and compression (performed on the vendor's computer in prior art patch generation techniques) are replaced by a single operation that both compares old and new file versions, and produces a compressed output by which the latter can be generated from the former. Likewise, the two distinct operations of decompression and patch instruction application (performed on the user's computer in the prior art) are replaced by a single operation that both decompresses the patch file data and results in recreation of the new file. The patch file generated and used in these processes is of considerably reduced size—sometimes half the size of compressed patch files produced by prior art approaches.
In the preferred embodiment, these advantages are achieved by use of compression/decompression processes in which the compressor (and decompressor) is pre-initialized in accordance with the old version of the file being updated. In implementations using LZ77-type compression, this pre-initialization takes the form of preloading the respective compressor/decompressor history windows with the old version of the file. On the vendor side, the new file is applied to the pre-initialized compressor, yielding the patch file as output. The compressor both identifies redundancies between the new file and the old file (with which the compressor's history window has been preloaded), and provides a highly compressed output. On the user's side, the patch file is decompressed using a parallel process.
(LZ77 is a form of adaptive dictionary compression named after Lempel/Ziv's 1977 paper “A Universal Algorithm for Sequential Data Compression,” IEEE Trans. Info. Theory, IT-23 (3), pp. 337–343. Many variants of this technology are known, including LZR (Rodeh's 1981 implementation), LZSS (Bell's 1986 implementation), LZB (Bell's 1987 implementation), LZH (Brent's 1987 implementation), etc. Further details can be found in the book Text Compression by Timothy Bell et al, Prentice Hall, 1990, and in Microsoft's U.S. Pat. Nos. 5,572,206, 5,521,597, and 5,455,577. A searching technique for identifying matches within the history window is disclosed in pending application Ser. No. 08/783,491, filed Jan. 14, 1997. The disclosures of these patents and patent application are incorporated by reference.)
The same technique is similarly applicable to non-LZ77 compressors. For example, in LZ78-type compressors, pre-initialization can be accomplished by first applying the old file to the compressor, thereby causing the compressor to build a string dictionary comprising excerpts of the old file. The new file is thereafter applied to the same compressor. The pre-initialization of the compressor's string dictionary allows it immediately to effect high compression efficiencies due to matches between the new file and the pre-initialized string dictionary.
(LZ78 is another form of adaptive dictionary data compression, this one named after Lempel/Ziv's 1978 paper, “Compression of Individual Sequences Via Variable-Rate Coding,” IEEE Trans. Info. Theory, IT-23 (5), pp. 530–536. Many variants of this, too, are known, including LZW (Welch's variation), LZFG (Fiala and Green's variation), and UNIX Compress.)
In LZ78-type systems, the pre-initialization of the decompressor is slightly more complex than in LZ77-type systems. Rather than copying the old file directly into a history window, a string table must be formed, duplicating the string table in the compressor after pre-initialization (i.e. when the new file was applied). In the preferred embodiment, the user's computer is provided with both a decompressor and a compressor. The compressor is used to process the old file (from the user's computer)—just as was done at the vendor's computer—thereby producing a string table. This string table is then used in the decompressor (effecting its “pre-initialization”) for decompression of the patch file from the vendor.
The same pre-initialization approach can be applied to Markov model compressors. Again, the old file is first applied to the compressor. The compressor generates probability data statistically modeling the old data file (e.g. calculating the probability of encountering a symbol X after seeing a certain number of previous symbols). When the new file is thereafter applied to the pre-initialized compressor, the existing probability data allows immediate compression efficiencies, producing a much more compact output file. This file is transferred to the user's computer. Again, as with LZ78, the user's computer has a compressor as well as a decompressor. Again, the copy of the old file on the user's computer is applied to the compressor, thereby generating the probability data with which the decompressor is pre-initialized. The compressed file from the vendor is then applied to the pre-initialized decompressor, regenerating the complete new file on the user's computer.
The preferred embodiment also addresses variant installations of the old file on different computers, so that a single patch file can be applied irrespective of such variations. By so doing, the need for a multi-version patch file is eliminated, further reducing the size of the patch file when compared with prior art techniques.
In the illustrated embodiment, such file variations are “normalized” prior to application of the patch file. A temporary copy of the old file is desirably made, and locations within the file at which the data may be unpredictable due to idiosyncrasies of the file's installation are changed to predictable values. So doing assures that the data to which the patch file is applied will be essentially uniform across all computers.
Additional features and advantages of the present invention will be more readily apparent from the following detailed description, which proceeds with reference to the accompanying drawings.