The “updating” or changing of software program files and data files is a normal process in computer science. For instance, updates or revisions to software programs and other files are routinely required to eliminate bugs found during usage or to add newly developed features. Sometimes these revisions may be relatively minor, involving changes in only a small percentage of the data that makes up the file. In other cases, the revisions may be much more extensive and require additional updating technique steps.
One way to update these files involves creating a completely new file containing all of the desired changes. These new files may then be distributed to the users to replace the existing files. In addition to physically distributing the files using floppy discs, CDs or DVDs, these relatively large files may be distributed from the software manufacturers to the users via a data communications network such as the Internet.
One obstacle to the frequent revision of large computer files by a manufacturer is the cost of delivering the updated file to the user. With new revised files, the amount of data can be substantial. For example, large files typically are as large as ten million characters (10 Megabytes) or larger. The distribution of such large files over a medium such as the Internet can take an undesirably long time from the point of view of the customer and can consume a large amount of server resources from the point of view of the file provider.
One solution to the problem of distributing large computer files over networks such as the Internet is the use of differencing programs or comparator algorithms. These applications compare an old file to a new revised file in order to determine how the files differ. Once identified, only the differences between the two files are transmitted.
One example of such a technique includes the “RSYNC” algorithm (“rsync”), which is utilizable with any conventional operating system including, for example, UNIX-like and Microsoft Windows operating systems. Rsync has proven to be extremely useful in comparing files whose content differs only partially. Generally speaking, rsync compares an original or “seed” file at a client computer with a revised or “target” file at a server and “notices” differences between the two files using checking data (e.g., checksums and the like). Specifically, rsync identifies these differences by generating checking data for blocks of the seed file at the client, which it uses to compare against checking data for blocks of the target file generated at the server. Matches in checking data indicate identical blocks, while differences suggest that changes have been made. Rsync then downloads only those parts of the target file that are actually new, while using any parts of the seed file that are unchanged from the target.
One drawback of the rsync algorithm is that generating the checking data at the server requires a large amount of processing by the server CPU. Thus, the server CPU may become overloaded when any more than just a few clients attempt to run the rsync algorithm. In these cases, the network bandwidth overload sought to be addressed by rsync is replaced with a CPU processing overload resulting in negligible improvements in the situation.
Another technique that is commonly used in the downloading of data, sometimes in conjunction with comparator algorithms like rsync, includes compression. Basically speaking, compression recognizes and eliminates redundancy in the data (i.e., repetitive or identical patterns of bits) to allow reductions in the amount of data to be stored or transmitted. Compression algorithms operate by generating a “history” associated with a piece of repetitive data. These histories are then referred to each time the repetitions are encountered to create a compressed form of the data. While compression is, in many cases, effective in reducing the amount of data to be transmitted, changes to just a few bytes in the beginning of an updated or revised file can result in a compressed file that is entirely different from the compressed version of a file to be updated (even though the uncompressed versions of the original file and revised file may be quite similar). As a result, this tends to defeat much of the optimization offered by comparator algorithms like rsync, which rely on similarities between the original and revised files.