When a software vendor wants to provide a set of one or more files to its customers, such as a new product release or relatively large upgrade, the file or files may be merged into an archive to make a single package of the related contents, wherein a package is generally some collection of data files used as a set. Often the archive is made into a self-extracting archive, by adding executable code which when executed extracts the contents of the package back into the set of files that were previously merged. The self-extracting code may also initiate a setup procedure, typically by executing one of the files that was just extracted, which in turn copies the files to appropriate locations on a customer's computer. Upon completion of the setup procedure, the self-extracting code deletes the extracted files, and then terminates. In most cases, this allows an entire product feature or update to be retrieved as a single file object, which can be directly executed to access or install the product's contents.
The archive process ordinarily will use some sort of data compression to reduce the size of the archive, which reduces the costs of distribution and retrieval, particularly for large archives. One such compression technique compresses the files separately, providing the customer with access to any individual file as needed. The size of such a package is generally the sum of the compressed sizes of each included file, plus the size of the extraction code. Upon execution, the package extracts each of the compressed files to a temporary location, from which a user can copy each file to a proper location in the system's directory.
For packages where individual file access is not necessary, such as when a setup procedure is automatically run to install the extracted files, package compression is further improved by the use of cabinet (or CAB) files, in which the files are essentially appended to one another (concatenated) prior to compression. This improves encoding efficiency with LZ-based encoders, (which are well-known types of dictionary encoders named after originating work done by Lempel and Ziv), because with LZ encoding, compression of an input data stream depends on a preceding portion of the input data stream known as the history, and the concatenation of the files increases the amount of history data that is available. Note that with compressed files, the compressed data is decompressed during extraction, so that the files are in their original form before the setup procedure runs to operate on those files.
Even with compression techniques, packages can be large relative to the amount of data that can be conveniently transmitted over a network, for example. For customers not having broadband network access, the large size of packages makes it impractical or at least very inconvenient to download such packages. Some customers have to pay long-distance or connection time charges to download data, and others may have quotas on the amount of data that can be downloaded and/or a limit on the connection time of a session. Other customers simply will not bother with downloading large files over a modem. Large file downloads are further vulnerable to network connection problems that terminate a session. For such customers, large package distribution is a problem.
The package vendors also have costs that are relative to the size of downloads they provide. For example, distributing large files requires a significant amount of network server equipment, which is expensive. CD-ROMs are often made available at the vendor's expense for some customers. Even distribution over the internet has variable costs which increase when larger packages are transmitted.
An improved way of providing updates that reduces the amount of data that needs to be transmitted is described in U.S. Pat. No. 6,493,871. In this approach, a client (customer) computer first obtains from a setup server an initial setup package that includes a setup program and a list of files required for installing the software product. A setup program on the client computer then determines whether some current or earlier versions of those files required for installation already exist on the client computer, and compiles a request list of files needed for updating the client computer. The client computer sends the request list to a download server, which maintains a collection of update files and patches, and responds to the request list by transmitting an appropriate set of files that are needed for updating to the client. One or more of the files may be in the form of patches, in which a patch is a small data file derived from an earlier version of a file and a newer version of that file. The patch can be applied to a copy of the earlier file version already at the client computer to produce the new version, eliminating the need to download the full new version.
While such data compression this can significantly reduce the amount of data that a client has to download, this technique also has a number of drawbacks. For one, such binary patching, also referred to as delta compression, only works when the vendor knows (or can safely assume) which representations of a file are already available at a given client's computer. This is not always possible, such as with a CD-ROM or other fixed distribution scheme. Note that it is feasible to have a single generic archive update various versions of files that a vendor's customers may be using by including multiple files in the archive for each different version, one of which can be applied to any given version of a file that a particular client may have. However this is also not efficient, and is not practical or manageable in situations where there are a large number of files (e.g., on the order of hundreds or even thousands) that need to be updated via a package. Much of the savings achieved via delta compression would be lost by having to deal with multiple versions for large numbers of files.
In summary, conventional compression is costly and/or inadequate to many users and vendors because the sizes of the resultant compressed packages are still too large for easy distribution. At the same time, delta compression has not heretofore worked well for customers and/or vendors who need or want to use self-contained packages that do not require dynamic customization at the server for each customer. What is needed is a way to provide software product data that is highly efficient, yet also substantially self-contained in a package.