1. Field of the Invention
The invention relates to data communications, in particular to encoding and decoding data.
2. Description of the Related Art
In general, compression is a reversible conversion of data to a format that requires fewer bits, usually performed so that the data can be stored or transmitted more efficiently. The size of the data in compressed form (C) relative to the original size (O) is known as the compression ratio (R=C/O). If the inverse of the process, decompression, produces an exact replica of the original data then the compression is lossless. Lossy compression, usually applied e.g. to image, audio or video data, does not allow reproduction of an exact replica of the original image, but has a higher compression ratio. Thus lossy compression allows only an approximation of the original to be generated. For image compression, the fidelity of the approximation usually decreases as the compression ratio increases.
Compression relies on the fact that the data is redundant. Data compression makes a file smaller by predicting the most frequent bytes and storing them in less space. Thus a compressor is typically made of at least two different tasks: predicting the probabilities of the input and generating codes from those probabilities, which is done with a model and a coder respectively. The success of data compression depends largely on the data itself and some data types are inherently more compressible than others.
A device (software or hardware) that compresses data is often known as an encoder or coder, whereas a device that decompresses data is known as a decoder. A device that acts as both a coder and decoder is known as a codec.
Data compression is used in a very wide variety of networks (for example telecommunication and data networks) and applications. For example, ubiquitous telecommunication requires a dense radio access network with thousands of base stations and other units. Their software advances continuously due to functional enhancement or bug fixing. Usually the binary size of software increases from version to version. On-site software upgrades would boost the costs for the network operators. Therefore, a download of new software versions is done remotely via inband or dedicated connections from a central FTP (File Transfer Protocol) server. Since operators allocate only narrow portions of the expensive bandwidth for software upgrades, the download times are considerably high even if conventional compression techniques are used.
There exists a large amount of various data compression algorithms. One efficient encoding method is delta encoding. Delta encoding refers to several techniques that store data as the difference between successive samples (or characters), rather than directly storing the samples themselves. Therefore, for example, only differences (delta) between an old software and a newer version of the software are needed to be downloaded to network elements in the radio access network. The network element is able to generate the newer version of the software with a patching algorithm based on the difference file (delta file) and the old software already available in the network element.
One of the most interesting delta encoding algorithms is BSDiff. The compressed delta file is usually very small and contains only differences between two software versions. A delta compressor generates from old (reference) and new (target) files a delta file. The delta file can then be transmitted to the target in a reasonable short time, which than builds the new software version out of the old software version by applying the patch (delta file). In other words, a patching algorithm generates the target file from the reference and delta files.
The following discusses how a search window based technique works. The compressor has two buffers (windows): a search window and a look-ahead window. The look-ahead window contains uncompressed data that has not been processed yet. Data that has been compressed is moved from the look-ahead window to the search window. The search window holds data in an uncompressed way. The compressor tries to find the longest match within the search window. When such a string is found, the compressor encodes the string by a reference to the string within the search window. FIG. 1 discloses search and look-ahead windows and a reference pointer.
The old executable file (reference file) is typically stored in a compressed way on a flash bank, but the patching algorithm needs to seek (jump) within the old file. The target device or system is not able to load the uncompressed file into its RAM (Random Access Memory) due to lack of memory space. This is a usual case for example for embedded systems.
A problem with current solutions is that decompression cannot be started within the compressed file, due to the fact that the decompressor is building up a search window during the decompression. Additionally a statistical decoder (for example a Huffman decoder) is building up the symbol statistics while it traverses the file. After jumping to a specific position both the search window and the statistical decoder do not contain valid information resulting in wrong output. In practice, seeking e.g. backwards would mean to start the decompression from the beginning of the file until the desired position is reached.
Furthermore, for the patching with e.g. the BSDiff algorithm random access for the reference file is definitely needed. For a big compressed file with several thousand seeks, the run time, when applying the patch, will be significantly high.