Data compression for network data transfer is necessary in order to reduce the amount data transferred over the network. The data compression improves network response time and allows the network to accommodate a larger amount of data over the same network infrastructure.
A number of data compression methods are in use today for networks such as “deflate” and “gzip”. Most standard compression techniques work on the basis of a code table or dictionary used to map codes to characters. The well-established lossless data compression methods are broadly grouped as “dictionary” and “entropy encoding” compression.
Dictionary encoding techniques such as the Lempel-Ziv family of algorithms maintain a dictionary of symbol-to-data mappings. Portions of the data to be compressed that match entries in the dictionary are replaced by the symbols. These dictionaries may be built statically and used to encode data, or dynamically, where during the encoding process the dictionary is updated and optimized.
Entropy encoding algorithms, like the Huffman Coding technique, build variable-length codes which replace portions of the data to be compressed. This is optimized by ensuring the most frequently occurring patterns of data are substituted by the shortest codes. For example, if the single characters are replaced with the codes, then the space or ‘e’ character is likely to be replaced with the shortest code in the code set. These code-to-data mappings are stored in a code-table or dictionary. The Huffman Coding technique uses a tree data structure to represent this dictionary.
While sending compressed data over a network, each chunk of data is compressed and sent along with a dictionary. During a network session or conversation, each chunk of compressed data that is transmitted contains its own unique dictionary which will be used during decompression. Quite often the session involves the transfer of a number of similar chunks of information. For example, a web browser surfing a website is likely to download a number of similar pages. In fact, any structured data such as html and xml would likely contain a lot of similar contents. This however is not limited to just web browsing, but can be extended to any situation where there is a session-based transfer of compressed data over a network.
As an example, consider the data in a group of html pages being compressed and downloaded. It is quite likely that the html tags such as table tags would be entries in the dictionary. After the first chunk of compressed data is downloaded to a browser, the subsequent data chunks are likely to contain similar entries in the dictionaries for html tags and other commonly occurring sets of information. As a result, a significant number of chunks of compressed information have very similar or even identical dictionaries.
In a session environment where a client computer is able to maintain a state or context, the repetitive dictionary entries become redundant. Current network data transport techniques do not take advantage of such redundant data and network session features in optimizing the data encoding during the transfer of data.
Accordingly, it is appreciated that there exists a need for a method and system for transport data compression that optimizes performance based on the redundant transferred data and network session characteristics.