1. Field of the Invention
The present invention is generally related to computer networks. More particularly, the present invention is related to systems and methods for compressing packet data.
2. Related Art
Presently, data compression is useful in many applications. One example is in storing data. As data is compressed to a greater extent, more and more information can be stored on a given storage device. Another example is in transferring data across a communication network. As bandwidth in communication networks is generally viewed as a limited resource, minimizing a size of units of data being sent across the communication network may increase performance of the communication network.
One class of data compression is known as lossless data compression. Lossless data compression allows exact copies of original data to be reconstructed from compressed data. Lossless data compression is used, for example, in the popular ZIP file format and in the Unix tool gzip. Additionally, some image file formats, such as PNG or GIF, use lossless data compression.
A popular technique for lossless data compression is known as LZ77. The basis for LZ77 was developed in 1977 by Abraham Lempel and Jacob Ziv. LZ77 is a substitutional compression algorithm, which operates by effectively identifying repeated patterns in an original version of a data file (or other unit of data) to be compressed, removing the repeated patterns, and inserting pointers to previous occurrences of the repeated patterns in the data file. The pointers may each include a pair of numbers called a ‘length-distance pair,’ which may sometimes be referred to as a ‘length-offset pair.’ The length may specify a length of a repeated pattern being removed, whereas the distance or offset may be indicative of a separation between the first occurrence of the repeated pattern and a subsequent occurrence of the repeated pattern being removed. The length and distance may be provided in various manners such as in bytes or characters. The resulting compressed data file may be significantly smaller than the original version of the data file. However, the compressed data file can be decompressed such that the resulting data file is an exact copy of the original version of the data file.
Commonly, data that is transferred across communication networks is divided into packets, also known as datagrams. A packet may be described as a unit of information transmitted as a whole from one device to another via a communication network. In packet switching networks, for example, a packet may be described as a transmission unit of fixed maximum size that consists of binary digits representing both data and a header. The header may contain an identification number, source and destination addresses, and error-control data. To illustrate, a file may be sent by a sending device on one side of a communication network to a receiving device on another side of the communication network. Prior or concurrent to sending, the file may be divided into packets. Subsequently, the packets may be received and reassembled by the receiving device to obtain the file.
One class of compression methods called symbolwise methods, also sometimes referred to as statistical methods, operate by estimating the probabilities of symbols (such as text characters or binary data), coding one symbol at a time, and using shorter codewords for the most likely symbols. Morse code is an example of a symbolwise method. The more accurate the probability estimate, the greater the amount of compression that can be achieved. Taking into account the context in which a symbol occurs may also help the probability estimate accuracy, thereby enhancing compression.
In adaptive compression schemes, the input to the coder is compressed relative to a model that is constructed from the text that has just been coded. LZ methods are one example of adaptive compression techniques. The model serves to predict symbols, which amounts to providing a probability distribution for the next symbol that is to be coded. The model provides this probability distribution to the encoder, which uses it to encode the symbol that actually occurs. Predictions can usually be improved by taking account of the previous symbol. Models that take a few immediately preceding symbols into account to make a prediction are called finite-context models of order m, where m is the number of previous symbols used to make the prediction.
There are many ways to estimate the probabilities in a model. Static models always use the same model regardless of what text is being coded. Semi-static models generate a model specifically for each file that is to be compressed. Adaptive models begin with a general probability distribution and then gradually alter it as more symbols are encountered. The encoder and decoder keep a running tally of the number of instances of each symbol so that they may calculate the same probability distributions.
An adaptive model that operates character by character, with no context used to predict the next symbol, is called a zero-order model. The probability that a particular subsequent character will occur is estimated to be the number of prior instances of that character divided by the total number of prior characters. The model provides this estimated probability distribution to an encoder such as an arithmetic coder. The corresponding decoder is also able to generate the same model since it has decoded all of the same characters up to that point.
For a higher-order model, the probability is estimated by how often that character has occurred in the current context. For example, in a first-order model, the prior character received is used as a context basis. If the character to be encoded is an l, and the prior character received is an a, the first order model would calculate how many times previously an a was followed by an l, to estimate the probability of an l occurring in this context. In a second-order model, the prior two characters received is used as the context basis. The prior characters ca would be evaluated for how often that string of characters was followed by an l. Generally, the higher the order of a model, the more likely that a more accurate probability will be calculated, thus allowing the information to be encoded in fewer bits of data. As long as the encoder and decoder use the same rules for adding context and the context used is based on previously encoded text only, the encoder and decoder will remain synchronized, thus allowing for an exact replica of the original text to be reproduced by the decoder.
Converting the probabilities into a bitstream for transmission is called coding. Symbolwise methods often use Huffman coding or arithmetic coding. An arithmetic coder stores two numbers, a low value and a high value, to represent the range of the probability distribution of the character to be encoded. Thus, a string of characters is replaced with a number between zero and one. The number is assigned based on the probability of the particular character appearing again in the string of characters. A probability of one indicates that the character is certain to occur, whereas a probability of zero indicates that the character is certain to not occur. The arithmetic coding step involves narrowing the interval between the low value and high value to a range corresponding to the probability of the character to be coded appearing again in the string of characters, and then outputting a value or symbol that is within the narrowed range.
The decoder simulates what the encoder must be doing. When it receives the first transmitted value or symbol from the encoder, it can see which range the value falls under and thus see the character that corresponds to that probability range. It then narrows the probability range for the subsequent character, just like the encoder does. Thus, when the second value or symbol is received, the decoder has a similar probability range as the encoder did when encoding the symbol, so it can see which range the value falls under, and thus what the original character was. Decoding proceeds along these lines until the entire character string has been reconstructed.
In natural languages, such as English, research has shown that the probability of the next character to appear in a string is highly dependent on the previous characters. Prediction by partial matching (PPM) is one method of predicting the next character in a string of characters based on the previous characters in the string. It is an adaptive statistical data compression technique that uses a set of previous characters in an uncompressed string of characters to predict the next character in the string. Using PPM with arithmetic coding can improve the compression rate, thus allowing a string of characters to be represented with even fewer bits.
Instead of being restricted to one context length (only first-order models or only second-order models), PPM uses different contexts, depending on what contexts have been observed in the previously coded text. For example, if the word that is to be encoded is political and the politica has previously been encoded, such that the next character is an l. The model may start with a context of the previous five characters to try to make a prediction. Thus, the model would look for instances where itica has previously occurred. If this string of characters is found, then the model would calculate the probability that the next letter after this string is an l, and encode a value associated with that probability. If, however, no match is found in the previously encoded characters for itica (i.e. this combination of characters has not occurred yet), then the model switches to a context of four characters. Thus, the model searches the previously encoded text for tica. Searching continues in this way until a match is found in the prior text.
If the model finds that the prior string tica has occurred before, but it has never been followed by an l, then this is a zero-frequency situation. Since the probability of an l occurring cannot be zero, a special “escape” symbol is sent by the encoder that tells the decoder that the symbol cannot be coded in the current context and that the next smaller context should be tried. Once the escape symbol is transmitted, both the encoder and decoder shift down to the smaller context of three symbols. Thus, one bit of data has been transmitted so far (the escape symbol) and one coding step completed. The model then searches for ica in the prior text. If this string is found, the probability of this string being followed by an l is calculated. In total, two encoding steps were required to encode this letter I. During the early parts of the text, while the model is still learning, it is unlikely that higher-order contexts will be found to be a match. Conversely, once the model is up to speed, it is unlikely that any of the lower-order contexts will be required.