Traditional entropy encoding compression algorithms (such as Huffman coding, adaptive Huffman coding or arithmetic coding) depend on having a statistical model of the input stream they are compressing. The more accurately the model represents the actual statistical properties of symbols in the input stream, the better the algorithm is able to compress the stream. Loosely speaking, the model is used to make a prediction about what input symbol will come next in the input stream. For example, if the input stream is English-language text, the model would assign a higher probability to the letter ‘e’ than to the letter ‘Q’ (usually).
The probability model can be static (i.e., unchanging for the duration of a compression process) or adaptive (i.e., evolving as the compressor processes the input data stream). The probability model can also take into account one or more of the most recently encountered input symbols to take advantage of local correlations. For example, in English text, encountering a letter ‘Q’ or ‘q’ in the input stream makes it more likely that the next character will be ‘u’.
An adaptive model typically works by matching the current input symbol against its prediction context, and if it finds the current input symbol in its context, generating a code representing the particular probability range that the input symbol represents. For example, if the current input symbol is ‘e’ and the model predicts that the probability of ‘e’ is in the range 0.13 to 0.47, then the compressor would generate an output code representing that probability range. Once the symbol is encoded, the compressor updates the probability model. This “code and update” cycle is repeated until there are no more input symbols to compress.
One commonly used adaptive model is called “prediction by partial matching” (PPM). The PPM coding model counts the occurrence of each symbol in the contexts in which it occurs. One problem with PPM coding model encounters is determining how to account for codes that do not occur in the context driving your coding. When the compressor encounters a new symbol for which its model has no prediction, it must do something else. Consequently, a solution to address new symbols by the model includes encoding a special “escape” symbol to signal to the decompressor that the next symbol is a literal value. The escapes are used because the other option of including every possible symbol in every context leads to a poor performance (including possible data expansion).
Typically, at the beginning of encoding using an adaptive coding model, a substantial amount of escape tokens are emitted and, thus, more bits may be needed to encode these escape tokens. Every entropy encoder strives to reduce the number of bits used to represent a block of data by trying to model the probabilities of the data items being coded (typically, this is in bytes, but it can be bits or words). The more accurate this model, the fewer bits are needed for the encoding. Therefore, a mechanism to model escape counts in adaptive compression models in a way that reduces the number of bits used to represent a block of data by accurately representing the escape probability would be beneficial.