Information theory teaches that entropy is a measure of the disorder of a system and is directly related to the amount of information contained within that system. A low entropy system is highly ordered and can usually be represented in fewer bits of information than a disordered, high entropy system. Information theory further teaches that the entropy of a binary symbol, i.e. the number of bits required to express that symbol, is the negative base-2 logarithm of the probability of the symbol's occurrence:H=−log2(FREQ/CUMFREQ)where H is the entropy, FREQ is the frequency of occurrence of the symbol so far and CUMFREQ is the cumulative frequency of all symbols seen so far. Furthermore, the total entropy of a data set of binary symbols is the sum of the entropies of the individual symbols:
      H    ⁡          (      p      )        =            ∑      i        ⁢                  ⁢          H      i      where H(p) is the total entropy of the data set and Hi is the entropy of the i-th symbol in the data set.
Most methods of lossless data compression are based on an encoding technique in which repetitive symbols or symbol patterns within a data set are identified and then replaced with symbols that occupy, on average, less space than the original symbols or symbol patterns.
Examples of lossless data compression techniques include Run Length Encoding, Huffman Coding, Arithmetic Coding and Dictionary-Based Compression. Several of these methods utilize one or more probability tables to represent the frequency distributions of the various symbols or symbol patterns. For example, an Order-0 Adaptive Model for an alphabet of n symbols begins with each symbol having a frequency of 1 and a cumulative frequency of n. This gives each symbol a probability of 1/n. As a symbol is seen in the input data set, the frequency of that symbol is increased by 1 and hence the cumulative frequency is also increased by 1. This increases the probability that the symbol will occur and lowers the entropy of that symbol.
To show the behavior of this particular model in practice, suppose we have an alphabet of 8-bit symbols drawn from the set [A, B, C, D, E, F] and we wish to encode a data set that contains the characters ABCDAAFEFDA. The table below shows the entropy calculations for each step:
TABLE 1SymbolFreq(A)Freq(B)Freq(C)Freq(D)Freq(E) Freq(F)CumFreqEntropy(Init)1111116A11111162.58B21111172.80C22111183.00D22211193.17A222211102.32A322211111.87F422211123.58E422212133.70F422222142.81D422223152.90A422323162.00(end)52232317
The original data set contained 11 8-bit symbols or 88 bits and the compressed data set encoded using an Order-0 Adaptive Model is expected to contain around 31 bits.
Based upon the frequency distribution of the original input data set, different models and encoding techniques provide different levels of compression. If we understand the nature of the data beforehand, we can better choose an appropriate model and encoding technique to use.
Each of the encoding methods relies on the substitution of a smaller binary string for a larger binary string based on the frequency of symbols or symbol patterns within the uncompressed data. The desired result of this process is a data set that is smaller than the original.
To achieve compression, an uneven frequency distribution of symbols or symbol patterns must be present in the uncompressed data set. Greater unevenness of the frequency distribution in the original data set allows us to achieve greater compression.
All known methods of lossless data compression result in a more even frequency distribution of the symbols in the compressed data set. Since lossless data compression methods rely upon an uneven frequency distribution of symbols or symbol patterns, the even frequency distribution makes further compression near impossible.
Most known compression techniques and the current state of the art focus on achieving the maximum possible compression in the minimum amount of time in order to accommodate real time applications. This, by its very nature, dictates that only one pass across the data set can occur.
It is known in the prior art to use a rules-based virtual machine for variable bit-length processing, and it has been speculated that variable bit-length processes might be used in achieving compression. See U.S. Pat. Nos. 5,600,726 and 5,893,084. Although these patents mention a number of lossless compression methods that could possibly be adapted to use an n-bit symbol, they fail to disclose a method for determining an optimal value for “n”. While U.S. Pat. No. 7,111,094, to Liu et al., appears to disclose a strategy for calculating an optimal value for “n”, for each one of a series of blocks to be compressed, the approach of the Lieu patent is to transform the data to be compressed in an attempt to change its frequency distribution. However, the Liu patent fails to specify how to determine the length of a block, and fails to specify a particular compression model to be used for the compression.