1. Field of the Invention
The present invention relates generally to the field of data compression, and more specifically to linear time universal coding for the class of tree models.
2. Description of the Related Art
Information sources emit symbols from a given alphabet according to some probability distribution. In particular, finite memory sources employ a finite number of contiguous past observations to determine the conditional probability of the next emitted symbol. In many instances employing conditional probability prediction, the memory length, i.e. the number of past symbols that determine the probability distribution of the next one, depends on the data received and can vary from location to location. Due to this variance in memory length, a Markov model of some order m fit to the data is generally not efficient in determining conditional probability for next emitted symbols. In such a Markov model, the number of states grows exponentially with m, thus providing a significantly complex resultant model including equivalent states yielding identical conditional probabilities. In general, when considering a Markov model, removing redundant parameters and reducing the total number of states can provide enhanced overall performance.
Reduced Markov models of information sources have been termed “tree models,” as they can be graphically represented using a simple tree structure. A “tree model” includes an underlying full α-ary tree structure and a set of conditional probability distributions on the alphabet, one associated with each leaf of the tree, where each leaf corresponds to a “state.” An α-ary tree structure includes, for example, binary trees, tertiary trees, and so forth, where α is the size of the source alphabet A. The model associates probabilities to sequences as a product of symbol probabilities conditioned on the states. The appeal of tree models is the ability to capture redundancies typical of real life data, such as text or images, while at the same time providing the ability to be optimally estimated using known algorithms, including but not limited to the Context algorithm. Tree models have been widely used for data modeling in data compression, but are also useful in data processing applications requiring a statistical model of the data, such as prediction, filtering, and denoising.
The use of statistical models, such as tree models, in lossless data compression is facilitated by arithmetic codes. Given a sequence x of length n over the alphabet A, and a statistical model that assigns a probability P(x) to the sequence x, an arithmetic encoder can efficiently assign a codeword (for example, over the binary alphabet {0,1}) of length slightly larger, but as close as desired, to the smallest integer not smaller than log (1/P(x)), where the logarithm is taken in base 2. A corresponding decoder can decode the codeword to recover the sequence x. For the code to be effective, the goal is to make the code length as short as possible, and lossless data compression requires exact recovery of the original sequence x by the decoder. A universal lossless data compression system aims to assign to every sequence x a codeword of length that approaches, as the length n of the sequence grows, the length assigned by the best statistical model in a given “universe” or class of models. When the statistical models are determined by K free parameters, for most sequences x this target can only be achieved up to an excess code length of (K log n)/(2n)+O(K/n) bits per input symbol. While increasing the dimensionality K of the class of models decreases the target code length, the unavoidable excess code length over this target code length increases. An optimal lossless data compression system aims at finding the best trade-off value for the number of parameters K.
For the class of tree models of any size, this optimal trade-off is achieved by codes such as CTW and Context. Any given tree determines a class of tree models with a number of free parameters K given by the number of its states times α−1, since α−1 free parameters per state determine each conditional distribution. For any tree having K free parameters and any sequence of length n, CTW and Context provide, without prior knowledge of the tree or K, a normalized excess code length of at most (K log n)/(2n)+O(K/n) bits over the shortest code length assigned by the best tree model supported by the tree. In the “semi predictive” variant of Context, a system seeks to estimate a best tree model, and describes the corresponding best tree to a decoder in a first pass. Determination of the best tree takes into account the number of bits needed to describe the tree itself, and a code length based on model parameters that are sequentially estimated by the encoder. The system sequentially encodes data based on the described tree in a second pass, using the model parameters estimated sequentially. Therefore, the parameters dynamically change during the encoding process. Given the tree, the decoder can mimic the same estimation scheme and therefore it needs not be explicitly informed of the parameters by the encoder. Such a design is suggested in, for example, J. Rissanen, “Stochastic complexity and modeling,” Annals of Statistics, vol. 14, pp. 1080-1100, September 1986. Determination of the best tree model requires “pruning” a tree, called a context tree, containing information on all occurrences of each symbol in every context. The second pass encoding entails assigning a conditional probability to each symbol sequentially based on previous occurrences of symbols in its context, and encoding the symbol using an arithmetic code. The decoder can reverse the second pass encoding operations.
One problem with using tree models is the cost associated with transitioning from one state to the next state. In principle, for a general tree model, knowledge of the current state and the next input symbol might not be sufficient to determine the next state. Determination of the latter generally entails traversing the tree from its root, and following branches according to the sequence of symbols preceding the current symbol. For general trees, such a procedure typically requires a number of steps that cannot be bounded by a constant. Thus, transitioning from one state to another is generally expensive from a computational perspective, and use of such trees can add complexity to the system.
Another computational problem with using tree models is that efficient implementations thereof require that the data collection to build the context tree be done in a compact suffix tree of the encoded sequence. Since compact trees need not be full, and their edges may be labeled by strings of length greater than one, the tree corresponding to the optimal tree model for a given sequence will generally not be a sub-tree of the suffix tree of the sequence, as it may contain paths not in the original sequence but were added to make the tree full. This phenomenon complicates the pruning process.
On the other hand, certain popular data compression algorithms, such as PPM and algorithms based on the Burrows-Wheeler transform, or BWT, are also based on tree models, but do not achieve optimal redundancy as the CTW and Context methods.
Based on the foregoing, it would be advantageous to offer a relatively simple coding method, which is relatively optimal for the class of tree models, using trees or tree structures in an efficient manner.