1. Field of the Invention
The present invention relates generally to the field of data structures employed in symbol prediction, and more specifically to improved modeling of sources using tree structures.
2. Description of the Related Art
Information sources emit symbols from a given alphabet according to some probability distribution. Finite memory sources employ a finite number of contiguous past symbols to determine the conditional probability of the next emitted symbol. In many instances employing conditional probability prediction, the memory length, i.e. the number of past symbols that determine the probability distribution of the next one, depends on the data received and can vary from location to location. Due to this variance in memory length, a Markov model of some order m fit to the data is generally not efficient in determining conditional probability for next emitted symbols. In such a Markov model, the number of states grows exponentially with m, thus providing a significantly complex resultant model including equivalent states that yield identical conditional probabilities. In general, when considering a Markov model, removing redundant parameters and reducing the total number of states can provide enhanced overall performance.
Reduced Markov models have been termed “tree sources,” as they can be graphically represented using a simple tree structure. A “tree source” includes an underlying full α-ary context tree structure and a set of conditional probability distributions on the alphabet, one associated with each leaf of the tree, where each leaf corresponds to a “state.” An α-ary context tree structure includes, for example, binary trees, tertiary trees, and so forth, where a is the size of the source alphabet. The appeal of tree sources is the ability to capture redundancies typical of real life data, such as text or images, while at the same time providing the ability to be optimally estimated using known algorithms, including but not limited to the Context algorithm. Tree sources have been widely used for data modeling in data compression, but are also useful in data processing applications requiring a statistical model of the data, such as prediction, filtering, and denoising.
The problem with using tree sources is the cost associated with transitioning from one state to the next state. In principle, for a general tree source, knowledge of the current state and the next input symbol might not be sufficient to determine the next state. Determination of the latter generally entails traversing the tree from its root, and following branches according to the sequence of symbols preceding the current symbol. For general trees, such procedure will require a number of steps that cannot be bounded by a constant. Thus, transitioning from one state to another is generally expensive from a computational perspective, and use of such trees can add complexity to the system.
Based on the foregoing, it would be advantageous to offer a relatively simple representation of tree sources that may allow state transitioning in an efficient manner, ideally requiring a constant number of operations per input symbol, or, equivalently, total execution time linear in the length of the input data size.