Virtually all of the currently used data compression systems, such as those of Ziv-Lempel type, PPM and Block-Sorting, are universal, which means that they learn the statistical properties from the string to be compressed, after which the actual compression is done with some of the existing coding systems such as Huffman code or arithmetic code. The learning process may be adaptive, which means that as each symbol in the string is read, the statistics are updated and then the symbol encoded with use of the statistics gathered from the past already processed portion of the string, which allows the new symbol to be decoded. In the Block-Sorting method of Burrows and Wheeler the learning process is implicit, based on breaking the string into blocks, which are subjected to a special preprocessing algorithm followed by another so-called Move-To-Front algorithm, after which the coding is done, for instance with a run-length code.
Because of the learning process involved in these algorithms, the strings must be long in order to obtain good compression. In addition the coding operations tend to be computation demanding and relatively slow. A major disadvantage of the Block-Sorting algorithm is that the encoding and decoding cannot be done on the fly; rather, an entire block, typically 200 kilobytes, must be processed before the symbols can be encoded.
Prior art data compression systems are designed to be universal and applicable to different compression systems. In them a Markov-type machine was fitted to a large body of training data representative of the files to be compressed, such as English text. This kind of arrangement creates the problem, however, that the number of states in the machine grows as the alphabet size, say 256, raised to the power of the order of the Markov machine fitted to data. The number of parameters grows then 256-fold further, because a probability for each symbol of the alphabet at each state needs to be calculated. This severely restricts the order of the Markov machine and the compression available. This problem was solved by Rissanen, [4], who introduced the concept of a Tree Machine (TM) developed further in [7]. It is also known as a Variable-Order-Markov-Chain.