The rapidly growing use of computer-based information systems interconnected with communication networks has dramatically increased the use of digital storage and digital transmission systems. Data compression is concerned with the compaction of data before storage or transmission. Such compaction is useful for conserving memory or communication resources. When the data source can be modeled by a statistical system, optimal coding schemes have been constructed to achieve desired compaction criteria. However, for real-world data, the source statistics are not always known to the data compressor. In fact, real-world data usually does not conform to any statistical model. Therefore it is important in most practical data compaction techniques to have an adaptive arrangement which can compress the data without knowing the statistics of the data source.
Much stored or transmitted data is redundant. The English language, for example, or a programming language, includes "words" which are often reused. One type of coding which takes advantage of this redundancy is the well-known Huffman code. In the Huffman scheme, variable length code words are used, with the length of the code word being related to the frequency of occurrence of the encoded symbol. Unfortunately, the Huffman approach requires two passes over the data, one to establish the frequency of occurrence of the symbols and another to do the actual encoding. Moreover, the Huffman technique requires temporary storage from the entire data block while the first pass is taken, thereby incurring a corresponding time delay.
In June, 1984, Welch published a paper entitled "A Technique for High-Performance Data Compression" in the IEEE Computer Magazine. The paper treated an algorithm, which has become known as the Lempe-Ziv algorithm, in a practical way, and proposed an implementation for data compression based on hashing for fast on-line processing. U.S. Pat. No. 4,558,302, having Welch as the sole inventor, covers the details of the implementation first introduced in theoretical form in his paper. More recently, U.S. Pat. No. 4,906,991, issued to Fiala and Greene, disclosed a sophisticated modification to the Lempe-Ziv algorithm which achieves better compression on most text files--but at the cost of significantly increased complexity.
In April, 1986, Bentley, Sleator, Tarjan and Wei published a paper entitled "A Locally Adaptive Data Compression Scheme" in the Communications of the ACM. In the paper, the authors proposed the use of a self-adjusting data structure to achieve data compression of text data. One of their main schemes used a "move-to-front" rule; this concept will be expanded upon below.
More recently, the disclosure of U.S. Pat. No. 4,796,003, issued to Bentley, Sleator and Tarjan (Bentley et al), indicates that it is possible to compress data with a compaction factor comparable to Huffman coding, but with a one pass procedure. More particularly, a system and an algorithm are used in which a word list is maintained with the position of each word on the word list being encoded in a variable length code, the shortest code representing the beginning of the list. When a word is to be transmitted in communication applications (or stored in memory applications), the list or codebook is scanned for the word. If the word is on the list, the variable length code representing the position of the word on the list is sent (or stored) instead of the word itself and the word is moved to the head of the word list. If the word is not on the word list, the word itself is transmitted (or stored), and then that word is moved to the head of the word list while all other words on the word list are "pushed down" while maintaining their relative order.
The receiver (or retriever in memory storage applications) decodes the data by repeating the same actions performed by the transmitter (or the storing mechanism). That is, a word list is constructed and the variable length codes are used to recover the proper words from the word list.
In the scheme of Bentley et al, the most often used words will automatically congregate near the front of the word list and hence be transmitted or stored with the smallest number of bits. Moreover, arbitrary pre-fixed codes can be used to transmit or store word positions on the list, low positions being encoded with the shortest codewords. Also, the list organization heuristics can be varied such as, for example, by moving the selected word ahead a fixed number of places or transposing it one position forward. Finally, the list positions themselves can be treated as new input data and the compaction scheme applied recursively to its own output, creating a new list and new variable length codes.
As alluded to, the encoder of the move-to-front implementation of Bentley et al has two operations, namely, (1) Search: for each input word, search for it in the codebook; and (2) Update: reorganize the codebook for further use. The implementation of Bentley et al organizes the codebook as a linear list. Both the search and update operations are done in linear fashion, i.e., they use linear search and linear update algorithms. The time complexity of each operation is in proportion to the codebook size, which is typically in the thousands to the tens of thousands. Thus, the complexity is high. In the earlier paper by Bentley, Sleator, Tarjan, and Wei, the codebook is organized as a doubly-linked double tree. The trees are adjusted after each input word to maintain depth balance. Thus either the search or the update operation can be accomplished in complexity proportional to the logarithm of the codebook size. But the complicated data structure results in extremely large memory requirements, and the coefficient of the logarithmic complexity can also be large. Thus, the complexity of this latter scheme may not even be less than the linear approach for codebook sizes of practical interest.
A decoder in accordance with the present invention compiles a word list from the encoded data and performs the inverse of the encoding methodology.