The present invention relates generally to data compression, and more specifically to the decoding of programmable variable length encoded data.
As digital communication replaces traditional forms of analog communication, the need for improved digital communication continually grows. One method of improving the efficiency of digital communication can be achieved through use of data compression, namely the reduction in the amount of signal space that must be allocated to a given message set or data sample set. Reduction of signal space allows, among other things, the use of smaller memories and increased transmission rates. By reducing the amount of needed signal space, therefore, system performance of a digital communication/storage system can be improved and overall system cost can be reduced.
Generally speaking, data compression involves the assignment of unique codewords to a block of data to be transmitted and/or stored in memory. In a simple form of data compression, each codeword might have a fixed number of bits. For example, each character in a typical document might be described with a 5 or 6-bit codeword, instead of a 7-bit ASCII representation. While this type of encoding reduces the total amount of data, it is unlikely to compress the data to an optimum degree.
To provide greater compression of data, a different encoding technique known as run-length encoding, or variable-length encoding (VLE), is more commonly employed. One well-known example of VLE is Huffman encoding. VLE is based upon statistical information about the data to be compressed. The data is encoded using fewer bits to specify commonly-occurring input data samples, and using more bits to specify less frequently-occurring samples. For example, to accomplish the compression of text data, an encoding scheme can use a codeword having a few bits to specify commonly-occurring letters of the alphabet, such as "E", while using codewords with more bits to specify rarely used letters, such as, "Q" or "X". By using a variable number of bits to encode input data, fewer bits are needed overall than if a fixed number of bits are used to specify each letter.
To decompress the data, of course, the mapping between the codewords and the data must be provided to a decoder. Typically, the mapping between data and codewords is defined in the form of a binary coding tree. A binary coding tree is made up of a root and nodes, each having two branches, where none, either, or both, of each node's branches may end with a completed codeword (or leaf). Such a tree can be described using two bits for each node. Therefore, if N bits are needed to describe a pattern-to-codeword mapping, (2+4+8 . . . +2.sup.N) bits will be needed to describe the tree to a decoder. For example, if 16 bits are used for each codeword, the binary coding tree would have to be 16 levels deep and would require 131,070 bits to describe the tree. A 32-level tree would require 8,589,934,590 bits to describe. Therefore, if the binary coding tree must be provided to the decoder each time new data is to be compressed, it becomes very expensive and/or time consuming to decode the codewords.
It is possible, of course, to use a fixed coding tree for all data to be compressed, and thereby avoid the need to describe the tree to the decoder whenever data is to be decoded. For example, the coding tree for a Huffman encoder/decoder is fixed. However, the use of the same data-to-codeword mapping may not provide optimal compression in all cases. For example, in one document, the letter "E" may be used most frequently, in which case an optimal data-to-codeword mapping would employ a single bit to represent that letter. In another document, however, the letter "A" may be the most prominent, in which case the same mapping would not provide optimal compression.
It is preferable, therefore, to be able to vary the coding tree to provide better compression for different instances of data. By analyzing the data prior to compression, statistical information can be obtained regarding the frequency with which each item of data occurs, and an optimal data-to-codeword mapping can be employed. If the statistical information does not vary much between different instances, it might be possible to predefine a small number of fixed mappings, and select the one which is most appropriate for the set of data to be encoded. In such a case, the binary coding trees can be stored in the decoder, and the correct one selected each time data is to be decompressed. With this approach, it is not necessary to transmit a description of the binary coding tree to the decoder for each new set of data.
This type of approach is not optimal for the compression of data which can have large degrees of variation from one instance to the next, for example image data. In that situation, it is preferable to employ programmable variable length encoding, rather than a fixed VLE, to provide the best compression for a given set of data. In programmable VLE, statistical information for the data is obtained, and a data-to-codeword mapping is then created to provide the greatest amount of compression. Heretofore, however, programmable VLE has not been employed because it requires the binary coding tree to be described to the decoder for each new set of data, resulting in the problem described previously.
It is an object of the present invention to provide a mapping scheme for decoding compressed data that minimizes the number of bits needed to describe the data-to-codeword mapping without losing any compression ability. It is a further object of the invention to provide a programmable variable length approach to data compression.