Variable-length coding can be used for data compaction of a data base comprising fixed-length characters, such as the 8-bit EBCDIC characters commonly employed for alphanumeric letters, numerals, and symbols. The lengths of the various codewords in the variable-length code are chosen so that the shorter codewords are used to represent the more frequently occurring characters and the longer codewords are used to represent the less frequently occurring characters. Thus the average length of codewords in the variable-length code is less than the fixed length of the characters.
The use of variable-length codes for data compaction of a data base comprising fixed-length characters requires a facility for converting between fixed-length characters and their corresponding codewords. The conversion of fixed-length characters into codewords is commonly termed "encoding," while the inverse conversion is commonly called "decoding." The decoding process is normally more difficult than the encoding process, since a sequence of codewords to be decoded comprises a string of binary digits. This string of digits must be partitioned into the different codewords before the codewords can be identified and converted.
The primary disadvantage of previous decoders for variable-length codes is that they have represented a compromise among three coding objectives. The first objective is an ability to decode a variable-length code whose average codeword length is as small as possible when the code is used for a given data base. Such a code permits the greatest possible data compaction for the data base. The second coding objective is an ability to decode codewords quickly and economically. The third objective is an ability to decode a variety of different variable-length codes, with each code designed to provide data compaction for a different data base.
For any given data base, the well-known Huffman algorithm can be used to construct a minimum-redundancy code, i.e., a variable-length code with the minimum average codeword length possible for that data base. Three general types of decoders are currently in use for Huffman codes, namely the table-lookup types, the tree-follower types, and the encoder-based types. However, these decoders have proved to be expensive, time-consuming, or incapable of decoding more than a single code.
A table-lookup decoder includes a table containing each codeword as a separate entity. As the successive codeword bits of a codeword are received, each codeword in the table must be checked to see whether it agrees with all codeword bits received so far. When only one codeword agrees, that codeword has been received and identified. The table storge required by this table-lookup type of decoder requires an expensive associative memory.
A tree-follower decoder depends upon the fact that Huffman codes have a tree-like structure. The decoder includes logic circuitry corresponding to the tree, and the reception of successive bits of a codeword causes control circuitry to traverse this tree-like structure. When a terminal node of the tree is reached, an entire codeword has been received, and the terminal node identifies the codeword. This tree-follower type of decoder is either expensive, if duplicate circuitry is provided for each node of the tree, or slow, if the same circuitry must be used again and again to represent different nodes of the tree.
An encoder-based decoder includes a copy of the encoder, a character generator, and comparison circuitry. The character generator supplies successive fixed-length characters to the encoder, and the encoder produces the codeword appropriate for each successive character. Each codeword thus produced is compared with the bits of the codeword to be decoded. When a match occurs, the codeword to be decoded is known to represent the last character supplied to the encoder. This encoder-based type of decoder is quite slow, since it may need to generate and test many codewords.
As has been described, previous decoders for Huffman codes have proved to be expensive, time-consuming, or incapable of decoding more than a single code. Therefore, a number of special variable-length codes have been developed that admit fast and inexpensive decoders. For example, reference may be made to Cocke et al., U.S. Pat. No. 3,701,111, "Method and Apparatus for Decoding Variable-Length Codes Having Length-Indicating Prefixes," and Raviv, U.S. Pat. No. 3,675,211, "Data Compaction Using Modified Variable-Length Coding." However, these special codes are not minimally redundant. That is, their average codeword lengths exceed the average codeword lengths of Huffman codes.