The invention relates to the encoding and decoding of strings of symbols from finite source alphabets, which encoding takes advantage of the relative frequency of occurrence of the symbols in order to achieve compaction. More particularly, the invention relates to an improvement over both the enumerative encoding technique and the Huffman compaction technique. As may be recalled, Huffman taught a method for compaction by assigning a variable length code word to each character in the source alphabet, in which the length of the code word was inversely related to the relative frequency of occurrence of the character. The statistics of occurrence were either stationary or semi-stationary. That is, the method did not contemplate a constant adaptive reordering and reassignment of code words.
Since Huffman's original paper in September 1952, Proceedings of the IRE, pages 1098-1101, "A Method for the Construction of Minimum Redundancy Codes," modifications have been made as represented by a series of U.S. patents; namely, U.S. Pat. Nos. 3,675,211; 3,675,212; 3,701,111; 3,694,813; 3,717,851; and T925,002. A useful source of background information is the book by Norman Abramson, "Information Theory and Coding," McGraw Hill, 1963, chapters 3 and 4. It remains, nevertheless, a classical code possessing certain attributes. These attributes include words formed from a fixed sequence of a' priori symbols; all of the words being distinct (nonsingularity), and no word of the code being a prefix of another code word (unique decodability).
Enumerative coding is the encoding of, say, each permutation of binary string of n symbols by first ordering the 2.sup.n terms and representing the term by an ordinal number. Reference may be made to T. M. Cover, "Enumerative Source Encoding," IEEE Transactions on Information Theory, January 1973, pages 73-77. More specifically, given a binary string of n binary digits, m of which are 1's, then let T(n,m) denote the set of all strings of n binary digits having m ones. It is necessary to "order" (arrange) the terms in a consistent manner, i.e., by size (lexicographically) with predetermined priority (left-to-right). As an example, if n=3 and m=2, then 011&lt;101&lt;110 would be an ordering. If one assigns a position number to each term in the ordering, then the last term is the binomial coefficient ##EQU3## All other position numbers (code words) are smaller.
In both Huffman and enumerative coding, the attainable average per symbol length has a lower bound determined by the appropriate entropy function for source symbol frequencies.
Given a source symbol string s where s=a.sub.i a.sub.j . . . grows to the right; let sa.sub.k denote the new string obtained by appending the new symbol a.sub.k where k runs from 1 to N, to the end of old string s. In the Huffman codings, the encoded representation of sa.sub.k is obtained by appending or concatenating the codeword A.sub.k of the symbol a.sub.k to the right end of the code of s. This means that the code word length would grow with a whole number of bits for each symbol independent of the symbol frequencies. This further means for binary source strings, Huffman coding produces either no compaction or for small alphabets may produce insufficient compaction.
In order to obtain improved Huffman compaction, then, for small source alphabets; a technique known as blocking has been employed. In blocking, the new source symbols of the new source alphabet are the juxtaposition of two or more of the original source symbols. Taking k symbols at a time increases the source alphabet size to N.sup.k.
As is apparent, enumerative coding does not require blocking. However, the calculation of each new code word C(sa.sub.k) requires information of all of the bits of the old string C(s). This, in turn, requires an indefinitely growing memory.