This invention relates to arithmetic compression coding of multi-alphabet sources, and more particularly to arithmetic coding wherein each symbol of a multi-alphabet symbol stream is encoded according to a probability interval variable whose value is cyclically readjusted with the coding of each symbol to reflect an estimate of the probability of the portion of the symbol stream already encoded.
Arithmetic codes of the FIFO-Form (first-in-first-out) are established in prior art and may be understood by reference to Langdon and Rissanen, "A Simple General Binary Source Code," IEEE Transactions On Information Theory, Volume IT-28, No. 5, September 1982. In the Langdon and Rissanen 1982 reference, and in other references cited therein, arithmetic coding is revealed as a cyclic process that generates a bindary code string by augmenting a present code string resulting from the encoding of previous source string symbols. The power of arithmetic codes is in the compression they achieve, and their flexibility is in their ability to encode strings modeled by stationary or non-stationary sources with equal ease. Arithmetic codes permit encoding of binary strings without the need for alphabet extension, and also encode strings with symbols drawn from alphabets containing more than two characters.
An arithmetic code basically updates the probability P(s) of a so-far processed source string s, by the multiplication P(s)P(i/s) where P(i/s) is a conditional probability of the symbol i given s. The multiplication required for augmenting the probability is relatively expensive and slow, even in the case where the probabilities are represented by binary numbers having at most a fixed number of significant digits.
A signal advance in simplifying the arithmetic encoding operation in the particular case of a binary alphabet is described in U.S. Pat. No. 4,467,317. According to that patent, the encoded binary string is a number in the semi-open interval [0, 1) and the encoding method provides for recursively augmenting the code string in response to each symbol in the unencoded string. The coding process is described as the successive subdivision of the semi-open interval into a subinterval positioned between a lower bound C(s) contained in the interval, and an upper limit C(s)+T(s), also contained in the interval, where T(s) is an internal coding variable expressed as a function of numeral 2.sup.-k.
The prior art U.S. Pat. No. 4,467,317 teaches that the generation of a binary arithmetically-encoded stream is generated by the recursions of (1a)-(1d).
For each MPS: EQU C(sMPS)=C(s)+2.sup.-k ( 1a) EQU T(sMPS)=T(s)=2.sup.=k ( 1b)
For each LPS: EQU C(sLPS)=C(s) (1c) EQU T(sLPS)=2.sup.-k ( 1d)
According to these equations, the magnitude of the binary arithmetically-coded stream C(s) is altered only on the occurrence of the more probable symbol (MPS), instead of on the occurrence of the less probable signal (LPS). Accordingly, an arithmetically-encoded binary stream C(s) is recursively generated in high-to-low position magnitude order in response to successive symbols b(i) of a binary symbol string s=b(1), b(2) . . . b(i), . . . b(n). The steps of generation are compounded of entering the q-least-significant bits of the encoded stream C(s) and an internal coding variable T(s) into respective registers, and determining whether the next symbol in the stream s constitutes an MPS, while receiving a binary integer parameter k which approximates the probability 2.sup.-k of an LPS; upon the occurrence of an MPS, simultaneously adding 2.sup.-k to the C register and substracting it from the T register; and left-shift normalizing both the C and T register contents if the T register contains a leading zero, or if the next symbol of s is an LPS, left-shifting the C register contents by k positions and entering the binary value 1.00 . . . 00 into the T register. This recursive technique maintains the working end of the code string C(s) in the C register and in alignment with the significant bits of the subinterval size T so that the code string is augmented in the proper place. Further, the tracking of the C(s) working end and the updating of T(s) are done concurrently, which has the very desirable effect of increasing the speed of the coding operation.
While the recursive technique of U.S. Pat. No. 4,467,317 greatly simplified the encoding operation of a binary symbol stream by approximating the probability of the LPS with an integral power of 1/2, the same idea is not generalized easily to non-binary alphabets for the reason that the n-ary alphabet symbol probabilities cannot be approximated well enough as powers of 1/2. This limitation is significant in view of the fact of growing parallelism in the data structures and operations of present-day processors. A method and means for compressive arithmetic encoding of symbol streams embracing more than two distinct alphabet characters would be useful to encode, for example, a symbol stream consisting of a succession of data bytes. In this case, the alphabet would consist of 256 distinct characters (byte formats) and the stream would be encoded byte-by-byte. At present, compressive arithmetic encoding encodes and decodes on a bit-by-bit basis and requires serialization of parallel data structures, which introduces complexity and delay into the coding process.
Further, the algorithmic approach to prior art arithmetic encoding depends upon the existence of a sophisticated modeling unit capable of calculating and providing the source statistical characteristics k. As taught in Langdon and Rissanen, 1982, implicit in the calculation of k is the default of a skew number calculation operation to the modeling unit, an operation which adds to the complexity of the overall data compression problem that comprehends modeling and encoding the data source.
At this point, then, arithmetic compression encoding awaits an advance of the art to accommodate the arithmetic encoding of multi-character-alphabet data sources by a method and means operating without multiplication or division to generate a binary code stream in response to simplified data source statistical characteristics that preserve the very desirable prior art property of concurrent updating of both the code stream and an internal coding variable.