1. Field of the Invention
The present invention relates to data compression, and, in particular, to variable-length coding schemes.
2. Description of the Related Art
Variable-length (VL) codes are an important part of several data compression algorithms. For example, some conventional video compression algorithms, such as those based on an MPEG (Moving Picture Experts Group) standard, apply variable-length coding to run-length-encoded data that is generated by (1) applying a transform, such as a discrete cosine transform (DCT), to blocks of either raw pixel data or motion-compensated interframe pixel difference data, (2) quantizing the resulting blocks of transform coefficients, and (3) run-length encoding the resulting blocks of quantized coefficients to generate the run-length-encoded data that are then variable-length encoded.
In conventional non-variable-length coding, symbols are represented by fixed-length data(,i.e., data having the same number of bits for all symbols). For example, symbols corresponding to the decimal integers 0 through 255 may be represented by the 8-bit binary values (00000000) through (11111111), respectively, where each 8-bit binary value is a fixed-length (i.e., 8-bit) code word representing a different integer.
In variable-length coding, a set of symbols is represented by a set of VL code words having differing numbers of bits. To achieve data compression, VL code words having fewer bits are preferably assigned to symbols that occur more frequently. For example, Table I shows a codebook of Huffman-type VL codes that may be used to efficiently represent integer data in which the frequency of occurrence of the integers decreases as the magnitude of the integer increases. In general, a codebook is a table representing a mapping between symbols and their corresponding code words. In this specification, the terms xe2x80x9ccodexe2x80x9d and xe2x80x9ccode wordxe2x80x9d are used interchangeably.
Each VL code in Table I comprises a prefix and one or more free bits. A prefix is a set of one or more bits (in Table I, a xe2x80x9c1xe2x80x9d preceded by zero, one, or more xe2x80x9c0xe2x80x9ds) that identifies how many total bits are in that code word, while the free bits distinguish between the different code words having the same total number of bits. In Table I, xe2x80x9cXxe2x80x9d represents a free bit in a VL code. A free bit corresponds to either a 0 or a 1. Thus, for example, the four VL codes represented by (1XX) in Table I correspond to:
the VL code (100) representing the integer 0,
the VL code (101) representing the integer 1,
the VL code (110) representing the integer 2, and
the VL code (111) representing the integer 3;
and analogously for the other codebook entries.
When the codebook of Table I is used to encode a set of integers 0-255 having at least the general tendency that larger integers occur less frequently than smaller integers, the average number of bits used to represent each integer will be smaller than 8, the number of bits in the fixed-length binary representation, thereby achieving an overall reduction in the number of bits used to represent the entire set of integer data as compared to using the fixed-length 8-bit binary codes.
Because the number of bits can vary from VL code to VL code in variable-length encoded data, in conventional VL coding schemes, variable-length encoded data must be decoded sequentially. The dependence of the bit position of any given VL code word on the previous VL code words increases the complexity and latency of the process of decoding VL encoded data. These disadvantages of VL coding do not occur with fixed-length coding schemes, where the bit positions of all fixed-length code words are known a priori. As a result, parallel processing techniques can be easily applied to the decoding of fixed-length encoded data.
The present invention is directed to a coding technique that addresses the problems of complexity and latency in prior-art variable-length coding that result from the dependence of the bit position of each VL code word in a sequence of VL encoded data on the previous VL code words. The present invention provides a simple and efficient way of merging the advantage of compression efficiency of variable-length coding with the advantage of known bit position of fixed-length coding. The present invention enables significant improvement in the implementation of decoders for encoded data generated according to the present invention, including the use of parallel decode processing.
In one embodiment, the present invention is a method for compressing an original stream of symbols into an encoded stream, comprising the steps of:
(a) encoding m symbols of the original stream of symbols using a codebook into m code words of the encoded stream, wherein:
the m symbols comprise one or more symbol values of a complete set of symbol values;
the codebook represents a mapping of the complete set of symbol values into a set of fixed-length code words, wherein at least one symbol value in the complete set of symbol values corresponds to two or more different code words in the codebook, each of the two or more different code words comprising at least one redundant bit; and
the m code words appear at regular intervals in the encoded stream based on the fixed length of the code words; and
(b) encoding at least one additional symbol into the bits of the encoded stream corresponding to m code words.
In another embodiment, the present invention is a method for decoding an encoded stream into a decoded stream of symbols, comprising the steps of:
(1) decoding m code words in the encoded stream, wherein the m code words were generated by:
(a) encoding m symbols of an original stream of symbols using a codebook into the m code words of the encoded stream, wherein:
the m symbols comprise one or more symbol values of a complete set of symbol values;
the codebook represents a mapping of the complete set of symbol values into a set of fixed-length code words, wherein at least one symbol value in the complete set of symbol values corresponds to two or more different code words in the codebook, each of the two or more different code words comprising at least one redundant bit and a set of bits corresponding to a VL code word of a variable-length (VL) codebook, the VL code word being identical for each of the two or more different code words corresponding to the at least one symbol value; and
the m code words appear at regular intervals in the encoded stream based on the fixed length of the code words; and
(b) encoding at least one additional symbol into the bits of the encoded stream corresponding to m code words, if the m code words comprise a sufficient number of redundant bits to encode the at least one additional symbol; and
(2) decoding the at least one additional symbol from the bits of the encoded stream corresponding to the m code words.