Data compression is the reduction of redundancy in data to decrease data communication costs and data storage requirements. Data compression is especially important in digital video processing systems where it is important to minimize the bandwidth required to transmit video images and minimize the memory requirements for storing video images.
Run length coding (RLC) and variable length coding (VLC) are two widely adopted techniques for lossless data compression.
One example of a variable length code is the Huffman code. The operation of the Huffman code may be understood in connection with FIG 1. In particular, FIG. 1 illustrates a Huffman tree. The tree is constructed for a set of seven characters A, B, C, D, E, F and G with the corresponding probabilities of occurrence of these characters being 0.1, 0.1, 0.1, 0.3, 0.1, 0.1, and 0.2 respectively. These probabilities are written in circles corresponding to the leaf nodes of FIG. 1. A circle denoting an internal node in the tree of FIG. 1 contains the sum of the probabilities of its child nodes.
The Huffman code word for a character is the sequence of 0's and 1's in the unique path from the root of the tree to the leaf node representing the character. For example, the code word for A is 000 and the code word for D is 01. To compress the character string DAF, a Huffman encoder concatenates the code words for the three characters to produce the binary string 01000100.
In general, in a Huffman code, characters with a higher probability of occurrence have shorter code words and characters with a lower probability of occurrence have longer code words.
In digital video systems, the data rate is usually very high. The primary requirement for a variable length encoder or decoder used in such a system is high speed. To achieve high speed coding and decoding, it is desirable to use parallel processing. Because the inputs of a variable length encoder are fixed length data words representing character symbols, it is easy to partition them for parallel encoding.
For a variable length decoder, the situation is quite different. Because the input data are variable length coded, it is not easy to recognize the boundaries between successive input code words. Thus, the input data cannot be easily partitioned for the utilization of parallel processing. Because of this constraint, the easiest way to decode the input data is bit by bit, using what is known as a bit serial decoder.
An example of a bit serial decoder is disclosed in U.S. Pat. No. 4,853,696 which issued Aug. 1, 1989 to A. Mukherjee and in A. Mukherjee et al, "Efficient VLSI Designs for Data Transformation of Tree-Based Codes", IEEE Transactions on Circuits and Systems, Vol. 38, No. 3, March 1991, pp. 306-314. This bit serial decoder is also implemented through use of the Huffman tree. For example, to decompose the binary string stated above, the decoder moves down the tree while processing the binary string from left to right. Thus, the first 0 causes the decoder to branch to the right child of the root. The following 1 causes the decoder to branch to the external node representing the particular character D. The decoding speed of a bit serial decoder is one bit/cycle. In the case of a bit serial decoder, the input bit rate is fixed, while the output bit rate is variable.
The bit serial decoder cannot meet the high data rate requirement of video systems because its decoding speed is only one bit/cycle. Therefore some kind of parallel processing technique is necessary for a hardware implementation of a variable length decoder. A simple technique is to partition the data to be encoded by the encoder into segments and insert a unique word between each adjacent segment. A unique word means a word that cannot be generated by any combination of the code words. At the decoder, the unique word is utilized to detect the boundary between adjacent segments Then, several variable length decoders can be utilized to decode several segments at the same time.
Of course, high parallelism can be achieved via this architecture. However, there are several significant disadvantages. First, a great deal of extra hardware is required. Second, a decoder of this type can only be utilized with an encoder which transmits the unique word. In addition, the insertion of the unique word will degrade the compression efficiency.
The bottleneck of a variable length decoder is that the length of an incoming code word is not known in advance. However, once the length of an incoming code word is known, the decoder can begin to decode the next code.
Based on the above concept, a parallel structure for a variable length decoder was proposed by J. W. Peake in "Decompaction", IBM Technical Disclosure Bulletin, Vol. 26, No. 9, pp. 4794-4797, February, 1984. A block diagram of this variable length decoder is illustrated in FIG. 2.
The decoder 10 of FIG. 2 includes an input buffer 14 for storing incoming compressed data 12. The input buffer 14 comprises two latches L1 and L2. The number of bit positions in each of the two latches is equal to the longest code word utilized in the system (e.g. P bits) so the two latches together have a total number of bit positions equal to twice the longest codeword in the system (e.g. 2P bits).
The decoder 10 of FIG. 2 also includes a barrel shifter 16. The barrel shift defines a window of length P in the two latches, which window has a variable location. The two latches of the input buffer are filled with incoming compressed data bits. Initially, the barrel shifter 16 defines a window which is coextensive with the first latch of the input of buffer. The P bits outputted by the barrel shifter 16 are applied to a length programmable logic array (PLA) 22 via lines 19. The bits outputted by the barrel shifter 16 are also applied to a decoder PLA 20 via lines 21. The length PLA 22 outputs via lines 23 the length of the first code to be decoded. The decoder PLA 20 outputs the corresponding decoded character symbol.
The length of the first code word is fed back via lines 23 to the barrel shifter 16. The window defined by the barrel shifter is then shifted a number of positions equal to the length of the first code word. If the first code word contains Q bits, the window defined by the barrel shifter is shifted so that the first Q bits from the first latch L1 are eliminated from the window and the first Q bits from the second latch L2 are included in the window. Thus, after a decode operation, the number of undecoded bits in the window defined barrel shifter is always equal to the longest possible code word in the system which is P bits. The next code word is then decoded by applying the present bits in the barrel shifter to both the length PLA 22 and the decode4 PLA 20. When enough of the compressed data bits have been decoded so that the number of decoded bits equals or exceeds P, the contents of the second latch L2 are moved into the first latch L1 and new data bits are written into the second latch L2. The barrel shifter then defines a window which includes the bits now in the first latch which have not been decoded and enough bits from the second latch so that the window is equal to P bits. Again, the next code word is decoded by applying the present bits in the barrel shifter to both the length PLA and the decoder PLA.
The speed of the parallel decoder 10 of FIG. 2 is one code word per cycle. This is much faster than the one bit per cycle of the bit serial decoder discussed above. Moreover, the parallel decoder 10 of FIG. 2 does not require use of a unique word to separate the coded bit stream into segments.
A VLSI implementation of the parallel decoder 10 was presented at the 3rd International Workshop on HDTV, Italy, August 1989, entitled "A Parallel Variable Length Code Decoder for Advanced Television Applications".
In the decoder 10 of FIG. 2, both the length PLA 22 and Decoder PLA 20 have the same AND plane. Therefore, the two PLA's can be merged into one. FIG. 3 shows an example of such a single PLA. FIG. 4 is a table which sets forth the variable length code which can be decoded using the PLA of FIG. 3. As shown in FIG. 4, this code has six symbols with different probabilities of occurrence. The corresponding code words are variable length depending on the probability of occurrence of the symbol.
The PLA 30 of FIG. 3 comprises an AND-plane 32, a length OR-plane 34 and a decoder OR-plane 36. The AND plane 32 serves to detect the presence of a particular input code word on the input lines 38. The length OR-plane 34 stores a table which contains the length of each of the code words in FIG. 4. The decoder OR-plane 36 stores a table which contains the data symbol corresponding to each code word.
There is one product line 40 in the PLA 30 for each code word in the code of FIG. 4. The input bits are received on the input lines 38. The number of input lines 38 is equal in length to the longest code word in the system. There is one input line 38 for each position in the barrel shifter 16 of FIG. 2. The complement of each input bit on a line 38 is obtained by a corresponding inverter 39 so that each input bit and its complement appear on lines 41a and 41b, respectively.
The AND-plane 32 of the PLA 30 utilizes the AND-plane transistors 42 to perform a parallel matching on the input data present on the input lines 38. When a code word contained in the data on the input lines 38 is matched, the specific corresponding product line 40 is held high. This allows the OR-plane transistors 44, associated with the specific held high product line, to output, via the inverters 45, the code word length on the output lines 47. Similarly, when a product line 40 is held high, the OR plane transistors 54, associated with that product line, output the decoded code word via the inverters 56 on the output lines 57.
The bottleneck of the parallel decoder 10 of FIG. 2 is the loop formed by the length PLA 22 and the barrel shifter 16. For every cycle, only one code word can be decoded. If higher decoding speed is desired, two or more code words must be decoded in each cycle.
Accordingly, it is an object of the present invention to provide a variable length decoder which can decode more than one code word in a cycle. Specifically, it is an object of the present invention to modify the variable length decoder of FIG. 2 to decode more than one code in a cycle without substantially increasing the hardware complexity.