Variable Length Coding (VLC) is the final lossless stage of the MPEG (Motion Picture Experts Group) video compression algorithm. In video compression, VLC is employed to further compress the quantized image. As shown in FIG. 1, VLC consists of three steps: zig-zag scanning in block 101; Run Length Encoding (RLE) performed in block 102; and Huffman Coding shown in block 103. At the decoder, VLC is the first step in the decoding process.
FIG. 2 shows the zig-zag scanning step. The quantized coefficients are read out in a zig-zag order starting from the DC component to the highest frequency component. RLE is used to code the string of data from the zig-zag scanner. Run length encoding codes the coefficients in the quantized block into a run length or number of occurrences and a level or amplitude. For example, four coefficients of value “10” are transmitted as {10,10,10,10}. Using RLE the level is 10 and the run of value of 10 is four. Thus using RLE {4, 10} is transmitted, thereby reducing the amount of data. Typically, RLE encodes a run of symbols into two bytes: a count; and a symbol. An end-of-block or last code symbol denotes the last data point.
At the final stage of compression, the Run Level Encoded data is Huffman encoded (Variable Length Coding). Symbols that occur more frequently are encoded with shorter codes than symbols that occur less frequently. Thus generally less coded bits are required. In video encoding millions of such codes are generated per second, thus Huffman encoding can greatly reduce the amount of coded data bits.
Normal video codecs generally employ a two mode VLC. The first mode is the table mode and the second mode is the escape mode. In the table mode, a look up table maps the most commonly occurring run-level-last triplets to their variable length codewords. All other triplets are coded in the escape mode. In the escape mode, an escape code is followed by the actual run, length and last values. Advanced video codecs such as MPEG4 and WMV9 use multi mode VLC. Typically there are 4 modes: a default table mode; a level escape mode; a run escape mode; and a full escape mode.
The default table mode is used when (level≦LMAX) and (run≦RMAX) where: LMAX is the maximum level corresponding to the given run in the VLC table; and RMAX is the maximum run at the given level in the VLC table. The codeword is obtained by indexing into the codeword table, using the level and run values.
The level escape mode is used when (LMAX<level≦2*LMAX) and (run≦RMAX). The level escape mode calculates new_level equal to level—LMAX.
The run escape mode is used when (RMAX<run≦(2*RMAX+1)) and (level≦LMAX). The run escape mode calculates new_run as run—(RMAX+1). In the level escape mode and the run escape mode, the corresponding modified values of level or run are used to obtain the codeword from the table. If the run-level pair does not satisfy either of the above conditions, then the full escape mode is used.
In the full escape mode the codeword is a predefined number of bits used to send the run, level and last values without any encoding.
When coding in any of the three escape modes, the generated codeword is prefixed by an escape code.
Current methods first need to determine the appropriate mode to be used to generate the codeword to be inserted into the bit-stream. The operation is sequential yielding very few opportunities for any parallel operation.
Other methods known in the prior art have attempted to reduce the sequential nature of the operations by extending the tables to encompass one of the escape modes. The codewords for the run-level pairs which need to be encoded in the selected escape mode are inserted into the table. Hence there is no need for explicit coding of that escape mode, because the codewords will be directly picked from the table.
This type of algorithm is highly conditional and has a multiple level nested if-else structure. Such algorithms are inefficient for Very Large Instruction Word (VLIW) architectures and cannot be software pipelined. VLIW architectures perform best on highly parallel code without conditionals. In VLC the bit stream cannot be written asynchronously, requiring a large loop carry dependency bound. This wastes the power of VLIW architectures such as the Texas Instruments C6400 digital signal processor family which can perform up to 8 operations per cycle.
Conditional execution statements present an additional difficulty. In VLIW architectures, conditional jumps are avoided in favor of conditioned instructions. These conditional instructions are executed or not executed based on the contents of a special conditional register. These registers are generally limited in number. The Texas Instruments 6400 family of digital signal processors has six such predicate registers. In VLC a large number of conditions have to be evaluated and all further processing depends on the result of these conditions, thus the process blocks the conditional registers for an excessive number of cycles. This causes a “register live too long” problem which further degrades the ability to schedule and optimize the code.
The third problem is memory load delay. In a typical VLC implementation, the run-level combination is loaded and then used to load the “last level at run” and the “last run at level” for the same. The domain of the given run level pair is then determined based on these values. The variable length codeword is then loaded from another table. Typically each load has a delay of 4 cycles. Since these loads are sequential, the length of the operation is greatly increased. This requires a larger number of iterations executing in parallel, which may not be possible to implement because of limited number of CPU registers.