Many types of variable length codes have been designed to efficiently represent compressed video prediction residual coefficient values. Typically many coefficients in a block of coefficients will be zero-valued for a large number of compressed video bitstreams. Most compression methods use a syntax that separately represents non-zero coefficient values, and a count of a number of zero coefficients (typically, but not always, the number of zeros that precede the non-zero coefficient in coefficient scan order).
One type of compression uses separate one-dimensional variable length codes (1D-VLCs) that can be individually tuned to represent the expected statistics of coefficient values and/or zero-run-lengths.
Another type of compression uses two-dimensional variable length codes (2D-VLCs) that can jointly represent a non-zero coefficient value and a count for a number of zero coefficients. 2D-VLCs can be an efficient method to exploit the correlation between adjacent zero-run-lengths and coefficient values. 2D-VLCs may also make use of a specific symbol to signal an ‘end-of-block’ (i.e., when the last VLC for a block of coefficients has been transmitted) in order to specify all of the coefficient data in a block.
Another type of compression uses three-dimensional variable length codes (3D-VLCs) that can jointly encode, with a single code, 3 pieces of information (i) a non-zero coefficient value, (ii) a count value for a number of zero coefficients, and (iii) an end-of-block indicator.
The difficulty with 2D and 3D VLCs is that the tables/codes are often much larger and/or less regular than 1D VLCs. In order to use non-fixed (i.e., adaptive) VLC coding, multiple different VLC tables/codes are needed. If adaptivity is at a picture or slice of macroblocks level, then such switching may be practical. However, if the adaptivity is at the coefficient/pixel level, where switching between tables is based on previously encoded coefficient-values and/or run-lengths (i.e., the ‘context’ information for the adaptive codes), then the size/regularity of the tables/codes being switched may be impractical due to a high complexity/cost.
Furthermore, adaptive codes are often found to be more efficient than fixed codes due to their ability to adapt to the underlying statistics of the source. However, implementing 2D or 3D VLCs that are also coefficient/run-level adaptive has the disadvantage that the number and size of tables/codes often becomes unwieldy. In order to efficiently implement an encoder/decoder (CODEC), conventional approaches choose between obtaining the benefits of 2D/3D codes that explicitly exploit the correlation between zero-run-lengths and coefficient values, and the benefits of coefficient/run-level adaptive codes. The benefits of such coefficient-level adaptive codes have been found to be significant, such that the recent video coding standard H.264/MPEG-AVC has chosen to use adaptive 1D coefficient-level adaptive codes for VLC-based coefficient residual coding. H.264/MPEG-AVC refers to such codes as context adaptive variable length code (CAVLC).
A typical VLC CODEC unit will contain an interface to a block of residual coefficients. For example, a VLC encoder unit will take a block of residual coefficients and output to a bitstream the syntax that represents them. Conversely, a VLC decoder unit will input (i.e., parse) sufficient syntax from a bitstream to output a block of residual coefficients. A hierarchy of interfaces may exist. A typical upwards hierarchy would include interfaces for parsing/encoding (i) a block, (ii) an entire macroblock (several blocks), (iii) a slice (several macroblocks), and (iv) a picture (several slices). Since the bitstream syntax is hierarchical, it is natural to design a VLC CODEC having a similar hierarchy.
A typical downward hierarchy (from the block level) could be (i) a block, (ii) individual coefficient values, and (iii) the individual syntax elements that compose a block. A significant problem is choosing an interface for the lowest level of the hierarchy.
Typically video coding standards demand that compliant devices be capable of processing a specified number of macroblocks per second (or other appropriate time interval). Each unit of a parallel or pipelined device (e.g., the VLC CODEC unit) must be capable of also processing a specified number of macroblocks per second. Synchronous devices have specified clock rates, so this may alternatively be specified as a requirement that the VLC CODEC unit process a certain number of macroblocks in a certain number of cycles.
High-performance VLC CODEC units often operate within a pipelined architecture where each unit processes a small integer number of macroblocks in a specified number of cycles. For example, a limit on number of motion vectors per macroblock pair in the H.264/MPEG4-AVC standard is specifically intended to limit the complexity/cost of a pipelined memory architecture designed for macroblock pairs.
When possible, it is desirable to extend the hierarchy of design to a lower level. For example, specific units in a pipelined design should be able to process a single macroblock in a specified number of cycles, a single block in a specified number of cycles, or individual syntax elements in a specified number of cycles.
With a 2D or 3D-VLC CODEC unit, if each VLC CODEC is parsed/encoded in a fixed (N) number of cycles (e.g., N=1 cycle per code) by a device, then a small upper limit is naturally imposed on the maximum number of cycles needed by the unit to process the coefficient data contained in next level in the hierarchy (i.e., a 4×4 or 8×8 block). Typically a 4×4 block of coefficients would have an upper limit (imposed by the syntax) of 16*N or 17*N 3D or 2D VLC codes per block for processing the coefficients. In the same manner, a 16×16 macroblock would then have an upper limit of not significantly more than 256*N cycles for parsing/encoding just the coefficient data.
In many implementations of a VLC encoding or decoding module, each syntax element (bitstream code that represents a quantity such as a zero-run-length, or a coefficient value) will be parsed in a single cycle. The problem with such a conventional solution is that while existing widely-deployed standards (such as MPEG-2/H.262) do not typically use significantly more than a single VLC per coefficient to represent a block of residual-coefficients, MPEG-4 AVC/H.264 uses multiple codes per coefficient, and separates the VLCs representing the coefficients values from the VLCs representing the zero-run-lengths in the bitstream for each individual block.
Conventional solutions use at most N cycles per each VLC code for coefficient data, and would typically parse one VLC and/or one coefficient per cycle. With the H.264/MPEG4-AVC CAVLC all of the VLCs/syntax-elements representing non-zero coefficient values for a 4×4 block precede, in the bitstream, all of the VLCs/syntax-elements representing the run-length encoded zero-valued coefficient values (and the positions in scan order of all of the coefficients).
In this way, all of the non-zero coefficients are parsed/encoded before all of the coefficient positioning information can be parsed/encoded with H.264/MPEG4-AVC. This is in contrast to previous/legacy standards for which non-zero coefficient values and their position (e.g., preceding zero-coefficient runlength) are coded immediately adjacent to each other in the bitstream.
For this reason, a conventional interface would need a fixed number of cycles (e.g., 1) be used for parsing each coefficient value and also that a second fixed number of cycles be used for parsing each zero-runlength).
The disadvantage of such a solution is that if an interface is to run in the smallest possible number of cycles (e.g., 1) then the maximum number of cycles to parse a block becomes two-times that which would have been needed for traditional/legacy bitstreams in which the coefficient values and positions were either jointly encoded in the bitstream or at least adjacent (if encoded with separate 1D VLCs) such that both the value and the position of a coefficient could be processed at the same time.
It would be desirable to implement a method and/or apparatus for parsing compressed video bitstreams that does not increase processing overhead more than compared with a 1D VLC.