There has recently been a great deal of interest in supporting a wide array of standards for video encoding and decoding in consumer products. Digital video standards of commercial interest include: the International Standards Organization (ISO) MPEG-2 and MPEG-4 standards; the Microsoft® VC-1 draft standard; the International Telecommunication Union Telecommunication Standardization Sector (ITU-T) H.263 and H.264 standards; the On2 VP6 standard; and the digital videotape (DV) standard.
It is likely that multi-standard video encoders and decoders will become more prevalent in coming years in a wide array of products. Such products may include: set-top boxes for receiving video over cable, digital subscriber line (DSL), satellite link and/or the Internet; digital TVs; personal video recorders; handheld devices (including personal digital assistants, dedicated personal video players and mobile phones); and wireless devices. High performance processors will be required for running applications on these widely varied products.
Fixed function variable length coder/decoder units are available, where “fixed function” refers to the fact that such units are dedicated to a particular standard, for example, the H.264 standard. Fixed function variable length coder/decoder units may also be found in typical MPEG-2 video encoder and video decoder chips including the Xilleon™ 200 family of chips from ATI Technologies Inc. of Markham, Ontario, Canada. Unfortunately, fixed function variable length coder/decoder units lack the flexibility to work across a variety of video compression methods.
It is also known that variable length coding/decoding may also be performed by a general purpose reduced instruction set computer (RISC) processor or a complex instruction set computer (CISC) processor, such as those found in personal computers. However, such high-performance general purpose processors are hard to characterize in terms of worst case performance, due to the dependence of general purpose processors on the statistical behavior of instruction and data caches. Additionally, general purpose processors are expensive to implement, as they require large die area. General purpose processors are also relatively inefficient at handling variable-length data, since they are implicitly designed for processing byte-aligned data.
Many of the digital video standards transform eight pixel by eight pixel arrays that are representative of a portion of a frame in a digital video sequence. The result of this transformation may be called a block of coefficients. The block of coefficients may be encoded using variable length codes as a form of efficient compression. A macroblock may be defined to include four eight-by-eight luminance blocks of coefficients and two eight-by-eight chrominance blocks of coefficients.
The output of a video encoder in some standards is known as an Elementary Stream (ES). The lowest-level entity in the ES is a encoded block of coefficients. Each encoded block is terminated by an end-of-block code. A macroblock may be formed by concatenating the four luminance blocks and the two chrominance blocks. The six encoded blocks may be preceded by a macroblock header that contains control information belonging to the macroblock: spatial address, motion vectors, prediction modes, field/frame DCT mode, quantizer step size. The result is a coded macroblock.
Variable length codes often arise out of attempts to compress an amount of data to be transmitted. One type of variable length code is the “run-level” type of code. Run-level codes recognize situations in which a sequence of values are to be transmitted, where many of the values are null (0). The code replaces a long series of null values with an indication of the value (level) that follows the series and an indication of the length (run) of the series. Through the use of such a code, a series of 28 0-valued bits that precede a value of 17 may be reduced to a indication of the 17 (a level value) and five bits (a run value) indicating that there is a series of 28 null values ahead of the 17.
Compression codes also include so-called entropy encoding schemes wherein the most common symbols are mapped to the shortest code strings. An example of an entropy encoding scheme is Huffman coding, which is used in the MPEG-2 standard. In the MPEG-2 standard, for example, blocks of coefficients are first run-level encoded and then each run-level combination (symbol) is Huffman encoded.
Run-level encoding, in particular, requires that a block of coefficients be written into locations in a coefficient buffer. A run-level encoder may then read the coefficient buffer, location by location, to determine run-level combinations representative of the block of coefficients. In the reverse, decoding, case, a run-level decoder receives run-level combinations and uses the run-level combinations to formulate a block of coefficients in a coefficient buffer. As there are typically many null entries in a block of coefficients, it may be considered that the writing, by the run-level decoder, of null entries to the coefficient buffer is inefficient.
Huffman encoding, in particular, requires that a code be determined to correspond to each received run-level combination. For the sake of efficiency, a single code may map to more than one run-level combination. For instance, a maximum run value may be defined and a given run-level combination may include a given run value and a given level value. When the given run value exceeds the maximum run value, the maximum run value may be subtracted from the given run value to provide an intermediate run value. The code generated then corresponds to a combination of the intermediate run value and the given level value. To distinguish this code from the same code generated when the received run value is equivalent to the intermediate run value just determined, the former code may be preceded by an “escape code”. In particular, the escape code may identify the following code to be a “delta-run” code. A “delta-level” code may be similarly determined and identified.
Determining an escape code to generate based on received run and level values can be a complex and processor-time consuming exercise. Existing encoders are known to either implement a hardwired approach, which is inherently inflexible, or implement a programmed approach on a general-purpose RISC processor or a purpose-specific processor. Unfortunately, such existing approaches do not specifically accelerate the determining and handling of escape codes.
Clearly, then, there is a need for methods and apparatus for efficiently encoding and decoding data that manipulates variable length coded data efficiently at very high processing rates.