The present invention relates to compression coding of video signals, and more particularly to a method for improving the efficiency of predicting the number of bits required to encode a matrix of transform coefficients for the International Telecommunication Union—Telecommunication Standardization Sector (ITU-T) H.264 video compression standard (H.264). As in previous video coding standards, the Video Coding Layer (VCL) of H.264 uses a hybrid of temporal and spatial prediction as well as transform coding to compress video sequences.
In the coding process, a picture is divided into slices, each slice representing a portion of the picture that can be decoded independently of the rest of the picture. The slices are divided into macroblocks. A macroblock consisting of a 16×16 block of luma samples and two corresponding 8×8 blocks of chroma samples is used as the basic H.264 processing unit. An H.264 compliant encoder will calculate a predicted value for each sample in a macroblock. The predicted values are then subtracted from the actual values to form prediction residuals. Once the prediction residuals have been calculated, they are transformed, generating 4×4 blocks of transform coefficients that are then scaled and quantized. The quantized transform coefficients are entropy encoded and transmitted. Two methods for encoding the quantized transform coefficients are supported by H.264. One is Context-Adaptive Variable Length Coding (CAVLC), the other is Context-Adaptive Binary Arithmetic Coding (CABAC). CAVLC maps syntax elements to various Variable Length Coding (VLC) tables using information from already transmitted syntax elements. This mapping improves the entropy encoding performance compared to using a single VLC table. The encoder can use a variety of methods, called coding modes, to encode a macroblock.
The selection of coding modes available to the encoder for a given macroblock depends on the type of slice the macroblock belongs to. In the main profile, three slice types are supported by the H.264 standard: intra-coded (I) slices, predictive-coded (P) slices, and bi-predictive (B) slices. The specific mode selected from the modes available for coding a macroblock from a particular type of slice depends on the image content of that macroblock. There are a total of 13 intra-coding modes available to all macroblocks regardless of slice type. Additionally 5 inter-coding modes are available for P slices (plus skipped) and 23 inter-coding modes are available for B slices (plus skipped). P-slice and B-slice macroblocks are treated similarly in terms of their division into sub-blocks and the coding modes used on the sub-blocks. However, unlike P-slice macroblocks, macroblocks from B-slices can use two distinct reference picture buffers, respectively called the first and second reference picture buffers.
Almost any encoder for the H.264 standard performs some variation of a rate-distortion (RD) optimization algorithm to determine the best coding mode for each macroblock. This process is known as mode decision. The RD optimization algorithm is applied on a macroblock by macroblock basis and attempts to find the best trade-off between the number of bits needed to encode a given macroblock versus the magnitude of the prediction residuals of the given macroblock. The trade-off function is represented by the equation:J(MB, mode, Q)=SSD(MB)+λ(Q)*R(MB, mode, Q)  (equation 1)where MB represents the given macroblock, including the original picture elements (pels), reconstructed pels and slice type. SSD is a difference term, typically a sum of squared differences, λ(Q) is a multiplier dependent on the quantizer Q and the slice type, and R is the number of bits needed to encode the macroblock. For each MB to be encoded, a transform needs to be calculated for each coding mode available for the MB's slice type. This, as well as the calculation of the SSD teens, must be done before the calculation of R(MB, mode, Q). The best trade-off is obtained by varying the mode for a given Q and MB in order to minimize the function J. In any practical implementation, Q is fixed for each macroblock.
Due to the entropic nature of the encoding, a large number of transform coefficients tend to equal positive one, negative one or zero. Non-zero coefficients tend to be found in the upper left hand portion of a given matrix and the coefficients in the bottom right hand corner of the matrix tend to equal zero, as shown in FIG. 1 for a generic transform coefficient matrix. That is why the H.264 standard specifies two different “zig-zag scans” (one for frame encoding and a different one for field encoding) to re-order the matrix coefficients in the sequence c0-c15 such that the coefficients more likely to be non-zero are ordered first, followed by the coefficients more likely to equal zero. This prevents the encoder from having to take up bandwidth transmitting a string of essentially null information. Much of the compute time of an H.264 video encoder is spent minimizing J(MB, mode, Q). This necessitates calculating R(MB, mode, Q) for all applicable coding modes for every single macroblock. Calculation of R(MB, mode, Q) is a very computationally intensive process. Therefore, in order to have a good real-time encoder, R(MB, mode, Q) needs to be calculated in as efficient a manner as possible.
What is needed is a method for quickly determining R(MB, mode, Q) that works on most H.264 transform matrices.