Digital television and DVD-video have been made possible by the standardization of video compression technology. A recent standard, ITU-T H.264 (hereinafter H.264), is enabling a new generation of applications. The H.264 standard does not explicitly define a codec. Rather the standard defines the syntax of an encoded video bitstream with a method of decoding the bitstream.
As part of the process to create an encoded video bitstream that can be decoded according to the method set forth in the H.264 standard, an encoder performs a transform and quantization. More specifically, the encoder divides data into macroblocks, and each macroblock is transformed, quantized and coded. Previous standards used the 8×8 Discrete Cosine Transform (DCT) as the basic transform that operates on floating-point coefficients. In contrast, a draft version of H.264 (T. Wiegand, ed., “Editor's Proposed Draft Text Modifications for Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC 14496-10 AVC), Draft 7, section 12.4.3) uses a DCT-like 4×4 integer transform but can apply the transform on a number of different block sizes (4×4, 4×8, 8×4, and 8×8). Coefficients from the transform stage undergo quantization. After quantization, quantized coefficients are entropy coded.
The decoding method of H.264 is the reverse of the encoding process described above. More specifically, encoded data undergoes entropy decoding, followed by the application of inverse quantization and an inverse transform. More specifically as set forth in an early draft of the H.264 standard (reference to JFCD), during decoder, after arranging quantized coefficients into a 2-dimensional array (of size either 4×4, 4×8, 8×4 or 8×8), inverse quantization is applied. After performing inverse quantization, inverse transforms are applied to the coefficients, typically first in the horizontal direction and then in the vertical direction. Finally the resulting values are scaled. In the case of 4×8, 8×4 or 8×8 block sizes, an additional scaling operation is performed between application of the horizontal and the vertical inverse transforms.
Irrespective of the block size (e.g., 4×4 block, 4×8 or 8×4 block, and 8×8 block) the same quantization parameter (QP) is used to indicate how fine or coarse the quantization was performed in the encoder. QP is usually a positive integer value between 0 and 51. In one prior art implementation, the QP for 4×8, 8×4 and 8×8 blocks is restricted to values 12 or larger, for example, as described in T. Wiegand, Ed., “Joint Final Committee Draft (JFCD) of Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC 14496-10 AVC) (hereinafter “Wiegand”), hereby incorporated by reference herein. As discussed therein, the inverse quantization was applied using an array, referred to as array V, that was different for each of three different blocks size types, namely 4×4 block, 4×8 or 8×4 block, and 8×8 block.
For a 4×4 block, array V is shown at FIG. 1A. For each quantized coefficient c_ij, the coefficient w_ij is obtained as a result of applying the array V to a block of quantized coefficients, in standard C-language notation, as:w—ij=(c—ij*R—ij(QP% 6))<<(QP/6)where R_ij(m) is equal to V_m0 if ij is in {00, 02, 20, 22}, equal to V_m1 if ij is in {11, 13, 31, 33} and equal to V_m2 otherwise, and where V_mn is the entry in the m-th row and n-column of the array V.
For an 8×4 or a 4×8 block, the array V is shown in FIG. 1B. For each quantized coefficient c_ij, the coefficient w_ij is obtained as a result of applying the array V to a block of quantized coefficients as:w—ij=(c—ij*R—ij(QP % 6))<<((QP/6)−2)where R_ij(m) is equal to V_m0 if i (for 4×8 blocks) or j (for 8×4 blocks) is in {0, 2}, and is equal to V_m1 otherwise.
For an 8×8 block, the array V is shown in FIG. 1C. For each quantized coefficient c_ij, the coefficient w_ij is obtained as a result of applying the array V to a block of quantized coefficients as:w—ij=(c—ij*R—ij(QP % 6))<<((QP/6)−2)where R_ij(m) is equal to V_m.
After the inverse quantization is performed, the inverse transformation is performed on the coefficients. As part of one implementation of the H.264 draft standard, applying an inverse transformation to the coefficients includes applying a horizontal transform, performing intermediate scaling, applying a vertical transform, and performing final scaling. Typically, the inverse transforms that are used are separable transforms, and thus typically two 1-dimensional transforms of sizes 4 and 8 respectively have been used.
Basis vectors define the inverse transformation. The basis vectors of one prior art transform of size 4 are defined by the matrix M4, shown in FIG. 2A, while the basis vectors of one prior art transform of size 8 may be defined by the matrix M8 as shown in FIG. 2B.
A horizontal transform in one prior art implementation is applied by performing a matrix multiplication between an array W of coefficients and the transpose of the corresponding transform matrix that includes the basis vectors (i.e. the transform matrix M4 for 4×4 and 4×8 blocks, and the transform matrix M8 for 8×4 and 8×8 blocks). The array Z′ containing the result of the horizontal transform is determined as:Z′=W*transpose(M4), for 4×8 and 4×4 blocks, andZ′=W*transpose(M8), for 8×8 and 8×4 blockswhere “*” represents a matrix multiplication.
Intermediate scaling is then carried out by scaling the matrix Z′ resulting from the horizontal transform according to:Z—ij=sign(Z′—ij)*((abs(Z′—ij)+(1<<(B −1))>>B),where Z_ij is a coefficient of the array Z′, B is 0 for 4×4 blocks, 2 for 4×8 and 8×4 blocks, and 7 for 8×8 blocks.
Next, the vertical transform is applied. Given the array Z, the vertical inverse transform is applied by performing a matrix multiplication between the array Z and the corresponding matrix that includes the basis vectors (i.e. M4 for 8×4 and 4×4 blocks, and M8 for 8×8 and 4×8 blocks). The array X′ containing the result of the vertical transform is determined as:X′=M4*Z, for 8×4 and 4×4 blocks, andX′=M8*Z, for 8×8 and 4×8 blocks.
After the vertical transform is applied, the final scaling is accomplished by scaling the results of the vertical transform according to:X—ij=(X′—ij+32)>>6.
The matrix multiplications using M4 or transpose (M4) are typically implemented as follows. Given an input vector w[0.3], the output vector x[0.3] is obtained by:z[0]=w[0]+w[2]z[1]=w[0]−w[2]z[2]=(w[1]>>1)−w[3]z[3]=w[1]+(w[3]>>1)x[0]=z[0]+z[3]x[1]=z[1]+z[2]x[2]=z[1]−z[2]x[3]=z[1]−z[3]The above procedure is applied four times to complete a matrix multiplication, once for each row or column of the input array.
Performing the inverse quantization in this fashion restricts a value for the quantization parameter QP to 0-51 for 4×4 blocks of information, and to a value of 12-51 for 4×8, 8×4 and 8×8 blocks of information, thereby limiting the highest quality achievable with transforms other than 4×4.
Furthermore, performing the inverse transformation as described above, requires multiplication operations during the inverse horizontal and vertical transforms, and intermediate scaling, at least for operations on 8×4, 4×8 and 8×8 blocks of information. Such multiplications consume significant processing operations, and the basis vectors of the transform may therefore require modifications enabling fast implementations as in the 4×4 case.
Moreover, to perform the scaling described above, a different table is used based on which inverse transform is being performed. That is, since there are multiple transforms based on the different block sizes, there are a number of tables that must be used when performing the scaling. Requiring the use of multiple tables may not be the most efficient implementation.
Additionally, inverse transformation often requires registers of more than 16 bits in size, consuming additional processor resources by limiting the number of operations that can be executed in parallel on a SIMD architecture (SIMD=Single Instruction Multiple Data, e.g. MMX on Intel processors). One or more of the above disadvantages may exist in encoders as well.