Moving Picture Experts Group (MPEG) video compression is currently used in many video products such as digital television set-top boxes, DSS, HDTV decoders, DVD players, video conferencing, Internet video, and other applications. These products benefit from MPEG video compression since compressed video requires less storage space for video information and less bandwidth for the transmission of the video information.
An MPEG video is a sequence of video frames comprised of intra coded I-frames and/or inter coded P and B-frames, as is well known in the art. Each video frame is typically divided into sub-sections of macro blocks (16×16 pixels). A macro block typically includes sub-sections of four 8×8 luminance blocks and two 8×8 chrominance blocks. A luminance block specifies brightness information (e.g., luminance image coefficients) about the pixels in the block, while the two chrominance blocks specify Cr and Cb color information (e.g., Cr and Cb image coefficients) about the pixels in the macro block.
MPEG video encoding and decoding processes typically use discrete cosine transform (“DCT”) and inverse DCT (“iDCT”) to encode and decode blocks. A DCT operation takes image values defined in a spatial domain and transforms them into a frequency domain. The DCT operation transforms the inputted image values into a linear combination of weighted basis functions. These basis functions are the frequency components of the inputted image values. As such, when a DCT operation is applied to a block of image values, it yields a block of weighted values corresponding to how much of each basis function is present in the original image to be encoded.
For most images, most of the image information lies at low frequencies which appear in the upper-left corner of the DCT-encoded block. The lower-right values of the DCT-encoded block represent higher frequencies, and are often small enough to be neglected with little visible distortion. The top left corner value in the DCT-encoded block is the DC (zero-frequency) component and lower and rightmore entries represent larger vertical and horizontal spatial frequencies.
The DCT operation is a separable transform in that the matrix that defines this transformation is decomposable into two matrices, one that corresponds to a column transform and another that corresponds to a row transform. Thus it can be implemented as two one-dimensional (1D) transforms. In other words, a two-dimensional (2D) DCT is just a 1D DCT applied twice, once in the column direction and once in the row direction. In the case of a 1D 8-point DCT, the first coefficient (the DC coefficient) represents the average value of the image values and the eighth coefficient represents the highest frequencies found in the image. An iDCT operation is used to convert the frequency coefficients back into the image information.
DCT encoding of a block is a 2D transformation operation that can be expressed by the following formula:
                              F          ⁡                      (                          u              ,              v                        )                          =                                            C              u                        2                    ⁢                                    C              v                        2                    ⁢                                    ∑                              y                =                0                            7                        ⁢                                                  ⁢                                          ∑                                  x                  =                  0                                7                            ⁢                                                          ⁢                                                f                  ⁡                                      (                                          x                      ,                      y                                        )                                                  ⁢                                  cos                  ⁡                                      [                                                                                            (                                                                                    2                              ⁢                              x                                                        +                            1                                                    )                                                ⁢                        u                        ⁢                                                                                                  ⁢                        π                                            16                                        ]                                                  ⁢                                  cos                  ⁡                                      [                                                                                            (                                                                                    2                              ⁢                              y                                                        +                            1                                                    )                                                ⁢                        v                        ⁢                                                                                                  ⁢                        π                                            16                                        ]                                                                                                                        with          :                                          ⁢                      C            u                          =                  {                                                                                          1                                          2                                                                                        if                                                                      u                    =                    0                                                                                                1                                                  if                                                                      u                    >                    0                                                                        ;                                          C                v                            =                              {                                                                                                    1                                                  2                                                                                                            if                                                                                                                v                          =                          0                                                ,                                                                                                                        1                                                              if                                                                                      v                        >                        0                                                                                            ⁢                                                                                                          In the formula above, a column dimension of the block is represented by x values and a row dimension of the block is represented by y values, so that f(x,y) is the image information at position [x,y] of the block. As such, F(u,v) is the 2D-encoded image information at position [u,v] of the 2D-encoded block.
DCT encoding is a separable two-dimensional (2D) transform operation. The separable nature of the DCT encoding operation can be exploited by (1) performing a first one-dimensional (1D) DCT operation in the column direction of the image block to produce a 1 D-encoded block, and then (2) performing a second 1D DCT operation in the row direction of the 1 D-encoded block to produce a 2D-encoded block. Alternatively, the first 1D DCT operation can be performed in the row direction of the block and the second 1D DCT operation performed in the column direction of the block. The scaled-version of the Chen method can be used to perform the two 1D DCT operations. This scaled-version is described in the paper “2D Discrete Cosine Transform,” which can be found on the Internet, incorporated herein by reference.
The Chen algorithm is an efficient implementation of the DCT operation that requires a fewer number of computations than a straightforward implementation of the DCT. While a straightforward implementation of the DCT requires a number of computations that is proportional to N^2 (where N=8 for an 8-point DCT), the Chen algorithm exploits symmetry and periodicity inherent in the DCT calculation to reduce the number of computations to an amount proportional to N log(N).
FIG. 1 presents a flowchart of a conventional process 100 that DCT encodes a block and outputs the DCT-encoded block. This process uses two separate 1D DCT transform operations. The process initially performs (at 105) a 1D DCT operation on the block in the block's column direction to produce a 1D-encoded block. The process then performs (at 110) a transpose operation on the 1D-encoded block to produce a transposed 1D-encoded block. A transpose operation interchanges the row and columns of an array. In other words, a transpose AT of an array A is an array that is symmetrically related to the array A, such that row i in AT is column j in A, and column j in AT is row i in A.
The process then performs (at 115) a 1D DCT operation on the transposed 1D-encoded block to produce a transposed 2D-encoded block. The process 100 performs the 1D DCT operation at 115 in the column direction of the transposed 1D-encoded block. Therefore, the 1D DCT operation is actually being performed in the row direction of the block since the result of the initial transformation operation at 105 was transposed at 110. The process then performs (at 120) a transpose operation on the transposed 2D-encoded block to produce a 2D-encoded block.
After the process performs (at 120) the second transpose operation, the process quantizes (at 125) the 2D-encoded block to produce a quantized 2D-encoded block. For an MPEG encoding, the quantization entails dividing each value of the 2D DCT-encoded block by a value of a quantization matrix. Because of the scalar values in the quantization matrix, this division often results in the reduction of the values of the DCT-encoded block.
To produce a bit stream of values (i.e., data stream), the process then rasterizes (at 130) values of the quantized 2D-encoded block according to a zig zag scan order. A zig zag scan order is commonly used to arrange DCT-coded image coefficients of an image block into a bit stream. FIG. 2 illustrates a conventional zig zag scan order 205 of an 8×8 image block 210. The block 210 contains image coefficients C0, C1, C2, etc. that are numbered from left to right and top to bottom. The zig zag scan order specifies the following sequence for outputting image coefficients to produce the bit stream: C0, C1, C8, C16, C9, C2, . . . After 130, the process 100 ends.
A DCT decoder performs an inverse DCT transformation on a DCT encoded block to reconstruct the block. DCT decoding of a block is also a two-dimensional (2D) transformation operation, which can be expressed by the following formula:
      f    ⁡          (              x        ,        y            )        =            ∑              u        =        0            7        ⁢                  ⁢                  ∑                  v          =          0                7            ⁢                        F          ⁡                      (                          u              ,              v                        )                          ⁢                              C            u                    2                ⁢                              C            v                    2                ⁢                  cos          ⁡                      [                                                            (                                                            2                      ⁢                      x                                        +                    1                                    )                                ⁢                u                ⁢                                                                  ⁢                π                            16                        ]                          ⁢                  cos          ⁡                      [                                                            (                                                            2                      ⁢                      y                                        +                    1                                    )                                ⁢                v                ⁢                                                                  ⁢                π                            16                        ]                              In the formula above, the columns of the 2D-encoded block are represented by u values and the rows of the 2D-encoded block are represented by v values, so that F(u,v) is the encoded image data at position [u,v] of the block. As such, f(x,y) is the image data at position [x,y] of the block.
Like DCT encoding, DCT decoding is a separable two-dimensional (2D) transform operation. The separable nature of the iDCT decoding can be exploited by (1) performing a first 1D iDCT process in the column direction of the 2D-encoded block to produce a 1D-encoded block and then a second 1D iDCT process in the row direction of the encoded block to produce the block. Alternatively, the first 1D iDCT operation can be performed in the row direction of the 2D-encoded block and the second 1D iDCT operation can be performed in the column direction of the encoded block. The scaled-version of the Chen method can be used to perform two 1D iDCT operations. This scaled-version is described in the paper “2D Inverse Discrete Cosine Transform,” which can be found on the Internet, incorporated herein by reference.
FIG. 3 presents a flowchart of a conventional process 300 that receives a DCT-encoded bit stream, generates a DCT-encoded block, and decodes the block. The process starts when it receives a bit stream of values. The process parses out and derasterizes (at 305) the values of the bit stream and stores the values in a block according to the zig zag scan order illustrated in FIG. 2. The block at this stage is referred to as a quantized 2D-encoded block.
The process then performs (at 307) an inverse quantization process on the quantized 2D-encoded block to produce a 2D-encoded block. For MPEG decoding, the quantization entails multiplying each value of the quantized 2D-encoded block by a value of a quantization matrix. Because of the scalar values in the quantization matrix, this multiplication often increases the values of the DCT-encoded block.
The process then performs (at 310) a 1D iDCT operation on the 2D-encoded block in the block's column direction. This operation results in a 1D DCT-encoded block. The process then performs (at 315) a transpose operation on the 1D-encoded block to produce a transposed 1D-encoded block. The process then performs (at 320) a 1D iDCT operation on the transposed 1D-encoded block to produce a transposed block. The 1D iDCT operation is performed in the block's column direction. Therefore, the 1D iDCT operation at 320 is actually being performed in the row direction of the block since the result of the initial transformation operation at 310 was transposed at 315. The process then performs (at 325) a transpose operation on the transposed block to produce a DCT-decoded block. After 325, the process ends.
A conventional MPEG encoder often includes a feedback decoding loop that decodes DCT-encoded blocks. MPEG encoders have such feedback loops in order to reconstruct previous frames that they will use in the encoding of subsequent frames. FIG. 4 presents a conventional encoding process 400 that employs such a feedback loop. The operations of the process 400 are similar to the operations of the processes 100 and 300 of FIGS. 1 and 3. Hence, similar numbers are used to described similar operations in these figures.
The process 400 starts when it receives a block. The process initially performs (at 105) a 1D DCT operation on the received block to produce a 1D DCT-encoded block. The process then performs (at 110) a transpose operation on the 1D DCT-encoded block to produce a transposed 1D DCT-encoded block. The process next performs (at 115) a 1D DCT operation on the transposed 1D-encoded block to produce a transposed 2D-encoded block. The process then performs (at 120) a transpose operation on the transposed 2D-encoded block to produce a 2D-encoded block. The process produces (at 125) a quantized 2D-encoded block by performing a quantization operation on the 2D-encoded block.
The feedback operations then commence at 307, when the process performs an inverse quantization operation on the quantized 2D-encoded block to produce a 2D-encoded block. The process then performs (at 310) a 1D iDCT operation on the 2D-encoded block. This operation produces a 1D DCT-encoded block. The process performs (at 315) a transpose operation on the 1D DCT-encoded block, in order to produce a transposed 1D DCT-encoded block. The process then performs (at 320) another 1D DCT operation on the transposed 1D-encoded block. This operation produces a transposed block that has been fully decoded. The process then performs (at 325) a transpose operation on the transposed block. This transposition operation results in a block that but for the lossy quantization operation would have been identical to the block received by the process 400.
Traditional video encoders and decoders require a lot of computational resource. For instance, transposition operations of conventional encoders and decoders are computationally intensive. Therefore, there is a need in the art for video encoders and decoders that require less computational resources. In particular, there is a need for encoders and decoders that would provide more efficient ways of performing transposition operations. Ideally, such encoder and decoders would adaptively perform their transposition operations based on their received data.