Many existing image and video coding standards employ compression techniques in order to allow high-resolution images and video to be stored or transmitted as relatively compact files or data streams. Such coding standards include Joint Photographic Experts Group (“JPEG”), Moving Pictures Experts Group (“MPEG”)-1, MPEG-2, MPEG-4 part 2, H.261, H.263, H.264/Advanced Video Coding (“H.264/AVC”) and other image or video coding standards.
In accordance with many of these standards, video frames are compressed using “spatial” encoding. These frames may be original frames (i.e., i-frames) or may be residual frames generated by a temporal encoding process that uses motion compensation. During spatial encoding, frames are broken into equal sized blocks of pixels. For example, an uncompressed frame may be broken into a set of 8×8 blocks of pixels. For each block of pixels, pixel components are separated into matrixes of pixel component values. For example, each block of pixels may be divided into a matrix of Y pixel component values, a matrix of U pixel component values, and a matrix of V pixel component values. In this example, Y pixel component values indicate luminance values and U and V pixel component values represent chrominance values.
Furthermore, during spatial encoding, a forward discrete cosine transform (“FDCT”) is applied to each matrix of pixel component values in a frame that is being encoded. An ideal one-dimensional FDCT is defined by:
      t    ⁡          (      k      )        =            c      ⁡              (        k        )              ⁢                  ∑                  n          =          0                          N          -          1                    ⁢                        s          ⁡                      (            n            )                          ⁢        cos        ⁢                                                            π                ⁡                                  (                                                            2                      ⁢                      n                                        +                    1                                    )                                            ⁢              k                        )                                2            ⁢            N                              where s is the array of N original values, t is the array of N transformed values, and the coefficients c are given by:c(0)=√{square root over (1/N)},c(k)=√{square root over (2/N)}for 1≦k≦N−1.
An ideal two-dimensional FDCT is defined by the formula:
      t    ⁡          (              i        ,        j            )        =            c      ⁡              (                  i          ,          j                )              ⁢                  ∑                  n          =          1                          N          -          1                    ⁢                        ∑                      m            =            0                                N            -            1                          ⁢                              s            ⁡                          (                              m                ,                n                            )                                ⁢          cos          ⁢                                                    π                ⁡                                  (                                                            2                      ⁢                      m                                        +                    1                                    )                                            ⁢              i                                      2              ⁢              N                                ⁢          cos          ⁢                                                    π                ⁡                                  (                                                            2                      ⁢                      n                                        +                    1                                    )                                            ⁢              j                                      2              ⁢              N                                          where s is the array of N original values, t is the array of N transformed values, and c(i,j) is given by c(i,j)=c(i)c(j), and with c(k) defined as in the one-dimensional case.
A matrix of coefficients is produced when a block of pixel component values is transformed using the FDCT. This matrix of coefficients may then be quantized and encoded using, for example, Huffman or arithmetic codes. A video bitstream represents the combined result of performing this process on all blocks of pixel component values in video frames in an uncompressed series of video frames.
An uncompressed video frame may be derived from a video bitstream by reversing this process. In particular, each matrix of coefficients in the bitstream video is decompressed and the decompressed values are inverse quantized in order to derive matrixes of inverse quantized coefficients. An inverse discrete cosine transform (“IDCT”) is then applied to each matrix of inverse quantized coefficients in order to derive matrixes of pixel component values. An ideal one-dimensional IDCT is defined by:
      s    ⁡          (      n      )        =            ∑              k        =        0                    N        -        1              ⁢                  c        ⁡                  (          k          )                    ⁢              t        ⁡                  (          k          )                    ⁢      cos      ⁢                                    π            ⁡                          (                                                2                  ⁢                  n                                +                1                            )                                ⁢          k                          2          ⁢          N                    where s is the array of N original values, t is the array of N transformed values, and the coefficients c are given byc(0)=√{square root over (1/N)},c(k)=√{square root over (2/N)}for 1≦k≦N−1.
An ideal two-dimensional IDCT is defined by the formula:
      s    ⁡          (              m        ,        n            )        =            ∑              i        =        0                    N        -        1              ⁢                  ∑                  j          =          0                          N          -          1                    ⁢                        c          ⁡                      (                          i              ,              j                        )                          ⁢                  t          ⁡                      (                          i              ,              j                        )                          ⁢        cos        ⁢                                            π              ⁡                              (                                                      2                    ⁢                    m                                    +                  1                                )                                      ⁢            i                                2            ⁢            N                          ⁢        cos        ⁢                                            π              ⁡                              (                                                      2                    ⁢                    n                                    +                  1                                )                                      ⁢            j                                2            ⁢            N                              The resulting matrixes of pixel component values are then reassembled into blocks of pixels and these blocks of pixels are reassembled to form a decoded video frame. If the decoded video frame is an intra-coded frame, the video frame is now completely decoded. However, if the decoded video frame is an inter-coded frame, the decoded video frame is merely a decoded residual frame. A completed frame is generated by constructing a predicted frame using motion vectors associated with the decoded video frame and then adding the predicted frame to the decoded residual frame.