Many existing image and video coding standards employ compression techniques in order to allow high-resolution images and video to be stored or transmitted as a relatively compact files or data streams. Such coding standards include Joint Photographic Experts Group (JPEG), Moving Pictures Experts Group (MPEG)-1, MPEG-2, MPEG-4 part 2, H.261, H.263, and other image or video coding standards.
In accordance with many of these standards, video frames are compressed using “spatial” encoding. These frames may be original frames (i.e., i-frames) or may be residual frames generated by a temporal encoding process that uses motion compensation. During spatial encoding, frames are broken into equal sized blocks of pixels. For example, an uncompressed frame may be broken into a set of 8×8 blocks of pixels. For each block of pixels, pixel components are separated into matrixes of pixel component values. For example, each block of pixels may be divided into a matrix of Y pixel component values, a matrix of U pixel component values, and a matrix of V pixel component values. In this example, Y pixel component values indicate luminance values and U and V pixel component values represent chrominance values.
Furthermore, during spatial encoding, a forward discrete cosine transform (FDCT) is applied to each matrix of pixel component values in a frame that is being encoded. An ideal one-dimensional FDCT is defined by:
            t      ⁡              (        k        )              =                  c        ⁡                  (          k          )                    ⁢                        ∑                      n            =            0                                N            -            1                          ⁢                              s            ⁡                          (              n              )                                ⁢          cos                      ⁣                              π          ⁡                      (                                          2                ⁢                n                            +              1                        )                          ⁢        k            )              2      ⁢      N      where s is the array of N original values, t is the array of N transformed values, and the coefficients c are given byc(0)=√{square root over (1/N)},c(k)=√{square root over (2/N)}for 1≦k≦N−1.
An ideal two-dimensional FDCT is defined by the formula:
            t      ⁡              (                  i          ,          j                )              =                  c        ⁡                  (                      i            ,            j                    )                    ⁢                        ∑                      n            =            1                                N            -            1                          ⁢                              ∑                          m              =              0                                      N              -              1                                ⁢                                    s              ⁡                              (                                  m                  ,                  n                                )                                      ⁢            cos                                ⁣                              π          ⁡                      (                                          2                ⁢                m                            +              1                        )                          ⁢        i                    2        ⁢        N              ⁢    cos    ⁣                    π        ⁡                  (                                    2              ⁢              n                        +            1                    )                    ⁢      j              2      ⁢      N      where s is the array of N original values, t is the array of N transformed values, and c(i,j) is given by c(i,j)=c(i)c(j), and with c(k) defined as in the one-dimensional case.
A matrix of coefficients is produced when the block of pixel component values is transformed using the FDCT. This matrix of coefficients may then be quantized and encoded using, for example, Huffman or arithmetic codes. A video bitstream represents the combined result of performing this process on all blocks of pixel component values in a series of video frames in an uncompressed series of video frames.
An uncompressed video frame may be derived from a video bitstream by reversing this process. In particular, to each matrix of coefficients in the bitstream is decompressed and the decompressed values are de-quantized in order to derive matrixes of transformed coefficients. An inverse discrete cosine transform (“IDCT”) is then applied to each matrix of transformed coefficients in order to derive matrixes of pixel component values. An ideal one-dimensional IDCT is defined by:
            s      ⁡              (        n        )              =                  ∑                  k          =          0                          N          -          1                    ⁢                        c          ⁡                      (            k            )                          ⁢                  t          ⁡                      (            k            )                          ⁢        cos              ⁣                    π        ⁡                  (                                    2              ⁢              n                        +            1                    )                    ⁢      k              2      ⁢      N      where s is the array of N original values, t is the array of N transformed values, and the coefficients c are given byc(0)=√{square root over (1/N)},c(k)=√{square root over (2/N)}for 1≦k≦N−1.An ideal two-dimensional IDCT is defined by the formula:
            s      ⁡              (                  m          ,          n                )              =                  ∑                  i          =          0                          N          -          1                    ⁢                        ∑                      j            =            0                                N            -            1                          ⁢                              c            ⁡                          (                              i                ,                j                            )                                ⁢                      t            ⁡                          (                              i                ,                j                            )                                ⁢          cos                      ⁣                              π          ⁡                      (                                          2                ⁢                m                            +              1                        )                          ⁢        i                    2        ⁢        N              ⁢    cos    ⁣                    π        ⁡                  (                                    2              ⁢              n                        +            1                    )                    ⁢      j              2      ⁢      N      The resulting matrixes of pixel component values are then reassembled into blocks of pixels and these blocks of pixels are be reassembled to form a decoded frame. If the decoded frame is an i-frame, the frame is now completely decoded. However, if the uncompressed frame is a predictive or a bi-predictive frame, the decoded frame is merely a decoded residual frame. A completed frame is generated by constructing a reconstructed frame using motion vectors associated with the decoded frame and then adding the reconstructed frame to the decoded residual frame.
Under ideal circumstances, no information is lost by using an FDCT to encode or an IDCT to decode a block of pixel component values. Consequently, under these ideal circumstances, a decoded version of a video frame is identical to the original version of the video frame. However, computing an FDCT or an IDCT may be computationally difficult because the computation of FDCTs and IDCTs involve the use of real numbers and significant numbers of multiplication operations. For this reason, real numbers used in FDCTs and IDCTs are frequently approximated using limited precision numbers. Rounding errors result from using limited precision numbers to represent real number values. Furthermore, quantization and dequantization may contribute additional errors.
Errors in the compression and decompression process may result in significant differences between the original uncompressed frame and the final uncompressed frame. For example, colors in the final uncompressed frame may differ from colors in the original uncompressed frame. Furthermore, errors caused by a mismatch between the encoder's implementation of the IDCTs and the decoder's implementation of the IDCT may accumulate during the encoding and decoding of sequences of predicted frames. These accumulated errors are commonly referred to as “IDCT drift”.