1. Field of the Invention
The present invention relates to video encoders and, more particularly, to a method and apparatus for selecting a reference frame for motion estimation in video encoding.
2. Description of the Background Art
Video compression technology plays an important role in video storage and video transmission over wired and wireless channels. The technology reduces the representation of the original video data by applying mathematical transformation, quantization, and encoding to reduce redundancies within a video sequence. An important process in video compression is motion estimation and motion compensation.
A video sequence is composed of a sequence of frames. When a video sequence is compressed, a frame is coded based on its difference from another frame. Each frame is divided into blocks. A block is coded based on the difference from a matching block on another frame, which is referred to as a “reference frame.” The process of identifying such a matching block is known as motion estimation. Motion estimation also identifies the position of the best matching block relative to the reference block, which is referred to as the “motion vector.”
While there are many criteria and techniques for fining a motion vector, the most commonly used approach is to identify a displacement vector (i,j) that minimizes the distortion between two blocks, or:
                                          D            ⁡                          (                              i                ,                j                            )                                =                                    ∑                              m                =                0                                            M                -                1                                      ⁢                                          ∑                                  n                  =                  0                                                  N                  -                  1                                            ⁢                                                                                    r                                          m                      ,                      n                                                        -                                      S                                                                  m                        +                        i                                            ,                                              n                        +                        j                                                                                                                                                ,        i        ,                  j          ∈                                    [                                                -                  p                                ,                                  p                  -                  1                                            ]                        .                                              Eq        .                                  ⁢        1            The quantity rm,n is a pixel (m,n) on a reference block r; Sm+i,n+j is a pixel (m+i, n+j) on a candidate block S; [−p, p−1] is a search window for the motion vector; M×N is the dimension of a coding block; and D(i,j) is known as a sum of absolute difference (SAD).
Video encoding standards, such as MPEG-2, H.263, MPEG-4 (part 2), and H.264 (also known as MPEG-4 part 10 or Advanced Video Coding (AVC)), provide a syntax for compressing a video sequence. In MPEG-2, H.263, and MPEG-4, motion estimation is computed using a single reference frame. This reference frame is the most recently encoded, and hence reconstructed, frame prior to coding the current frame. Newer standards, such as H.264, allow for multiple reference frames in which the candidate block can be searched. This is due to the fact that certain video scenes may have sudden scene changes or cuts, where the best reference frame is not the very last reconstructed frame. Allowing the encoder to use the best matched reference frame rather than always the very last reconstructed frame provides for improved coding efficiency.
One technique for selecting a candidate block for motion estimation among multiple reference frames is to test every block within a search window on every reference frame using Equation 1. The computation in finding a satisfactory result is costly and time consuming. Thus, this technique is impractical for many real-time applications. Simplifying the process and computations for the decision of selecting a reference frame from which the candidate block is to be selected will reduce encoding cost.
The reference implementation from the H.264 standards uses an exhaustive approach for determining a candidate reference frame. The exhaustive approach examines all possible reference frames and their corresponding bit-rate cost and distortion. The goal of this technique, referred to as the rate-distortion optimization, is to achieve the best encoding result in terms of visual quality and coding rates. In order to perform the rate-distortion optimization, the encoder encodes the video by exhaustively searching the best reference frame in the rate-distortion sense among all possible reference frames. As a result, the computational complexity of the encoder using the rate-distortion optimization is increased dramatically, which limits or prohibits the use of such an encoder for practical applications, such as real-time video communications.
Accordingly, there exists a need in the art for an improved method and apparatus selecting a reference frame for motion estimation in video encoder.