When video data is transmitted in real-time, it is desirable to send as little data as possible. Data-reducing, coding/decoding of digital video signals is, in many cases, based on a motion-compensated interpolation of picture element values (interframe coding). For this purpose, movement vectors or displacement vectors are required for picture element (pixel) blocks. These movement vectors are normally generated in the encoder by means of movement estimation. A system for processing motion video information generally employs a video encoder. The video encoder estimates motion within a video signal to process the video signal.
Motion estimation is a very important process in a standard video encoder, such as H.263 and MPEG-4 etc., for obtaining a high video-compression rate by removing elements that repeat between adjacent frames. A motion-compensation technique predicts a video signal most similar to an input video signal from a previous frame through a motion estimation technique, and to convert and encode a difference between the predicted video signal and the input video signal.
A video sequence is divided into group of frames, and each group can be composed of a series of single frames. Each frame is roughly equivalent to a still picture, with the still pictures being updated often enough to simulate a presentation of continuous motion. A frame is further divided into macroblocks. In H.26P and MPEG-X standards, a macroblock is made up of 16 by 16 luma pixels and a corresponding set of chroma pixels, depending on the video format. A macroblock (MB) has an integer number of blocks, with the 8 by 8 pixel matrix being the smallest coding unit.
Video compression is a critical component for any application which requires transmission or storage of video data. Compression techniques compensate for motion by reusing stored information in previous frames (temporal redundancy). Compression also occurs by transforming data in the spatial domain to the frequency domain. Hybrid digital video compression, exploiting temporal redundancy by motion compensation and spatial redundancy by transformation, such as Discrete Cosine Transform (DCT), has been adapted in H.26P and MPEG-X international standards.
Motion estimation is used to reduce the flow of transmitted data. Motion estimation is performed over two frames, the current frame to be encoded and the previous coded frame, also called reference frame, to derive video data matching between the two frames. In practice, video compression, including motion estimation, is carried out macroblock-wise (a whole macroblock at a time), to facilitate hardware and software implementations. Motion estimation is performed for each macroblock using a 16 by 16 matrix of luma pixels. (Handling just luma pixels simplifies procedures, and the human visual system has a higher sensitivity to luminance changes over color changes.). The goal of motion estimation, for each macroblock, is to find a 16 by 16 data area in the previous frame which best represents the current macroblock. For a macroblock in the current frame, the best matching area in the last frame is used as the prediction data for the current macroblock, while the prediction error, the residue after subtracting the prediction from the macroblock data, is removed of temporal data redundancy. Temporal redundancy refers to the part of the current frame data that can be predicted from the previous frame. The removal of redundancy, or subtracting prediction values, eliminates the need to encode the repeated part of the data.
In several algorithms for motion estimation, a block matching algorithm (BMA) is most frequently used because the BMA is comparatively simple in a calculation. The BMA is a method of searching a block most similar to a current block from a search region of a previous frame. A full search block matching algorithm (FSBMA) as a basic method is optimum from an aspect of performance, but this algorithm is highly computing intensive and requires the use of special-purpose architectures to obtain real-time performance. Therefore, a high-speed algorithm such as a hierarchical search block matching algorithm (HSBMA) is used, in which motion estimation is performed by dividing an input video frame and a previous video frame into several resolutions. The HSBMA is a technique that a motion vector candidate of a large scale is obtained from a video frame at a low resolution and an optimum motion vector is then searched from within a video frame of a higher resolution. A multi-resolution search using multiple candidate and spatial correlation of motion field (MRMCS) is a high-speed hierarchical search block matching algorithm for an efficient motion estimation together with an advantage of realizing a hardware of the HSBMA.
A technique for the MRMCS algorithm is classified into upper, medium and lower steps based on the understanding that the resolution of the video is lowered in each step. About 90% of the calculation amount for the motion estimation is used in the medium and lower steps.
To estimate such a motion, a method of recovering damaged data within one frame of a motion video is disclosed in U.S. Pat. No. 5,598,226, in which a motion is estimated by searching for a block of a previous frame corresponding to a block of a current frame, and the HSBMA is provided as a method of calculating a mean absolute error (MAE) of one block of a current frame and peripheral blocks of its corresponding previous frame, and of comparing the MAE with a predetermined threshold value. That is, MAE0 is first calculated for blocks of the same position in a lower resolution video, and this MAE0 is compared with the threshold value. If its comparison result is smaller than the threshold value, it is decided there is no motion, or the MAE is calculated for the peripheral blocks to obtain a minimum MAE (MAEmin); and if the obtained minimum MAE (MAEmin) is greater than the calculated MAE0, it is decided as no motion. Then, a motion vector corresponding to the minimum MAE (MAEmin) is decided as a candidate of a next step to search a final motion vector through the same procedure in a higher video resolution.
Further, a method of estimating a motion by using a pixel difference classification (PDC) is disclosed in U.S. Pat. No. 5,200,820, in which a threshold value is predetermined, and a difference of pixels is compared with the threshold value in each of blocks within a search region of a previous frame on a corresponding block of a current frame, so as to discriminate a matching or mismatching. And then, a sum obtained by applying such a value to all pixels of a corresponding block selects the largest block for the total search points to thus estimate a motion.
An adaptive step size motion estimation algorithm based on a statistical SAD (Sum of Absolute Differences) is disclosed in U.S. Pat. No. 6,014,181, in which a step size is varied by using a statistical distribution of an SAD of previous frames, instead of a fixed step size used in a TSS (Three Step Search) algorithm to improve a motion estimating speed.
The sum of absolute difference (SAD) is an effective and widely adapted criteria to provide an accurate representation to relate motion estimation with coding efficiency. For the macroblock at (x, y) position, the SAD value between the current macroblock and a 16 by 16 block in the previous frame offset by (vx, vy) is
                              SAD          ⁡                      (                          vx              ,              vy                        )                          =                              ∑                          j              =              0                        15                    ⁢                                    ∑                              i                =                0                            15                        ⁢                                                                          p                  ⁡                                      (                                                                  x                        +                        i                                            ,                                              y                        +                        j                                                              )                                                  -                                  q                  ⁡                                      (                                                                  x                        +                        i                        +                        vx                                            ,                                              y                        +                        j                        +                        vy                                                              )                                                                                                                        [                  SAD          ⁢                                          ⁢          equation                ]            where, p(x+i, y+j) is a pixel value in the current macroblock of the current frame, q(x+i+vx, y+j+vy) is a pixel value in the previous frame, in a 16 by 16 (i.e., 16×16) block that is offset by (vx, vy) from the current macroblock. The summation indices i and j cover the area of the macroblock. If SAD(vx, vy) is the minimum in the pre-specified search range, then (vx, vy) is the motion vector for the macroblock. The motion estimation search range (M, N) is the maximum of (vx, vy), defining a window of data in the previous frame containing macroblock-sized matrices to be compared with the current macroblock. To be accurate, the search window must be large enough to represent motion. On the other hand, the search range must be limited for practical purpose due to high complexity involved in the computation of motion estimation.
FIG. 2 is a drawing illustrating the spatial relationship between the macroblock in the current frame and search window in the previous frame (prior art). If motion vector range is defined to be (M, N), then the search window size is (16+2M, 16+2N). For TV or movie sequences, the motion vector range needs to be large enough to accommodate various types of motion content. For video conferencing and videophone applications, the search range can be smaller. Therefore, the choice of search range is a combination of application and availability of deliverable technology. Given a motion estimation search range, the computational requirement is greatly affected by the exact method of covering the search window to obtain motion vectors. An exhaustive search technique, full motion estimation search, covers all the candidate blocks in the search window to find the best match. In this case, it requires (2M+1).times (2N+1) calculations of the cost function to obtain motion vector for each macroblock. This computation cost is prohibitive for software implementations.
Such a motion estimation method may repair the damaged data to increase the resolution of video blocks and overall video encoding speed as compared with the fixed step size, however, the motion estimation for all macroblocks requires the performance of much calculation, which also causes a prolonged operating time of a motion estimator and much power consumption.