Moving Pictures Experts Groups (MPEG) is an International Standards Organization (ISO) standard for compressing video data. Video compression is important in making video data files, such as full-length movies, more manageable for storage (e.g., in optical storage media), processing, and transmission. In general, MPEG compression is achieved by eliminating redundant and irrelevant information. Because video images typically consist of smooth regions of color across the screen, video information generally varies little in space and time. As such, a significant part of the video information in an image is predictable and therefore redundant. Hence, a first objective in MPEG compression is to remove the redundant information and leaving only the true or unpredictable information. On the other hand, irrelevant video image information is information that cannot be seen by the human eye under certain reasonable viewing conditions. For example, the human eye is less perceptive to noise at high spatial frequencies than noise at low spatial frequencies and less perceptive to loss of details immediately before and after a scene change. Accordingly, the second objective in MPEG compression is to remove irrelevant information. The combination of redundant information removal and irrelevant information removal allows for highly compressed video data files.
MPEG compression incorporates various well-known techniques to achieve the above objectives including: motion-compensated prediction/estimation, Discrete Cosine Transform (DCT), quantization, and Variable-Length Coding (VLC). In general, prediction/estimation is a process in which past information is used to predict/estimate current information. There is typically a difference/error between the past information used and the actual/current information. As part of the compression scheme, this difference (instead of the actual/current video information) is transmitted for use in reconstructing/decoding a compressed video frame by essentially adding it to existing past information that may be referred to as a reference frame. How well the decompression process performs depends largely on the estimate of this difference. When successive video frames involve moving objects, the estimate must also include motion compensation. This is done through the use of motion vectors which are the displacement measurements of objects between successive video frames. These motion vectors are then additionally transmitted as part of the compression scheme to be used in reconstructing/decoding the compressed video frame.
One of the motion-compensated estimation techniques that is most suitable for hardware implementation due to its consistency and simplicity is block matching. In block matching, motion is estimated on the basis of blocks and a motion vector is generated for each block under the assumption that all the pixels within a block have the same motion activity. In short, a block from a search area in the reference video frame (i.e., a frame that has been received and/or processed previously) is identified through a search based on a match selection criteria relative to a block from a present frame. Such selection criteria is typically designed to ensure a minimized estimation difference. The most effective search but also the most processing and computing intensive is a full exhaustive search in which every block within the search area is examined and corresponding computation made. If a search area is limited to ±16 pixels displacement in the X and Y directions, then the total number of matches need to be made is approximately (2*(16)+1)2=1089. The match selection criterion used for the full search may be the Sum of Absolute Difference (SAD) (other match selection criteria include mean absolute difference, mean square difference, etc.). The SAD for a block A of size N×N inside the current frame compared to a block B of a distance (Δx, Δy) from A in the previous (or reference) frame is defined as:
      SAD    ⁡          (                        Δ          ⁢                                          ⁢          x                ,                  Δ          ⁢                                          ⁢          y                    )        =            ∑              x        ,                  y          =          1                    N        ⁢                  ⁢                                              I            A                    ⁡                      (                          x              ,              y                        )                          -                              I            B                    ⁡                      (                                          x                +                                  Δ                  ⁢                                                                          ⁢                  x                                            ,                              y                +                                  Δ                  ⁢                                                                          ⁢                  y                                                      )                                                    where I is the intensity level of a pixel.        
As shown in the SAD equation above, an addition and a subtraction operation are required for each pixel match. Hence, an approximate total of 2178 operations are required for each pixel match in a full search. Consequently, each macroblock (16×16 pixels) requires approximately 2178×256 or 557K operations which is processor intensive and therefore undesirable. The corresponding blocks from the current frame and the reference frame with the smallest SAD value are then selected as the best matched (i.e., having the least difference/error) for transmission as compression information. The associated motion (displacement) vector is computed from the selected pair of blocks for use as motion compensation information.
To reduce the processing needed while minimizing estimation difference/error, other search techniques have been developed. One such search techniques is the Diamond Search (DS). In a DS, which is based on the assumption that motion vectors are in general center biased, a search area (in a block in the reference frame) includes nine checking points as shown for example in FIG. 1A. The search begins with an examination of the center checking point of the search area. This portion of the search (e.g., involving nine checking points) is known as a Large Diamond Search (LDS). If the minimum SAD is found at the center, then four additional checking points representing a smaller diamond, as shown in FIG. 1B, are examined and the search stops. The portion of the search (e.g., involving 4 additional checking points) is known as a Small Diamond Search (SDS). Otherwise, depending on the position of the current minimum, additional checking points will have to be examined as shown for example in FIGS. 1C and 1D. By considering the present minimum as the new center of a new large diamond created, the process continues until the minimum which is a center point is found. At which point, a smaller diamond with four additional checking points are examined. A discussion of the DS is presented for example in “A New Predictive Diamond Search Algorithm for Block Based Motion Estimation” by A. Tourapis, G. Shen, M. Liou, O. Au, and I. Ahmad, Proc. Of SPIE Conf. On Visual Communication and Image Processing, Vol. 3, pp. 1365-1373, 20-23 Jun. 2000. This material is incorporated herein by reference in its entirety.
While DS typically requires only a fraction of the processing required in a full search, a DS is susceptible to getting caught up with local minimums which are not desirable because they may not represent the best matched macroblock. In other words, while the DS search relies on the inherent center-biased nature of motion vectors and allows for iteration searches to examine additional checkpoints, it has no mechanism to ensure that the minimum SAD can be quickly determined.
Moreover, the paper “A New Predictive Diamond Search Algorithm for Block Based Motion Estimation” cited above takes advantage of the high correlation of neighboring macroblocks (and therefore their associated motion vectors) and use as the starting point of the DS the median value of the motion vectors of three neighboring blocks: left macroblock LMB relative to the current macroblock that is designated the center of the search area, up macroblock UMB relative to the current macroblock that is designated the center of the search area, and up-right macroblock URMB relative to the current macroblock that is designated the center of the search area. In other words, instead of using the center pixel of the diamond as a starting point, the median of these three neighboring blocks is used. By taking into consideration the correlation between adjacent macroblocks (and their associated motion vectors), an improved prediction can be made thereby shortening the search.
Furthermore, in MPEG-4, in addition to a stage involving the aforementioned DS which is an integer pixel motion estimation, a half-pixel motion estimation stage may be implemented. As its name suggests, a half-pixel motion estimation involves a search of checking points that are at a half-distance between two checking points of the integer pixel motion estimation search. The half-distance can easily be interpolated from the checking points of the integer pixel motion estimation search. The half-pixel motion estimation stage is designed to improve the accuracy of motion vectors. See “A New Predictive Diamond Search Algorithm for Block Based Motion Estimation” by W. Zheng, I. Ahmad, and M. Liou, International Conf. on Information Systems, Analysis and Synthesis, SCI 2001/ISAS 2001 Vol. 13, 2001. It is desirable to reduce even further the processing required for motion-compensated estimation which translates to less power and smaller die size required.
Thus, a need exists for a more efficient, less complex, and effective motion-compensated estimation technique that can be easily implemented in hardware.