1. Field of the Invention
The current invention relates to video compression and in particular to a motion estimation search method for a video coding.
2. Background Information
A video is a sequence of still images, called frames, representing a scene in motion. Often the only difference between successive frames in a video results from an object in the frame moving or the camera moving. Most of the image information in a frame is the same in the previous frame, but is moved to a different location within the frame. This characteristic can be used to compresses the amount of data needed to represent a video clip. Full image information is only stored for certain reference frames within the video clip. The reference frames are divided into a plurality of non-overlapping image blocks. The only information stored for non-reference frames are motion vectors which described the movement of image blocks relative to a reference frame. Say, for example, every fourth frame in a video clip is a reference frame then the amount of image information stored for the video clip could theoretically be reduced by 75%. This technique is widely adopted by video compression standards such as MPEG-1, MPEG-2, MPEG-4, H.261, H.263 and H.264/AVC.
Motion estimation is used to find the motion vectors that describe movement of image blocks in frames. In motion estimation a video frame is divided into non-overlapping block of equal size and the best-matched block from reference frames to that block in the current frame within a predefined search window is determined. A search method is used to move through the search window evaluating possible candidate blocks within the search window. Evaluation of best-match is normally performed by minimizing a block distortion measure (BDM) between the blocks, using well know techniques such as Mean Squared Error (MSE), Sum of Squared Errors (SSE), Mean Absolute Difference (MAD) or the Sum of Absolute Differences (SAD). SAD, for example, takes the absolute value of the difference between corresponding pixels in each block and sums the differences to create a simple metric of block similarity. SAD takes account of every pixel in the blocks and because of its simplicity it is extremely fast to calculate.
The most straightforward search method is referred to as the full search (FS), which exhaustively evaluates all possible candidate blocks within the search window. It has been estimated that the computation of FS could consume up to 70% of the total computation of the video encoding process. To overcome this problem a number of fixed search patterns have been proposed to speed up motion estimation. Examples of well-known search patterns include the one-at-a-time search (OTS), the three-step search (3SS) [2], the new three-step search (N3SS) [3], the four-step search (4SS) [4], the block-based gradient descent search (BBGDS) [5], the diamond search (DS) [6], and the hexagon-based search (HEXBS) [7], the cross-diamond search (CDS) [8] and the cross-diamond-hexagon search (CDHS) [9]. The 3SS, N3SS, 4SS and BBGDS all employ square search point patterns of different sizes to search for the best-matched block. The DS, HEXBS, CDS and CDHS employ one or more search point patterns from cross-shaped, diamond-shaped and hexagon-shaped to fit the center-biased characteristic of the motion vector distribution for achieving fewer search points and better prediction accuracy as compared with search methods using square-shaped pattern. These search methods could, however, be easily trapped by local minima as they primarily rely on the unimodal error surface assumption, which means matching error monotonically decreased towards global minimum. In most real-world sequences, local minimum points could spread over the search window especially for the sequences with complex motion contents. Thus, the performance of these fixed search patterns depends very much on the motion content of a video sequence. The BBGDS is suitable for slow motion blocks estimation while the 3SS is relatively more suitable for the estimation of fast motion blocks. The 4SS, DS, and HEXB could achieve better prediction accuracy for moderate motion blocks. The advanced methods of CDS, and CDHS could provide good prediction accuracy from slow to moderate motion blocks. Unfortuately, the real video sequences usually consist of wide-range motion contents and these fixed search patterns cannot provide satisfactory motion estimation results in these sequences. Search pattern switching has been proposed to solve this problem by adaptively using different search patterns among the 3SS, 4SS, DS and BBGDS for achieving higher prediction accuracy. However, the performance of these methods largely depends on the accuracy of the motion content estimators and some of these estimators are also quite complex in practical implementation.
Besides the initial search point pattern, it is also found that the initial minimum search point is very important. A hybrid unsymmetrical-cross multi-hexagon-grid search (UMHexagonS) [13] that takes advantage of many search patterns has been proposed in H.264/AVC [14] and adopted in the JM reference software [15]. In UMHexagonS, an unsymmetrical-cross search and an uneven multi-hexagon-grid are first performed over a wide range of search windows to determine a very good initial minimum point for leading a search path in the following steps using the extended hexagon-based search. UMHexagonS greatly outperforms FS in computation with up to 90% computational reduction while still maintaining good rate-distortion performance. However, the uneven multi-hexagon-grid search step is very computation extensive compared to other search steps in the UMHexagonS. A multi-path search (MPS) method [16] has also been proposed with use of more than one path to avoid using a wrong search path misled by the initial minimum search point. Basically, MPS is a multi-path BBGDS with multiple descending gradient paths. For each of the candidate paths, the compact square-shaped pattern of BBGDS is used. Although MPS could provide robust motion estimation accuracy its computational requirement is quite significantly increased especially for sequences with complex motions.
By way of background a brief explanation of OTS, BBGD and MPD will now be given.
One-at-a-Time Search (OTS)
The strategy of OTS is to keep searching along a particular search direction until the minimum point along that direction is found. If, for example, the current minimum BDM point is at the position (0, 1) and the upper-direction OTS is performed, then the point immediately above it, i.e. point (0, 2), will be searched. If point (0, 2) has lower distortion than point (0, 1), point (0, 2) will be set as the current minimum point. Point (0, 3), which is above point (0, 2), will then be searched. The search continues until the minimum point is closeted between two higher values, or until the search window boundary is reached. As OTS search follows the descending gradient path in a particular direction, it can be considered as a 1-D gradient descent search in that direction. This is an efficient searching strategy because it does not waste effort in probing into unknown terrain of the error surface. Moreover, it is also easily implemented in hardware and data access is very efficient because a search point is either horizontally or vertically adjacent to the previous search point.
The first OTS based block motion estimation method was proposed in 1985 by Srinivasan et al. [1] and employs the OTS strategy in the horizontal and then vertical direction. An example of OTS search path is shown in FIG. 1. The search window center and its two horizontally adjacent points, i.e. points at (0, 0), (−1, 0), and (1, 0), are evaluated first. If point (−1, 0), i.e. the left adjacent point of the search window center, has the lowest BDM, OTS is performed along the left direction from the search window center. If point (1, 0), i.e. the right adjacent point of the search window center, has the lowest BDM, OTS is performed along the right direction. OTS stops when the minimum BDM point is closeted between two points with higher BDM. This point is regarded as the lowest distortion point in the horizontal direction of the search window center, which is noted as (s, 0). The OTS in vertical direction is then performed using pint (s, 0) as the center. The BDM of points (s, 1) and (s, −1) are calculated. If point (s, 1) has lower distortion, OTS is performed in the upper direction from (s, 1). If point (s, −1) has lower distortion, OTS is performed along the lower direction. When the minimum BDM point is closeted between two points with higher distortions, the motion vector (MV) pointing to that point is returned. OTS method performs 1-D gradient descent search on the error surface twice. Although it uses fewer search points compared with other fast block motion estimation methods, its prediction quality is low. This is because 1-D gradient descent search is insufficient to provide a correct estimation of the global minimum position.
Block-Based Gradient Descent Search (BBGDS)
BBGDS [5] was proposed by Liu et al in 1996. It is a 2-D gradient descent search. An example of BBGDS search path is shown in FIG. 2. The BDM of the search window center point and its eight adjacent points are determined. If a lower distortion point is found among the eight points, that point will be the next search center. The BDM of any new adjacent points around the new center are determined and the center moves to the point with the lowest distortion. The procedure is repeated until a center point is enclosed by eight adjacent points all with higher BDMs. The motion vector pointing to this final center point is returned. The eight adjacent points which BBGDS searches correspond to the eight directions, i.e. the upper, lower, left, right, upper-left, upper-right, lower-left, and lower-right directions of the search center. They cover all the possible directions from the search center. In other words, BBGDS performs a small-scale 2-D gradient descent search and then one-at-a-time moves towards the global minimum following a descending gradient path. BBGDS has a much better prediction quality in terms of PSNR than OTS method. Although BBGDS performs better than OTS, it is also easily trapped in a local minimum near the initial search window center, because it selects only one among the eight search points adjacent to the currently lowest distortion point as the next search center of the next 2-D gradient descent search. When there is more than one descending gradient path, BBGDS will select only one path and may therefore be distracted to a local minimum instead of finding the global minimum.
Multi-Path Search (MPS)
The MPS method was proposed by Goel et al in 2006 [16]. Basically, MPS is a BBGDS that searched multiple descending gradient paths. For each of the candidate paths, the compact square-shaped search point pattern of BBGDS is used. The method converges when there is no new descending gradient path found. The MPS method is illustrated in FIG. 3 and comprises the following steps.
Step 1—Calculate the BDM of the search window center and set its distortion value as MIN.
Step 2—Calculate the BDM of the eight adjacent points of the search window center. Mark the search points with distortion value lower than MIN as ANCHOR point. If no ANCHOR point is found, go to Step 5 with zero motion vector (ZMV); otherwise go to Step 3.
Step 3—Set MIN to the lowest distortion value amongst the ANCHOR points and go to Step 4.
Step 4—Around each of the ANCHOR points, calculate BDMs for 3 or 5 neighboring points depend on the position of the ANCHOR point. Any search point with distortion lower than MIN is marked as ANCHOR point. If no new ANCHOR point is found, go to Step 5; otherwise go back to Step 3.
Step 5—The search is completed. Return with the final motion vector.
The problem with MPS is that it is not efficient because it uses many points to search all candidate descending gradient paths.