In hierarchical motion estimation (ME), video frames undergo a process of resolution reduction. Several layers are constructed, each containing the same frame as the previous layer, but having both dimensions reduced by a certain scaling factor. The scaling factor is usually a factor of 2, a small power of 2, or the power of another integer. The result is a pyramid, where the lowest layer has the original frame and each layer above has the same frame at increasingly reduced resolutions.
After pyramids have been created for both a target frame being motion estimated and a reference frame, a full search takes place in the highest layer for target regions. Hence, a search window is defined in the reduced reference frame and all candidate motion vectors are evaluated over the search window for each reduced block in the target frame. The motion vectors are found by comparing all of the blocks in the search window to the reduced blocks in the target frame using a comparison metric (i.e., a sum of absolute difference (SAD) or a sum of squared differences (SSD)).
The resulting motion vectors are then propagated to the next layer down by multiplying each of the motion vector coordinates with the scaling factor. The scaled vectors become the center of new searches in the next layer. Because of the resolution increase at the next layer, each of the scaled motion vectors actually becomes the search center of several target regions in the next layer, as dictated by the scaling factor and relative region sizes between each layer. For example, at a scaling factor of 2×2 and equal region sizes, a single motion vector for a particular region in the higher layer will seed four regions in the next layer below. Once seeded, the motion vectors in the next layer are motion refined against the reference frame. The process of propagating the motion vectors and refining the motion vectors is repeated until results for the bottom layer are reached, where the process ends.
A common problem that arises in hierarchical searches is that an erroneous match in the higher layer usually propagates to the lower layers, often resulting in an erroneous motion vector. Erroneous motion vectors are not rare cases since the lower resolution of the higher layers often leads to ambiguity in the motion estimations. A number of candidate blocks will have similar metric values (i.e., SADS) and while the initial motion vector selected may be slightly better in the higher layer, the initial motion vector may not be better in the bottom layer.
Another common problem arises in hierarchical searches when a moving object boundary falls in the middle of the target region at the higher layers. In such a case, the motion estimation can lock to either side of the object, thus producing a wrong motion vector predictor for the other side. At the next layer down, the motion vector predictor will be applied to the entire target region even through part of the target region contains the moving object and another part contains a stationary background. If the next layer search range is not large enough to compensate for the situation, (and many motion vector field smoothing techniques, such as rate-distortion optimization, can prevent the motion estimation from fixing the situation even if the search range is sufficient) the same motion vector will propagate into both regions and nothing is available to fix the motion vector of the other side. As such, dragging artifacts are commonly produced in the video.
An existing solution to the hierarchical search problems is to perform a conventional search. However, conventional searches use very large search ranges to capture high motion and/or high temporal separation that are compounded at higher resolutions (i.e., high definition frames). Otherwise, the conventional searches suffer degraded compression efficiency for large frames.
Another existing solution to the hierarchical search problems is to propagate more than a single motion vector predictor from each target region down to the next layer. However, the increase in motion vector predictors results in more searches in the next layer, thereby increases computational complexity.