Video codecs typically employ motion estimation (ME) to improve video compression performance by removing or reducing the temporal redundancy among the video frames. For encoding an input block, traditional ME is performed at an encoder module using a specified search window in at least one reference frame to find a motion vector that minimizes some difference metric such as the Sum of Absolute Differences (SAD) between an input source block and the reference block pointed to by the motion vector. The motion vector information may then be transmitted to a decoder module for motion compensation.
Generally, higher coding gains may be achieved during ME by employing larger search windows. However, using larger search windows increases the encoding complexity. Further, when employing hardware acceleration. ME search window size may be limited by on-chip memory size constraints. To address this problem, various advanced video codecs, such as advanced video coding (AVC), scalable video coding (SVC), VP8 and so forth, employ hierarchical motion estimation (HME) techniques to extend the search range while still using a relatively small search window. In typical HME, a full resolution video frame is successively downsampled by factors of two into multiple lower resolution downsampled image layers and motion vector predictors obtained via ME are propagated up through the image layers and refined to identify a motion vector for a block of the full resolution video frame or base layer.
In addition to scaling the image hierarchy by factors of two, typical HME schemes also employ a correspondingly scaled, fixed shape and size of source block when performing ME at the lower resolution downsample layers. For instance, for a 16×16 full resolution source size, a conventional HME scheme may employ a 8×8 source size at the first downsample layer, a 4×4 source size at the second downsample layer, and so forth. However, particularly with regard to low energy or flat image content, such approaches may generate suboptimal predictors by erroneously identifying local minima further away from details in the image content.