Increasing video resolution and frame rates, along with large number of searching and matching operations involved in motion estimation demand very high performance. While high performance can be achieved by increasing hardware throughput and higher clock frequency, it is important to identify and exploit parallelism present in the algorithm in order to efficiently utilize available hardware resources.
The Motion Estimation process involves searching operations which require accessing large amounts of reference picture data from memory. Memory bandwidth is an expensive resource which often limits the computational parallelism that can be built in hardware. Further, this large data traffic from the memory leads to large power dissipation.
Motion estimation finds a best match for each block in a current video frame among blocks from previously coded frame(s) (called as reference frames). Block size is typically 16×16 pixels.
A widely used metric to define the match is SAD (Sum Of Absolute Difference in all the pixel values of current block and a reference block).
The best match information is indicated by the motion vector: if the current position of a block is (16,16) then motion vector (4,1) means the best match lies at position (20,17) in the reference frame.
The motion vector can also be in fraction pixel precision: half pixel, quarter pixel etc.
Fractional pixels are calculated by interpolating neighboring integer position pixels.
A motion estimation algorithm would typically include these steps:
Stage 1: choosing best among a few predictor motion vectors;
Stage 2: search around winner of Stage 1;
Stage 3, 4: search around winner of Stage 2 and Stage 3 respectively;
Stage 5: sub-pixel search at interpolated positions.