Video encoders apply motion-compensated prediction in order to reduce the amount of image data that must be encoded. This is done by exploiting temporal correlation between successive frames. For example, if a video shows an object moving against a stationary background, only the information representing the moving object needs to be encoded once the information representing the background has been obtained. Motion of the object between a reference frame and a frame currently being encoded is described by motion vectors.
Motion-compensated prediction or motion estimation (ME) includes finding, for each possible pixel block size of a current frame, the “best-possible” match among blocks within a previously encoded frame called a reference frame. Most encoders measure distortion induced by choosing a certain block as a predictor. The “best-possible” match is chosen by minimizing a distortion value subject to a bitrate budget. Since distortion tends to increase as bitrate increases, finding the “best-possible” match subject to a bitrate budget is referred to as rate-distortion (RD) optimization.
Highly accurate ME algorithms are prohibitively expensive in terms of computational complexity and memory bandwidth. The complexity of ME has even increased with the recent High Efficiency Video Encoding (HEVC) standard, which allows prediction block sizes from 4×8 pixels up to 64×64 pixels, whereas previous commonly used standards often used blocks of 8×8 pixels. Since searching for the best match for every possible block size involves redundant computations, practical implementations of software and/or hardware video encoders store distortion values of smaller blocks (e.g., 4×8, 8×8 and 16×16) to re-use them when evaluating RD costs of bigger blocks (e.g., 32×32 and 64×64).
Such merge-based strategies offer the advantage of providing accurate motion estimations at a low computational complexity and memory bandwidth costs. However, these advantages are obtained at the cost of high storage requirements, since distortion values need to be stored for every possible combination of a block size and motion vector within the search area.