Most video coding algorithms, including the H.264/AVC video coding standard, use temporal redundancy of images in consequent frames to reduce the size of a coded bit stream. A reference frame is a previously coded frame and the target frame is a frame currently being coded. Only the difference (i.e. residual) between the reference frame and a target frame is coded. Since the reference frame is often very similar to the target frame, great bandwidth savings may thus be achieved.
A video scene often contains moving objects. In order to minimize the residual between reference images and target images, a motion estimation (ME) process is used in order to find a better match between reference images and target images. This process is typically done at block granularity, yielding a motion vector (MV) for every block of the target image, that describes movement between the target image and reference image. These motion vectors are also coded into the bit stream.
Motion compensation (MC) uses the motion vectors described above to create an improved reference image, block by block, by taking parts from the original reference image and creating a motion compensated reference image. In cases where the motion vectors are in sub-pixel resolution, pixels may be interpolated. The improved reference image will then yield a smaller residual when subtracted from the target image. Thus, by compensating for image differences due to motion, further bandwidth savings may be realized.
Motion estimation algorithms often try to find a motion vector that minimizes sum of absolute differences (SAD) or sum of square differences (SSD) between a target block and a reference block. Such algorithms are however sensitive to global illumination changes, which may occur for example with fade-in and fade-out video effects, changes of lighting conditions, etc.
The H.264/AVC video coding standard offers a weighted prediction (WP) tool that enables the encoder to scale or offset the reference frame, in order to make it more similar to the target frame, and thus reduce the residual. This process involves multiplying reference image pixels' values by a weighting coefficient (a multiplicative term) and adding an offset (an additive term). Several algorithms exist that find linear prediction coefficients by minimizing a global distance measure between target image levels and linearly transformed reference image levels. Usually L1 (least absolute deviation) or L2 (least squares) norms are used as the distance measure.
Often, some pixels in the target picture or reference picture will exceed their maximum or minimum possible values; this is called saturation. For example, since an imaging device has a limited dynamic range, overly lit pixels may hit a maximum luma value that can be assigned, and overly dark pixels may hit a minimum luma value that can be assigned. Existing weighted prediction coefficients estimation techniques do not consider the saturation phenomenon, and therefore yield a suboptimal estimation of coefficients. Suboptimal coefficients estimation eventually results in worse motion estimation accuracy and larger video bit stream size.
The process of weighted prediction coefficients estimation, i.e. estimating the weight and offset of the reference frame, in conditions where saturation occurs, therefore needs to be addressed. Embodiments of the present invention enable improved weighted prediction coefficients estimation by accounting for the saturation phenomenon.