Scalable Video Coding (SVC) is an extension of the H.264/AVC video coding standard. SVC allows multi-layered video streams to be encoded in a single bit stream composed of a base layer and optional additional enhancement layers that can improve resolution, frame rate, and quality. Inter-layer residual prediction is a key compression technique in SVC. As the motion vectors in the base layer and an enhancement layer tend to be similar, an up-sampled residual block from the base layer also tends to have similar residuals to the corresponding macroblock (MB) in the enhancement layer. Thus, base layer residuals may be used as predictors for enhancement layer residuals. An encoder may decide whether to use residual prediction or not by comparing the energy of the residuals with and without inter-layer prediction.
In regular motion estimation, an encoder seeks a motion vector that minimizes the distortion between the current MB and a given reference MB. When using Sum of Absolute Differences (SAD) as the distortion measurement, the differences between the current pixel c and each reference pixel p is give by |c−p|. This difference is summed over a given block size to obtain the block distortion. Assuming that pixel depth is 8-bit, both c and p are 8-bit values.
For residual prediction, the encoder seeks a motion vector that minimizes |c−p−r| over the search range, where r is the residual value from base layer. This expression may be rewritten as |(c−r)−p| because c and r are predetermined values for a given MB and only p varies over the search range. Typically, c and p have 8-bit values and r has a 9-bit value within the range of −255 to 255, and thus, (c−r) will be a 10-bit value within the range of −255 to 510.
The challenge for residual prediction is that typical motion estimation engines cannot be directly used because of the increased bit depth associated with (c−r). Using 10-bit instead of 8-bit values may incur significant cost due to increased datapath width and increased accuracy in arithmetic operations. One way to convert (c−r) to 8 bit values is clipping. However, clipping errors may impact motion estimation quality especially when c is very bright (close to 255) or very dark (close to 0) and/or when the magnitude of r is large.