Hybrid video compression consists of encoding an anchor video frame and then predictively encoding a set of predicted frames. Predictive encoding uses motion compensated prediction with respect to previously decoded frames in order to obtain a prediction error frame followed by the encoding of this prediction error frame. Anchor frames and prediction errors are encoded using transform coders.
FIG. 1 is a block diagram of a video encoder. Referring to FIG. 1, a motion compensated (MC) prediction module generates a motion compensated prediction from a previously decoded frame. A first adder subtracts the motion compensated prediction from a current frame to obtain a residual frame. A transform coder converts a residual frame to a coded differential, for example by using a combination of a transform, a quantizer, and an entropy encoder. During decoding, a transform decoder converts the coded differential to a reconstructed residual frame, for example by using a combination of an entropy decoder, an inverse quantizer, and an inverse transform. A second adder adds the reconstructed residual frame to the motion compensated prediction to obtain a reconstructed frame. A delay element “Z−1” stores the reconstructed frame for future reference by the MC prediction module.
There are a number of drawbacks of related art solutions. For example, some prior solutions are limited to taking advantage of very specific types of temporal dependencies among video frames. That is, the generic motion compensated prediction operation is limited to forming predictors for the current frame by using blocks from previously decoded frames directly, or by using low-pass filter based interpolations of these blocks. Once motion estimation is done and candidate blocks in the previously decoded frame(s) are found, it is assumed that these blocks or their various low-pass filtered forms are the best predictors for the blocks in the predicted frame. Many temporal variations, such as temporally independent variations on otherwise temporally correlated, frequency rich blocks, are not accounted for by related art solutions. For example, low pass filtered versions of blocks undergoing such variations can remove relevant high frequency signal components from the prediction and actually hurt performance. These unaccounted variations cause serious performance penalties as they produce motion compensated differentials that are very difficult to code with the transform coders employed in hybrid video coders. Some specific problematic temporal variations such as specific types of brightness variations have been considered by researchers. However, these solutions are not valid beyond the specific problems for which they are designed. Hence, they do not provide general and robust solutions. Also, some researchers have also devised frame adaptive motion interpolation filters but these too are limited to very specific temporal evolution models. Furthermore, because one can only have a limited number of filtering possibilities, the effectiveness of such designs is very limited over video sequences that show scenes rich with spatial frequencies.
Related art typically proceeds with a piecewise smooth frame model under uniform translational motion assumptions and runs into problems whenever actual coded frames deviate from these assumptions.