Hybrid video compression consists of encoding an anchor video frame and then predicatively encoding a set of predicted frames. Predictive encoding uses motion compensated prediction with respect to previously coded frames in order to obtain a prediction error frame followed by the encoding of this prediction error frame (i.e., the residual frame). Anchor frames and prediction errors are encoded using transform coders.
FIG. 1 is a block diagram of a video encoder. Referring to FIG. 1, a motion compensated (MC) prediction module generates a motion compensated prediction from a previously decoded frame. A first adder subtracts the motion compensated prediction from a current frame to obtain a residual frame. A transform coder converts a residual frame to a coded differential, for example by using a combination of a transform, a quantizer, and an entropy encoder. During decoding, a transform decoder converts the coded differential to a reconstructed residual frame, for example by using a combination of an entropy decoder, an inverse quantizer, and an inverse transform. A second adder adds the reconstructed residual frame to the motion compensated prediction to obtain a reconstructed frame. A delay element “Z−1” stores the reconstructed frame for future reference by the MC prediction module.
The generic motion compensated prediction operation is limited to forming predictors for the current frame by using blocks from previously coded frames directly, or by using low-pass filter based interpolations of these blocks. This process forms a good mechanism for exploiting temporal correlations.
There are well-known techniques that improve upon block based motion compensation by utilizing parametric motion models, overlapped motion compensation, various low-pass filter based interpolators, intensity compensation algorithms, etc. However, these solutions are restricted to very specific types of temporal evolutions in video frames and do not provide general solutions. See, for example,
1) Joint Video Team of ITU-T and ISO/IEC JTC 1, “Draft ITU T Recommendation and Final Draft International Standard of Joint Video Specification (ITU-T Rec. H.264|ISO/IEC 14496-10 AVC),” Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, JVT-G050, March 2003; and
2) MPEG4 Verification Model, VM 14.2, pp. 260-264, 1999.
There are a number of drawbacks in related art solutions. For example, prior solutions are limited to taking advantage of very specific types of temporal dependencies among video frames. Once motion estimation is done and candidate blocks in the anchor frame(s) are found, it is assumed that these blocks or their various low-pass filtered forms are the best predictors for the blocks in the predicted frame. Many temporal variations, such as small warps of blocks containing edges, temporally independent variations on otherwise temporally correlated, high frequency rich blocks, etc., are not accounted for by related art solutions. These unaccounted variations cause serious performance penalties as they produce motion compensated differentials that are very difficult to code with the transform coders employed in hybrid video coders.
Furthermore, some specific problematic temporal variations such as specific types of brightness variations have been incorporated into recent video compression standards. However, these solutions are not valid beyond the specific problems for which they are designed. Hence, they do not provide general and robust solutions.
Moreover, some related art typically proceeds with a piecewise smooth frame model under uniform translational motion assumptions and runs into problems whenever actual coded frames deviate from these assumptions.