Temporal prediction is a commonly-used technique for the efficient coding of video information. Rather than code and transmit the actual luminance or chrominance values of an image in a video sequence, a prediction for this image is formed by using previously coded, reconstructed, and stored image data as reference, followed by coding the differences between the image being coded and its prediction. The better the prediction formation process is, the more efficient the video coding becomes, since the prediction error that is coded is decreased. Thus, conventional predictive video coding include a process of predicting data in a frame of a video sequence by using information from already-coded and reconstructed frames from that sequence called reference frames. This process routinely operates at the block-level in the images. In predictive block coding, rather than code the data of the block itself, a corresponding prediction block is subtracted from the block being coded and the resulting prediction error is coded instead.
Temporal prediction is a process that consists of two main components, a motion estimation process and a motion compensation process. Given a current block being coded in a current image from a video sequence, motion estimation attempts to find the best matching block in one or more reference frames of that video sequence. The outcome of the motion estimation process consists of motion vectors indicating the displacements of blocks in the current frame with respect to their best corresponding block matches (predictors) in one or more reference frames.
It is possible that the motion estimation process may perform better if reference frame data consisting of luminance or chrominance information is transformed in some way prior to motion estimation, to make it more closely correlated with the current frame data. This transformation is reflective of a model that describes the differences between the current and a reference data block, and it can be seen as a way to compensate for those differences. The potential benefit of such a compensating transformation working in conjunction with the motion estimation process ultimately consists of an increase in coding efficiency due to better motion estimation and the formation of a better prediction for the current block by the motion compensation process.
A motion compensation process takes the motion vectors produced by the motion estimation process and compensates for motion existing between the current frame being coded and a reference frame. The outcome of this conventional motion compensation consists of one or more prediction blocks taken from one or more reference frames and corresponding to the current block being coded. To form a good prediction, the reference image data may be used as is, or it may be transformed according to a particular model prior to forming the prediction. Any transformation that may have been determined as suitable to be applied to a reference block in connection with the motion estimation process executed for the current block is applied to the corresponding motion compensated block prior to computing the prediction error. A frame-level transformation of reference image data for prediction is generally insufficient for determining a sufficiently improved prediction. Block-level transformations of reference data are capable of producing better block-wise predictions for blocks in the image being coded, at the cost of higher overhead.
Using this joint motion and transformation-compensated prediction process, a prediction for the current block is formed. A better block prediction results in a small prediction error that needs to be coded in the video coder, which results in coding gains. However, conventional methods using a joint transformation and motion compensation of data in reference frames to generate a prediction for the frame being coded are limited in the amount of rate-distortion coding gains they generate because of their insufficient use of the prediction data available in reference frames. Also, conventional methods may further accentuate the blocking artifacts associated with the use of independent block-level processing in conventional video coding.
The determination of parameters of compensating transformations that are applied to reference image data in order to improve the quality of the prediction signal in video coding has been presented in K. Kamikura, et all, “Brightness-Variation Compensation Method and Coding/Decoding Apparatus for Moving Pictures,” NTT(JP), U.S. Pat. No. 6,456,658, issued September 2002; S. J. Golin, et al., “Methods and Apparatus for Improving Motion Analysis of Fades,” Intel Corporation, U.S. Pat. No. 5,544,239, issued August 1996; J. M. Boyce, “Adaptive Weighting of Reference Pictures in Video Encoding,” U.S. Patent Application Publication No. US2004/0008786, filed January 2004; N. M. M. Rodrigues, et al., “Hierarchical Motion Compensation with Spatial Luminance Transformations,” 2001; K. Kamikura, et al., “Global Brightness-Variation Compensation for Video Coding,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 8, No. 8, December 1998; and S. H. Kim, et al., “Fast Local Motion Compensation Algorithm for Video Sequences with Brightness Variations,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, No. 4, April 2003. The transformation can be determined for, and applied to data in a reference frame at frame level in which case the motion estimation process operates with respect to the transformed reference frame. Alternatively, block-level transformations can be determined during the motion estimation process. These transformations are applied block-wise to form the prediction blocks used in the prediction error computation in conventional video coders. Based on the block-wise transformation parameters determined in a first stage, only a subset of parameters are actually retained for use for a given frame, for example, including the most often used block-level models in the frame. The transformation models that were used in related art include single-parameter (offsetting, or scaling) models, 2-parameter linear models (scaling and offsetting), and 6-parameter affine models. In another approach, a hierarchical compensation is applied by combing initial frame-level compensation with a subsequent block-level refinement of the transformation.
The determination of a single, or a small number of transformation models to be used for compensating a frame in predictive video encoding introduces a limitation in terms of the coding efficiency gain. This is caused by a limited capacity to describe the differences that exist between a current frame being encoded and a frame used for reference. The approaches that determine and use block-wise transformation models somewhat alleviate this limitation; however, they are still restricted in increasing the quality of the prediction and thus, the objective coding efficiency. Additionally, because they operate on individual blocks independently, these approaches preserve or even accentuate the blocking artifacts which are detrimental to the subjective quality of decoded images.