Motion Compensated (Inter-frame) prediction is a vital component of video coding schemes and standards such as MPEG-1/2, and H.264 (or JVT or MPEG AVC) due to the considerable benefit it could provide in terms of coding efficiency. However, until recently, in most of these standards, motion compensation was performed by primarily, if not only, considering temporal displacement. More specifically, standards such as MPEG-1, 2, and MPEG-4 consider two different picture types for inter-frame prediction, predictive (P) and bi-directionally predicted (B) pictures. Such pictures are partitioned into a set of non-overlapping blocks, each one associated with a set of motion parameters. For P pictures, motion parameters were limited to a horizontal and a vertical displacement of the current block I(x,y) versus a references, which indicated the position of a second block I(x′,y′) which would be used for predicting I(x,y). FIG. 1A illustrates an example of motion compensation in P pictures. For B pictures, however, this could alternatively or additionally involve the consideration of a horizontal and vertical displacement from a second reference. FIG. 1B illustrates an example of motion compensation in B pictures. In the later case, essentially prediction is formed by averaging both predictions from these two references, using equal weighting factors of (½, ½).
Unfortunately, the above model was not sufficient for video scenes containing temporal brightness variations, such as fades, crossfades, flashes, camera-iris adjustments etc. Thus, the simple translational only inter-frame technique cannot sufficiently improve coding efficiency when temporal brightness variation is involved. For this purpose, several methods were previously proposed that also consider luminance variations during motion compensation. See Kamikura, et al., “Global Brightness-Variation Compensation for Video Coding,” IEEE Trans on CSVT, vol. 8, pp. 988-1000, December 1998; Rodrigues, et al., “Hierarchical Motion Compensation with Spatial and Luminance Transformations”, ICIP, pp. 518-521, October 2001; Rodrigues, et al., “Low-Bit Rate Video Coding with Spatial and Luminance Transformations”, ConfTele2001, Figueira da Foz, 22-24 Apr., 2001; and J. Boyce, “Weighted Prediction in the H.264/MPEG4 AVC Video Coding Standard”, ISCAS, pp. 789-792, May 2004.
More specifically, instead of considering only geometric transformations during motion compensation, the predicted signal is also scaled and/or adjusted using two new parameters w and o. The sample with brightness value I(x,y,t) at position (x,y) at time t, is essentially constructed as w*I(x+dx, y+dy, t′)+o, where dx and dy are the spatial displacement parameters (motion vectors). However, these new weighting parameters could also considerably increase the overhead bits required for representing motion information, therefore potentially reducing the benefit of such strategies. For this purpose, and by making the assumption that most luminance transformations happen globally, Kamikura, et al., in “Global Brightness-Variation Compensation for Video Coding,” IEEE Trans on CSVT, vol. 8, pp. 988-1000, December 1998, suggests the usage of only a single set of global parameters (w, o) for every frame. The use or not of these parameters is also signaled at the block level, thereby providing some additional benefit in the presence of local brightness variations.
A somewhat similar strategy is also employed in H.264. This standard, however, also supports several other features that could exploit brightness variations even further. In particular, the codec allows for the consideration of multiple references but also reordering, which enabled multiple weights to be associated with each reference, to better handle local brightness variations. Weights for bi-predicted partitions could also optionally be implicitly derived through the consideration of a temporal, or more precisely a picture order count distance associated with each picture. On the other hand, in Rodrigues, et al., “Hierarchical Motion Compensation with Spatial and Luminance Transformations”, ICIP, pp. 518-521, October 2001 and Rodrigues, et al., “Low-Bit Rate Video Coding with Spatial and Luminance Transformations”, ConfTele2001, Figueira da Foz, 22-24 Apr., 2001, weights are optionally transmitted, after quantization, using a hierarchical structure to somewhat constrain the bit overhead. However, this method appears to only consider weighted prediction within P frames, and no consideration of Bi-prediction is discussed.
With respect to these techniques, although weighting parameters are considered at the block level, which better handles local brightness variations, the method makes no special consideration of bi-predictive motion compensation, and on how these parameters should be efficiently coded. The focus of these papers is mainly on the potential benefits in terms of improved Mean Square Error of the prediction if luminance transformations are additionally used versus translational only prediction, and less on the coding of the parameters.
The ITU-H.264 (or JVT or ISO MPEG4 AVC) video compression standard has adopted certain weighted prediction tools that could take advantage of temporal brightness variations and could improve performance. The H.264 video coding standard is the first video compression standard to adopt weighted prediction (WP) for motion compensated prediction.
Motion compensated prediction may consider multiple reference pictures, with a reference picture index coded to indicate which of the multiple reference pictures is used. In P pictures (or P slices), only uni-prediction is used, and the allowable reference pictures are managed in list 0. In B pictures (or B slices), two separate reference picture lists are managed, list 0 and list 1. In B pictures (or B slices), uni-prediction using either list 0 or list 1, or bi-prediction using both list 0 and list 1 is allowed. When bi-prediction is used, the list 0 and the list 1 predictors are averaged together to form a final predictor. The H.264 weighted prediction tool allows arbitrary multiplicative weighting factors and additive offsets to be applied to reference picture predictions in both P and B pictures. Use of weighted prediction is indicated in the sequence parameter set for P and SP slices. There are two weighted prediction modes; explicit mode, which is supported in P, SP, and B slices, and implicit mode, which is supported in B slices only.
In the explicit mode, weighted prediction parameters are coded in the slice header. A multiplicative weighting factor and an additive offset for each color component may be coded for each of the allowable reference pictures in list 0 for P slices and list 0 and list 1 for B slices. The syntax, however, also allows different blocks in the same picture to make use of different weighting factors even when predicted from the same reference picture store. This can be made possible by using reordering commands to associate more than one reference picture index with a particular reference picture store.
The same weighting parameters that are used for single prediction are used in combination for bi-prediction. The final inter prediction is formed for the samples of each macroblock or macroblock partition, based on the prediction type used. For a single directional prediction from list 0,SampleP=Clip1(((SampleP0·W0+2LWD−1)>>LWD)+O0)  (1)and for a single directional prediction from list 1,SampleP=Clip1(((SampleP1·W1+2LWD−1)>>LWD)+O1)  (2)and for bi-prediction,
                                          SampleP            =                                          Clip                ⁢                                                                  ⁢                1                ⁢                                  (                                      (                                                                  (                                                                                                            SampleP                              0                                                        ·                                                          W                              0                                                                                +                                                                                    SampleP                              1                                                        ·                                                          W                              1                                                                                +                                                      2                            LWD                                                                          )                                            >>                                                                        (                                                                                                          ⁢                          LWD                                                +                        1                                                              )                                    )                                            +                              (                                                      O                    0                                    +                                      O                    1                                    +                  1                                )                                              >>          1                )                            (        3        )            where Clip1( ) is an operator that clips the sample value within the range [0, 1<<SampleBitDepth−1], with SampleBitDepth being the number of bits associated with the current sample, W0 and O0 are the weighting factor and offset associated with the current reference in list 0, W1 and O1 are the weighting factor and offset associated with the current reference in list 1, and LWD is the log weight denominator rounding factor, which essentially plays the role of a weighting factor quantizer. SampleP0 and SampleP1 are the list 0 and list 1 initial predictor samples, while SampleP is the final weight predicted sample.
In the implicit weighted prediction mode, weighting factors are not explicitly transmitted in the slice header, but instead are derived based on relative distances between the current picture and its reference pictures. This mode is used only for bi-predictively coded macroblocks and macroblock partitions in B slices, including those using direct mode. The same formula for bi-prediction as given in the preceding explicit mode section is used, except that the offset values O0 and O1 are equal to zero, and the weighting factors W0 and W1 are derived using the formulas:X=(16384+(TDD>>1))/TDD Z=clip3(−1024, 1023,(TDB·X+32)>>6)W1=Z>>2,W0=64−W1  (4)This is a division-free, 16-bit safe operation implementation ofW1=(64*TDD)/TDB where TDD and TDB are the clipped within the range [−128, 127] temporal distances between the list 1 reference and the current picture versus the list 0 reference picture respectively.
Although the H.264 video coding standard enables the use of multiple weights for motion compensation, such use is considerably limited by the fact that the standard only allows signaling of up to 16 possible references for each list at the slice level. This implies that only a limited number of weighting factors could be considered. Even if this restriction did not apply, it could be potentially inefficient if not difficult to signal all possible weighting parameters that may be necessary for encoding a picture. Note that in H.264, weighting parameters for each reference are independently coded without the consideration of a prediction mechanism, while the additional overhead for signaling the reference indices may also be significant. Therefore H.264 and the method disclosed in Kamikura, et al., “Global Brightness-Variation Compensation for Video Coding,” IEEE Trans on CSVT, vol. 8, pp. 988-1000, December 1998 are more appropriate for global than local brightness variations. In other words, these tools tend to work well for global brightness variations, but due to certain limitations little gain can be achieved in the presence of significant local brightness variation.