Video compression encoders and/or decoders gain much of their compression efficiency by forming a reference picture prediction of a picture to be encoded, and only encoding the difference between the current picture and the prediction. The more closely correlated the prediction is to the current picture, the fewer the bits needed to compress that picture. This prediction can be generated by using either spatial or temporal samples within previously available pictures or blocks. Temporal prediction is essentially performed through the consideration of motion parameters that may be available within the bitstream and, optionally, weighting/offsetting parameters which are either explicitly encoded or implicitly derived from the bitstream. Weighting and offsetting parameters can be rather useful in the presence of certain transitions such as fades and cross-fades, and could lead to considerably improved performance compared to traditional motion compensated schemes.
Proper selection of weights can greatly impact the video compression efficiency of a system that considers weighted prediction. The International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/international Telecommunication Union, Telecommunication Sector (ITU-T) H.264 recommendation (hereinafter the “MPEG-4 AVC standard”) provides a weighted prediction tool with two modes, an explicit mode and an implicit mode. In the explicit mode, the encoder may select and properly assign the weights and offsets used in encoding and decoding.
The MPEG-4 AVC standard does not suggest or require any particular method for selecting these weights and offsets. On the other hand, for the implicit mode, weighting parameters are computed based on “temporal” distances between pictures. For determining such distances, each picture/slice is associated with a counter field referred to as the Picture Order Count (POC), which can also be used for display purposes. Implicit mode is only available for B slices, while a rather important differentiation between these two modes is that for B slices for explicit mode the same weights are applied for both single and bi prediction, while implicit weights are applied only for bi prediction.
Several methods have been proposed for weight estimation, which may consider statistical approaches like linear regression, estimating weighting parameters as the ratio between the average value of the pixels in the current picture divided by the average value of the pixels in the reference picture, histogram methods, and weighted parameter estimation in the presence of cross-fades using displaced differences. In any of the above methods, weights are refined by considering the current source picture and the motion predicted non-weighted reference picture. This process is repeated until it converges or satisfies an end criteria/criterion.
In the MPEG-4 AVC standard, multiple reference pictures can be used for inter-prediction, with a reference picture index coded to indicate which of the multiple reference pictures is used. In P slices, only single prediction is used, and the allowable reference pictures are managed in list 0. In B slices, two reference picture lists are considered, list 0 and list 1. In B slices, prediction can be performed using single prediction by considering either list 0 or list 1, or bi-prediction using both list 0 and list 1. When bi-prediction is used, the list 0 and the list 1 predictors are averaged together to form a final predictor. Unlike in previous standards, B pictures may be stored and used as reference pictures when coding other pictures.
The MPEG-4 AVC standard uses tree-structured hierarchical macroblock partitions. Inter-coded 16×16 pixel macroblocks can be broken down into macroblock partitions, of sizes 16×16, 16×8, 8×16, or 8×8. 8×8 macroblock partitions are also known as sub-macroblocks, and may also be broken into sub-macroblock partitions, of sizes 8×4, 4×8, and 4×4. For each macroblock partition, a reference picture index, prediction type (list 0, list 1, bipred), and a motion vector may be independently selected and coded. For each sub-macroblock partition, a motion vector may be independently selected and coded, but the reference picture index and prediction type of the sub-macroblock is used for all of the sub-macroblock partitions.
The MPEG-4 AVC standard does not use a temporal reference in the Video Coding Layer (VCL), but instead uses Picture Order Count (POC) to indicate relative distances between coded pictures. Several methods are provided for coding the picture order count of each slice, including coding of a delta_pic_order_cnt field in the slice header. POC is used for scaling of motion vectors in direct mode, and for weighting factor derivation in weighted prediction (WP) implicit mode.
Weighted prediction is supported in the Main and Extended profiles of the MPEG-4 AVC standard. Use of weighted prediction is indicated in the sequence parameter set for P and SP slices using the weighted_pred_flag field, and for B slices using the weighted_bipred_idc field. There are two WP modes, an explicit mode which is supported in P, SP, and B slices, and an implicit mode which is supported in B slices only.
In WP, the weighting factor used is based on the reference picture index (or indices in the case of bi-prediction) for the current macroblock or macroblock partition. The reference picture indices are either coded in the bitstream or may be derived, e.g., for skipped or direct mode macroblocks. In explicit mode, these parameters are coded in the slice header. In implicit mode, these parameters are derived. The weighting factors and offset parameter values are constrained to allow for 16 bit arithmetic operations in the inter prediction process.
Explicit mode is indicated by weighted_pred_flag equal to 1 in P or SP slices, or by weighted_bipred_idc equal to 1 in B slices. In explicit mode, the WP parameters are coded in the slice header. A multiplicative weighting factor and an additive offset for each color component may be coded for each of the allowable reference pictures in list 0 for P slices and B slices. The number of allowable reference pictures in list 0 is indicated by num_ref_idx_l0_active_minus1, and for list 1 for B slices is indicated by num_ref_idx_l1_active_minus1.
The dynamic range and precision of the weighting factors can be adjusted using the luma_log 2_weight_denom and chroma_log 2_weight_denom fields, which are the binary logarithm of the denominator of the luma and chroma weighting factors, respectively. Higher values of the log weight denominator allow more fine-grained weighting factors but require additional bits for coding the weighting factors and limit the range of the effective scaling. For each allowable reference picture index in list 0, and for B slices also in list 1, flags are coded to indicate whether or not weighting parameters are present in the slice header for that reference picture index, separately for the luma and chroma components. If the weighting parameters are not present in the slice header for a given reference picture index and color component, a default weighting factor equivalent to a scaling factor of 1 and a zero offset are used. The multiplicative weighting factors are coded as luma_weight_l0, luma_weight_l1, chroma_weight_l0, and chroma_weight_l1. The additive offsets are coded as luma_offset_l0, luma_offset_l1, chroma_offset_l0, and chroma_offset_l1.
For fades that are uniformly applied across the entire picture, a single weighting factor and offset are sufficient to efficiently code all macroblocks in a picture that are predicted from the same reference picture. However, for fades that are non-uniformly applied, e.g., for lighting changes or camera flashes, more than one reference picture index can be associated with a particular reference picture store by using memory management control operation (MMCO) commands and/or reference list picture reordering (RPLR). This allows different macroblocks in the same picture to use different weighting factors even when predicted from the same reference picture store.
The same weighting parameters that are used for single prediction are used in combination for bi-prediction. The final inter prediction is formed for the pixels of each macroblock or macroblock partition, based on the prediction type used. For single prediction from list 0, SampleP, which denotes the weighted predictor, is calculated as follows:SampleP=Clip1(((SampleP0·W0+2LWD−1)>>LWD)+O0),and for single prediction from list 1,SampleP=Clip1(((SampleP1·W1+2LWD−1)>>LWD)+O1),and for bi-prediction,SampleP=Clip1(((SampleP0·W0+SampleP1·W1+2LWD)>>(LWD+1))+(O0+O1+1)>>1)where Clip1( ) is an operator that clips to the range [0, 255], W0 and O0 are the list 0 reference picture weighting factor and offset, and W1 and O1 are the list 1 reference picture weighting factor and offset, and LWD is the log weight denominator rounding factor. SampleP0 and SampleP1 are the list 0 and list 1 initial predictors.
The determination of appropriate explicit WP parameters in an encoder is outside the scope of the MPEG-4 AVC standard.
The Joint Video Team (JVT) JM reference software includes a method of selecting weights and always assigns a value of zero to the offsets. In the JM software method, while coding a picture, the mean values, Mi, of the Y, U, and V color components of all pixels in the current picture are calculated, where i is the color component index. In addition, the mean values, MRij, of the Y, U, and V components of each pixel in each of the allowable reference pictures are calculated, where j is the reference picture index. An estimated multiplicative weighting factor, Wij, for each color component of each reference picture is computed as the ratio of the mean of the current picture to the mean of the reference picture, scaled by a left shift of the log weight denominator, as follows:Wij=(int)(Mi*(1<<LWD)/MRij)+0.5)
After the weighting factor is determined, a scaling of the reference picture by the weighting factor is performed, and the scaled reference picture is stored. The scaled reference picture is rounded to 8-bit precision, so that it may be used in the motion estimation and mode decision processes, which use 8-bit pixel operations.
If implicit WP is used, as was previously described, then weighting factors are not explicitly transmitted in the slice header, but instead are derived based on relative distances between the current picture and the reference pictures. Implicit mode is used only for bi-predictively coded macroblocks and macroblock partitions in B slices, including those using direct mode. The same formula for bi-prediction is used, except that the offset values O0 and O1 are equal to zero, and the weighting factors W0 and W1 are derived using the formulas below.X=(16384+(TDD>>1))/TDD Z=clip3(−1024,1023,(TDB·X+32)>>6)W1=Z>>2W0=64−W1 This is a division-free, 16-bit safe operation implementation of the following:W1=(64*TDD)/TDB,where TDB is temporal difference between the list 1 reference picture and the list 0 reference picture, clipped to the range [−128, 127], and TDB is difference of the current picture and the list 0 reference picture, clipped to the range [−128, 127]. In this case, since single prediction uses the original references, no additional picture needs to be stored for motion estimation.
Several other methods for estimating the explicit WP parameters were previously proposed, such as methods that consider statistical approaches like linear regression, histogram methods, weighted parameter estimation in the presence of cross-fades using displaced differences, and so forth. Schemes to take motion into consideration have also been proposed. For example, an iterative approach was proposed in which a set of preliminary weights is first computed and then motion vectors are estimated based on the current source picture and the weighted reference picture. Finally, weights are refined by considering the current source picture and the motion predicted non-weighted reference picture, with any of the above methods. This process is repeated until it converges or satisfies a stopping criterion.
Unfortunately, all of the above prior art methods primarily aim to find the best explicit weighting method weights, and never consider which weighting method (implicit versus explicit) should be used, if at all. This can partly be resolved through the consideration of various well known transition detection techniques. Such methods consider various correlation metrics to characterize transitions within a sequence, which could be useful also in determining whether weighting prediction should be used or not. Nevertheless, considering also the presence of two different WP methods, it is also desirable to be able to efficiently select between the two modes, since, potentially either one could provide different benefits.
For simplicity, we write weighted prediction for list 0 prediction asSampleP=SampleP0·w0+o0,we write weighted prediction for list 1 prediction asSampleP=SampleP0·w0+O1,and for bi-prediction asSampleP=(SampleP0·w0+SampleP1·w1+o0+o1)/2,where wi is weighting factor and oi is weighting offset.