The International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 standard (hereinafter the “MPEG4/H.264 standard” or simply the “H.264 standard”) is the first international video coding standard to include a Weighted Prediction (WP) tool. The scalable video coding (SVC) standard, which is currently being developed as an amendment of the H.264 standard (and is thus also interchangeably referred to herein as the “H.264 standard”), also adopts weighted prediction. However, the H.264 standard does not specify the relationship of weights among the base layer and enhancement layer.
Weighted Prediction is supported in the Main, Extended, and High profiles of the H.264 standard. The use of WP is indicated in the sequence parameter set for P and SP slices using the weighted_pred_flag field, and for B slices using the weighting_bipred_idc field. There are two WP modes, an explicit mode and an implicit mode. The explicit mode is supported in P, SP, and B slices. The implicit mode is supported in only B slices.
weighted_pred_flag equal to 0 specifies that weighted prediction shall not be applied to P and SP slices. weighted_pred_flag equal to 1 specifies that weighted prediction shall be applied to P and SP slices.
weighted_bipred_idc equal to 0 specifies that the default weighted prediction shall be applied to B slices. weighted_bipred_idc equal to 1 specifies that explicit weighted prediction shall be applied to B slices. weighted_bipred_idc equal to 2 specifies that implicit weighted prediction shall be applied to B slices. The value of weighted_bipred_idc shall be in the range of 0 to 2, inclusive.
A single weighting factor and offset are associated with each reference index for each color component in each slice. In explicit mode, these WP parameters may be coded in the slice header. In implicit mode, these WP parameters are derived based only on the relative distance of the current picture and its reference pictures.
For each macroblock or macroblock partition, the weighting parameters applied are based on a reference picture index (or indices in the case of bi-prediction) of the current macroblock or macroblock partition. The reference picture indices are either coded in the bitstream or may be derived, e.g., for skipped or direct mode macroblocks. The use of the reference picture index to signal which weighting parameters to apply is bitrate efficient, as compared to requiring a weighting parameter index in the bitstream, since the reference picture index is already available based on the other required bitstream fields.
Many different methods of scalability have been widely studied and standardized, including SNR scalability, spatial scalability, temporal scalability, and fine grain scalability, in scalability profiles of the MPEG-2 and H.264 standards, or are currently being developed as an amendment of the H.264 standard.
For spatial, temporal and SNR scalability, a large degree of inter-layer prediction is incorporated. Intra and inter macroblocks can be predicted using the corresponding signals of previous layers. Moreover, the motion description of each layer can be used for a prediction of the motion description for following enhancement layers. These techniques fall into three categories: inter-layer intra texture prediction, inter-layer motion prediction and inter-layer residue prediction (via residual_prediction_flag).
In the Joint Scalable Video Model (JSVM), which is currently being developed as extension/amendment to the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 standard (hereinafter the “MPEG4/H.264 standard” or simply the “H.264 standard”), an enhancement layer macroblock can exploit inter-layer motion prediction using scaled base layer motion data, using either “BASE_LAYER_MODE” or “QPEL_REFINEMENT_MODE”, as in case of dyadic (two-layer) spatial scalability. In addition, in macroblock (or sub-macroblock) prediction mode, the predictor of a motion vector can choose from a base layer motion vector or an enhancement layer motion vector from a spatial neighbor, via motion_prediction_flag_Ix[ ]. motion_prediction_flag_Ix[ ] equal to 1 specifies that the (scaled) base layer motion vector are used as motion vector predictors. motion_prediction_flag_Ix[ ] equal to 0 specifies that enhancement layer motion vector from spatial neighbors are used as motion vector predictors.
In first and second prior art approaches relating to weighted prediction for scalable video coding, it was proposed to always inherit the base layer weights for the enhancement layer. This is efficient since the weights in the enhancement layer do not have to be transmitted when the same algorithm is used to calculate the weighing parameters in the base and enhancement layer. This inheritance is indicated in the first prior art approach by adding a flag (base_pred_weight_table_flag) to the slice header, and in the second prior art approach by the syntax and semantic changes of the weighted_pred_flag, weighted_bipred_idc, motion_prediction_flag_Ix[ ] and residue_prediction_flag. In the first prior art approach, when base_pred_weight_table_flag is equal to 1, the enhancement layer always inherits the base layer weights. The H.264 standard does not specify which set of weights should be used for the enhancement layer when inter-layer prediction modes (mentioned above) are used and base_pred_weight_table_flag is 0.