Video coding and decoding using inter-picture prediction with motion compensation has been known for decades. Uncompressed digital video can consist of a series of pictures, each picture having a spatial dimension of, for example, 1920×1080 luminance samples and associated chrominance samples. The series of pictures can have a fixed or variable picture rate (informally also known as frame rate), of, for example 60 pictures per second or 60 Hz. Uncompressed video has significant bitrate requirements. For example, 1080p60 4:2:0 video at 8 bit per sample (1920×1080 luminance sample resolution at 60 Hz frame rate) requires close to 1.5 Gbit/s bandwidth. An hour of such video requires more than 600 GByte of storage space.
One purpose of video coding and decoding can be the reduction of redundancy in the input video signal, through compression. Compression can help reducing aforementioned bandwidth or storage space requirements, in some cases by two orders of magnitude or more. Both lossless and lossy compression, as well as a combination thereof can be employed. Lossless compression refers to techniques where an exact copy of the original signal can be reconstructed from the compressed original signal. When using lossy compression, the reconstructed signal may not be identical to the original signal, but the distortion between original and reconstructed signal is small enough to make the reconstructed signal useful for the intended application. In the case of video, lossy compression is widely employed. The amount of distortion tolerated depends on the application; for example, users of certain consumer streaming applications may tolerate higher distortion than users of television contribution applications. The compression ratio achievable can reflect that: higher allowable/tolerable distortion can yield higher compression ratios.
A video encoder and decoder can utilize techniques from several broad categories, including, for example, motion compensation, transform, quantization, and entropy coding, some of which will be introduced below.
Bi-Prediction can relate to techniques where a prediction unit (PU), such as a block of samples, can be predicted from two motion compensated blocks of samples of two or more reference pictures. Bi-prediction was first introduced into video coding standards in MPEG-1 (formally: ISO/TEC 11172-part 2) and has been included in other video coding technologies and standards such as MPEG-2 part 2, H.264 and H.265 as well.
During the reconstruction of a sample of a bi-predicted PU, motion compensated and interpolated input samples from each reference block can be multiplied by a weighting factor that can be different for each reference block, and such weighted sample values of the two reference blocks can be added to generate the sample under reconstruction. Such sample can be processed further by mechanisms such as loop filtering.
In MPEG-1 and MPEG-2, the weighting factors can be determined based on the relative temporal distance between the picture to which the PU under reconstruction belongs to, and the two reference pictures. This was possible because, in MPEG-1 and MPEG-2, one of the two reference I or P pictures was in the “past”, and the other in the “future” (in terms of presentation order) of the B-picture under reconstruction, and because in MPEG-1 and MPEG-2, there was a well-defined timing relationship established for any picture under reconstruction in relation to its reference pictures.
Starting with H.264, the reference picture selection concepts for bi-predicted pictures were relaxed such that the reference pictures only needed to be earlier in decoding order, but not in presentation order. Further, the notion of time was also relaxed in that neither H.264 nor H.265 require a constrained/fixed picture interval in the time domain. Therefore, a decoder cannot calculate weighting factors any more based on the timing information available in the bitstream. Instead, H.264 and H.265 include a “default” of 0.5 as the weighting factor for the reference samples of a bi-predicted picture. This default can be overwritten by a syntax available in the slice header known as pre_weight_table( ). The default of 0.5 or the information in the pred_weight_table applies to all bi-predicted PUs in a given slice.
Document JVET-00047, available from http://phenix.it-sudparis.eu/jvet/doc_end_user/documents/3_Geneva/wg11/JVET-C0047-v2.zip, includes a mechanism where the weighting factors of a bi-predicted PU can be signaled in the bitstream on a PU granularity. The authors of that document demonstrate a coding efficiency gain relative to the default 0.5 weighting. Seven different weight factors can be indicated using variable length codewords. The weighting factor can be determined by the encoder based, for example, on rate-distortion optimization considerations.