In video coding systems, spatial and temporal redundancy is exploited using spatial and temporal prediction to reduce the information to be transmitted. The spatial and temporal prediction utilizes decoded pixels from the same picture and reference pictures respectively to form prediction for current pixels to be coded. In a conventional coding system, side information associated with spatial and temporal prediction may have to be transmitted, which will take up some bandwidth of the compressed video data. The transmission of motion vectors for temporal prediction may require a noticeable portion of the compressed video data. To further reduce the bitrate associated with motion vectors, a technique called Motion Vector Prediction (MVP) has been used in the field of video coding. The MVP technique exploits the statistic redundancy among neighbouring motion vectors spatially and temporally. In the rest of this document, MVP may sometimes denote “motion vector prediction” and sometimes denote “motion vector predictor” according to contexts.
In High-Efficiency Video Coding (HEVC) development, a technique named Advanced Motion Vector Prediction (AMVP) is used to derive a motion vector predictor for a current motion vector. The AMVP technique uses explicit predictor signalling to indicate the MVP selected from a MVP candidate set. In HEVC, the MVP candidate set of AMVP includes spatial MVPs as well as a temporal MVP. The temporal MVP is derived based on motion vectors from a respective area (i.e., a collocated block) of a collocated picture. FIG. 1 illustrates an example of TMVP derivation, where the motion vector 112 from a collocated block 114 in a collocated picture 110 is used as a temporal MVP candidate. The MVP is used as one of the AMVP candidates for predicting the current motion vector 122 of the current block 124 in the current picture 120. The collocated picture is a reference picture of the current picture. The collocated block is a block corresponding to the current block. Usually it is the block at the same relative position in the collocated picture as the current block in the current picture.
When TMVP is used to predict the current motion vector (MV), it should be scaled based on the time distance between pictures. FIG. 2 illustrates an example of TMVP scaling, where the motion vector 212 from a collocated block 214 in a collocated picture 210 is scaled before it is used as one of the TMVP candidates for predicting the current motion vector 222 of the current block 224 in the current picture 220. The scaling can be based on time distance as measured by picture order count (POC). In FIG. 2, the TMVP 212 points from the collocated picture to the reference picture 230. On the other hand, the current motion vector points from the current picture 220 to the reference picture 240. The TMVP has to be scaled before it is used as one of the AMVP candidates for predicting the current motion vector.
There is also a Merge mode used in various advanced video coding, where the motion vector information of a current block can share the motion information of a previously coded block. In this case, information regarding the merging candidate needs to be identified. However, there is no need to transmit the motion information for the current block. Accordingly, the Merge mode can achieve a higher degree of coding efficiency. The merging candidates can be derived in a similar fashion as the AMVP candidate.
In H.264/AVC, the collocated picture is a fixed reference picture in the reference list. In H.265/HEVC, the encoder can choose any reference picture in the reference list as the collocated picture for the current picture. The information related to the reference picture selection is signalled from the encoder to the decoder in the slice header.
It is desirable to explore techniques to improve the efficiency of temporal motion vector prediction.