The H.264/MPEG4-AVC video standard allows multiple different reference pictures for inter-prediction. The different reference pictures are potentially signaled down to 8×8 partitions which reference a picture to use for inter-prediction. The standard also allows the choice, in a flexible manner, of which reference pictures to use, and the order in which the reference pictures, are available for any given slice (i.e., a group of macroblocks) of video.
Such flexibility leaves the direct (i.e., spatial and temporal) block prediction modes open to a wide variety of different implementations. A direct-mode block is a bi-predictive predicted block in a B-frame that does not signal either references or motion vectors. Rather, references and motion vectors are derived from a co-located block in a previously decoded picture. The overhead of the derived block mode is very low and provides a very important prediction mode that is often used to significantly reduce the rate of B-frames.
The reference pictures for each slice of video are arranged into two ordered lists (i.e., List0 and List1). For bi-predictive and direct-mode predicted blocks, one picture from each list should be indicated for use for inter-prediction by two reference-indices (one into each list) indicating an ordered number of one of the reference pictures from each list.
Previous H.264 implementations of direct-modes use the following sequence to determine which two current reference pictures should be used for inter-prediction of each block of direct-mode block. First, previous H.264 implementations find the co-located picture (i.e., reference 0, the first reference picture from List1) and block for the current block. This co-located picture will be the first reference picture used for direct-mode prediction. Next, the co-located block will be used to derive the reference indices and motion vectors for the current block. Specifically, previous H.264 implementations determine the List0 reference picture that is used by the co-located block to refer to a ‘direct-mode reference’. The reference index in the co-located picture of this direct mode reference is called the direct-mode reference index. The direct mode reference index is used by the current block to determine the second reference picture to use for inter prediction. Specifically, the direct-mode reference index is directly used in the reference picture list of the current slice. Finally, the motion vectors for the current block are interpolated from the motion vectors used in the co-located block according to the temporal distances between the current picture and the two reference pictures.
Such an implementation has the disadvantage that the second reference picture does not necessarily refer to the same physical reference picture for direct-prediction that was used for inter-prediction by the co-located block. The reference picture used in the co-located block and the second picture used in the direct-mode prediction of the current block are the same physical picture only if the direct mode reference picture was present in the same position (i) in List0 of the current slice of the current picture being decoded and (ii) in List0 of the co-located slice of the co-located picture.
The intent of direct-mode prediction is that it uses the physical reference picture used by the co-located block as a reference picture for the current block. However, since H.264 supports reference picture re-ordering, this condition is not necessarily met. Reference picture re-ordering is the ability to flexibly order reference lists for each slice to use different pictures that are best inter-predicted from various other previously encoded/decoded pictures. If the encoder has the ability to specify which pictures are best for the current picture, then prediction residuals may be reduced.
A particular example of where the ability to re-order reference pictures is useful is to adaptively choose whether to code an I or P-picture as two fields (the second of which is inter-predicted from the first) or as a single picture without inter-prediction between fields. The reference pictures may be re-ordered between the current picture and the co-located picture such that the same reference picture does not occur in the same position in the respective List0. The direct-mode prediction could be seriously compromised with the existing solution since the intended use of the direct-mode is that the same reference picture would be used.
It would be desirable to identify reference index that spatial and temporal direct-mode prediction modes should use to reference the picture that was the primary reference of the co-located macroblock.