The standard H.264/AVC (ITU-T and ISO/IEC JTC 1, “Advanced video coding for generic audiovisual services”, ITU-T Rec.H.264 and ISO/IEC 14496-10 (MPEG-4 AVC), available in version 7 as of April 2007) is one of the best known video-encoding standards. For a general treatment of the characteristics of this standard, reference may be made, for example, to an article by D. Marpe et al., “The H.264/MPEG-4 Advanced Video Coding standard and its applications”, IEEE Communication Magazine, Vol. 44, No. 8, pp. 134-144, August 2006.
In said standard, each image of the original video sequence to be encoded (represented schematically and designated by S in FIG. 1, where W and H identify, respectively, the width and the height of the image) is divided into one or more slices and subsequently encoded in a video encoder E as image of type I, P or B in the output video sequence VO. A slice is a set of macroblocks (of, for example, 16×16 pixels) belonging to one and the same image.
The slices of type I (Intra) are encoded in an independent way, while the slices of type P (Predictive) and B (Bi-predictive) are encoded by resorting to a motion-compensated prediction (MCP) with respect to one (in the case of P slices) or two (in the case of B slices) reference images.
B-type slices enable a compression factor to be obtained higher than that of I and P slices, not only on account of the motion-compensated prediction, but also thanks to a dedicated mode of encoding of the macroblock, referred to as Direct Prediction.
As described, for example in A. M. Tourapis et al., “Direct mode coding for Bi-predictive slices in the H.264 standard”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 15, Issue 1, January 2005, pp. 119-126, in said encoding mode, the vectors used for MCP are not encoded within the bitstream in so far as they are obtained according to a fixed rule specified by the H.264/AVC standard, and hence known uniquely both to the decoder and to the encoder.
The standard specifies two different direct prediction mode, respectively, of a spatial type (Direct Spatial) and a temporal type (Direct Temporal).
In the first mode, the motion vectors (MVs) of an entire macroblock of, for example, 16×16 pixels or 8×8 pixels are obtained from the vectors of macroblocks already encoded within the same image. In the direct temporal mode, instead, the motion vectors of an entire macroblock are obtained from the motion vectors of macroblocks belonging to a previously encoded image.
The direct spatial mode hence defines a rule of spatial prediction of the motion vectors, whilst the direct temporal mode defines a rule of temporal prediction of the motion vectors.
The H.264/AVC standard enables use of a different mode of direct prediction for B-type slices subjected to encoding. The type of direct prediction chosen is signalled within the header of each B-type slice by a syntax element “direct_spatial_mv_pred_flag”.
The encoding efficiency of the two possible direct prediction modes depends markedly upon the characteristics of the input signal. For some sequences, the two modes provide equivalent performance. For other sequences the spatial mode is decidedly more efficient than the temporal mode, whereas for others still the reverse is true.
Documents such as, for example, U.S. Pat. Nos. 6,192,081, 6,654,420, 7,031,381 and 7,177,360 take generically into account different possible encoding modes, including the direct mode; they do not, however, treat procedures for choosing the direct encoding mode of a macroblock.
H. Kimato et al., “Spatial temporal adaptive direct prediction for bi-directional prediction coding on H.264”, Proc. Of Picture Coding Symposium 2003, Saint-Malo, France, Apr. 23-25, 2003. proposed a dynamic choice of the optimal mode of direct prediction in the H.264 standard, using metrics of various nature.