The motivation for increased coding efficiency in video coding has led to the adoption in the Joint Video Team (JVT) (a standards body) of more refined and complicated models and modes describing motion information for a given macroblock. These models and modes tend to make better advantage of the temporal redundancies that may exist within a video sequence. See, for example, ITU-T, Video Coding Expert Group (VCEG), “JVT Coding—(ITU-T H.26L & ISO/IEC JTC1 Standard)—Working Draft Number 2 (WD-2)”, ITU-T JVT-B118, March 2002; and/or Heiko Schwarz and Thomas Wiegand, “Tree-structured macroblock partition”, Doc. VCEG-N17, December 2001.
The recent models include, for example, multi-frame indexing of the motion vectors, increased sub-pixel accuracy, multi-referencing, and tree structured macroblock and motion assignment, according to which different sub areas of a macroblock are assigned to different motion information. Unfortunately these models tend to also significantly increase the required percentage of bits for the encoding of motion information within sequence. Thus, in some cases the models tend to reduce the efficacy of such coding methods.
Even though, in some cases, motion vectors are differentially encoded versus a spatial predictor, or even skipped in the case of zero motion while having no residue image to transmit, this does not appear to be sufficient for improved efficiency.
It would, therefore, be advantageous to further reduce the bits required for the encoding of motion information, and thus of the entire sequence, while at the same time not significantly affecting quality.
Another problem that is also introduced by the adoption of such models and modes is that of determining the best mode among all possible choices, for example, given a goal bitrate, encoding/quantization parameters, etc.
Currently, this problem can be partially solved by the use of cost measures/penalties depending on the mode and/or the quantization to be used, or even by employing Rate Distortion Optimization techniques with the goal of minimizing a Lagrangian function.
Such problems and others become even more significant, however, in the case of Bidirectionally Predictive (B) frames where a macroblock may be predicted from both future and past frames. This essentially means that an even larger percentage of bits may be required for the encoding of motion vectors.
Hence, there is a need for improved method and apparatuses for use in coding (e.g., encoding and/or decoding) video data.