High-efficient video coding/decoding technology is the key of realizing storing and transmitting multimedia data at high quality and low cost. The present popular international standards for images coding are based on this coding theory which adopts coding method combining motion compensation based on block matching, discrete cosine transform and quantization. Typically, The first joint technology committee of international standards organization/International Electro-technical Commission (ISO/IEC JTC1) proposes motion picture experts group (namely to MPEG)-1, MPEG-2 and MPEG-4 and such international standards; and the International Telecom Union (ITU-T) proposes the H.26x series. These video coding standards are widely used in the industries.
All these standards for video coding adopt Hybrid Video Coding strategy normally including four main modules such as predicting, transforming, quantizing, information entropy coding etc. wherein, the main function of predicting module is to predict the current image to be coded by using the coded and reconstructed image, namely inter prediction, or to predict the current image block (macro block) to be coded by using the coded and reconstructed image block (or macro block) in images, namely intra prediction; the function of the transforming module is to convert the image block inputted into another space so as to converge the energy of inputted signals at transform coefficient of low frequency for lowering relativity within the elements of image block and being useful for compressing; the main function of quantizing module is to map the transformed coefficients into a limited element aggregate advantageous to coding; and the main function of information entropy coding module is to represent the quantized transform coefficient with variable length code according to statistical rule. The video decoding system has similar modules, mainly to reconstruct the decoded image of the inputted code stream through the procedures of entropy decoding, inverse quantizing, inverse transforming, etc.
The main function of the prediction based on motion compensation is to reduce redundancy of video series on time. Most coding efficiency for video comes from the predicting module. The procedure of the video coding is to code each frame image of video series. The conventional video coding system which codes each frame image is based on macro block as a basic unit. When encoding the current macro block, the motion vector is involved in connecting the current macro block with the reference block. When encoding each frame image, there are situations which can be divided into intra coding (I frame), prediction coding (P frame), bi-directional prediction (B frame) coding etc. Generally, when coding, I frame, P frame and B frame coding are interlarded, for example based on IBBPBBP sequence.
The coding compression efficiency may get above 200:1 bit rate by B frame coding. When encoding the macro block of the B frame, four modes of Direct, Forward Prediction, Backward Prediction and Bi-directional Prediction are involved. The B frame technology needs processing forward and backward motion estimation together so high computation complexity is needed and the additional identification information should be introduced in order to distinguish the forward and backward motion vector.
In conventional video coding standards (such as MPEG-x and H.26x series), the reference frame of B frame only has one forward reference frame and one backward reference frame, while P frame only has one forward reference frame. In order to sufficiently utilize relativity of time domains between images, the P frame and B frame have been allowed to have multiple forward reference frames. However, the spending of space and time are also greatly enhanced, so there is a compromising means which adopts fixed reference frame number to limit greater space and time spending. Actually, the further the distance is, the weaker the relativity between images in time domain is, so such limitation is reasonable.
Direct mode is a code mode which processes both forward prediction and backward prediction. The forward and backward motion vectors of B frame are deduced by the motion vector of the backward reference picture, and it might not be necessary to encode the motion vector information So that the bits occupied by encoding the motion vector information can be reduced and efficiently enhance coding efficiency. Therefore Direct mode is used widely.
When encoding the P frame and B frame by using the fixed reference frame number, considering that the forward reference of P frame is more than the forward reference frame number of B frame (B frame must include a backward reference frame while P frame not include), it will result in the problem of not matching motion vector, i.e. when deducing the motion vector for B frame, the motion vectors of each block in the backward reference frame are used. For the backward reference frame is P frame, so that it maybe appear that the reference frame pointed by the motion vector is beyond the maximum forward reference frame which is possibly pointed by the B frame. For example, as shown in FIG. 1, when the number of the fixed reference frames is 2, the reference frame possibly pointed by the B frame is its two adjacent P frames, while the reference frame of P frame which is as the backward reference frame of B frame is two P frames prior to this P frame, namely to P_REF_1 and P_REF_0. When the motion vector of P frame points to the forefront P frame P_REF_1, such P frame is beyond the maximum forward reference frame which is possibly pointed by the B frame and the calculated motion vector pointing to B frame can not reach the P_REF_1. Therefore, B frame can not obtain the actual reference block for encoding, so that encoding deviation might appear and serious image distortion will be result in.