Generally, information volume is compressed by reducing redundancy in temporal and spatial directions for moving picture coding. Therefore, motion deriving and motion compensation are performed on a block-to-block basis referring to a preceding or a following picture, and a coding is performed for a difference value between an obtained predictive picture and a current picture for inter picture prediction coding aimed at reducing a temporal redundancy.
In a moving picture coding method H.26L, which is currently under standardization, a picture with only intra picture prediction coding (I picture), a picture for which inter picture prediction coding is performed referring to one picture (hereinafter P picture) and a picture for which inter picture prediction coding is performed referring to two pictures preceding in display order or two pictures following in display order or each one of pictures preceding and following in display order (hereinafter B picture) are proposed.
FIG. 1 is an illustration showing an example of a reference relation between each picture according to above-mentioned moving picture coding method and reference pictures.
In picture I1 intra picture prediction coding is performed without a reference picture, and in picture P10 inter picture prediction coding is performed referring to a picture preceding in display order, P7. In a picture B6 inter picture prediction coding is performed referring to two pictures preceding in display order, in a picture B12 inter picture prediction coding is performed referring to two pictures following in display order and in a picture B18 inter picture prediction coding is performed referring to each one of pictures preceding and following in display order.
A direct mode is one of prediction mode of bi-predictions which perform inter picture prediction coding referring to each of pictures preceding and following in display order. In the direct mode, motion vectors for a block to be coded are not coded in the bit stream directly, and two motion vectors for actual motion compensation are calculated referring to a motion vector of a co-located block in a coded picture close to the picture including the block to be coded in display order, and a predictive block is generated.
FIG. 2 shows an example that a coded picture which is referred to in order to determine a motion vector in the direct mode contains a motion vector which refers to a preceding picture in display order. “P” indicated by a vertical line in FIG. 2 has nothing to do with a picture type and it shows a mere picture. In FIG. 2, for example, a picture P83, in which bi-prediction is performed referring to pictures P82 and P84, is a current picture to be coded. If it is assumed that a block with coding in the picture P83 is a block MB81, a motion vector of the block MB81 is determined using a motion vector of a co-located block MB82 in the picture P84 which is a coded backward reference picture. Since the block MB82 contains only one motion vector MV81 as a motion vector, two motion vectors MV82 and MV83 to be obtained are calculated directly by applying a scaling to a motion vector MV81 and a time interval TR81 based on Equation 1 (a) and Equation 1 (b).MV82=MV81/TR81×TR82  Equation 1 (a)MV83=−MV81/TR81×TR83  Equation 1 (b)
In these equations, the time interval TR81 shows an interval between the picture P84 and the picture P82, that is, a time interval between the picture P84 and a reference picture indicated by the motion vector MV81. The time interval TR82 shows a time interval between the picture P83 and a reference picture indicated by the motion vector MV82. The time interval TR83 shows a time interval between the picture P83 and a reference picture indicated by the motion vector MV83.
The direct mode includes two methods, the temporal prediction already explained and the spatial prediction, and the spatial prediction is explained below. In the spatial prediction in the direct mode, for example, coding is performed on a macroblock of 16×16 pixels basis, and a motion vector, which is obtained referring to a picture closest from a current picture to be coded in display order, is selected from motion vectors in three macroblocks neighboring the current macroblock to be coded, and the selected motion vector is a motion vector for the current macroblock to be coded. If three motion vectors refer to a same picture, a median value is selected. If two of three motion vectors refer to a picture closest from a current picture to be coded in display order, the remainder is considered as “0” vector, and a median value of these values is selected. If only 1 motion vector refers to a picture closest from a current picture to be coded in display order, this motion vector is selected. Thus a motion vector is not coded for a current macroblock to be coded in the direct mode, and motion prediction is performed using a motion vector contained in another macroblock.
FIG. 3A is an illustration showing an example of a motion vector predicting method in the case that a picture preceding in a B picture in display order is referred to using a conventional spatial predicting method in the direct mode. In this FIG. 3A, P indicates a P picture, B indicates a B picture and numbers assigned to picture types in right four pictures indicate an order in which each picture is coded. It should be assumed that a macroblock diagonally shaded in a picture B4 is a current macroblock to be coded. When a motion vector of a current macroblock to be coded is calculated using a spatial predicting method in the direct mode, first, three coded macroblocks (area shaded with broken lines) are selected from macroblocks neighboring the current macroblock to be coded. Explanation of a method for selecting three neighboring macroblocks is omitted here. Motion vectors in coded three macroblocks have been calculated and stored already. There is a case that the motion vector is obtained referring to different pictures for each macroblock even if macroblocks are in a same picture. Reference indices in reference pictures used for coding each macroblock can show which picture is referred to by the three neighboring macroblocks respectively from. Detail of reference indices will be explained later.
Now, for example, it is assumed that three neighboring macroblocks are selected for a current macroblock to be coded shown in FIG. 3A, and motion vectors in each coded macroblock are a motion vector a, b and c respectively. Here, it is assumed that the motion vector and the motion vector b are obtained referring to a P picture with a picture number 11 of “11”, and the motion vector c is obtained referring to a P picture with the picture number 11 of “8”. In this case, among these motion vectors, a, b, and c, the motion vectors a and b which refer to a picture closest to a current picture to be coded in order or display time are candidates for a motion vector of a current macroblock to be coded. In this case, the motion vector c is considered as “0”, and a median value of these three motion vectors a, b and c is selected and determined as a motion vector of the current macroblock to be coded.
However, a coding method such as MPEG-4 can perform coding for each macroblock in a picture using a field structure and a frame structure. Therefore, in a coding method such as MPEG-4, there is a case that a macroblock coded in the field structure and a macroblock coded in the frame structure are mixed in one frame of reference frame. Even in such a case, if three macroblocks neighboring a current macroblock to be coded are coded in the same structure as the current macroblock to be coded, it is possible to derive a motion vector of the current macroblock to be coded using the above-mentioned spatial predicting method in the direct mode without any problems. That is, a case that three neighboring macroblocks are coded in the frame structure for a current macroblock to be coded in the frame structure, or a case that three neighboring macroblocks are coded in the field structure for a current macroblock to be coded in the field structure. The former case is as already explained. In the latter case, by using three motion vectors corresponding to top fields of three neighboring macroblocks for a top field of a current macroblock to be coded, and by using three motion vectors corresponding to bottom fields of three neighboring macroblocks for a bottom field of the current macroblock to be coded, a motion vector of the current macroblock to be coded can be derived for the top field and the bottom field respectively using the above-mentioned method. However, in the temporal prediction method in the direct mode, since the above-mentioned block contains plural motion vectors for temporal prediction in the direct mode when in a block with intra picture prediction coding, motion compensation in the direct mode is performed, if a block of which motion vector is referred to belongs to a B picture such as B6 shown in FIG. 1, a problem occurs because a calculation of motion vector by a scaling based on Equation 1 can not be applied directly. Furthermore, there is a case that precision of motion vector value (half pixel precision and quarter pixel precision, for example) does not meet predetermined precision since dividing operation is performed after the calculation of motion vector.
When a current macroblock to be coded and one of neighboring macroblocks are coded in a different structure for a spatial prediction, it is not specified which one of a field structure or a frame structure is used for coding the current macroblock to be coded, and a method for selecting a motion vector of the current macroblock to be coded from motion vectors of neighboring macroblocks coded in both the field structure and the frame structure is not specified.
The first object of the present invention is to offer a motion vector prediction method in temporal direction with high precision in the direct mode even if a block of which motion vector is referred to belongs to a B picture.
The second object of the present invention is to offer a motion vector prediction method in spatial direction with high precision in the direct mode even if a block of which motion vector is referred to belongs to a B picture.