In general, moving pictures are encoded by compressing the information in amount by making use of the redundancy in the spatial and temporal directions, which exists in the moving pictures. Here, the inter picture prediction is used as a method that makes use of the redundancy in the temporal direction. In the inter picture prediction, a certain picture is encoded by using a reference picture for which specified is a picture existing in vicinity of the certain picture in the forward or backward direction in display time order.
More specifically, to compress the information in amount, an amount of motion is detected from the reference picture, and the redundancy in the spatial direction is reduced from the difference between a motion-compensated picture and the encoding target picture.
According to the H.264 standard that was established in recent years, the encoding is performed in units of slices.
The slice is a unit that is smaller than a picture and is composed of a plurality of macro blocks. Each picture is composed of one or more slices.
The I-slice is a slice including blocks for which the intra picture prediction is to be performed using only the encoding target picture, the I-slice not having a reference picture. Also, the P-slice is a slice that includes (i) a block for which the inter picture prediction is to be performed by referring to an already-decoded picture, and (ii) a block for which the intra picture prediction is to be performed. Further, the B-slice is a slice that includes (i) a block for which the inter picture prediction is to be performed by referring to at most two already-decoded pictures at the same time, and (ii) a block for which the intra picture prediction is to be performed.
A picture can include a plurality of types of slices. A picture including only I-slices is called I-picture; a picture including only I-slices and P-slices is called P-picture; and a picture including I-slices, P-slices and B-slices is called B-picture.
In the following, description will be made in units of pictures. However, the description could be applied to the units of slices as well.
In the H.264 standard, compared to the MPEG2 standard or the MPEG4 standard, the restrictions concerning the reference picture have been reduced significantly. The reference picture to be referred to by a block belonging to a P-picture may exist either in the forward or backward direction from the target in display time order in so far as the reference picture has already been decoded. Also, the at most two pictures to be referred to by a block belonging to a B-picture may exist either in the forward or backward direction from the target in display time order in so far as the pictures have already been decoded. Further, the reference picture may be any type of picture, and may be any of I-picture, P-picture, and B-picture.
FIG. 40 illustrates the prediction relationships among pictures in the above-described moving picture encoding methods.
In FIG. 40, the vertical lines each represent a picture, and the signs attached to the right-bottom of each picture indicate the picture types (I, P, and B). Also, the arrows indicate that the inter picture prediction decoding is performed for a picture at the start of an arrow, using a picture at the end of the arrow as the reference picture.
A B-picture can refer to at most two pictures. One of the two references to pictures is called a forward reference (L0) and the other is called a backward reference (L1).
It should be noted here that in the forward reference, a picture that exists in the forward direction in display time order is given a priority, but it should not necessarily be a picture existing in the forward direction in display time order. Similarly, in the backward reference, a picture that exists in the backward direction in display time order is given a priority, but it should not necessarily be a picture existing in the backward direction in display time order.
In the P-picture, each block thereof can refer to at most one picture, and only forward reference (L0) is possible. As is the case with the B-picture, in the P-picture, the picture should not necessarily be a picture existing in the forward direction in display time order.
For example, B-picture B9, which is the 9th picture where picture I1 is the starting picture, uses, in the forward reference, P-picture P10 that is the 10th picture existing after B-picture B9 in display time order, and uses, in the backward reference, P-picture P7 that is the 7th picture existing before B-picture B9 in display time order.
In the H.264 standard, compared to the MPEG2 standard or the MPEG4 standard, the restrictions concerning the display order have also been reduced significantly. It is possible to determine the display order without depending on the decoding order, unless the picture memory for storing decoded pictures overflows.
FIG. 41 illustrates the relationships between the decoding order and the display order with respect to the pictures in the above-described moving picture encoding methods.
In FIG. 41, the numbers in the upper row indicate the decoding order of the pictures, and the numbers in the lower row indicate the display order of the pictures. In FIG. 41, the arrows in the middle part indicate the relationships between the decoding order and the display order. The display order is encoded as an attribute of each picture.
For example, P-picture P10 in FIG. 41 is displayed after B-picture B11 and P-picture P13 that are decoded after P-picture P10.
Also, in the H.264 standard, during decoding of a B-picture, an encoding mode called direct mode can be selected. In the direct mode, the encoding target block itself does not have a motion vector. There are two types of direct modes: temporal direct mode; and spatial direct mode.
In the temporal direct mode, a motion vector to be used for the encoding target block is generated by prediction, by using a motion vector of another encoded picture as a reference motion vector and performing the scaling process based on the positional relationships among pictures in the display time (see, for example, Patent Document 1).
FIG. 42 illustrates a method of generating a motion vector by prediction in the temporal direct mode.
In FIG. 42, the vertical lines each represent a picture, and the sign attached to the right-top of each picture indicate a picture type (P indicating the P-picture, and B indicating the B-picture). Also, the number attached to each picture type indicates the decoding order of the picture. In the following recitation of the present description, it is presumed that the picture numbers in the drawings are provided in accordance with the same standard.
The pictures P1, B3, B4, B5, and P2 have display time information T1, T2, T3, T4, and T5, respectively. Here will be described a case where block BL0 of picture B5 is decoded in the temporal direct mode.
Used in this case is motion vector MV1 of block BL1 (anchor block) that is at the same coordinate position as block BL0, where the block BL1 is in picture P2 (anchor picture) that is in the vicinity of picture B5 in display time and has already been decoded. The motion vector MV1 is a motion vector that was used when the block BL1 was decoded, and refers to picture P1. In this case, the following motion vectors are used when block BL0 is decoded. That is to say, motion vector MV_F is used for picture P1, and motion vector MV_B is used for picture P2.
The sizes of motion vectors MV_F and MV_B can be obtained by Equation 1 as follows, where MV represents the size of motion vectors MV1, MVf represents the size of motion vector MV_F, and MVb represents the size of motion vectors MV_B.MVf=(T4−T1)/(T5−T1)×MVMVb=(T5−T4)/(T5−T1)×MV  Equation 1
The process for obtaining MVf and MVb from MV1 in this way is called scaling process. The block MBL0 is motion-compensated from pictures P1 and P2 being the reference pictures, by using motion vectors MV_F and MV_B that are obtained through the scaling process.
On the other hand, in the spatial direct mode, as is the case with the temporal direct mode, the decoding target block itself does not have a motion vector, the decoding is performed by referring to and using a motion vector that is owned by a decoded block spatially placed in the vicinity of the decoding target block (see, for example, Patent Document 2).
FIG. 43 illustrates a method of generating a motion vector by prediction in the spatial direct mode. Explained here is a case where block BL0 of picture B5 shown in FIG. 43 is decoded in the spatial direct mode. In this case, determined as candidates for motion vectors of the encoding target block are motion vectors that refer to a decoded picture that is closest to the decoding target picture in display time order, the candidates being selected from among motion vectors MVA1, MVB1, and MVC1 respectively for decoded blocks including three pixels A, B, and C in the vicinity of block BL0 being the decoding target block.
When three motion vectors are determined as the candidates as a result of this process, a central value thereof is selected as a motion vector of the decoding target block. Also, when two motion vectors are determined as the candidates as a result of this process, a mean value thereof is calculated, and the calculation result is selected as a motion vector of the decoding target block.
In the example shown in FIG. 43, motion vectors MVA1 and MVC1 are obtained by referring to picture P2, and motion vector MVB1 is obtained by referring to picture P1. Accordingly, a mean value of motion vectors MVA1 and MVC1, which refer to picture P2 being a decoded picture that is closest to the decoding target picture in display time order, is calculated, and the calculation result is motion vector MV_F being the first motion vector of the decoding target block. The second motion vector MV_B is obtained in the same manner, and in this example, motion vector MVB2, which refers to picture P3 being a decoded picture that is closest in display time order, is determined as the motion vector MV_B.                Patent Document 1: Japanese Patent Application Publication No. 11-75191        Patent Document 2: International Publication Pamphlet No. 2004/008775        