For example, in an international standard video encoding system, such as MPEG (Moving Picture Experts Group) or “ITU-T H.26x”, a method of defining block data (referred to as a “macroblock” from here on) which is a combination of 16×16 pixels for a luminance signal and 8×8 pixels for each of color difference signals which correspond to the 16×16 pixels of the luminance signal as one unit, and compressing image data on the basis of a motion compensation technology and an orthogonal transformation/transform coefficient quantization technology is used. In motion compensation processes carried out by a moving image encoding device and a moving image decoding device, a forward picture or a backward picture is referred to, and detection of a motion vector and generation of a prediction image are carried out for each macroblock. At this time, a picture for which only one picture is referred to and on which inter-frame prediction encoding is carried out is referred to as a P picture, and a picture for which two pictures is simultaneously referred to and on which inter-frame prediction encoding is carried out is referred to as a B picture.
In AVC/H.264 which is an international standard system (ISO/IEC 14496-10 | ITU-T H.264), an encoding mode called a direct mode can be selected when encoding a B picture (for example, refer to nonpatent reference 1). More specifically, a macroblock to be encoded does not have encoded data of a motion vector, and an encoding mode in which to generate a motion vector of the macroblock to be encoded can be selected in a predetermined arithmetic process using a motion vector of a macroblock of another already-encoded picture and a motion vector of an adjacent macroblock.
This direct mode includes the following two types of modes: a temporal direct mode and a spatial direct mode. In the temporal direct mode, by referring to the motion vector of another already-encoded picture and then carrying out a scaling process of scaling the motion vector according to the time difference between the other already-encoded picture and the picture which is the target to be encoded, a motion vector of the macroblock to be encoded is generated. In the spatial direct mode, by referring to the motion vector of at least one already-encoded macroblock located in the vicinity of the macroblock to be encoded, a motion vector of the macroblock to be encoded is generated from the motion vector. In this direct mode, either of the temporal direct mode and the spatial direct mode can be selected for each slice by using “direct_spatial_mv_pred_flag” which is a flag disposed in each slice header. A mode in which transform coefficients are not encoded, among direct modes, is referred to as a skip mode. Hereafter, a skip mode is also included in a direct mode which will be described below.
FIG. 11 is a schematic diagram showing a method of generating a motion vector in the temporal direct mode. In FIG. 11, “P” denotes a P picture and “B” denotes a B picture. Further, numerical numbers 0 to 3 denote an order in which pictures respectively designated by the numerical numbers are displayed, and show images which are displayed at times T0, T1, T2, and T3, respectively. It is assumed that an encoding process on the pictures is carried out in order of P0, P3, B1, and B2.
For example, a case in which a macroblock MB1 in the picture B2 is encoded in the temporal direct mode will be considered hereafter. In this case, the motion vector MV of a macroblock MB2 which is a motion vector of the picture P3 closest to the picture B2 among the already-encoded pictures located backward with respect to the picture B2 on the time axis, and which is spatially located at the same position as the macroblock MB1. This motion vector MV refers to the picture P0, and motion vectors MVL0 and MVL1 which are used when encoding the macroblock MB1 are calculated according to the following equation (1).
                                          MVL            ⁢                                                  ⁢            0                    =                                                                      T                  ⁢                                                                          ⁢                  2                                -                                  T                  ⁢                                                                          ⁢                  0                                                                              T                  ⁢                                                                          ⁢                  3                                -                                  T                  ⁢                                                                          ⁢                  0                                                      ×            MV                          ⁢                                  ⁢                              MVL            ⁢                                                  ⁢            1                    =                                                                      T                  ⁢                                                                          ⁢                  2                                -                                  T                  ⁢                                                                          ⁢                  3                                                                              T                  ⁢                                                                          ⁢                  3                                -                                  T                  ⁢                                                                          ⁢                  0                                                      ×            MV                                              (        1        )            
FIG. 12 is a schematic diagram showing a method of generating a motion vector in the spatial direct mode. In FIG. 12, currentMB denotes the macroblock to be encoded. At this time, when the motion vector of an already-encoded macroblock A on a left side of the macroblock to be encoded is expressed as MVa, the motion vector of an already-encoded macroblock B on an upper side of the macroblock to be encoded is expressed as MVb, and the motion vector of an already-encoded macroblock C on an upper right side of the macroblock to be encoded is expressed as MVc, the motion vector MV of the macroblock to be encoded can be calculated by determining the median of these motion vectors MVa, MVb, and MVc, as shown in the following equation (2).MV=median(MVa,MVb,MVc)  (2)
The motion vector is determined for each of forward and backward pictures in the spatial direct mode, and the motion vectors for both of them can be determined by using the above-mentioned method.
A reference image which is used for the generation of a prediction image is managed as a reference image list for each vector which is used for reference. When two vectors are used, reference image lists are referred to as a list 0 and a list 1, respectively. Reference images are stored in the reference image lists in reverse chronological order, respectively, and, in a general case, the list 0 shows a forward reference image and the list 1 shows a backward reference image. As an alternative, the list 1 can show a forward reference image and the list 0 can show a backward reference image, or each of the lists 0 and 1 can show a forward reference image and a backward reference image. Further, the reference image lists do not have to be aligned in reverse chronological order. For example, the following nonpatent reference 1 describes that the reference image lists can be ordered for each slice.