1. Field of the Invention
The present invention relates to a video coding method, a video coding apparatus and a video coding program, e.g., and to a technique effectively applicable to a video coding by referring to a plurality of pictures.
2. Description of the Related Art
Requirements for a reduction of a data volume and an improvement of an image quality for the transmission and accumulation of video image information are all the more increasing in association with a wide range development of digitalization of video images.
In response to such demands, H.264/Advanced Video Coding (AVC) for example are in a spotlight as so-called next generation video coding technique.
In the next generation video coding, a prediction method for each Macro Block (MB) of Bi-predictive Picture (B-Picture) has largely five methods, i.e., an intra-image prediction, a forward prediction, a backward prediction, a bi-directional prediction and a direct mode.
Among them, the direct mode is a method for determining a motion vector of the current Macro Block from a motion vector of a Macro Block which is adjacent time-wise and space-wise by focusing on a continuity of a video.
The direct mode heightens a temporal-spatial relationship of a motion vector and contributes to an improvement of a motion prediction and of an information volume compression ratio because a transmission of motion vector information of the current Macro Block is not required.
Meanwhile, a coding of a Bi-directional prediction Picture (B-Picture) basically adopts one picture from the forward direction and one picture from the backward direction as reference picture in a conventional video coding such as MPEG2.
Comparably, the H.264/AVC can use either three or more reference pictures, and it is able to select two from the forward direction or two from the backward direction even if there are only two reference pictures. It also enables a reference to a B-Picture per se as reference picture, as opposed to the conventional method in which another picture cannot refer to the B-Picture.
Now the description is of a temporal direction direct mode as one of direct modes. Although the direct mode is one not requiring a transmission of motion vector information of the current Macro Block as already described, a motion vector of a temporal direct mode then selects, as a reference vector (a “mvCol” hereinafter), a motion vector of a Macro Block at the same position (a “co-location Macro Block” hereinafter) as a picture processed in the immediate past and determines a vector from the vector by a ratio of time distance. That is, when considering a B-Picture of a specific place as the basis in a common sequence of a video coding, the B-Picture is processed after a reference picture in the forward direction (i.e., in the time past direction) and one in the backward direction (i.e., in the time future direction), and therefore the picture processed in the immediate past becomes a future picture.
FIG. 8 exemplifies a motion vector of a temporal direct mode. As shown in FIG. 8, if a mvCol is (−5, −10) between two pictures (Pic), a direct mode vector of the B-Pic drawing on the left is a motion vector of the temporal direct mode with a half the size of the mvCol in both the forward and backward directions, the forward direction being (−2.5, −5) in the same direction and the backward direction being (2.5, 5) in the opposite direction.
Note that, if there are two B-Pics or if a B-Pic has a Field structure comprising a plurality of Fields corresponding to a jump scan, a weighting changes with the time allocation.
A reference picture in the past direction is generally called List0, and one in the future direction is called List1.
A direct vector is generated by a scaling according to a time distance with a motion vector of the co-location Macro Block of a picture (which is called a Co-located picture; simply “Col-Pic”) of Reference_Index=0 of the List1 as mvCol.
FIG. 9 exemplifies a direct vector in the case of a Frame structure corresponding to a sequential scan; and FIG. 10 exemplifies a direct vector in the case of a Field structure corresponding to a jump scan.
In the case of reference pictures List0 and List1 being separated between the forward and backward directions time-wise from a view of the current frame, a motion vector of the direct mode is generated according to a scaling of a temporal-direction distance (i.e., an internal division of the mvCol) with the motion vector of the co-location Macro Block of the List1 as mvCol if the motion vector of each Macro Block of the List1 indicates the List0 as reference picture.
However, in the case of both of the List0 and List1 being in the same direction from the view of the current picture, or a motion vector indicating a opposite parity (i.e., a different Field) within the same frame by a field structure coding, or a B-Picture becoming a reference frame, then the mvCol may possibly become a motion vector which does not straddle the current frame. In this case, the motion vector of the direct mode is calculated by an external division of the mvCol.
FIG. 11 exemplifies the case of calculating a direct vector by an external division of the mvCol, as a concrete example.
The example shown in FIG. 11 is in the case of a Field structure, and yet the example shown in FIG. 11 can apply to the case of Frame structure in a case such as B-Picture becoming a reference picture. In this case, a direct vector 0 (L0MV) and a direct vector 1 (L1MV) are external divisions of the mvCol, and therefore they are degraded in terms of vector accuracy. The L1MV can only be expressed in four times coarser accuracy as compared to the MV in the example shown in FIG. 11.
Here, the operation of calculating a motion vector of the direct mode by an external division and generating a motion vector with a larger component than a mvCol depending on a case can only generate a motion vector of which an accuracy is a half-pel (i.e., one half of a pixel) accuracy for a vector becoming two times, and an integer (i.e., a unit pixel) accuracy for the vector becoming four times even if a mvCol is a motion vector of a quarter-pel (i.e., a quarter of a pixel) accuracy for example, and accordingly there is a possibility of the degraded accuracy of the motion vector causing a degraded prediction efficiency.
Meanwhile, in the case of a Field structuring carrying out a coding by the unit of a field of an input video image, the common method of taking two reference pictures each in the forward and backward directions causes a shortage in the number of ref_idx, resulting in a failure to generate a motion vector of a temporal direct mode.
A patent document 1 has disclosed a technique comprising a judgment unit for judging whether or not a scaling process for obtaining a motion vector in the case of a temporal direct mode for coding a video, and, if the scaling process is judged to be impossible, carrying out a motion compensation either by using another coding mode or not performing a scaling process.
The technique disclosed by the patent document 1, however, discards the temporal direct mode in the case of being unable to carry out the scaling process, and therefore is faced with a risk of being unable to effectively utilize an improvement of compression efficiency by adopting the temporal direct mode.
[Patent document 1] Laid-Open Japanese Patent Application Publication No. 2004-215229