Video data is generally large in size and thus often encoded into a compressed form for transmission from a sending device to a receiving device or for storing in a data storage device. Typical compression coding methods for video data are MPEG-1, MPEG-2, MPEG-4, and MPEG-4 AVC/H.264 (hereafter, simply called “H.264”) standardized by the Moving Picture Experts Group (MPEG).
Those compression coding methods use interframe motion prediction techniques based on the correlation between frames. When a portion having a high correlation with other frame is found, the coder encodes that portion as its spatial difference (motion vector) and pixel value differences (prediction error). Generally, video data has a high degree of frame-to-frame correlation. Thus the pixel differences of such correlated frames are substantially smaller than the pixel values themselves. For this reason, a high compression ratio is achieved by the use of interframe motion prediction coding.
To implement the interframe motion prediction coding, the above-noted compression coding methods define three types of pictures, i.e., I-pictures, P-pictures, and B-pictures. I-pictures, or Intra pictures, are encoded not with interframe predictive coding techniques, but with intraframe predictive coding techniques. P-pictures, or forward predicted pictures, are encoded with reference to, in general, a previous picture. B-pictures, or bidirectional predicted pictures, are encoded with reference to, in general, both previous and later pictures.
The above-noted compression coding methods also define “Group of Pictures” (GOP) for ease of random access. H.264 does not define GOP per se, but uses similar structures. In a GOP, the coding is supposed to begin with an I-picture. The remaining part of the GOP includes P-pictures and B-pictures. Typically, one to four B-pictures are inserted between two P-pictures. While the arrangement of picture types (I, P, B) is usually in a fixed pattern, there is also a coding method that allows any frame to be encoded as an I-frame (see, for example, Japanese Laid-open Patent Publication No. 8-60956).
There are two types of GOP, called “open GOP” and “closed GOP.” FIG. 15 explains a structure of open GOP. FIG. 16 explains a structure of closed GOP.
The upper halves of FIGS. 15 and 16 depict a sequence of pictures constituting a source video stream, each individual picture being added a number that indicates in what order those pictures are supposed to be played back. The lower halves of FIGS. 15 and 16 depict a sequence of pictures constituting coded video data, each individual picture being designated by a picture type and a frame number. The frame number indicates from which picture in the source video data the frame has been coded. For example, picture “I2” in FIG. 15 is an I-picture corresponding to picture “2” in the source video data.
In the example of FIG. 15, twelve pictures numbered “0” to “11” constitute a GOP. In the case of open GOP, the pictures in a GOP may be encoded by using reference pictures not only in the same GOP, but also other GOP (normally, immediately preceding GOP). In the example of FIG. 15, two coded pictures “B0” and “B1” make reference to picture “P-1” in their immediately preceding GOP. As can be seen from this example, open GOP allows referencing among different GOPs.
In the example of FIG. 16, on the other hand, ten pictures numbered “2” to “11” constitute a GOP. In the case of closed GOP, the pictures in a GOP are encoded by using reference pictures only in the same GOP. In the example of FIG. 16, picture “B3” is the first B picture appearing in its GOP and makes reference to picture “I2” in the same GOP.
Open GOP offers a higher coding efficiency than closed GOP. For example, the number of B-pictures contained in a coded video is relatively small in the case of closed GOP, compared with the case of open GOP.
On the other hand, closed GOP advantageously ensures that, when coded pictures are combined on a GOP basis, the resulting edited pictures can be decoded properly. This is because the decoding of a closed GOP does not depend on any other GOPs. In the case of open GOP, simply combining separately-coded GOPs together results in an incorrect video stream that cannot be decoded properly, because the coded pictures may reference to a picture in other GOP that is missing in the stream.
The above-noted compression coding methods provide many degrees of freedom in terms of how to encode each frame in a GOP. For example, the coding process may be configured with parameters that specify the interval M of P-pictures, frame structure and field structure, direct mode motion compensation (temporal direct mode and spatial direct mode), and the like. Optimal values of those parameters may vary from scene to scene. It is therefore desirable to actually execute coding with different parameters and evaluate the result for more accurate determination of optimal parameter values.
As an example of the above approach, there is proposed an apparatus that has a plurality of coding units configured with different coding parameters to encode the same given video in a parallel fashion. Depending on the scenes, an optimally coded video output is selected from among those produced by the coding units. This technique involves switching of video outputs on a GOP basis to selectively output optimal ones (see, for example, Japanese Laid-open Patent Publication No. 2000-341690, Japanese Laid-open Patent Publication No. 2000-23154, and Japanese Laid-open Patent Publication No. 2006-295492)
As described above, open GOP enables a higher coding efficiency than closed GOP. In other words, a higher picture quality can be achieved with the same amount of coded data.
The use of closed GOP, on the other hand, makes it easier to switch coded video data outputs on a GOP basis in the case where the coding is performed with a plurality of coding units as in the aforementioned apparatus. This is because the decoding of a closed GOP does not rely on any other GOPs. In the case of open GOP, however, the decoder may not be able to decode the video data produced by switching video data outputs on a GOP basis.