H.261 and H.263 standardized by ITU (International Telecommunication Union) are technologies for encoding a moving image signal at low bit rate, high compression rate, and high quality to generate coded data, and decoding a coded moving image. ISO (International Organization for Standardization) MPEG-1, MPEG-2, MPEG-4, etc. are also widely used as international standards.
In addition, H.264 has recently been standardized through collaboration between ITU and ISO (NPL 1). As compared to the conventional moving image coding technologies, the H.264 is known to be able to provide further improvements in compression efficiency and in image quality.
In such moving image coding technologies, the inter-frame prediction coding techniques which utilize temporal correlation between frames are widely used for the sake of efficient compression of the moving image signal. The inter-frame prediction coding involves predicting the image signal of the current frame from that of a previously-coded frame or frames, and encoding a prediction error signal between the predicted signal and the current signal. In typical moving images, the image signals of temporally neighboring frames are highly correlated. The techniques are thus effective in improving the compression efficiency.
The moving image coding technologies such as MPEG-1, MPEG-2, MPEG-4, and H.264 encode a moving image by the combination of I-pictures (intra coded pictures) which use no inter-frame prediction coding, P-pictures (unidirectionally predictive coded pictures) which use inter-frame prediction coding from a previously-coded frame, and B-pictures (bidirectionally predictive coded pictures) which use inter-frame prediction coding from two previously-coded frames.
In decoding, a single frame of an I-picture can be decoded by itself. A single frame of a P- or B-picture is not decodable by itself since it needs image data intended for inter-frame prediction before decoding.
FIG. 1 shows an example of picture configuration in a moving image coding system. Each individual rectangle represents a frame with the picture type and order of display indicated below (for example, B5 indicates that the frame is the fifth to be displayed and is encoded as a B-picture). Such I-, P-, and B-pictures of different characteristics are appropriately combined to encode the moving image.
FIG. 1 is a diagram showing an example of the picture configuration of the moving image coding. As shown in FIG. 1, when the coded moving image bit stream is subjected to special reproduction such as fast-play and fast reverse play, only the bit streams of I-pictures which are decodable by themselves are extracted from the bit stream for reproduction.
FIG. 2 is a diagram showing an example of operation for obtaining a fast play bit stream and a fast reverse play bit stream. FIG. 3 is a diagram showing the configuration of an apparatus that performs fast play and fast reverse play. As shown in FIG. 3, a bit stream is input to a stream extraction unit 101. The stream extraction unit 101 extracts only the bit streams of I-pictures from the input bit stream, and supplies the extracted bit streams to a stream rearrangement unit 102. The stream rearrangement unit 102 rearranges the supplied I-picture bit streams if needed, and outputs the bit streams to outside.
Description will now be given in conjunction with the example of FIG. 2. For fast play, only the bit streams of I-pictures are extracted by the stream extraction unit 101 in order from the bit stream shown in the top of FIG. 2. The extracted bit streams are arranged to constitute a bit stream, which results in the fast play bit stream shown in the lower left of FIG. 2. Fast play involves only the extraction of I-pictures without the rearrangement processing of the stream rearrangement unit 102.
For fast reverse play, the stream extraction unit 101 similarly extracts only the I-pictures from the bit stream. The stream rearrangement unit 102 rearranges and outputs the I-pictures in reverse order to that of display. This provides the fast reverse play bit stream shown in the lower right of FIG. 2.
For example, PTL 1 discloses a technical development of the foregoing method, in which only minimum necessary I-pictures for display are extracted to generate a fast play stream. The method of PTL 1 can also be used for the special reproduction of a bit stream that is encoded by the H.264 moving image coding technology which has recently been standardized. H.264, however, has a higher degree of freedom of coding than the coding standards of MPEG-1, MPEG-2, MPEG-4, etc. The application of the coding standard of MPEG-1, MPEG-2, or MPEG-4 is therefore not always possible. The details will be given below.
Unlike the coding standards of MPEG-1, MPEG-2, and MPEG-4, H.264 has two types of pictures that are composed only of I-slices and are decodable by themselves, namely, an IDR (Instantaneous Decoding Refresh) picture and an I-picture. The IDR-picture entails a reset operation on the internal state of the decoder, and is fully decodable by itself like the I-picture according to the coding standards of MPEG-2 etc.
On the other hand, the H.264 I-picture contains image data that is decodable by itself, but with a header part that needs information on past pictures for decoding. A simple way to implement special reproduction is thus to use IDR-pictures alone. As employed herein, an IDR-picture and an I-picture shall hereinafter refer to the H.264 IDR-picture and I-picture, respectively, unless otherwise specified.
Depending on the operation of the encoder, a bit stream with a small number of IDR-pictures and a large number of I-pictures can be generated for the sake of improving the coding efficiency etc. In such a case, the pictures available for the IDR-specific special reproduction are small in number, failing to provide a smooth motion.
FIG. 4 shows an example of the operation of a fast play bit stream with IDR-pictures alone. The original bit stream is shown in the top of FIG. 4, and a fast play bit stream in the bottom of FIG. 4. In the example, the original bit stream includes for every six frames a picture that contains image data reproducible by itself (IDR or I). Of such pictures, IDR-pictures occur at intervals of 18 frames. The rest are I-pictures.
As shown in FIG. 4, the exclusive use of IDR-pictures, without I-pictures whose image data is decodable by itself, degrades the motion smoothness of the fast play bit stream because only one in 18 frames is available for special reproduction. When I-pictures (other than IDR) are also used for special reproduction, past picture information is needed in order to decode the information on the header part (picture number, order of output, frame buffer management information, etc.) as described above. With IDR- and I-pictures extracted and arranged, it is not possible to decode the header part normally. This causes problems such as improper order of output of the pictures, and that the decoding apparatus judges it an error and provides a result of decoding.
For example, “frame_num” included in “slice_header( )” is defined to be incremented by one for each reference picture. When IDR- and I-pictures are extracted and arranged, “frame_num” can increase by more than 1 in value between adjoining reference pictures, which may be judged as an error by some decoding apparatuses.
To avoid such problems, the decoding apparatus may be provided with an operation mode for special reproduction. In the special reproduction mode, the decoding apparatus performs the decoding operation of simply outputting decoded image data in order of decoding, ignoring the header part′ information on the order of output and a decoding error.