Scalable coding is a technology in which an image is encoded hierarchically, i.e. coarse information through fine information. Here, when encoded data of a basic hierarchical layer configured with the coarsest information is decoded, a decoded video having the lowest quality is obtained; when encoded data of the basic hierarchical layer and a first hierarchical layer is decoded, a decoded video having a middle quality is obtained; and, when encoded data of a second hierarchical layer is also decoded, a decoded video having a high quality is obtained. The scalable coding is an encoding method in which quality of a decoded video increases as the number of decoded hierarchical layers increases.
The SVC (see Non-Patent Document 1) is a scalable coding method which has been standardized as an extended method of the MPEG-4 AVC/H.264 coding method, and supports temporal scalability (hereinafter, referred to as “time hierarchical encoding”), spatial scalability, and SNR scalability.
FIG. 24 shows an example of video data which is time hierarchically encoded. In FIG. 24, frames indicated by arrows show those being referred to at inter frame prediction encoding. A frame (I0) to be decoded first is a frame which is predicted by using only pixel values of the frame, and does not refer to other frames. A frame (P1) to be decoded next generates a prediction image by referring to the already decoded I0 frame, and shows that a difference image against the generated prediction image is encoded. A frame (B2) to be decoded next generates a prediction image by referring to the already decoded two frames, i.e. I0 frame and P1 frame, and shows that a difference image against the generated prediction image is encoded. And, the same continues.
In FIG. 24, if the frames I0 and P1 are called as basic hierarchical layer frames (T0), the frame B2 as a first hierarchical layer frame, the frames B3 and B4 as second hierarchical layer frames, and the frames B5, B6, B7, and B8 as third hierarchical layer frames, the basic hierarchical layer frame is decoded by referring only to a frame belonging to the hierarchical layer of its own frame, the first hierarchical layer frame is decoded by referring only to frames belonging to the hierarchical layer of its own frame and to the basic hierarchical layer frame, and the same continues, i.e. decoding is carried out by referring only to frames belonging to the hierarchical layer of its own frame and to lower hierarchical layer frames.
While all the frames in video data can be decoded by decoding frames of all the hierarchical layers, if the frame belonging to the basic hierarchical layer frame is only decoded, one-eighth of the whole frame is decoded, and, if the frames belonging to the basic hierarchical layer and the first hierarchical layer are decoded, one-fourth of the whole frame is decoded. That is, encoding is carried out in such a manner that an image moving more fluently is decoded as the number of hierarchical layers to be decoded is increased.
Since the video data which is time hierarchically encoded as described above is configured so that part of frames thereof can be extracted to be decoded, the video data can be correctly decoded even if a decoding device is not compatible with scalable coding.
By encoding, in a parameter set of an upper header added to the top of video data, the maximum number of hierarchical layers when the video data is hierarchically encoded and a flag showing whether or not a frame belonging to each hierarchical layer uses, as a reference image, a frame belonging to upper hierarchical layers, a decoding device compatible with scalable coding can determine whether or not the video data is configured in a scalable manner and how much roughness can be expected in decoding when the data is configured in the scalable manner.
For example, in Non-Patent Document 2, a parameter set (video parameter set) for encoding the maximum number of hierarchical layers of video data which is time hierarchically encoded and a flag which shows a reference relationship among hierarchical layers, is encoded in an upper level than a parameter set of a sequence level.
FIG. 25 is a block diagram showing a configuration of a conventional video encoding device for generating video data which is time hierarchically encoded.
A video parameter set encoding unit 101 encodes the maximum number of hierarchical layers of video data and a flag which shows whether or not a frame belonging to each hierarchical layer uses, as a reference image, a frame belonging to upper hierarchical layers.
A sequence parameter set encoding unit 102 encodes an identification number showing which video parameter set is referred to by a sequence and parameters (resolution of video data, etc.) about a whole sequence of the video data.
A basic hierarchical layer frame encoding unit 103 encodes an identification number of a sequence parameter set to be referred to and a frame belonging to a basic hierarchical layer.
Similar to the basic hierarchical layer frame encoding unit 103, an upper hierarchical layer frame encoding unit 104 encodes frames belonging to upper hierarchical layers.