Information transmitted by a communication system or information recorded in a storage apparatus include images or video. Conventionally, techniques for coding images (including video, hereinafter) in order to transmit and store these images have been known.
Video coding schemes, such as AVC (H.264/MPEG-4 Advanced Video Coding) and its succeeding codec HEVC (High-Efficiency Video Coding) (Non-Patent Literature 1), have been known.(non-patent literature 1)
According to these video coding schemes, typically, a predictive image is generated on the basis of a local decoded image obtained by coding/decoding an input image, and a predictive residue (referred to as a “difference image” or a “residual image”), which is obtained by subtracting the predictive image from the input image (original image), is coded. Methods of generating a predictive image include inter-screen prediction (inter prediction), and intra-screen prediction (intra prediction).
In HEVC, it is assumed that reproduction at a temporally decimated frame rate, such as a case of reproducing 60 fps content at 30 fps, and a technique of achieving temporal scalability is adopted. More specifically, each picture is assigned a numerical value called a temporal identifier (Temporal ID, sub-layer identifier), and a constraint that a picture with a larger temporal identifier does not refer to a picture with a smaller temporal identifier is imposed. Consequently, in the case of decimating pictures with a specific temporal identifier for reproduction, pictures assigned larger temporal identifiers are not required to be decoded.
In recent years, a scalable coding technique or a hierarchical coding technique, that hierarchically codes images according to a required data rate, has been proposed. SHVC (Scalable HEVC) and MV-HEVC (MultiView HEVC) have been known as typical scalable coding schemes (hierarchical coding methods).
SHVC supports spatial scalability, temporal scalability, and SNR scalability. For example, in the case of the spatial scalability, an image down sampled from an original image to have a desired resolution is coded as a lower layer. Next, on a higher layer, inter-layer prediction is performed in order to remove inter-layer redundancy (Non-Patent Literature 2).
MV-HEVC supports viewpoint scalability (view scalability). For example, in the case of coding three viewpoint images that are a viewpoint image 0 (layer 0), a viewpoint image 1 (layer 1) and a viewpoint image 2 (layer 2), inter-layer redundancy can be removed by predicting the viewpoint images 1 and 2 on higher layers from the viewpoint image 0 on a lower layer (layer 0) through inter-layer prediction (Non-Patent Literature 3).
Inter-layer predictions used in scalable coding schemes, such as SHVC and MV-HEVC, include inter-layer image prediction and inter-layer motion prediction. The inter-layer image prediction generates a predictive image on a target layer using texture information (image) of a decoded picture on a lower layer (or a layer different from the target layer). The inter-layer motion prediction generates a predictive value of motion information on the target layer using the motion information of a decoded picture on a lower layer (or a layer different from the target layer). That is, inter-layer prediction is performed using a decoded picture on a lower layer (or a layer different from the target layer) as a reference picture on the target layer.
Besides the inter-layer prediction that removes redundancy in image information or motion information between layers, there also is prediction between parameter sets. In order to remove the redundancy of coding parameters common to layers, in a parameter set (e.g., sequence parameter set SPS, picture parameter set PPS, etc.) that defines a set of coding parameters required to decode/code coding data, the prediction between parameter sets predicts a part of coding parameters in the parameter set used for decoding/coding on an upper layer from among corresponding coding parameters in the parameter set used for decoding/coding on a lower layer (also called reference or inheritance), and omits decoding/coding the part of coding parameters. For example, there is a technique which is notified in SPS and PPS, that predicts scaling list information (quantization matrix) on a target layer from scaling list information on a lower layer (also called syntax prediction between parameter sets).
In the cases of view scalability and SNR scalability, the parameter set used for decoding/coding on each layer contains many common coding parameters. Accordingly, there is a technique called shared parameter set, which removes the redundancy of side information between layers (parameter set) using the parameter sets common to different layers. For example, in Non-Patent Literatures 2 and 3, SPS or PPS (the layer identifier of parameter set has a value of nuhLayerIdA) used for decoding/coding on the lower layer having a layer identifier value of nuhLayerIdA is allowed to be used for decoding/coding on the higher layer having a layer identifier value (nuhLayerIdB) higher than nuhLayerIdA. Through an NAL unit header in an NAL unit that stores coded data of a parameter set, such as coded data on an image or coding parameter, a layer identifier for identifying a layer (also called nuh_layer_id, layerId, or lId), a temporal identifier for identifying a sub-layer associating the layer (also called nuh_temporal_id_plus1, temporalId, or tId), and an NAL unit type (nal_unit_type) for representing the kind of the coded data stored in the NAL unit are notified.
In Non-Patent Literatures 2 and 3, as to a video parameter set VPS that defines a set of coding parameters to be referred to for decoding coded data made up of at least one layer, there is a bit stream constraint “VPS layer identifier is set to zero (nuh_layer_id=0)”.
In Non-Patent Literature 4, as to a sequence parameter set SPS that defines a set of coding parameters to be referred to for decoding the target sequence, and a picture parameter set PPS that defines a set of coding parameters to be referred to for decoding each picture in the target sequence, there is bit stream constraint “layer identifiers of SPS and PPS are set to zero (nuh_layer_id=0)” is proposed.