As multiple-viewpoint image coding technologies, a disparity predictive coding method and a decoding method associated with this coding method have been proposed. In the disparity predictive coding method, the amount of information is reduced by predicting a disparity between multiple viewpoint images when coding the multiple viewpoint images. A vector representing a disparity between viewpoint images is called a displacement vector. A displacement vector is a two-dimensional vector having an element in the horizontal direction (x component) and an element in the vertical direction (y component), and is calculated for each block, which is one of regions divided from one image. To obtain multiple viewpoint images, cameras disposed for individual viewpoints are usually utilized. In multiple-viewpoint coding, each viewpoint image is coded as an individual layer of multiple layers. A coding method for a video image constituted by multiple layers is generally called scalable coding or hierarchy coding. In scalable coding, high-efficiency coding is implemented by performing inter-layer prediction. A layer which is not subjected to inter-layer prediction but serves as a base is called a base layer, and the other layers are called enhancement layers. Scalable coding in which layers are constituted by viewpoint images is called view scalable coding. In scalable coding, a base layer is also called a base view, while an enhancement layer is also called a non-base view. In view scalable coding, scalable coding in which layers are constituted by texture layers (image layers) and depth layers (distance image layers) is called three-dimensional scalable coding.
Apart from view scalable coding, other examples of scalable coding are spatial scalable coding (processing a low-resolution picture as a base layer and a high-resolution picture as an enhancement layer) and SNR scalable coding (processing a low image-quality picture as a base layer and a high-resolution picture as an enhancement layer). In scalable coding, a base layer picture, for example, may be used as a reference picture when coding an enhancement layer picture.
In HEVC, a technique for reusing prediction information concerning processed blocks, which is called a merge mode, is known. In the merge mode, from a merge candidate list in which merge candidates are constructed as elements, an element specified by a merge index (merge_index) is selected as a prediction parameter, thereby deriving a prediction parameter of a prediction unit.
As a technology for using a motion vector of a different layer (different view) from a target layer for predicting a motion vector of the target layer, inter-layer motion prediction (inter-view motion prediction) is known. In inter-layer motion prediction, motion prediction is performed by referring to a motion vector of a picture having a viewpoint different from that of a target picture. NPL 1 discloses inter-view prediction (IV prediction) and inter-view shift prediction (IVShift prediction) for determining a reference position for inter-layer motion prediction. In inter-view prediction (IV prediction), reference is made to a motion vector at a position determined by adding a displacement equal to a disparity vector to the center position of a target layer. Inter-view shift prediction (IVShift prediction), reference is made to a motion vector at a position determined by adding a displacement equal to a disparity vector which has been adjusted by the size of a target block to the center position of a target layer.
NPL 1 also discloses the following technology. In a sequence parameter set (SPS), an ON/OFF flag of a texture extension tool for such as residual prediction, and an ON/OFF flag of a depth extension tool for such as wedgelet segmentation prediction and contour segmentation prediction are defined, and the ON/OFF flags are sequentially decoded and coded by using a loop variable.