In multiple-viewpoint image coding technologies, there is suggested disparity prediction coding that reduces the amount of information by predicting a disparity between images upon multiple-viewpoint image coding, and also suggested is a decoding method corresponding to the coding method. A vector that represents a disparity between viewpoint images is referred to as a disparity vector. The disparity vector is a two-dimensional vector having a horizontal element (x component) and a vertical element (y component) and is calculated for each block that is a split region of one image. Obtaining multiple-viewpoint images generally uses a camera that is arranged at each viewpoint. In multiple-viewpoint image coding, each viewpoint image is coded as different layers in a plurality of layers. A coding method for a moving image configured of a plurality of layers is generally referred to as scalable coding or hierarchical coding. In the scalable coding, high coding efficiency is realized by performing prediction between layers. A layer that serves as a reference and for which prediction between layers is not performed is referred to as a base layer, and a layer other than the base layer is referred to as an enhancement layer. The scalable coding in a case where a layer is configured of viewpoint images is referred to as view scalable coding. In this case, the base layer is referred to as a base view, and the enhancement layer is referred to as a non-base view. In addition to the view scalable coding, the scalable coding in a case where a layer is configured of a texture layer (image layer) and a depth layer (distance image layer) is referred to as three-dimensional scalable coding.
Types of scalable coding include, in addition to the view scalable coding, spatial scalable coding (processing a low-resolution picture as the base layer and a high-resolution picture as the enhancement layer), SNR scalable coding (processing a low-quality picture as the base layer and a high-resolution picture as the enhancement layer), and the like. In the scalable coding, a picture of the base layer, for example, may be used as a reference picture in coding of a picture of the enhancement layer.
In NPL 1, there is known a technology referred to as view synthesis prediction that obtains a high-accuracy predicted image by splitting a prediction unit into small sub-blocks and performing prediction using a disparity vector for each sub-block. In addition, in NPL 1, there is known a technology referred to as residual prediction that estimates a residual using an image of a view different from a target view and adds the residual. In addition, in NPL 1, there is known a technology that derives enhancement merge candidates such as an inter-view merge candidate.
In NPL 2, there is known a technology that further splits a residual prediction block into a plurality of sub-blocks. In addition, in NPL 2, there is known a technology that improves the accuracy of a disparity vector in residual prediction performed in a time direction using a vector of a block of a picture different from a picture belonging to a target block (target picture) that has the same view ID as the target block.