Moving picture experts group (MPEG)-2, MPEG-4, MPEG-4 advanced video coding (AVC)/H.264 are known as moving image coding methods (for example, see NPL 1).
In these moving image coding methods, motion-compensated inter predictive coding is performed. The motion-compensated inter predictive coding uses correlations in the moving image in a time direction to reduce a code amount. The motion-compensated inter predictive coding divides a coding target image into blocks, obtains a motion vector of each of the blocks, and uses a pixel value of the block of a reference image that is indicated by a motion vector for prediction, thereby realizing efficient coding.
Recently, in the H.264 standard, multiview video coding (MVC) that is an extended standard for coding a multiview moving image that is formed of a plurality of moving images in which a plurality of cameras capture images of the same captured object or background has been established as Annex H. This coding method uses disparity-compensated predictive coding that aims to reduce the code amount by using a disparity vector that represents a correlation between cameras.
In addition, in the present situation, a new standard in which a video that is captured by an imaging device is defined as a texture image and a depth image is transmitted together with the texture image has been established in MPEG-3DV that is an ad hoc group of the MPEG.
A depth image is information that indicates a distance from a camera to a captured object. The depth image may be generated by obtaining the distance from a device that is located in the vicinity of the imaging device and measures a distance, for example. Further, the depth image may be generated by analyzing images that are captured by imaging devices at a plurality of viewpoints.
In addition, the H.264/AVC and MVC specify that sequence information about an entire sequence such as a color format and a display order of pictures of a sequence that is coded is created and the sequence information may be coded. Specifically, such sequence information is coded as a sequence parameter set (SPS) that is a parameter set.
Having such a situation as a background, coding of a texture image by using information based on a depth image has been suggested. For example, a video coding device of PTL 1 generates a disparity-compensated image such that the base viewpoint image is converted into an image of a viewpoint of the texture image that is a coding target based on a positional relationship between the depth image that corresponds to the base viewpoint image and a camera when the texture image from a viewpoint other than the base viewpoint is coded. Further, the video coding device of PTL 1 generates a disparity difference image by using the difference between the texture image that is a coding target and the disparity-compensated image, that is, generates a texture image from a viewpoint other than the base viewpoint and codes the generated texture image.