The term ‘multi-view images’ refers to a plurality of images obtained by photographing the same object and background using a plurality of cameras, while the term ‘multi-view moving images (i.e., ‘multi-view video’)’ refers to moving images obtained in this way.
Motion compensated prediction and disparity compensated prediction have been proposed as technologies for use in general moving image coding and multi-view moving image coding.
Motion compensated prediction is a method which is also employed in International Standards for moving image coding formats of recent years typified by H.264. In this method, the motion of an object is compensated between a frame targeted for coding and a reference frame that has already been coded so as to obtain an inter-frame difference for the image signal, and only this difference signal is coded (see Non-patent document 1).
In contrast, in disparity compensated prediction, by compensating disparities in an object by using a frame photographed by a different camera as the reference frame, coding can be performed as the inter-frame differences between image signals are being obtained (see Non-patent document 2).
The term ‘disparity’ which is used here refers to differences in positions on the image planes of cameras placed at different positions where the same position on an object is projected. In disparity compensated prediction, this is represented by two-dimensional vectors and then coded. As is shown in FIG. 9, because disparities are information whose creation is dependent on the camera position and on the distance from the camera (i.e., the depth), a method known as view synthesis prediction (view interpolation prediction) which utilizes this principle exists.
In view synthesis prediction (view interpolation prediction), a method exists in which the depth of an object is estimated using camera position information and triangulation theory for multi-view video obtained on the coding side or the decoding side, and frames targeted for coding are synthesized (i.e., interpolated) using this estimated depth information so as to create a prediction image (see Patent document 1 and Non-patent document 3). Note that if the depth is estimated on the coding side, it is necessary to encode the depth which is used.
In prediction which uses images photographed using these separate cameras, if individual differences exist between the responses of the camera imaging elements, or if gain control or gamma correction are performed in each camera, or if the settings for the depth of field or aperture or the like are different in each camera, or if there is a direction-dependent illumination effect in the scene, or the like, then the coding efficiency deteriorates. The reason for this is that the prediction is made on the assumption that the illumination and color of the object are the same in both the frame targeted for coding and the reference frame.
Methods such as illumination compensation and color correction are being investigated as ways of dealing with changes in the illumination and color of an object. In these methods, by using a reference frame whose illumination and color have been corrected as the frame which is used for making a prediction, it is possible to limit the amount of prediction residual which is encoded to a minimum.
In H.264, the weighted prediction in which a linear function is used as a correction model is adopted (see Non-patent document 1), while in Non-patent document 3, a method is proposed in which corrections are made using a color table.