Multi-viewpoint images are a plurality of images obtained by photographing the same object and background thereof using a plurality of cameras, and multi-viewpoint images are video images for the multi-viewpoint images. Below, a video image obtained by a single camera is called a “two-dimensional video image”, and a set of multiple two-dimensional video images obtained by photographing the same object and background thereof is called a “multi-viewpoint video image”.
There is a strong temporal correlation in a two-dimensional video image, and encoding efficiency thereof is improved by using the temporal correlation. In addition, for a multi-viewpoint video image, when the cameras are synchronized with each other, the images (taken by the cameras) at the same time capture the object and background thereof in entirely the same state from different positions, so that there is a strong correlation between the cameras. The encoding efficiency for encoding the multi-viewpoint video image can be improved using this correlation.
First, conventional techniques relating to the encoding of two-dimensional video images will be shown.
In many known methods of encoding two-dimensional video images, such as H. 264, MPEG-2, MPEG-4 (which are international encoding standards), and the like, highly efficient encoding is performed by means of motion compensation, orthogonal transformation, quantization, entropy encoding, or the like. In a technique called “motion compensation”, temporal correlation between frames are used.
Non-Patent Document 1 discloses detailed techniques of motion compensation used in H. 264. General explanations thereof follow.
In accordance with the motion compensation in H. 264, an encoding target frame is divided into blocks of any size, and each block can have an individual motion vector, thereby achieving a high level of encoding efficiency even for a local change in a video image. In addition, as candidates for a reference frame, past or future frames (with respect to the present frame), which have already been encoded, may be prepared for an encoding target frame so that each block can use an individual reference frame, thereby implementing a high level of encoding efficiency even for a video image in which an occlusion occurs due to a temporal change.
Next, conventional encoding methods of multi-viewpoint images or multi-viewpoint video images will be explained.
As the encoding of multi-viewpoint video images uses a correlation between cameras, the multi-viewpoint video images are highly efficiently encoded in a known method which uses “disparity compensation” in which motion compensation is applied to images obtained by different cameras, which have different viewpoints, at the same time. Here, disparity is the difference between positions, to which the same point on an imaged object is projected, on the image planes of cameras which are disposed at different positions.
FIG. 12 is a schematic view showing the concept of disparity generated between such cameras. That is, FIG. 12 shows a state in which an observer looks down on image planes of cameras, whose optical axes are parallel to each other, from the upper side of the image planes. Generally, such points, to which the same point on an imaged object is projected, on image planes of different cameras, are called “corresponding points”. In disparity compensation, based on the above corresponding relationship, each pixel value of an encoding target frame is predicted using a reference frame, and the relevant prediction residual and disparity information which indicates the corresponding relationship are encoded.
For each frame in a multi-viewpoint video image, temporal redundancy and redundancy between cameras are present at the same time. Patent Document 1 discloses a method for removing both redundancies simultaneously.
In the relevant method, a differential image between an input image and the corresponding disparity-compensated image is generated at each specific time, and the generated images are regarded as a two-dimensional video image so as to perform encoding together with motion compensation. In accordance with such a method, temporal redundancy, which cannot be removed by a disparity compensation for removing the inter-camera redundancy, can be removed using the motion compensation. Therefore, a prediction residual, which is encoded in a final process, is reduced, so that a high level of encoding efficiency can be achieved.    Non-Patent Document 1: ITU-T Rec.H.264/ISO/IEC 11496-10, “Editor's Proposed Draft Text Modifications for Joint Video Specification (ITU-T Rec. H.264/ISO/IEC 14496-10 AVC), Draft 7”, Document JVT-E022d7, pp. 10-13, and 62-73, September 2002.    Patent Document 1: Japanese Unexamined Patent Application, First Publication No. 2007-036800.