Multi-viewpoint images are images obtained by photographing the same object and background thereof by using a plurality of cameras, and multi-viewpoint video images are video images of the multi-viewpoint images. Below, a video image obtained by a single camera is called a “two-dimensional video image”, and a set of multiple two-dimensional video images obtained by photographing the same object and background thereof is called a “multi-viewpoint video image”.
As there is a strong correlation between two-dimensional video images, the encoding efficiency thereof is improved by using such a correlation. On the other hand, when the cameras for obtaining multi-viewpoint images or multi-viewpoint video images are synchronized with each other, the images (of the cameras) corresponding to the same time have captured the object and background thereof in entirely the same state from different positions, so that there is a strong correlation between the cameras. The encoding efficiency of the multi-viewpoint images or the multi-viewpoint video images can be improved using this correlation.
First, conventional techniques relating to the encoding of two-dimensional video images will be shown.
In many known methods of encoding two-dimensional video images, such as H. 264, MPEG-2, MPEG-4 (which are international encoding standards), and the like, highly efficient encoding is performed by means of motion compensation, orthogonal transformation, quantization, entropy encoding, or the like. A technique called “motion compensation” is a method which uses a temporal correlation between frames.
Non-Patent Document 1 discloses detailed techniques of motion compensation used in H. 264. General explanations thereof follow.
In accordance with the motion compensation in H.264, a target frame for encoding is divided into blocks of any size. For each block, an already-encoded block called a “reference frame” is selected, and an image is predicted using vector data (called “motion vector”) which indicates a corresponding point. The relevant block division has 7 possible forms such as 16×16 (pixels), 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4, so that image prediction can be performed in consideration of a distinctive feature in the position and size of the imaged object by using a fine unit. Therefore, a residual of an encoding target, which is indicated by the difference between a predicted image and the original image, is reduced, thereby implementing a high level of encoding efficiency.
Next, a conventional encoding method of multi-viewpoint images or multi-viewpoint video images will be explained.
The difference between the encoding of multi-viewpoint images and the encoding of multi-viewpoint video images is that multi-viewpoint video images have, not only a correlation between cameras, but also a temporal correlation. However, the same method using the correlation between cameras can be applied to both the multi-viewpoint images and the multi-viewpoint video images. Therefore, methods used in the encoding of multi-viewpoint video images will be explained below.
As the encoding of multi-viewpoint video images uses a correlation between cameras, the multi-viewpoint video images are highly efficiently encoded in a known method which uses “parallax (or disparity) compensation” in which motion compensation is applied to images obtained by different cameras at the same time. Here, “parallax” (or disparity) is the difference between positions, to which the same point on an imaged object is projected, on the image planes of cameras which are disposed at different positions.
FIG. 21 is a schematic view showing the concept of parallax generated between such cameras. In the schematic view of FIG. 21, image planes of cameras, whose optical axes are parallel to each other, are looked down (vertically) from the upper side thereof. Generally, such points, to which the same point on an imaged object is projected, on image planes of different cameras, are called “corresponding points”.
In parallax compensation, based on the above corresponding relationship, each pixel value of a target frame for encoding is predicted using a reference frame, and the relevant prediction residual and parallax data which indicates the corresponding relationship are encoded.
In many methods, parallax is represented by a vector on an image plane. For example, Non-Patent Document 2 discloses a method of performing parallax compensation for each block, where parallax for each block is represented by a two-dimensional vector, that is, two parameters (x and y components). In this method, parallax data having two parameters and a prediction residual are encoded.
In Non-Patent Document 3, camera parameters are used for encoding, and the parallax vector is represented by one-dimensional data based on the Epipolar geometry constraint, thereby efficiently encoding predicted data. FIG. 22 is a schematic view showing the concept of the Epipolar geometry constraint.
In accordance with the Epipolar geometry constraint, for two cameras (camera A and camera B), a point on one of the images, which corresponds to another point in the other image is constrained on a straight line called an “Epipolar line”. In the method disclosed in Non-Patent Document 3, in order to indicate the position on the Epipolar line, parallax to all target frames for encoding is represented by one parameter such as the distance from the camera, by which the reference frame is obtained, to the imaged object.    Non-Patent Document 1: ITU-T Rec.H.264/ISO/IEC 11496-10, “Editor's Proposed Draft Text Modifications for Joint Video Specification (ITU-T Rec. H.264/ISO/IEC 14496-10 AVC), Draft 7”, Final Committee Draft, Document JVT-E022, pp. 10-13, and 62-68, September 2002.    Non-Patent Document 2: Hideaki Kimata and Masaki Kitahara, “Preliminary results on multiple view video coding (3DAV)”, document M10976MPEG Redmond Meeting, July, 2004.    Non-Patent Document 3: Shinya SHIMIZU, Masaki KITAHARA, Kazuto KAMIKURA and Yoshiyuki YASHIMA, “Multi-view Video Coding based on 3-D Warping with Depth Map”, In Proceedings of Picture Coding Symposium 2006, SS3-6, April, 2006.