Multi-viewpoint video images are a plurality of video images obtained by photographing the same object and background thereof using a plurality of cameras. Below, a video image obtained by a single camera is called a “two-dimensional video image”, and a set of multiple two-dimensional video images obtained by photographing the same object and background thereof is called a “multi-viewpoint video image”.
There is a strong temporal correlation in the two-dimensional video image of each camera, which is included in a multi-viewpoint video image. In addition, when the cameras are synchronized with each other, the images (taken by the cameras) at the same time capture the object and background thereof in entirely the same state from different positions, so that there is a strong correlation between the cameras. The encoding efficiency of video encoding can be improved using this correlation.
First, conventional techniques relating to the encoding of two-dimensional video images will be shown.
In many known methods of encoding two-dimensional video images, such as H. 264, MPEG-2, MPEG-4 (which are international encoding standards), and the like, highly efficient encoding is performed by means of motion compensation, orthogonal transformation, quantization, entropy encoding, or the like. For example, in H.264, it is possible to perform encoding using temporal correlation between the present frame and past or future frames.
Non-Patent Document 1 discloses detailed techniques of motion compensation used in H. 264. General explanations thereof follow.
In accordance with the motion compensation in H. 264, an encoding target frame is divided into blocks of any size, and each block can have an individual motion vector, thereby achieving a high level of encoding efficiency even for a local change in a video image.
In addition, as candidates for a reference image, past or future frames (with respect to the present frame), which have already been encoded, may be prepared so that each block can have an individual reference frame, thereby implementing a high level of encoding efficiency even for a video image in which an occlusion occurs due to a temporal change.
Next, a conventional encoding method of multi-viewpoint video images will be explained. As the encoding of multi-viewpoint video images uses a correlation between cameras, the multi-viewpoint video images are highly efficiently encoded in a known method which uses “disparity compensation” in which motion compensation is applied to images obtained by different cameras at the same time. Here, disparity is the difference between positions, to which the same point on an imaged object is projected, on the image planes of cameras which are disposed at different positions.
FIG. 7 is a schematic view showing the concept of disparity generated between such cameras. That is, FIG. 7 shows a state in which an observer looks down on image planes of cameras A and B, whose optical axes are parallel to each other, from the upper side thereof. Generally, such points, to which the same point on an imaged object is projected, on image planes of different cameras, are called “corresponding points”. In encoding based on disparity compensation, based on the above corresponding relationship, each pixel value of an encoding target frame is predicted using a reference frame, and the relevant prediction residual and disparity information which indicates the corresponding relationship are encoded.
For each frame in a multi-viewpoint video image, temporal redundancy and redundancy between cameras are present at the same time. Non-Patent Document 2 and Patent Document 1 (disclosing a multi-viewpoint image encoding apparatus) each disclose a method for removing both redundancies simultaneously.
In the relevant methods, temporal prediction of a differential image between an original image and a disparity-compensated image is performed, and a residual of motion compensation in the differential image is encoded.
In accordance with such methods, temporal redundancy, which cannot be removed by a disparity compensation for removing the inter-camera redundancy, can be removed using the motion compensation. Therefore, a prediction residual, which is finally encoded, is reduced, so that a high level of encoding efficiency can be achieved.
Non-Patent Document 1: ITU-T Rec. H.264/ISO/IEC 11496-10, “Editor's Proposed Draft Text Modifications for Joint Video Specification (ITU-T Rec. H.264/ISO/IEC 14496-10 AVC), Draft 7”, Final Committee Draft, Document JVT-E022, pp. 10-13, and 62-68, September 2002.Non-Patent Document 2: Shinya SHIMIZU, Masaki KITAHARA, Kazuto KAMIKURA and Yoshiyuki YASHIMA, “Multi-view Video Coding based on 3-D Warping with Depth Map”, In Proceedings of Picture Coding Symposium 2006, SS3-6, April, 2006.Patent Document 1: Japanese Unexamined Patent Application, First Publication No. H10-191393.