In recent years, applications using images from multiple viewpoints have been widely used. An example of the applications is a binocular three-dimensional (3D) television system. In the binocular 3D television system, an image for left eye and that for right eye, which have been respectively taken from two directions different from each other by two cameras, are generated and displayed on a common screen to present a 3D image to a viewer. In this case, the image for left eye and that for right eye are separately transmitted or recorded as independent images, respectively. In this case, an amount of information that is approximately two times that of a single two-dimensional (2D) image is needed.
Accordingly, a technique is proposed in which, assuming that one of the images for left eye and right eye is a main image and the other thereof is a sub-image, the information of the sub-image is compressed by the general compression encoding method to suppress the amount of information (see, for example, Patent Document 1). In the proposed 3D TV image transmission method, it is made that, for every small area of the sub-image, a relative position having high correlation with the main image is determined such that a positional deviation amount (hereinafter, referred to as a disparity vector) and a differential signal (hereinafter, referred to as a prediction residual signal) of the relative position are transmitted or recorded. An image close to the sub-image can be restored by using the main image and the disparity vector; however, because the information of the sub-image that the main image does not include, such as the information of an area shadowed by an object, cannot be restored, the prediction residual signal is also transmitted or recorded.
In 1996, a 3D image encoding method called Multi-view Profile (ISO/IEC 13818-2/AMD3) has been added to the MPEG-2 Video (ISO/IEC 13818-2), which is the international standard for encoding a single-view image. The MPEG-2 Video Multi-view Profile is a two-layer encoding method in which an image for left eye is encoded in the base layer and that for right eye is encoded in the enhancement layer, and an image is compression-encoded by using the disparity-compensated prediction utilizing an inter-view redundancy, in addition to the motion-compensated prediction utilizing a temporal redundancy and the discrete cosine transform utilizing a spatial redundancy.
Also, a technique is proposed in which an amount of information of multi-view images taken by three or more cameras is suppressed by using the motion-compensated prediction and the disparity-compensated prediction (see, for example, Patent Document 2). In the proposed high-efficient image encoding method, the encoding efficiency is improved by performing pattern matching with reference pictures from multiple viewpoints to select a motion-compensated prediction image or a disparity-compensated prediction image having the smallest error.
Also, work for standardizing the Multiview Video Coding (MVC) (hereinafter, referred to as the MVC), in which the AVC/H. 264 (see Non-Patent Document 1) is extended to multi-view images, is underway in the JVT (Joint Video Team) (see Non-Patent Document 2). Similarly to the aforementioned MPEG-2 Video Multi-view Profile, the encoding efficiency of the MVC is also improved by adopting a prediction between viewpoints.