1. Field of the Invention
The present invention relates to a multi-viewpoint image transmission method and a multi-viewpoint image display method. The invention also relates to an intermediate-viewpoint image generation method and parallax estimation method for multi-viewpoint images, and an apparatus for implementing the same.
2. Related Art of the Invention
Heretofore, a variety of stereovision systems have been proposed; among others, multi-ocular stereovision systems using multi-viewpoint images provide great potential as systems that enable stereoscopic moving images to be viewed simultaneously by a plurality of viewers without requiring special glasses. In the multi-ocular stereovision system, the more cameras and display apparatus used, the more natural becomes the motion parallax that the viewers can perceive, and the easier it becomes to view the images simultaneously by a large number of viewers. In actuality, however, because of such limiting factors as the size of the imaging system and the setting of camera optical axes, there is a limit to the number of cameras that can be used in practical situations. Furthermore, in the transmission and storage processes, it is desired to reduce the amount of information that tends to increase in proportion to the number of cameras used.
If, at the display side, multi-ocular stereoscopic images can be displayed by generating middle-viewpoint images from binocular stereoscopic images, this would alleviate the load of the imaging system and achieve a reduction in the amount of information for transmission and storage. If one is to generate from a plurality of images with different viewpoints a middle-viewpoint image that should be visible from an arbitrary viewpoint intermediate between the different viewpoints, one needs to estimate depth by obtaining corresponding relationships of pixels between the images.
MPEG-1 and MPEG-2 are proposed as image compression schemes for digital transmission of moving pictures. Work is also under way to transmit multi-viewpoint images by extending the MPEG-2 (ISO/IEC13818-2/PDAM3). FIG. 28 is a diagram showing an outline of the MPEG-2 syntax. Transmission by MPEG-2 involves encoding and decoding image data having a hierarchical structure of Sequence, Group of Picture (GOP), and Picture. According to ISO/IEC13818-2/PDAM3), it seems that the transmission of multi-viewpoint images is achieved by extending the GOP layer (though not clear as it is not specifically stated).
FIG. 29 shows the temporal and spatial relationships of multi-viewpoint images to be transmitted. It is attempted here to increase the coding efficiency by using parallax compensation in addition to the motion compensation used in the conventional MPEG-2 scheme. Information on each camera (camera parameters such as camera position, orientation of camera optical axis, etc.) must be appended to the multi-viewpoint images for transmission. ISO/IEC13818-2/PDAM3 states that camera parameters are included in Pic.Extension (extension of the Picture layer) shown in FIG. 28 for transmission, but no specific descriptions of the camera parameters are given.
As for camera parameter descriptions, the position of the camera, the orientation of the camera optical axis, and the distance between the camera position and image plane are defined as the camera parameters in the OpenGL, a CG language (OpenGL Programming Guide, The Official Guide to Learning OpenGL, Release 1, Addison-Wesley Publishing Company, 1993).
FIG. 30 is a diagram for explaining the definitions of the camera parameters according to the OpenGL. In FIG. 30, A is the lens center, B is the center of the image plane (the imaging surface), and C is the intersection of the image's upper edge and the perpendicular dropped from B to the upper edge. The coordinates of A, B, and C are defined as (optical.sub.-- center.sub.-- X, optical.sub.-- center.sub.-- Y, optical.sub.-- center.sub.-- Z), (image plane.sub.-- center.sub.-- X, image.sub.-- plane.sub.-- center.sub.-- Y, image.sub.-- plane.sub.-- center.sub.-- Z), and (image.sub.-- plane.sub.-- vertical.sub.-- X, image.sub.-- plane.sub.-- vertical.sub.-- Y, image plane.sub.-- vertical.sub.-- Z), respectively.
Here, one can easily think of transmitting multi-viewpoint images by appending the camera parameter information defined by the OpenGL to Pic.Extension.
However, with the above-described prior art method, since there is no information concerning the nearest point and farthest point in the image (the nearest point and farthest point of the subject), the problem has been that it is not possible to determine the range of depth within which to work when producing a display that reduces eye strain (for example, by controlling parallax).
Furthermore, when displaying multi-viewpoint images, the viewing distance must be determined appropriately on the basis of such conditions as the view angle at the time of shooting, the size of the imaging surface, and the distance between the lens center and the imaging surface (this is done to prevent the displayed image from becoming unnatural because of too large parallax, or conversely, from appearing flat because of the lack of stereoscopic effects). However, in the OpenGL, no definition is given of the size of the imaging surface (physical size of the CCD), and moreover, the distance between the lens center and the image forming surface is assumed to be equal to the focal length of the lens. This has lead to the problem that at the display side there is no knowing the value of the view angle used at the time of image capturing, making it impossible to determine the appropriate view angle, i.e. the viewing distance at the time of display, and thus giving rise to the possibility that the resulting stereoscopic display image may look unnatural.