From the past, in the fields of the computer graphics (CG) and the virtual reality (VR), technologies have been activity studied for generating an image of an object, by a computer, viewed not only from a viewpoint position at which a camera is placed but also from a viewpoint position desired by a user.
For example, there are methods for displaying a three-dimensional image of an object or generating an image (to be referred to as a virtual viewpoint image hereinafter) of an object viewed from a virtual viewpoint using plural images of the object taken under different conditions.
As a method for displaying a three-dimensional image of an object, there is a method using a display having plural image display planes such as a DFD (Depth-Fused 3-D) display, for example. The DFD is a display in which plural image display planes are layered at some intervals (for example, refer to document 1: Japanese Patent No. 3022558). The DFD can be roughly classified into a brightness modulation type and a transmission type.
For displaying an image of the object on the DFD, a two-dimensional image of the object is displayed on each image display plane. At this time, if the DFD is the brightness modulation type, brightness of each of pixels overlapping when viewed from a predetermined viewpoint of an observer (reference viewpoint) is set in a ratio according to the shape of the object in the depth direction for displaying the pixels. Accordingly, as to a point existing on the object, brightness of the pixel on an image display plane existing near from the observer becomes large, and as to another point, the brightness of the pixel on a display plane existing farther from the observer becomes large. As a result, the observer who observes images displayed on each image display plane of the DFD can see a stereoscopic image (three-dimensional image) of the object.
When the DFD is the transmission type, transmittance of each of pixels, on each image display plane, overlapping when viewed from a predetermined viewpoint of the observer (reference viewpoint) is set according to the shape of the object in the depth direction so as to display the pixels.
In addition to the displaying method using the DFD, as the method for displaying the three-dimensional image of the object, there is a method for displaying two images having parallax corresponding to an interval of right and left eyes of the observer on a screen such as a liquid crystal display and the like.
For generating the images for displaying the three-dimensional image of the object or for generating the image of the object viewed from arbitrary viewpoints, when the three-dimensional shape of the object is known since the object is generated by a computer graphics and the like, for example, each of the images can be generated using the model. On the other hand, when the three-dimensional shape of the object is not known, it is necessary to obtain the three dimensional shape of the object, namely, a geometrical model of the object before generating each image.
In addition, also when generating the virtual viewpoint image using the plural images, it is necessary to obtain the geometrical model of the object based on the plural images first. The geometrical model of the object is represented as a set of basic figures called polygon or voxel, for example.
There are various methods for obtaining the geometrical model of the object based on the plural images, and many studies are being performed as Shape from X in the field of the computer vision. In the Shape from X, the stereo method is a representative model obtaining method (refer to document 2 : Takeo Kanade et al.: “Virtualized Reality: Constructing Virtual Worlds from Real Scenes,” IEEE MultiMedia, Vol. 4, No. 1, pp. 34-37, 1997, for example).
In the stereo method, the geometrical model of the object is obtained based on plural images of the object taken from different viewpoints.
At this time, a distance from the reference viewpoint for obtaining the model to each point of the object is calculated using triangulation techniques by performing corresponding point matching, that is, by associating points (pixels) on each image. But, the geometrical model of the object is not immediately obtained using the stereo method. A group of points on the surface of the object is obtained. Therefore, it is necessary to determine structural information indicating how the points included in the point group are connected and what surface is formed in order to obtain the geometrical model of the object (refer to document 3 : Katusi Ikeuchi “Model generation of real object using images”, Journal of the Robotics Society of Japan, Vol. 16, No. 6, pp. 763-766, 1998, for example).
That is, in the method for obtaining the geometrical object using the stereo method, the apparatus (computer) for generating the image should perform complicated processing such as application of the shape of the object, statistical processing and the like. Therefore, high computing power is necessary.
In addition, as the method for obtaining the geometrical model of the object based on the plural images, there is a method called Shape from Silhouette for determining a region that the object occupies in the space based on an outline of the object in each image taken from plural viewpoints (to be referred to as Shape from Silhouette method hereinafter) (refer to document 4: Potmesil, M: “Generating Octree Models of 3D Objects from their Silhouettes in a Sequence of Images,” CVGIP 40, pp. 1-29, 1987, for example).
In many cases, the geometrical model of the object obtained by the Shape from Silhouette method is represented as a set of small cubes called voxels. However, when the geometrical model of the object is represented by the voxels, large amount of data are required for representing the three-dimensional shape of the object. Therefore, high computing power is required for obtaining the geometrical model of the object using the Shape from Silhouette method.
Therefore, in recent years, instead of representing the geometrical model of the object using polygons or voxels like the stereo method and the Shape from Silhouette method, a method is proposed in which partial images of the object are texture-mapped to projection planes having a multi-layered structure so as to represent the three-dimensional shape of the object on the multi-layered planes (refer to document 5: Jonathan Shade et al.: “Layered Depth Images,” SIGGRAPH98 Conference Proceedings, pp. 231-242, 1998, and document 6; Tago, Nitta, Inamura, Harashima, “Video-Based Rendering using dynamic layer representation”, Three-dimensional image conference 2001, pp. 33-36, 2001, for example).
The texture mapping is a method for setting the projection planes of the multi-layered structure, and mapping each partial image (texture image) cut out from the taken image to a projection plane corresponding to a distance of the object appearing in the texture image so as to obtain stereoscopic visual effects. Thus, this method has advantages in that adequately high-speed processing can be performed even by graphics hardware in a generally widespread personal computer and in that handling of data is easy.
However, on the other hand, when representing the geometrical model of the object using the multi-layered planes based on the texture mapping, if intervals at which the projection planes are set are too wide, detailed shape of the object cannot be represented. Therefore, a contrivance is proposed in which a value (depth value) in addition to color information of R (red), G (green) and B (blue) is added for each pixel of the texture image for detailed shape, for example, while rough shape is represented by the projection planes (planes). In the document 5, a method is proposed in which positions of pixels of each texture image are changed according to the depth value so as to represent detailed depths that cannot be fully represented only by the multi-layered planes. In addition, in the document 6, a method is proposed in which transmittance of each pixel is set according to the depth value to represent detailed depths that cannot be fully represented only by the multi-layered planes.