1. Field of the Invention
The present invention relates to an image processing apparatus, an image processing method, and a program, and more particularly to, an image processing apparatus for generating a binocular parallax image corresponding to stereoscopic vision by performing image conversion with respect to a two-dimensional image, an image processing method, and a program.
2. Description of the Related Art
Various apparatuses and methods for converting a two-dimensional image into a binocular parallax image corresponding to stereoscopic vision have been proposed in the related art. The binocular parallax image generated based on the two-dimensional image includes a pair of a left eye image viewed by a left eye and a right eye image observed by a right eye. The binocular parallax image including the pair of the left eye image and the right eye image is displayed on a display apparatus capable of separating the binocular parallax image into the left eye image and the right eye image and providing them to the left eye and the right eye of an observer, the observer can recognize the images as a stereoscopic image.
The related art regarding the image generation and display process as described above are as follows.
For example, Japanese Unexamined Patent Application Publication No 8-30806 discloses an apparatus that shifts a left eye image and a right eye image in the horizontal direction by a predetermined amount with respect to a still image or an image with small motion, so that the image is recognized as if it floats up.
Furthermore, Japanese Unexamined Patent Application Publication No 10-51812 discloses a method that divides an image into a plurality of parallax calculation regions, calculates pseudo-depth from a feature value of the image in each region, and horizontally shifts a left eye image and a right eye image in the opposite direction based on the depth.
In addition, Japanese Unexamined Patent Application Publication No 2005-151534 discloses a method that calculates the feature value of the upper and lower portions of an image and adjusts a synthesis ratio of a plurality of scene structures representing depth information prepared in advance, thereby displaying an image through a combination of simple structures.
However, the above related art has the following problems.
According to the image conversion apparatus disclosed in Japanese Unexamined Patent Application Publication No 8-30806, the entire screen is simply shifted with respect to the still image or the image with small motion, and the context of a subject in the image may not be represented.
According to the image conversion apparatus disclosed in Japanese Unexamined Patent Application Publication No 10-51812, the pseudo-depth is estimated from the feature value of the image. However, since the estimation is based on the assumption that the sharpness, luminance and saturation of a subject located at the front of the screen are high, it may not be said that the estimation is right at all times. Since erroneous retinal disparity is applied to a subject for which depth estimation has been erroneously performed, the image may be disposed at an erroneous position.
According to the image conversion apparatus disclosed in Japanese Unexamined Patent Application Publication No 2005-151534, since the structure of the image is fitted to a relatively simple finite structure, unnatural depth is prevented from occurring. However, common problems occur in all the above-described related methods. That is, relatively large retinal disparity occurs in the generated binocular parallax image. The binocular parallax image is three-dimensionally displayed using a stereoscopic display apparatus. Generally, a stereoscopic display apparatus is used which allows a user to observe an image by using a special stereoscopic vision glasses. The stereoscopic display apparatus is classified into a passive glass type stereoscopic display apparatus, which separates an image observed by both eyes through polarizing filters or color filters, an active glass type stereoscopic display apparatus, which temporally separates an image to left and right images through a liquid crystal shutter, and the like.
In the case of seeing a binocular parallax image with a large retinal disparity, it is possible to perceive a stereoscopic effect due to retinal disparity when a user wears such stereoscopic vision glasses. However, when a user seeing an image after taking off the glasses, since the image is viewed as a double image in which left and right images largely overlap each other, the image may not be generally observed as a two-dimensional image. That is, the image converted by the existing image conversion apparatus can be really appreciated only when a user wears glasses.
Furthermore, the large retinal disparity is considered to have an influence on observer fatigue. For example, according to Japanese Unexamined Patent Application Publication No 6-194602, when a left eye image and a right eye image have been significantly shifted from each other, contradiction occurs in the control of an angle of convergence and the adjustment of the eye lens in the visibility in the real world, resulting in fatigue in stereoscopic vision using binocular parallax.
In addition, as a factor common to all the above related methods, a pixel shifting section most extensively used is employed in the generation method of the binocular parallax image corresponding to stereoscopic vision. However, when the binocular parallax image is generated through pixel shift, an area (i.e., an occlusion area) with no pixel information may occur.
The generation of an occlusion area when a left eye image and a right eye image are generated using a pixel shifting section will be described with reference to FIGS. 1A to 1D. FIGS. 1A to 1D illustrate an input image, depth information (a distance image), a right eye image and a left eye image, respectively.
The depth information (the distance image) of FIG. 1B is an image obtained by displaying distance information of the input image of FIG. 1A according to luminance, a high luminance area is a pixel part corresponding to a subject near a camera, and a low luminance area is a pixel part corresponding to a subject far from the camera.
The right eye image of FIG. 1C is generated by shifting a pixel part (a body area) at a close range of the input image of FIG. 1A in the left direction based on the depth information (the distance image) of FIG. 1B.
The left eye image of FIG. 1D is generated by shifting the pixel part (the body area) at the near distance of the input image of FIG. 1A in the right direction based on the depth information (the distance image) of FIG. 1B.
As illustrated in FIGS. 1A to 1D, the area (i.e., the occlusion area) with no pixel information occurs in the right eye image of FIG. 1C and the left eye image of FIG. 1D which are generated through the above pixel shift process.
In relation to the occlusion area generated in either or both of the two images of the binocular parallax image, since pixel information of the input image does not exist in the input image, it is necessary to perform a filling process using pixels existing in a (spatially) peripheral area. Japanese Unexamined Patent Application Publication No 2005-151534 discloses an example of an interpolation process using pixel information of a part corresponding to an input image. Furthermore, Vol. 56, No. 5, pp. 863 to 866 (2002. 5) of The Journal Of The Institute of Image Information And Television Engineers entitled “Disocclusion Based On The Texture Statistics Of The Image Segmented By The Region Competition Algorithm”, coauthored by Yamada Kunio, Mochiduchi Kenji, Aizawa Kiyoharu and Saito Takahiro. also discloses an example of an interpolation process. However, although these interpolation processes are used, unnaturalness such as stretching of an image may occur in at least one of the two images of the binocular parallax image.
Moreover, according to the image conversion apparatuses disclosed in Japanese Unexamined Patent Application Publication No 10-51812 and 2005-151534, the pseudo-depth is estimated from an image. However, it is difficult to detect detailed depth from one image. For example, it is not easy to perform depth estimation with respect to a fine structure of tree branches, an electric wire or hair.
In the case of generating the binocular parallax image through the pixel shift by using such depth information, parallax equivalent to a (spatially) peripheral area occurs in such fine subjects. Therefore, since the interval between the fine subjects and the background may not be set to a different depth effect, it may not be possible to allow the binocular parallax image to have a stereoscopic effect according to an actual subject distance.