1. Field of the Invention
The present invention relates to a stereoscopic image generation method and a stereoscopic image generation system for generating a stereoscopic image that allows the viewer of the image to perceive a stereoscopic effect due to parallax.
2. Description of the Related Art
In recent years, binocular parallax stereoscopic images that allow the viewers to perceive a stereoscopic effect by causing them to view different images with the left and right eyes respectively are being widely used in the field of movies and television and other fields. A technique for multi-view stereoscopic images in which images observable by the viewers are changed according to viewing angles to allow the viewers to perceive a stereoscopic effect is also being used for, for example, naked-eye stereoscopic devices. In addition, multi-view parallax stereoscopic images in which the binocular parallax method and the multi-view method are combined are being put to practical use. A parallax stereoscopic image is composed of a right-eye image presented to the right eye and a left-eye image presented to the left eye. The positions of the subjects in these images are shifted in a horizontal direction according to the binocular parallax of the human eyes to allow the viewer (observer) of the images to perceive a stereoscopic effect.
A conventional parallax stereoscopic image is generally generated by taking a right-eye image and a left-eye image simultaneously using two cameras arranged in a left-right direction. With this method, a right-eye image and a left-eye image with a parallax substantially similar to the binocular parallax of the human eyes can be directly obtained. Therefore, a natural stereoscopic image that does not cause the viewer to have an uncomfortable feeling can be generated.
However, with the method in which two cameras are used to take a right-eye image and a left-eye image, the two cameras must have the same specifications and be aligned correctly, and the images must be taken with the cameras perfectly synchronized with each other. Therefore, when the images are taken, specialized staff and a large number of specialized devices are required. This causes not only an increase in image cost but also a problem in that a large amount of time is required to set up and adjust the cameras and other devices.
A conventional multi-view stereoscopic image is generally generated by taking multi-view images simultaneously using a large number of cameras arranged at different viewpoints. However, the method in which a plurality of cameras are used to take multi-view images has a problem in that the plurality of cameras must have the same specifications and be aligned correctly and that the images must be taken with all the cameras synchronized with each other.
Particularly, to generate a multi-view parallax stereoscopic image, two cameras must be provided for each of different viewpoints so that images with parallax are taken. Therefore, such a multi-view parallax stereoscopic image is far from widespread use, unless there is a very specific purpose.
One technique proposed to address the above issues is to subject an image normally taken using a single camera to image processing to generate binocular parallax right-eye and left-eye images (see, for example, Japanese Patent Application Laid-Open No. 2002-123842). In this technique, first, depth information (a depth value) is set for each of pixels constituting an original image, and the horizontal positions of the pixels are changed according to the depth information to generate right-eye and left-eye images in which the positions of subjects in these images have been shifted according to binocular parallax.
With this technique, a stereoscopic image can be generated from a normal original image taken using a commonly used camera, and therefore photographing cost and photographing time can be reduced. In addition, stereoscopic images can be generated from existing movie and other contents, and general television programs can be converted to stereoscopic images and displayed on a television screen.
However, in the conventional method of generating a stereoscopic image from a normal original image, the value of the depth information varies across the boundary between, for example, a human, or a subject, and a background, and this causes a problem in that a depth discontinuity occurs.
If such a depth discontinuity occurs, an unnatural stereoscopic effect such as a so-called cardboard effect in which only the distance between a human or the like and a background is emphasized to cause the human image to be monotonous is perceived. In addition, when the positions of the pixels in the right-eye and left-eye images are changed, the amounts of movement of pixels contained in the human or the like are largely different from that of pixels contained in the background. Therefore, a large gap (loss) is formed in a part of the background that, in the original image, is covered with the human or the like.
In some conventional methods, to avoid such a gap, blurring processing is performed on boundary portions, or the image of a human or the like or a background is enlarged or deformed. However, such processing may rather cause the viewer to have an uncomfortable feeling due to the boundary portions where parallax is not given. Also, with such boundary processing, the quality of the stereoscopic deteriorates disadvantageously. In addition, the problem with the blurring processing and the enlarging-deforming processing is that the operational load on the operator who performs such processing on the stereoscopic image using software increases. This results in a problem in that the amount of processing work of the operator becomes enormous when a multi-view or multi-view parallax stereoscopic image is generated from an original image.
In the conventional method, the original value of the hue, chroma, or brightness of each of the pixels constituting the original image (the chroma in Japanese Patent Application Laid-Open No. 2002-123842 above) is generally used as the depth information for each of the pixels. Therefore, the value of the depth information varies significantly across the boundary between a human being a subject or the like and a background. This causes a problem in that the depth discontinuity is likely to be emphasized.
Original images contain various elements such as the intention of the producer and a story. In such a case, it is important to emphasize an important subject that the producer wants the viewers to pay much attention and to emphasize a focused region in an original image. In contrast, it is important to make adjustments such that unimportant regions and blurred regions are not emphasized. However, in the conventional method, depth information is routinely computed over the entire area of an original image, and the computed depth information is used as is. Therefore, one problem with the conventional method is that it is difficult to reflect the intention of the producer in a stereoscopic manner.