In recent years, digital still cameras and digital camcorders using a solid-state imaging device (hereinafter, referred to also simply as an “imaging device”) such as a Charge Coupled Device (CCD) image sensor and a Complementary Metal Oxide Semiconductor (CMOS) image sensor have achieved remarkably higher functions and higher performance. In particular, with the advance of the semiconductor manufacturing technologies, pixel structures in such solid-state imaging devices have been further miniaturized.
As a result, higher integration of pixels and driving circuits in the solid-state imaging devices have been considered. Therefore, in a few years, the number of pixels in an imaging device has immensely been increased from about million pixels to ten million pixels or more. Furthermore, quality of images captured by imaging has also dramatically been improved.
In the meanwhile, flat-screen display apparatuses such as Liquid Crystal Displays (LCDs) and plasma displays can save space and display high-definition and high-contrast images. Such movement of improving image quality is expanding from two-dimensional (2D) images to 3D images. Recently, 3D display apparatuses, which can display high-quality 3D images by using polarization eyeglasses or eyeglasses with high-speed shutter, have been developed.
3D imaging apparatuses for generating high-quality 3D images or high-quality 3D video to be displayed by 3D display apparatuses have also been developed. For a simple method of generating 3D images and displaying them by a 3D display apparatus, it is considered that an image or video is captured by an imaging apparatus having two optical systems (two sets of a lens and an imaging device) located at two different positions. Images captured by the respective optical systems are provided as a left-eye image and a right-eye image to a 3D display apparatus. The 3D display apparatus displays the captured left-eye image and right-eye image by switching them at a high speed, so that a user wearing eyeglasses can perceive the images as a 3D image.
There is another method for generating a left-eye image and a right-eye image, by calculating depth information of a scene by an imaging system including a plurality of cameras, and using the depth information and texture information for the left-eye/right-eye image generation. There is still another method for generating a left-eye image and a right-eye image, by which depth information is calculated from a plurality of images captured by a single camera by varying geometric or optical conditions of a scene (such as a way of light exposure) or conditions of an optical system in an imaging apparatus (such as a diaphragm size).
One example of the above-described method using a plurality of cameras is a multi-baseline stereo method disclosed in Non-Patent Literature 1 by which a depth of each pixel is calculated by simultaneously using images captured by a plurality of cameras. It is known that this multi-baseline stereo method can estimate a scene depth with a higher accuracy than that of a general twin-lens stereo.
The following describes one example of the multi-baseline stereo method in the case where a left-eye image and a right-eye image (parallax images) are generated by using two cameras (a twin-lens stereo). A twin-lens stereo captures two images of a subject from different viewpoints by using two cameras, and extracts feature points from the respective captured images to determine a correspondence relationship between the feature points to find corresponding points. A distance between the found corresponding points is called a parallax. For example, regarding two images captured by the two cameras, if coordinates (x, y) of the corresponding feature points are (5, 10) and (10, 10), respectively, a parallax is 5. Here, assuming that the in cameras are arranged in parallel to each other, and “d” represents a parallax, “f” represents a focal distance between the two cameras, and “B” represents a distance (baseline) between the cameras, a distance from the cameras to the subject is calculated by following Equation 1.
                    [                  Math          .                                          ⁢          1                ]                                                            Z        =                              -            Bf                    d                                    (                  Equation          ⁢                                          ⁢          1                )            
If the distance between the two cameras is far, a feature point observed by one of the cameras may not be observed by the other camera. Even in such a case, the multi-baseline stereo method can use three or more cameras to reduce ambiguity in the corresponding point search, thereby reducing errors in parallax estimation.
If a depth is determined, it is possible to generate a left-eye image and a right-eye image by using the depth information and a scene texture as disclosed in Non-Patent Literature 2, for example. According to the method disclosed in Non-Patent Literature 2, based on the estimated depth and the scene texture obtained by the imaging apparatus, it is possible to generate images which are vertically captured from vertical camera positions (a vertical left-eye camera position and a vertical right-eye camera position) as new viewpoints. Thereby, it is possible to generate images having viewpoints different from those in actual capturing.
The images having the new viewpoints can be generated by following Equations 2. Here, the respective symbols are the same as those in Equation 1. “xc” represents x-coordinates of a camera for which a depth is calculated, and “xl” and “xr” represent x-coordinates of respective cameras at the newly-generated viewpoints. “xl” is x-coordinates of a (virtual) left-eye camera, and “xr” is x-coordinates of a (virtual) right-eye camera. “tx” represents a distance (baseline) between the virtual cameras.
                    [                  Math          .                                          ⁢          2                ]                                                                      xl          =                      xc            +                          txf                              2                ⁢                                                                  ⁢                Z                                                    ⁢                                  ⁢                  xr          =                      xc            -                          txf                              2                ⁢                                                                  ⁢                Z                                                                        (                  Equations          ⁢                                          ⁢          2                )            
As described above, if a depth is calculated by using a plurality of cameras, it is possible to generate a left-eye image and a right-eye image.
On the other hand, one example of the method in which conditions regarding a scene are varied to calculate a depth is the photometric stereo method disclosed in Non-Patent Literature 3. When a plurality of images generated by capturing a subject by varying positions of illumination are inputted, a 3D position of the subject is determined based on a 3D relationship between pixel values of the subject and the illumination positions. Furthermore, an example of the method of varying optical conditions of an imaging apparatus is the depth-from-defocus method disclosed in Non-Patent Literature 4. By this method, a distance (depth) from a camera to a subject can be calculated by using (a) a change amount in blur in each pixel in a plurality of images captured by varying a focal distance of the camera, (b) a focal distance of the camera, and (c) a diaphragm size (opening size) of the camera. As described above, various methods for determining scene depth information have been researched. In particular, the depth-from-defocus method has advantages of reducing a size and a weight of an imaging apparatus and not requiring other apparatuses such as an illumination apparatus.