Recent years have seen remarkable improvement in function and performance of digital still cameras or digital video cameras which use solid-state imaging devices (which may hereinafter be referred to simply as “imaging device”) such as charge coupled device (CCD) image sensors or complementary metal oxide semiconductor (CMOS) image sensors. Particularly, along with the development of semiconductor manufacturing technology, the solid-state imaging devices have finer and finer pixel structures.
This results in an attempt to improve the integration of pixels and drive circuits in solid-state imaging devices. Thus, the number of pixels in an imaging device has increased significantly from approximately one million pixels to ten million pixels or more within a small number of years. Furthermore, the quality of images obtained through an imaging process has been dramatically enhanced.
Meanwhile, thin display apparatuses such as liquid-crystal displays or plasma displays have enabled display of high-definition and high-contrast images while taking up little space. Such a tendency of improvement in image quality is now expanding from two-dimensional images to three-dimensional images. Nowadays, a three-dimensional display apparatus has begun to be developed which displays high-quality three-dimensional images using a pair of polarized glasses or high-speed shutter glasses.
Development of a three-dimensional imaging apparatus for obtaining high-quality three-dimensional images or video to be displayed on the three-dimensional display apparatus has also been advancing. A conceivable simple method of obtaining three-dimensional images and displaying them on the three-dimensional display apparatus is capturing images or video by means of an imaging apparatus which includes two optical systems (a lens and an imaging device) different in position. The images captured using the respective optical systems are input to the three-dimensional display apparatus as a left-eye image and a right-eye image. With the three-dimensional display apparatus switching fast between and displaying the captured left-eye and right-eye images, a user wearing a pair of glasses can be given stereoscopic vision from the three-dimensional images.
There is a method of generating a left-eye image and a right-eye image based on texture information and depth information of a scene which is calculated using an imaging system including a plurality of cameras. There is also a method of generating a left-eye image and a right-eye image by calculating depth information based on a plurality of images captured by one camera in different scene conditions or different conditions of an optical system in an imaging apparatus.
The former method includes a multi-baseline stereo method disclosed by Non Patent Literature (NPL) 1 in which images captured by a number of cameras are utilized at the same time to determine a depth of each pixel. This multi-baseline stereo method is known for accurate depth estimation of a scene compared to commonly-used twin-lens stereo.
As an example, a method of generating a left-eye image and a right-eye image (disparity images) using two cameras (twin-lens stereo) is described. In the case of twin-lens stereo, two images are captured from viewpoints different from each other using two cameras, then, from each of the captured images, a feature point is extracted, and a feature-to-feature correspondence relationship is determined to specify corresponding points. The distance between the corresponding points thus specified is referred to as a disparity. For example, when coordinates (x, y) of corresponding feature points of two images captured with two cameras are (5, 10) and (10, 10), the disparity is 5. Assume here that the cameras are placed in parallel, the distance from the cameras to an object is determined by (Expression 1) where d is a disparity, f is a focal length of the two cameras, and B is a distance between the cameras (i.e., baseline).
                    [                  MATH          .                                          ⁢          1                ]                                                            Z        =                              -            Bf                    d                                    (                  Expression          ⁢                                          ⁢          1                )            
An increased distance between the two cameras may lead to a failure of one of the cameras to observe a feature point observed with the other camera. Even in such a case, the disparity estimation error is reduced in the multi-baseline stereo method because three or more cameras are used, which allows reduction in ambiguity of the search for corresponding points.
Once the depth is determined, it becomes possible to generate a left-eye image and a right-eye image using depth information and scene texture as in the method disclosed by Non Patent Literature (NPL) 2, for example. According to the method disclosed by NPL 2, an image whose viewpoint is at a new position, that is, a virtual camera position (including a left-eye camera position and a right-eye camera position) can be generated using the estimated depth and the scene texture obtained form the imaging apparatus. By doing so, it is possible to obtain an image from a viewpoint different from a viewpoint of the shooting.
The image from the new viewpoint can be generated using (Expression 2). This expression uses the same denotation as (Expression 1). Assume that xc is the x-coordinate of the camera from which the depth was determined, and xl and xr are each the x-coordinate of the camera located at the viewpoint position which is newly generated. Here, xl and xr are the x-coordinate of the left-eye camera (virtual camera) and the x-coordinate of the right-eye camera (virtual camera), respectively. The distance between the virtual cameras (i.e., baseline) is denoted by tx.
                    [                  MATH          .                                          ⁢          2                ]                                                                                  x            ⁢                                                  ⁢            l                    =                                    x              ⁢                                                          ⁢              c                        +                          txf                              2                ⁢                Z                                                    ⁢                                  ⁢                  xr          =                                    x              ⁢                                                          ⁢              c                        -                          txf                              2                ⁢                Z                                                                        (                  Expression          ⁢                                          ⁢          2                )            
Thus, it is possible to generate a left-eye image and a right-eye image by calculating the depth using a plurality of cameras.
The latter depth calculation method includes photometric stereo disclosed by Non Patent Literature (NPL) 3, as a method of changing a scene-related condition. When a plurality of images of an object captured with lighting at different positions is input, a three-dimensional position of the object is determined by the three-dimensional relationship between the pixel values of the object and the positions of the lighting. In addition, a method of changing an optical condition of an imaging apparatus includes a depth from defocus method disclosed by Non Patent Literature (NPL) 4. In this method, the distance (depth) from the camera to the object can be determined using a change amount of blur at each pixel in a plurality of images captured with a camera with different focal lengths, the focal lengths of the camera, and a size of an aperture (an opening size). Thus, various methods of obtaining three-dimensional information about a scene have been long studied.
Using left-eye and right-eye images generated using the depth information obtained in the above-described method, it is possible to display three-dimensional images. Particularly, three-dimensional display has recently become possible even on home-use liquid crystal displays or plasma displays. Furthermore, in consumer application, capturing and displaying three-dimensional images have also become possible.
By capturing images or video using a three-dimensional imaging apparatus, it is possible to obtain depth information of an object. Thus, when the images or video captured with the three-dimensional imaging apparatus are displayed through a three-dimensional display apparatus, it is possible to display images or video which achieves a stereoscopic effect as well as looks real. However, especially a photographer using a consumer three-dimensional imaging apparatus has no skill or know-how for capturing images or video which achieves a stereoscopic effect. The images or video captured by such a photographer barely achieves a stereoscopic effect, resulting in little merit as three-dimensional images.
Meanwhile, as a process related to a distance (depth) of a scene, a method of changing a resolution allocated to a distance has been proposed (see NPLs 1 to 5, for example). The techniques disclosed by these NPLs 1 to 5 allow reduction in the amount of three-dimensional data by changing an allocation of a resolution with respect to a distance.