1. Field of the Invention
The present invention relates to an image-capturing apparatus that synthesizes a virtual viewpoint image using images captured by a plurality of image-capturing units.
2. Description of the Related Art
In the past, a digital camera has had limited ranges of obtainable resolution, bokeh (blur), angle of view, viewpoint and the like of an image, depending on characteristics and arrangement of an optical system or image-capturing element of the camera used for capturing image. It is more convenient for a user if an image conforming to various conditions can be acquired by one apparatus, and an image-capturing apparatus that has such characteristics is always demanded.
There is a technique to cope with the demand like this, in which images captured by a plurality of cameras having different viewpoints are used to synthesize an image which is as if captured by one virtual camera, and thereby the image overcoming the above limits in each of the cameras can be acquired. The synthesized image like this as if captured by the virtual camera is called a virtual viewpoint image. In Steven J. Gortler et al., “The lumigraph”, SIGGRAPH 96, pp 43-52, (1996) (hereinafter referred to as “Steven”), a method is disclosed in which images captured by a plurality of cameras are used to synthesize an image which is as if captured by a camera having a virtual viewpoint. Further, in A. Isaksen et. al., “Dynamically Reparameterized Light Fields”, ACM SIGGRAPH, pp. 297-306 (2000) (hereinafter referred to as “Isaksen”), a method is disclosed in which images captured by a plurality of cameras having a focus closing to pan focuses is used to synthesize an image which is as if captured by a camera having a certain size of aperture.
On the other hand, in a motion picture capturing apparatus such as a video camera, a frame rate of an obtainable video is also an important factor that characterizes the image-capturing apparatus. In Wilburn et. al., “High-Speed Videography Using a Dense Camera Array”, CVPR'04 (hereinafter referred to as “Wilburn”), a technique is disclosed in which image-capturing is performed by a plurality of cameras with timing being shifted, and on the assumption that a subject is on a plane at a fixed distance from the cameras, geometric transform is performed to synthesize a video having a frame rate which exceeds performance of each of the cameras.
There can be considered that in the motion picture capturing system that generates the virtual viewpoint image using the plurality of cameras based on the techniques such as Steven and Isaksen, the image-capturing is performed with the timing being shifted as the technique of Wilburn, leading to improvement of the frame rate performance. The techniques such as Steven and Isaksen are based on the assumption that a position and orientation of the camera are given. For this reason, in the case where image-capturing is performed with timing being shifted as in the technique of Wilburn by use of handheld cameras or the like with positions and orientations of the cameras varied, a relationship in position and orientation between the cameras needs to be estimated. The relationship in position and orientation between the cameras can be estimated by calculating a fundamental matrix in the projective geometry; however, there is a problem of an error in corresponding point detection between images or deterioration in estimation accuracy due to the presence of a moving subject.