Digital cameras are becoming increasingly popular and as a result, a demand for image processing software that allows photographers to edit digital images exists. In many instances, it is difficult or impossible for a photographer to capture a desired entire scene within a digital image and retain the desired quality and zoom. As a result, photographers are often required to take a series of overlapping images of a scene and then stitch the overlapping images together to form a panorama.
Many techniques for creating a panorama from a series of overlapping images have been considered. For example, U.S. Pat. No. 6,549,651 to Xiong et al. discloses a method of aligning images through projective registration and subsequent calibration. Overlapping images are registered projectively using a gradient-based optimization method in combination with a correlation-based linear search. Registration is performed by comparing overlapping areas between overlapping images at certain predetermined resolution levels on a Gaussian pyramid representing the overlapping images. Different combinations of overlap are tried to achieve an optimal overlap which generally minimizes the average brightness and contrast difference with respect to certain transformation parameters. After local registration, a global optimization is used to remove inconsistencies. During the global optimization phase, errors determined from cyclically overlapping sets of images is minimized across all image pairs. Prior to registration, internal and external camera parameters are input into the transformation matrices by a computer assuming default values, or manually with user input.
It is often desirable to obtain a three-dimensional model from a set of two-dimensional images that form a panorama. Such a model is useful where it is needed to map a 360-degree panorama onto a single two-dimensional image space. Simple image stitching in this instance is not possible because some of the images will have been captured behind the chosen point of view of the two-dimensional image. As such, rotational and image focal length information is desirable. However, such three-dimensional information is lost when a single image is captured in two-dimensions.
Approaches to retaining or extracting three-dimensional information for such purposes have been proposed. For example, U.S. Pat. No. 6,157,747 to Szeliski et al. discloses a method of aligning a set of overlapping images to create a mosaic. A difference error between a first image and an adjacent image is determined and an incremental rotation of the adjacent image relative to a three dimensional coordinate system is computed through an incremental angle which tends to reduce the difference error. The focal length for each image is first estimated by deducing the value from one or more perspective transforms computed using an eight-parameter method. A three-parameter rotational model is employed in order to directly estimate the three-dimensional rotation matrix. Once the three-parameter rotational model has been used to estimate the 3D rotation matrices, a global optimization step is performed. The global optimization uses a patch-based alignment whereby patches of each image in a pair are determined to be matches and the distance between all identified pairs in the set are simultaneously minimized.
U.S. Pat. No. 5,920,657 to Bender et al. discloses a method of creating a high resolution still image using a number of images of a scene having different focal lengths. Camera motion is recovered by modeling the change between successive image frames due to camera zoom as a velocity of portions of the image in the horizontal, vertical and scale directions. Once the velocity between image frames for matching image portions is determined in each of these three directions, the value for any pixel in any image frame can be warped to a corresponding location in an image frame of a different focal length. The method is directed largely to warping images of the same scene to different scales, but is said to be applicable for creating a single panorama still from a series of “pan and jib” shots. In this situation, it is assumed that each image frame is produced at the same focal length.
U.S. Pat. No. 6,249,616 to Hashimoto discloses a method for combining images. Three-dimensional relationships between images are determined and an output image is created by combining the input images in accordance with the determined 3D relationships. Overlap is estimated by cross-correlation in the spatial or frequency domain, or by corresponding Laplacian pyramid levels of the input images. The translation parameters found during determination of overlap are converted into rotational parameters using an initial estimate of focal length. The rotational parameter equations are solved at each level of a Gaussian pyramid for each pair of overlapping images by determining the least squared difference between the intensity data of pixels of the overlapping portions in a remapped input image and the target image. The transformations are made globally consistent by treating the pair wise transformations as estimates and characterizing the distribution of the estimates as a covariance matrix. The covariance matrix is then used as a measure of the relative confidence in the estimate of each parameter value, and is adjusted to change high confidence estimates less than low confidence estimates.
Although the above references disclose techniques for estimating camera position information from two-dimensional images, improvements are desired. It is therefore an object of the present invention to provide a novel method, apparatus and computer program for estimating a three-dimensional model and thereby camera position from a set of two-dimensional images that combine to form a panorama.