1. Field of the Invention
This invention relates to computer systems, specifically to computer-aided image processing, and more specifically to the merging of images to form a composite image.
2. Description of the Related Art
Image capture devices, such as cameras, may be used to capture an image of a section of a view or scene, such as a section of the front of a house. The section of the view or scene whose image is captured by a camera is known as the field of view of the camera. Adjusting a lens associated with a camera may increase the field of view. However, there is a limit beyond which the field of view of the camera cannot be increased without compromising the quality, or “resolution”, of the captured image. Further, some scenes or views may be too large to capture as one image with a given camera at any setting. Thus, it is sometimes necessary to capture an image of a view that is larger than can be captured within the field of view of a camera. In these instances, multiple overlapping images of segments of the view or scene may be taken, and then these component images may be joined together, or merged, to form a composite image.
One type of composite image is known as a panoramic image. A panoramic image may have a rightmost and leftmost image that each overlap only one other image, or alternatively the images may complete 360°, where all images overlap at least two other images. In the simplest type of panoramic image, there is one row of images, with each image at most overlapping two other images. However, more complex composite images may be captured that have two or more rows of images; in these composite images, each image may potentially overlap more than two other images. For example, a motorized camera may be configured to scan a scene according to an M×N grid, capturing an image at each position in the grid. Other geometries of composite images may be captured.
Computer programs and algorithms exist for assembling a single composite image from multiple potentially overlapping component images. A general paradigm for automatic image stitching techniques is to first detect features in individual images; second, to establish feature correspondences and geometric relationships between pairs of images (pairwise stage); and third, to use the feature correspondences and geometric relationships between pairs of images found at the pairwise stage to infer the geometric relationship among all the images (multi-image stage).
Image stitching is thus a technique to combine and create images with large field of views. Feature-based stitching techniques are image stitching techniques that use point-correspondences, instead of image pixels directly, to estimate the geometric transformations between images. An alternative is intensity-based stitching techniques that use image pixels to infer the geometric transformations. Many image stitching implementations make assumptions that images are related either by 2D projective transformations or 3D rotations. However, there are other types of deformations in images that are not captured by the aforementioned two, for instance, lens distortions.
Panoramic image alignment is the problem of computing geometric relationships among a set of component images for the purpose of stitching the component images into a composite image. Feature-based techniques have been shown to be capable of handling large scene motions without initialization. Most feature-based methods are typically done in two stages: pairwise alignment and multi-image alignment. The pairwise stage starts from feature (point) correspondences, which are obtained through a separate feature extraction and feature matching process or stage, and returns an estimate of the alignment parameters and a set of point-correspondences that are consistent with the parameters. Various robust estimators or hypothesis testing frameworks may be used to handle outliers in point-correspondences.
The multi-image stage may use various techniques to further refine the alignment parameters, jointly over all the images, based on the consistent point-correspondences retained in the pairwise stage. It is known that the convergence of the multi-image stage depends on how good the initial guesses are. However, an equally important fact that is often overlooked is that the quality of the final result from the multi-image stage depends on the number of consistent point-correspondences retained in the pairwise stage. When the number of consistent point-correspondences is low, the multi-image alignment will still succeed, but the quality of the final result may be poor.
In the pairwise stage, it is commonly assumed that an imaging system satisfies an ideal pinhole model. As a result, many conventional methods only estimate either 3×3 homographies or “rotation+focal lengths”. However, real imaging systems have some amount of lens distortion. Moreover, wide-angle lenses that are commonly used for shooting panoramic images may introduce larger distortions than regular lenses. Modeling lens distortion is critical for obtaining high-quality alignment. It may appear that it is sufficient to model lens distortion at the multi-image alignment stage. This strategy may work if all the most correct correspondences are kept at the pairwise alignment. However, without modeling lens distortion at the pairwise stage, it may not be possible to retain all of the most correct correspondences. Among those most correct correspondences that may be rejected by the model without lens distortion, many may be ones close to image borders, because lens distortion effects are more pronounced for the points close to image borders than those close to image centers. Correspondences that have points close to image borders are, on the other hand, more important for estimating lens distortion, for the same reason that lens distortion effects are larger there. Losing them at the pairwise stage makes it difficult for the multi-image stage to correctly estimate lens distortion. As a result, misalignment may show up when images are stitched together, particularly along the image borders. Therefore, it is important to estimate the lens distortion jointly with other alignment parameters at the pairwise stage.
RANSAC
RANSAC is an exemplary robust estimator or hypothesis testing framework. RANSAC is an abbreviation for “RANdom SAmple Consensus”. RANSAC provides a hypothesis testing framework that may be used, for example, to estimate parameters of a mathematical model from a set of observed data which contains outliers.
EXIF
EXIF stands for Exchangeable Image File Format, and is a standard for storing interchange information in image files, especially those using Joint Photographic Experts Group (JPEG) compression. Most digital cameras now use the EXIF format. The format is part of the Design rule for Camera File system (DCF) standard created by Japan Electronics and Information Technology Industries Association (JEITA) to encourage interoperability between imaging devices.