Description of the Related Art
Image capture devices, such as cameras, may be used to capture an image of a section of a view or scene, such as a section of the front of a house. The section of the view or scene whose image is captured by a camera is known as the field of view of the camera. Adjusting a lens associated with a camera may increase the field of view. However, there is a limit beyond which the field of view of the camera cannot be increased without compromising the quality, or “resolution”, of the captured image. Further, some scenes or views may be too large to capture as one image with a given camera at any setting. Thus, it is sometimes necessary to capture an image of a view that is larger than can be captured within the field of view of a camera. In these instances, multiple overlapping images of segments of the view or scene may be taken, and then these component images may be joined together, or merged, to form a composite image.
One type of composite image is known as a panoramic image. A panoramic image may have a rightmost and leftmost image that each overlap only one other image, or alternatively the images may complete 360°, where all images overlap at least two other images. In the simplest type of panoramic image, there is one row of images, with each image at most overlapping two other images. However, more complex composite images may be captured that have two or more rows of images; in these composite images, each image may potentially overlap more than two other images. For example, a motorized camera may be configured to scan a scene according to an M×N grid, capturing an image at each position in the grid. Other geometries of composite images may be captured.
Computer programs and algorithms exist for assembling a single composite image from multiple potentially overlapping component images. A general paradigm for automatic image stitching techniques is to first detect features in individual images; second, to establish feature correspondences and geometric relationships between pairs of images (pair-wise stage); and third, to use the feature correspondences and geometric relationships between pairs of images found at the pair-wise stage to infer the geometric relationship among all the images (multi-image stage).
Panoramic image stitching is thus a technique to combine and create images with large field of views. Feature-based image stitching techniques are image stitching techniques that use point-correspondences, instead of image pixels directly, to estimate the geometric transformations between images. An alternative to feature-based image stitching techniques is intensity-based stitching techniques that use image pixels to infer the geometric transformations. Many image stitching implementations make assumptions that images are related either by 2D projective transformations or 3D rotations. However, there are other types of deformations in images that are not captured by the aforementioned two, for instance, lens distortions.
Panoramic image alignment is the problem of computing geometric relationships among a set of component images for the purpose of stitching the component images into a composite image. Feature-based techniques have been shown to be capable of handling large scene motions without initialization. Most feature-based methods are typically done in two stages: pair-wise alignment and multi-image alignment. The pair-wise stage starts from feature (point) correspondences, which are obtained through a separate feature extraction and feature matching process or stage, and returns an estimate of the alignment parameters and a set of point-correspondences that are consistent with the parameters. Various robust estimators or hypothesis testing frameworks may be used to handle outliers in point-correspondences.
The multi-image stage may use various techniques to further refine the alignment parameters, jointly over all the images, based on the consistent point-correspondences retained in the pair-wise stage. It is known that the convergence of the multi-image stage depends on how good the initial guesses are. However, an equally important fact that is often overlooked is that the quality of the final result from the multi-image stage depends on the number of consistent point-correspondences retained in the pair-wise stage. When the number of consistent point-correspondences is low, the multi-image alignment will still succeed, but the quality of the final result may be poor.
In the pair-wise stage, it is commonly assumed that an imaging system satisfies an ideal pinhole model. As a result, many conventional methods only estimate either 3×3 homographies or “rotation+focal lengths”. However, real imaging systems have some amount of lens distortion. Moreover, wide-angle and “fisheye” lenses that are commonly used for shooting panoramic images tend to introduce larger distortions than regular lenses. Modeling lens distortion is critical for obtaining high-quality image alignment.
Radially symmetric distortion, or simply radial distortion, is a particular type of image distortion that may be seen in captured images, for example as a result of the optical characteristics of lenses in conventional film and digital cameras. In addition to radial distortion being introduced into images by lenses during image capture, radial distortion may be applied as an effect to either natural images (images of the “real world” captured with a conventional or digital camera) or synthetic images (e.g., computer-generated, or digitally synthesized, images). Radial distortion may be classified into two types: barrel distortion and pincushion distortion. FIG. 1A illustrates barrel distortion, and FIG. 1B illustrates pincushion distortion. Note that barrel distortion is typically associated with wide-angle and fisheye lenses, and pincushion distortion is typically associated with long-range or telescopic lenses.
In digital image processing, an unwarping process renders an image with little or no radial distortion from an image with radial distortion. FIG. 2A illustrates an unwarping process 202 rendering an image with little or no distortion 200B from an input image with barrel distortion 200A. FIG. 2B illustrates an unwarping process 202 rendering an image with little or no distortion 200D from an input image with pincushion distortion 200C. Note that the images in FIGS. 2A and 2B may be images digitized from photographs or negatives captured with a conventional camera, images captured with a digital camera, digitally synthesized images, composite images from two or more sources, or in general images from any source.
Conventionally, in digital image processing, unwarping 202 of radially distorted images has been performed using a two-dimensional (2-D) sampling process. For example, in a conventional unwarping process, a grid may be set in the output image (the image without radial distortion). For each point in the grid, a corresponding location is found in the input image (the image with radial distortion) by applying a distortion equation. Since this location may not have integral coordinates, 2-D interpolation may be used to obtain the color/intensity value for the corresponding pixel.
As mentioned above, panoramic image alignment is the process of computing geometric relationships among a set of component images for the purpose of stitching the component images into a composite image. A problem in panoramic image stitching is how to register or align images with excessive distortion, such as images taken with wide-angle or fisheye lenses. Because of the large amount of distortion, conventional alignment workflows, including those modeling lens distortion, do not work well on such images. Another problem is how to efficiently unwarp the distorted images so that they can be stitched together to form a new image, such as a panorama.
A conventional method for aligning and unwarping images with excessive distortion is to unwarp the images with a pre-determined function onto a flat plane and then register the unwarped rectilinear version of the image using regular plane-projection based alignment algorithms. There are problems with this approach. For example, for images with a large amount of distortion such as images captured with fisheye lenses, the unwarped images tend to be excessively large. In addition, for images captured with some fisheye lenses, it is not even possible to unwarp an entire image to a flat plane because the field-of-view is larger than 180 degrees, and thus some sacrifices may have to be made.
As another example of problems with conventional methods for aligning and unwarping images with excessive distortion, the pre-determined unwarping functions may only do a visually acceptable job for unwarping images. Visually, the unwarped images may appear rectilinear. However, the images may not in fact be 100% rectilinear. The reason is that the pre-determined unwarping functions are conventionally obtained based on some standard configurations and are not adapted to the particular combination of camera and lens used to capture the image. Thus, conventional unwarping functions are not exact, and thus may introduce error in alignment and stitching.
Furthermore, rectilinear images generated by conventional unwarping algorithms may suffer from aliasing. Aliasing refers to a distortion or artifact that is caused by a signal being sampled and reconstructed as an alias of the original signal. An example of image aliasing is the Moiré pattern that may be observed in a poorly pixelized image of a brick wall. Conventional unwarping algorithms, which perform interpolation in 2-D space, may by so doing introduce aliasing artifacts into the output images. The aliasing artifacts may be another source of error in alignment and stitching.
In addition to the above, conventional unwarping algorithms are not very efficient. The distortion equation has to be solved for each point in the image. In addition, interpolation is done in two-dimensional (2-D) space, which is inefficient when sophisticated interpolation algorithms such as cubic interpolation are used.
Another conventional method for aligning and unwarping images with excessive distortion is to compute the unwarping function and alignment model all in the one step. This may yield better results. However, a problem with this method is that it is hard to optimize both the unwarping function and the alignment model because of the excessive distortion in images. There also may need to be a custom version of the code for each different combination of an unwarping function and an alignment model.
“Adobe”, “Camera RAW”, “Photoshop”, and “XMP” are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States and/or other countries.