1. Technical Field
The invention is related to construction of mosaiced panoramic images, and in particular, to a technique for building a “match” structure of image correspondences which is optimized to align the images for “registering” large numbers of images into an underlying panoramic representation having static and/or dynamic components.
2. Related Art
In general, a mosaic panoramic image is an image that has been expanded in one or more dimensions by combining multiple overlapping image frames to generate the panorama. As a result, the panoramic image is typically larger than the size that a conventional image capture device (such as a digital still or video camera) is capable of capturing in a single image frame.
A number of conventional techniques have been developed to generate such panoramic images. For example, the basic approach is to simply take a plurality of regular photographic or video images in order to cover the entirety of the desired viewing space. These images are then aligned and composited into complete panoramic images using any of a number of conventional image “mosaicing” or “stitching” algorithms.
For example, one conventional mosaicing scheme has demonstrated how panoramas contained in a set of unordered images can be automatically identified and stitched. These techniques allow a user to take a number of images with a still camera, identify clusters of images which came from the same panorama, and then stitch each cluster into a panorama. For example, one such scheme first compares “point features” within pairs of images to generate a mesh of matches. A connected component approach is used to identify separate panoramas. Images in each cluster are then aligned by minimizing measurement errors. Lastly, images in each cluster are warped onto a compositing surface and blended to generate the output panorama.
When applied to relatively small data sets, such as a cluster of images captured by a digital still camera, many conventional mosaicing schemes are capable of rapidly stitching related image frames to generate mosaic images. However, when data sets become very large, the generation of mosaics becomes a more complicated process. For example, with a still camera, users typically only take a few or at a few dozen images to create a panorama. However, with a video camera, it is easy to generate thousands of images over a very short period. For example, at a typically video speed of only 30 frames per second, a total of 1800 image frames will be generated in only one minute.
Unfortunately, the matching techniques performed by many mosaicing schemes are often impractical for matching such large numbers of images for generating a single mosaic panorama for those images. Consequently, work has been done to provide efficient techniques for stitching or mosaicing larger data sets such as video recordings. While the idea of stitching panoramas from videos is not new, the way in which such mosaics are constructed, the underlying camera motion models employed, and the details of the algorithms vary considerably.
For example, one conventional scheme for generating mosaic images from video sequences uses an affine motion model for an image resolution enhancement application as a function of received video frames. Another conventional video mosaicing scheme uses an eight parameter perspective transformation model. Yet another conventional video mosaicing scheme uses a “manifold projection” approach in combination with a simple translation only camera motion model. This approach results in a fast algorithm for video stitching in which narrow strips of pixels from the underlying scene are used to form a composite panoramic image. Further, the use of a translation only camera model avoids the necessity of computing more complex 3D camera motions as do a number of conventional mosaicing schemes.
Other conventional video stitching schemes operate by initially only stitching together adjacent frames of the video sequence, thereby making the matching problem linear in the number of images. However, such techniques ignore matches due to the camera crossing back over its path. By not including these matches, components of the panorama can drift due to error accumulation. However, some conventional mosaicing schemes partially compensate for this problem by interleaving the matching process and alignment process. Specifically, after each new image is aligned to its temporal neighbor, spatial neighbors are identified and used to refine the orientation estimate of the new image.
A similar interleaved matching strategy was employed by another conventional mosaicing scheme which generates image structure from motion analysis. In other conventional structure from motion work, a strategy for frame “decimation” is presented for the extraction of structure and motion from hand held video sequences. This strategy identifies “unnecessary” frames for removal from subsequent computation by using a sharpness measure to order frames for consideration for removal as a function of a global motion model. A threshold based on a motion correlation coefficient is then employed. This strategy is employed as a preprocessing step within a larger system for structure from motion in which a tree of trifocal tensors is used. However, one problem with such schemes is that data is lost whenever “unnecessary” image frames are discarded.
Another conventional video mosaicing scheme generally operates by interleaving image matching and orientation estimation steps by making the assumption that temporally adjacent images are spatially adjacent. This scheme also makes the assumption that any loops in the camera path are small enough that accumulated error, or drift, can be ignored. However, such assumptions can be considered overly restrictive in that they constrain the ability of users to capture video recordings. Further, this scheme does not directly handle breaks in the matching, as would occur with multiple videos of the same panorama. In addition, such interleaving of the matching and alignment requires that the images be aligned in the same order as the video.
Another problem with the application of automatic high-quality stitching to video sequences is the relatively high computational costs associated with stitching large numbers of images, and the resulting simplifications in motion models or restrictive assumptions required to make such algorithms run in a reasonable time. Existing methods for constructing large panoramas in a “batch” fashion from static images can be quite robust. However, they are typically not sufficiently efficient for aligning and stitching all the frames of a high quality video sequence in a reasonable amount of time. While fast techniques do exist for stitching video, such methods typically use more restricted motion models and produce final panoramic representations that are less accurate than static image-based batch processing approaches.
At least one conventional video mosaicing scheme partially addresses some of these issues. For example, one conventional scheme, referred to as “VideoBrush™,” provides a near-real-time preview of a panoramic image constructed from images captured by a video camera. In general, the VideoBrush™ system provides techniques for 1D and 2D video mosaicing using parametric alignment which includes videos captured with an approximately fixed camera location, or an arbitrarily moving camera capturing an approximately planar scene. As a result, users are constrained in how they capture videos for use by this conventional technique.
Another problem with many conventional mosaicing schemes is that they operate using post-hoc processing of recorded images after the entire set of images has been captured. As a result, one problem is that the user never really knows for sure that sufficient coverage of the scene has been achieved to ensure that the desired panorama can be constructed from the set of captured image frames until the panorama has actually been generated at some later time. As a consequence it can be hard for users to see “the big picture.”
In particular, using conventional image stitching schemes, it is not until the images are uploaded from the camera to the computing device (such as a PC-type computer, or the like) that users find that the resulting panorama is flawed. For example, gaps will occur in the panorama if users miss one or more spots. Gaps will also occur if the stitching program is unable to insert one or more of the captured images into the panorama, e.g., due to too little overlap between pictures, due to a lack of texture, due to problems with image focus (image too blurry), etc. Further, while gaps at the edge of a panorama can be removed by cropping the panorama, this may cause other desired image elements to be lost outside the cropping boundaries. In the worst case, what was intended to be a single panorama can end up as multiple unconnected (and odd-shaped) pieces of the overall image.
Yet another problem with conventional post-processing approaches involves the issue of “ghosting” where objects have moved from one frame to the next while the images for the panorama were being taken. While users may be able to identify flaws within a picture (e.g., out of focus) using the view finder, and retake it, flaws between photos, such as ghosting or differences in lighting etc. are less likely to be discovered while shooting. Unfortunately, by the time users notice such flaws, it is typically too late to retake the missing or flawed image frames.
Users can reduce the risk of flaws such as gaps and ghosting by using a larger overlap between images and by taking multiple copies of areas suspected of being subject to ghosting. This redundancy-based approach, however, is costly is terms of time and especially in terms of storage space (as more pictures are taken) and it still cannot guarantee successful generation of the resulting panoramic images.