1. Technical Field
The invention is related to image stitching, and in particular, to a technique for constructing a photorealistic mosaiced image from a series of images of a scene.
2. Related Art
A number of conventional techniques have been developed to generate composite or stitched images to create an overall mosaic of a scene. The basic idea in generating such mosaic images is to take a plurality of partially overlapping photographic or video images in order to cover the entirety of the desired viewing space. These images are then aligned, warped, if necessary, and composited into complete panoramic mosaic images.
In general, construction of these mosaic images is accomplished using any of a number of conventional image “mosaicing” or “stitching” algorithms that typically operate by identifying correspondences between the various images.
To identify correspondences between two or more images, conventional image stitching schemes operate to determine which points in one image correspond to the same physical points in another image of the same scene. The images are then warped to correspond the matching points to one another, and the images are merged or blended to construct the mosaic. Solving the correspondence problem to generate mosaic images is relatively simple with respect to top-down or “orthographic” images (such as satellite imagery). In such cases, the resulting mosaic image can be easily constructed without causing objectionable artifacts and warping.
Recently, satellite-based 2D imagery has become increasingly popular for browsing via the Internet. Some such browsing schemes provide mosaiced satellite imagery in combination with overlaid street maps for route planning. Related schemes use topological information to provide a simulated 3D viewing experience by projecting the 2D mosaiced imagery onto a coarse ground elevation model. However, buildings are not modeled by this approach, leading to confusing artifacts (especially with tall buildings) when viewing images from a different direction than the original images were taken from.
For some select cities, extruded polygonal building models are available. Further, detailed 3D models can also be obtained automatically using conventional light detection and ranging (LiDAR)-based techniques. Unfortunately, LiDAR-based techniques tend to be too noisy, and provide insufficient resolution for accurate texture mapping applications. Smoothing LiDAR models tends to be a difficult and computationally expensive problem. In either case, such techniques generally operate to first create an overall mosaic image from the set of input images, and then map the resulting mosaic image to the 3D model. Unfortunately regardless of the type of 3D model that is used, the problem of correctly aligning occlusion boundaries of buildings is not adequately addressed. As a result, the images resulting from such schemes tend to include misaligned textures and mismatching shadows from one surface to another. Further, mapping the mosaiced images to such models typically results in very unnatural angles and warping of objects such as buildings. In other words, the resulting mosaic images do not provide photorealistic views of the scene.
An alternative to the use of full 3D models is to provide oblique views of a scene (such as a set of low-altitude aerial views of the scene) which are then stitched to create an oblique mosaic image for a fixed number of orientations. Then, for a given oblique viewing direction, an orthographic view allows navigation of a large area by panning, much in the same way as with 2D top-down imagery such as stitched satellite imagery.
However, the problem of photorealistic image stitching is more complicated in the case of oblique imagery, especially in the case of images of tall buildings captured at an angle from a moving airplane. In this case, the correspondences between images can be identified to create a relatively seamless mosaic image. Unfortunately, the resulting mosaic image will tend to include a confusing jumble of tall buildings or other structures that all lean at different angles relative to each other because of the different oblique angles at which the various images were captured. In other words, the resulting mosaic images do not provide photorealistic views of the scene.
Some conventional stereo imaging applications attempt to address the problem of different image angles by labeling the pixels of input images to obtain a depth map. This depth information is then used in constructing mosaic images. However, as the depth information in such techniques corresponds to a set of view-parallel planes, slanted surfaces cannot be represented well and appear discretized. Again, the result tends to be a mosaic image that fails to provide a photorealistic view of the scene.
Several conventional mosaicing schemes have attempted to address such problems involves the use of “graph cuts” in combination with conventional stereo imaging applications. With such techniques, the typical goal is to assign a label to each pixel, where each label represents a certain depth. However, rather than assuming view-parallel planes, one such technique uses an approach where each pixel label represents an arbitrary plane, with the affine parameters of each plane being estimated in an expectation-maximization (EM) fashion. The orientations of various surfaces (such as building surfaces) are then estimated and used to warp the various images prior to image stitching. However, the focus of such schemes is on obtaining an accurate depth map of image pixels for use in constructing the resulting mosaic images. Consequently, such schemes tend to fare poorly when addressing the problem of constructing a photorealistic view of the scene.