Many mosaicing applications involve long image sequences taken by translating cameras scanning a long scene. Thus applications are known that include a video camera mounted on a vehicle scanning city streets [14,1,17,21,19], or a video camera mounted on a low altitude aircraft scanning a terrain [22]. Earlier versions of our work on ego-motion computation for sideways moving cameras were proposed in [16,15]. They had initialization and robustness problems that are addressed in this patent. In addition, they did not address the computation of dense depth maps and the creation of undistorted mosaics.
In [1,17] methods are described for creating a multi-perspective panorama. These methods recover camera motion using structure-from-motion [8], matching features between pairs of input images. Matched points are used to recover the camera parameters as well as a sparse cloud of 3D scene points, recovery that is much easier when fisheye lens are used as in [1]. Feature points as used in the above-described approaches will be preferred in clean, high contrast, and unambiguous imagery. However, direct methods may be preferred when feature points are rare, ambiguous, or noisy.
Image mosaicing can be regarded as a special case of creating a model of the observed scene. Having multiple images of the scene theoretically enables both the computation of camera parameters and the geometric and photometric structure of the scene. As the mosaicing process is much simpler than the creation of a scene model, it is likely to work in more cases. Mosaicing works especially well when long scenes are involved, with camera motion only in one direction. Even when a scene model has successfully been constructed, the generation of a very long panoramic image of the entire scene, having minimum distortion, is a challenging problem.
Also known in this field is X-Slits mosaicing [23] one of whose declared benefits is its reduced distortion compared to pushbroom projection. But for mosaics that are spatially very long, the X-Slits images become very close to pushbroom projection with its significant distortions. Attempts to reduce the distortion of the spatially long mosaic were presented in [17,18] using different X-Slits projections for different scene segments. Also relevant is [20], where a mosaic image is generated by minimizing a stitching cost using dynamic programming. Other papers on mosaicing of long scenes include [19,21], where long mosaics are generated from a narrow slit scanning a scene. In these papers the camera is assumed to move slowly in a roughly constant velocity, and the scene depth can be estimated from stationary blur. In [2] a long panorama is stitched from a sparse set of still images, mainly addressing stitching errors.
Panoramic images of long scenes, generated from images taken by a translating camera, are normally distorted compared to perspective images. When large image segments are used for stitching a panoramic image, each segment is perspective but the seams between images are apparent due to depth parallax. When narrow strips are used the panoramic image is seamless, but its projection is normally pushbroom, having aspect distortions. The distortions become very significant when the variations in scene depth are large compared to the distance from the camera.
US 2007/003034 (Wilburn et al.) [26] discloses an apparatus and method for video capture of a three-dimensional region of interest in a scene using an array of video cameras positioned for viewing the three-dimensional region of interest in the scene from their respective viewpoints. A triggering mechanism is provided for staggering the capture of a set of frames by the video cameras of the array. A processing unit combines and operates on the set of frames captured by the array of cameras to generate a new visual output, such as high-speed video or spatio-temporal structure and motion models, that has a synthetic viewpoint of the three-dimensional region of interest. The processing involves spatio-temporal interpolation for determining the synthetic viewpoint space-time trajectory. Wilburn et al. do not generate panoramic images, but only new perspective images. Also, all cameras in the array are synchronized, and combination is done only on a set of frames captured simultaneously.