Until recently, image processing systems have generally processed images, such as frames of video, still photographs, and the like, on an individual, image-by-image basis. Each individual frame or photograph is typically processed by filtering, warping, and applying various parametric transformations. In order to form a panoramic view of the scene, the individual images are combined to form a two-dimensional mosaic, i.e., an image that contains a plurality of individual images. Additional image processing is performed on the mosaic to ensure that the seams between the images are invisible such that the mosaic looks like a single large image.
The alignment of the images and the additional processing to remove seams is typically accomplished manually by a technician using a computer workstation, i.e., the image alignment and combination processes are computer aided. In such computer aided image processing systems, the technician manually selects processed images, manually aligns those images, and a computer applies various image combining processes to the images to remove any seams or gaps between the images. Manipulation of the images is typically accomplished using various computer input devices such as a mouse, trackball, keyboard and the like. Since manual mosaic generation is costly, those skilled in the art have developed automated systems for generating image mosaics.
In automated systems for constructing mosaics, the information within a mosaic is generally expressed as two-dimensional motion fields. The motion is represented as a planar motion field, e.g., an affine or projective motion field. Such a system is disclosed in U.S. patent application Ser. No. 08/339,491, entitled "Mosaic Based Image Processing System", filed Nov. 14, 1994 now U.S. Pat. No. 5,649, 032, and herein incorporated by reference. The image processing approach disclosed in the '491 application automatically combines multiple image frames into one or more two-dimensional mosaics. However, that system does not account for parallax motion that may cause errors in the displacement fields representing motion in the mosaic.
In other types of image processing systems, multiple images are analyzed in order to recover photogrammatic information such as relative orientation estimation, range map recovery and the like without generating a mosaic. These image analysis techniques assume that the internal camera parameters (e.g., focal length pixel resolution, aspect ratio, and image center) are known. In automated image processing systems that use alignment and photogrammetry, the alignment and photogrammatic process involves two steps: (1) establishing correspondence between pixels within various images via some form of area- or feature-based matching scheme, and (2) analyzing pixel displacement in order to recover three-dimensional scene information.
Other image processing systems have analyzed image motion within a three-dimensional scene that is imaged from multiple viewpoints to determine the range or depth of objects within the scene. Such an approach is disclosed in K. J. Hanna, "Direct Multi-Resolution Estimation of Ego-Motion and Structure From Motion", Proceedings of the IEEE Workshop on Visual Motion, Princeton, N.J., Oct. 7-9, 1991, pp. 156-162, and K. J. Hanna et al., "Combining Stereo and Motion Analysis for Direct Estimation of Scene Structure", Proceedings of the Fourth International Conference on Computer Vision (ICCV'93), Berlin, Germany, May, 1993. The disclosures within both these papers are incorporated herein by reference. The prior art methods of generating three-dimensional representations have heretofore not been used in conjunction with systems that generate two-dimensional mosaics. Consequently, these approaches are used to analyze the three-dimensional geometry of a scene, but do not form useful representations of combinations of images such as mosaics.
Therefore, a need exists in the art for a system that automatically generates, from a plurality of images, a three-dimensional mosaic that accurately represents both the two-dimensional image information and the three-dimensional geometry within a scene.