The need to combine pictures into panoramic mosaics existed since the beginning of photography, since the camera's field of view is always smaller than the human field of view. Also, very often large objects cannot be captured in a single picture, and only photo-mosaicing enables a more complete view. Digital photography created new applications for mosaicing [14, 15, 16, 4, 24, 23], which were first implemented for aerial and satellite images.
Three major issues are important in traditional image mosaicing:                (i) Image alignment, which determines the transformation that aligns the images to be combined into a mosaic. Paper photo-mosaicing uses rigid transformations for alignment: picture translations (shifts) and rotations. Digital processing enables more general transformations, like affine or planar-projective.        (ii) Image cut and paste is necessary since most regions is in the panoramic mosaic are overlapping, and are covered by more than one picture. The cut and paste process involves either a selection of a single image for each overlapping region, or some kind of a combination of all overlapping images.        (iii) Image blending is necessary to overcome the intensity difference between images, differences that are present even when images are perfectly aligned. Such differences are created by a dynamically changing camera gain.        
The simplest mosaics are created from a set of images whose mutual displacements are pure image-plane translations. This is approximately the case with some satellite images. Such translations can either be computed by manually pointing to corresponding points, or by image correlation methods. Other simple mosaics are created by rotating the camera around its optical center using a special device, and creating a panoramic image which represents the projection of the scene onto a cylinder [7, 11, 12, 13] or a sphere. Since it is not simple to ensure a pure rotation around the optical center, such mosaics can be used only in limited cases.
In more general camera motions, which may include both camera translations and camera rotations, more general transformations for image alignment are used [5, 8, 9, 10, 18]. In most cases images are aligned pairwise, using a parametric transformation like an affine transformation or planar-projective transformation (see, for example, [26]). These transformations include an intrinsic assumption regarding the structure of the scene, such as being planar. A reference frame is selected, and all images are aligned with this reference frame and combined to create the panoramic mosaic. These methods are therefore referred to as reference frame based methods.
Aligning all frames to a single reference frame is reasonable when the camera is far away and its motion is mainly a sideways translation and a rotation around the optical axis. Significant distortions are created when camera motions include other rotations. FIG. 1 shows the effects of large rotations on reference frame based methods. The objects a, b, x, y, c, d, w, z are viewed from two cameras C1 and C2. The image I1 is selected to be a reference frame and image I2 is projected onto that reference frame. Large rotations generate distortions when projecting on the reference frame, and the information derived from frames with such rotations is blurred, and almost useless. Moreover, in long sequences in which the camera is traveling in a complex path, one frame can not be used for long as a reference frame, and projection of the entire sequence onto that frame becomes impractical.
The manifold projection method was introduced in [25], where a mosaic is constructed by scanning a scene with a one-dimensional, straight array.
However, none of the above methods can handle cases where images cannot be aligned due to parallax, or cases of zoom and forward motion.
Manifold Projection simulates the sweeping of a scene using a linear one-dimensional sensor array, see FIG. 2. Such a one-dimensional sensor can scan the scene by arbitrary combinations of rotations and translations, and in all cases the scanning will result in a sensible panoramic image if it could be figured out how to align the incoming one-dimensional image strips. Some satellite images are created by scanning the earth with a one-dimensional sensor array using a rotating mirror. Since in this case the alignment of the sensors can be done using the location of the satellite and the position of the mirror, panoramic two-dimensional images are easily obtained. FIG. 2 shows aerial photography with a linear one-dimensional scan system.
In more general cases the motion of the sweeping plane may not be known. It seems impossible to align the one-dimensional image strips coming from an arbitrary plane sweep, but the problem becomes easier when the input is a video sequence. A two-dimensional frame in a video sequence can be regarded as having a one-dimensional strip somewhere in the center of the image (“center strip”), embedded in the two-dimensional image to facilitate alignment. The motion of the sweeping plane can then be computed from the entire image, and applied on the center-strip for alignment and mosaicing.
The image transformations of the one-dimensional strips generated by the sweeping plane are only rigid transformations: image plane translations and rotations. Therefore, rigid transformations are also the transformations used in manifold projection. It should be noted that general camera motions induce, in general, non-rigid image-plane transformations. However, to simulate the plane sweep only rigid transformations are used for the center-strip.
The panoramic mosaic generated by combining the aligned one-dimensional center-strips forms the manifold projection. This is a projection of the scene into a general manifold, which is a smooth manifold passing through the centers of all image planes constructing the mosaic. In the case of pure camera translations (FIG. 3a), manifold projections turn out to be a parallel projection onto a plane. In the case of pure camera rotations (FIG. 3b), it is a projection onto a cylinder, whose principal axis is the rotation axis. But when both camera translations and rotations are involved, as in FIG. 3c, the manifold is not a simple manifold any more. In FIGS. 3a, 3b and 3c the camera is located at the tip of the “field-of-view” cone, and the image plane is marked by a solid segment. The ability to handle such arbitrary combinations of camera rotations and translations is the major distinction between manifold projection and all previous mosaicing approaches.
In view of the foregoing, it should be apparent that there exists a need to provide a method for the creation of panoramic image mosaics in cases not treated in the prior art. Such cases involve camera translations with image parallax; forward motion; camera motions that are combinations of translations and rotations; and camera zoom.