This invention relates to video image mosaicing for obtaining panoramic mosaics of a scene.
Prior art references considered to be relevant as a background to the invention are listed below. Acknowledgement of the references herein shall not be inferred as meaning that these are in any way relevant to the patentability of the invention disclosed herein. Each reference by a number enclosed in square brackets and accordingly the prior art will be referred to throughout the specification by numbers enclosed in square brackets.
[1] ARPA Image Understanding Workshop, Monterey, Calif., November 1994, Morgan Kaufmann.
[2] Fifth International Conference on Computer Vision, Cambridge, Mass., June 1995, IEEE-CS.
[3] IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, Calif., June 1996.
[4] P. J. Burt and E. H. Adelson. A multiresolution spline with application to image mosaics. ACM Trans. on Graphics, 2(4), pages 217-236, October 1983.
[5] P. J. Burt and P. Anandan. Image stabilization by registration to a reference mosaic. In ARPA Image Understanding Workshop [1], pages 425-434.
[6] S. E. Chen and L. Williams. View interpolation for image synthesis. In SIGGRAPH, pages 279-288, Anaheim, Calif., August 1993, ACM.
[7] T. R. Halfhill. See you around. Byte Magazine, pages 85-90, May 1995.
[8] M. Hansen, P. Anandan, K. Dana, G. van der Wal, and P. J. Burt. Real-time scene stabilization and mosaic construction. In ARPA Image Understanding Workshop [1], pages 457-465.
[9] M. Irani, P. Anandan, and S. Hsu. Mosaic based representations of video sequences and their applications. In Fifth International Conference on Computer Vision [2] page 605-611.
[10] P. Jaillon and A. Montanvert. Image mosaicing applied to three-dimensional surfaces. In 12 International Conference on Pattern Recognition, pages 253-257, Jerusalem, Israel, October 1994, IEEE-CS.
[11] A. Krishnan and N. Ahuja. Panoramic image acquisition. In IEEE Conference on Computer Vision and Pattern Recognition [3], pages 379-384.
[12] S. Mann and R. Picard. Virtual bellows: Constructing high quality stills from video. In First IEEE International Conference on Image Processing, Austin, Tex., November 1994.
[13] L. McMillan and G. Bishop. Plenoptic modeling: An image-based rendering system. In SIGGRAPH, Los Angeles, Calif., August 1995, ACM.
[14] D. L. Milgram. Computer methods for creating photomosaics. IEEE Trans. on Computers, C-24, pages 1113-1119, 1975.
[15] D. L. Milgram. Adaptive techniques for photomosaicing. IEEE Trans. on Computers, C-26, pages 1175-1180, 1977.
[16] S. Peleg, Elimination of seams from photomosaics. Computer Graphics and Image Processing, 16, pages 90-94, May 1981.
[17] B. Rousso, S. Avidan, A. Shashua, and S. Peleg. Robust recovery of camera rotation from three frames. In IEEE Conference on Computer Vision and Pattern Recognition [3], pages 796-802.
[18] H. S. Sawhney, S. Ayer, and M. Gorkani. Model-based 2D and 3D dominant motion estimation for mosaicing and video representation. In Fifth International Conference on Computer Vision [2], pages 583-590.
[19] S. Seitz and C. Dyer. Physically valid view synthesis by image interpolation. In Proc. IEEE Workshop on Representation of Visual Scenes, Cambridge, Mass., June 1995, IEEE-CS.
[20] R. Hartley and R. Gupta. Linear pushbroom cameras. In J. O. Eklundh, editor, Third European Conference on Computer Vision, pages 555-566, Stockholm, Sweden, May 1994, Springer.
[21] M. Irani, B. Rousso, and S. Peleg. Detecting and tracking multiple moving objects using temporal integration. In G. Sandini, editor, Second European Conference on Computer Vision, pages 282-287, Santa Margherita, Italy, May 1992, Springer.
[22] S. Peleg, J. Herman, D. Dixon, P. J. Burt, and J. R. Bergen. U.S. Patent Applicationxe2x80x94Improved methods for mosaic image construction. 
[23] R. Szeliski. Video mosaics for virtual environments. IEEE Computer Graphics and Applications, pages 22-30, March 1996.
[24] R. Szeliski and S. B. Kang. Direct methods for visual scene reconstruction. In Proc. IEEE Workshop on Representation of Visual Scenes, Cambridge, Mass., June 1995, IEEE-CS, pages 26-33.
[25] J. Y. Zheng and S. Tsuji. Panoramic representation for route recognition by a mobile robot International Journal of Computer Vision, Vol. 9, pages 55-76, 1992.
The need to combine pictures into panoramic mosaics existed since the beginning of photography, since the camera""s field of view is always smaller than the human field of view. Also, very often large objects cannot be captured in a single picture, and only photo-mosaicing enables a more complete view. Digital photography created new applications for mosaicing [14, 15, 16, 4, 24, 23], which were first implemented for aerial and satellite images.
Three major issues are important in traditional image mosaicing:
(i) Image alignment, which determines the transformation that aligns the images to be combined into a mosaic. Paper photo-mosaicing uses rigid transformations for alignment: picture translations (shifts) and rotations. Digital processing enables more general transformations, like affine or planar-projective.
(ii) Image cut and paste is necessary since most regions in the panoramic mosaic are overlapping, and are covered by more than one picture. The cut and paste process involves either a selection of a single image for each overlapping region, or some kind of a combination of all overlapping images.
(iii) Image blending is necessary to overcome the intensity difference between images, differences that are present even when images are perfectly aligned. Such differences are created by a dynamically changing camera gain.
The simplest mosaics are created from a set of images whose mutual displacements are pure image-plane translations. This is approximately the case with some satellite images. Such translations can either be computed by manually pointing to corresponding points, or by image correlation methods. Other simple mosaics are created by rotating the camera around its optical center using a special device, and creating a panoramic image which represents the projection of the scene onto a cylinder [7, 11, 12, 13] or a sphere. Since it is not simple to ensure a pure rotation around the optical center, such mosaics can be used only in limited cases.
In more general camera motions, which may include both camera translations and camera rotations, more general transformations for image alignment are used [5, 8, 9, 10, 18]. In most cases images are aligned pairwise, using a parametric transformation like an affine transformation or planar-projective transformation. These transformations include an intrinsic assumption regarding the structure of the scene, such as being planar. A reference frame is selected, and all images are aligned with this reference frame and combined to create the panoramic mosaic. These methods are therefore referred to as reference frame based methods.
Aligning all frames to a single reference frame is reasonable when the camera is far away and its motion is mainly a sideways translation and a rotation around the optical axis. Significant distortions are created when camera motions include other rotations. FIG. 1 shows the effects of large rotations on reference frame based methods. The objects a, b, x, y, c, d, w, z are viewed from two cameras C1 and C2. The image I1 is selected to be a reference frame and image I2 is projected onto that reference frame. Large rotations generate distortions when projecting on the reference frame, and the information derived from frames with such rotations is blurred, and almost useless. Moreover, in long sequences in which the camera is traveling in a complex path, one frame can not be used for long as a reference frame, and projection of the entire sequence onto that frame becomes impractical.
Recently, a method, called xe2x80x9cManifold Projectionxe2x80x9d [22], has been proposed to create mosaics in more general cases. This method performs alignments using only image-plane translations and rotations, constructs the mosaic from the center-most parts of the images, and merges the images into a seamless panorama. The manifold projection method is very similar to the one in [25], where a mosaic is constructed by scanning a scene with a one-dimensional, straight array. However, while in [25] camera motion is measured by an external device, in [22] the camera motion is measured from the images in the sequence.
However, none of the above methods can handle cases where images cannot be aligned due to parallax, or cases of zoom and forward motion.
Manifold Projection simulates the sweeping of a scene using a linear one-dimensional sensor array, see FIG. 2. Such a one-dimensional sensor can scan the scene by arbitrary combinations of rotations and translations, and in all cases the scanning will result in a sensible panoramic image if it could be figured out how to align the incoming one-dimensional image strips. Some satellite images are created by scanning the earth with a one-dimensional sensor array using a rotating mirror. Since in this case the alignment of the sensors can be done using the location of the satellite and the position of the mirror, panoramic two-dimensional images are easily obtained. FIG. 2 shows aerial photography with a linear one-dimensional scan system.
In more general cases the motion of the sweeping plane may not be known. It seems impossible to align the one-dimensional image strips coming from an arbitrary plane sweep, but the problem becomes easier when the input is a video sequence. A two-dimensional frame in a video sequence can be regarded as having a one-dimensional strip somewhere in the center of the image (xe2x80x9ccenter stripxe2x80x9d), embedded in the two-dimensional image to facilitate alignment. The motion of the sweeping plane can then be computed from the entire image, and applied on the center-strip for alignment and mosaicing.
The image transformations of the one-dimensional strips generated by the sweeping plane are only rigid transformations: image plane translations and rotations. Therefore, rigid transformations are also the transformations used in manifold projection. It should be noted that general camera motions induce, in general, non-rigid image-plane transformations. However, to simulate the plane sweep only rigid transformations are used for the center-strip.
The panoramic mosaic generated by combining the aligned one-dimensional center-strips forms the manifold projection. This is a projection of the scene into a general manifold, which is a smooth manifold passing through the centers of all image planes constructing the mosaic. In the case of pure camera translations (FIG. 3a), manifold projections turn out to be a parallel projection onto a plane. In the case of pure camera rotations (FIG. 3b), it is a projection onto a cylinder, whose principal axis is the rotation axis. But when both camera translations and rotations are involved, as in FIG. 3c, the manifold is not a simple manifold any more. In FIGS. 3a, 3b and 3c the camera is located at the tip of the xe2x80x9cfield-of-viewxe2x80x9d cone, and the image plane is marked by a solid segment. The ability to handle such arbitrary combinations of camera rotations and translations is the major distinction between manifold projection and all previous mosaicing approaches.
In view of the foregoing, it should be apparent that there exists a need to provide a method for the creation of panoramic image mosaics in cases not treated in the prior art. Such cases involve camera translations with image parallax; forward motion; camera motions that are combinations of translations and rotations; and camera zoom.
It is important to note that whenever the terms xe2x80x9cvideoxe2x80x9d, xe2x80x9cmoviexe2x80x9d, xe2x80x9cfamexe2x80x9d, xe2x80x9cpicturexe2x80x9d, or xe2x80x9cimagexe2x80x9d are used, they refer to any representation of a picture or a movie (motion picture). A still picture can be recorded on film by a traditional camera, by a digital camera, by a scanner, or any other device that records still images. A video (or a motion picture) can be recorded by a film camera, an analog or a digital videotape, or any other device that records motion pictures. The area of image mosaicing in general, and this invention in particular, is applicable to all forms of images which can be manipulated by appropriate devices, whether mechanical, optical, digital, or any other technology.
Panoramic mosaics are constructed by combining strips from the image sequence. In accordance with the present invention, the shape, size and position of the strips are determined for each image in accordance with the type of camera motion. The strips are cut from the images, and pasted into the panoramic mosaic after being transformed, such that the resulting mosaic remains continuous.
In accordance with the present invention, the following constraints are preferably (but not necessarily) used in order to deal with general image plane transformations:
(a) the strips should be approximately perpendicular to the optical flow.
(b) the strips collected for pasting should be warped before pasting into the panoramic image so that after warping their original optical flow, it becomes approximately parallel to the direction in which the panoramic image is constructed.
Under these conditions, cases of zoom and forward motion can be handled as well as the other simple cases. For example, in the case of zoom or forward motion, these properties enable cutting circular strips, and proper bending of them before pasting into the panoramic image.
This invention also describes how to determine the width of the strips. For example, in order to handle image parallax properly, the size of the strips can be determined from the camera""s three-dimensional motion, as can be computed from the sequence itself, or as can be measured by external devices.
To enable smooth mosaics even when frames to be combined are taken from different viewpoints, and have substantial parallax, views can be synthesized for in-between camera positions. For smoothest mosaics the number of in-between camera positions is selected such that the strip is narrow, e.g. having a width of a single pixel.
The present invention provides for a method for combining a sequence of two-dimensional images of a scene to obtain a panoramic mosaic of said scene, said sequence of two-dimensional images being acquired by a moving camera in relative motion with respect to said scene, said camera having an optical center, the camera motion giving rise to optical flow between the images, the method comprising the step of wrapping the images;
pasting the images into the panoramic image,
such that the optical flow becomes substantially parallel to the direction in which the mosaic is constructed.
The invention still further provides for combining a sequence of two-dimensional images of a scene to obtain a panoramic mosaic of said scene, said sequence of two-dimensional images being acquired by a moving camera in relative motion with respect to said scene, said camera having an optical center, the camera motion giving rise to optical flow between the images, the method comprising the steps of:
(a) selecting for each image of said sequence at least one strip such that each strip is substantially perpendicular to said optical flow; said strips having a front edge and a back edge with the optical flow entering a strip through the front edge and exiting the strip through the back edge; and
(b) pasting together said strips from adjacent to construct a panoramic mosaic.
By one embodiment the method further comprises the step of:
(axe2x80x2) wrapping the front edge of a strip defined on a two-dimensional so that it is substantially aligned with the back edge of a strip defined on an adjacent two-dimensional image.
By another embodiment the strips are transformed by warping into strips having edges of arbitrary shape before the strips are pasted together.
By yet another embodiment the strips are transformed by warping into strips having straight edges before the strips are combined together.
According to yet another embodiment the two-dimensional images are related by an affine transformation or by a planar-projective transformation.
According to another embodiment said images are projected onto a three-dimensional cylinder whose major axis approximates the path of the camera centers of said images, the combination of the strips is achieved by translating the projected two-dimensional images substantially along the cylindrical surface of the three-dimensional cylinder.
According to yet another embodiment every two subsequent images define their own cylinder whose major axis substantially passes through the centers of the cameras of said images, and the cylinders are concatenated substantially along the image sequence.
According to still another embodiment a transformation is applied to the panoramic mosaic depending on a desired viewpoint.
According to a further embodiment wherein the sequence of images is augmented by sets of interpolated images intermediate to the images of the sequence of images, and wherein the strips are augmented with strips defined on the interpolated images.
According to another embodiment the system further combines a sequence of two-dimensional images of a scene to obtain a panoramic mosaic of said scene, said sequence of two-dimensional images being acquired by a moving camera in relative motion with respect to said scene, said camera having an optical center, the camera motion giving rise to optical flow between the images, the system comprising:
wrapper for wrapping the images;
paster for pasting the images into the panoramic image, such that the optical flow becomes substantially parallel to the direction in which the mosaic is constructed.
Still further, the invention provides for combining a sequence of two-dimensional images of a scene to obtain a panoramic mosaic of said scene, said sequence of two-dimensional images being acquired by a moving camera in relative motion with respect to said scene, said camera having an optical center, the camera motion giving rise to optical flow between the images, the system comprising:
(a) selector for selecting for each image of said sequence at least one strip such that each strip is substantially perpendicular to said optical flow; said strips having a front edge and a back edge with the optical flow entering a strip through the front edge and exiting the strip through the back edge; and
(b) paster for pasting together said strips from adjacent images in such a way that the front edge of a strip defined on an image is substantially aligned with the back edge of a strip defined on an adjacent image.
Still yet further the invention provides a memory containing a file representing a panoramic mosaic of a scene.
The process described herein can alternatively be interpreted using three-dimensional projections. of the images onto cylinders (xe2x80x9cpipesxe2x80x9d) whose principal axis is the direction of camera motion. Such projections create warpings of the images such that the optical flow becomes parallel.