The invention relates generally to the field of panoramic imaging technology, and in particular to the field of forming a complete three-dimensional panoramic scene.
Panoramic imaging technology has been used for merging multiple photographs or digital images to produce a single seamless 360xc2x0 panoramic view of a particular scene. A single photographic camera is usually employed in such a way that a sequence of image inputs is obtained as the camera is rotated around the focal point of the camera lens causing every two neighboring images to slightly overlap each other. The intensity values from the two neighboring images in the overlap region are weighted and then summed to form a smooth transition. The resultant panorama provides a 2D (two-dimensional) description of the environment.
There is a wide range of potential applications that requires not only intensity panorama but also panoramic three-dimensional (3D) maps associated with the intensity images, that is, a 3D description of the environment. VR technology and e-commerce are example applications where 3D panorama plays a crucial role. Virtual world and virtual objects can be built using the 3D panorama and displayed with the help of VRML (Virtual Reality Modeling Language); see Ames et al., VRML 2.0 Sourcebook, Second Edition, Positioning Shapes, Chapter 5, pp. 63-75.
In order to obtain both intensity and 3D panorama, multiple (more than one) cameras are usually utilized in constructing a panoramic 3D imaging system. There have been systems producing depth panoramic images; see Huang et al., Panoramic Stereo Imaging System with Automatic Disparity Warping and Seaming, Graphical Models and Image Processing, Vol. 60, No. 3, May 1998, pp. 196-208. This system utilizes a side-by-side camera system in imitating a human viewer. It is known that the panoramic images are best captured when the axis of rotation is at the rear-nodal point of the camera. However, this is impossible by using the side-by-side configuration. One solution displaces the camera vertically such that the line between the rear-nodal points of the cameras is aligned with the rotation axis. The details for vertical stereo camera setup are described in the U.S. Pat. No. 6,023,588 issued Feb. 8, 2000 to Ray et al. entitled Method and Apparatus for Capturing Panoramic Images with Range Data.
The camera set swivels at the nodal point at a constant angular interval and produces intensity images that are used to generate 3D images. Like the conventional two-dimensional panorama formed by stitching two neighboring intensity images together, the three-dimensional panorama is constructed by stitching neighboring 3D images. However, problems arise when two adjacent 3D images in a sequence are merged. The 3D values of an object point measured by the camera system is defined with respect to the local three-dimensional coordinate system that is fixed relative to the stereo camera optical system; see Ohta et al., Stereo by Intra- and Inter-Scanline Search Using Dynamic Programming, IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. PAMI-7, No. 2, March, 1985, pp. 139-154, Cox et al., A Maximum Likelihood Stereo Algorithm, Computer Vision and Image Understanding, Vol. 63, No. 3, May 1996, pp. 542-567. The computed 3D values of an object point in the real world space is a function of the orientation of the camera optical axis. Consequently, distortion appears when a sequence of 3D images is used to describe the shape of an object. For instance, a smooth surface object in the three-dimensional space appears as a fragmented smooth surface object after reconstruction, using the untreated 3D images. No systematic methods have been shown to address this problem in panoramic 3D map formation. A similar situation exits in estimating depth from motion. For that problem, Szeliski et al. propose a projective structure approach to estimate depth from frame to frame in Direct Methods for Visual Scene Reconstruction, Proceedings, IEEE Workshop on Representation of Visual Scenes, Jun. 24, 1995, Cambridge, Mass., pp. 26-33. The approach performs the depth computation for successive images in the projective space rather than the Euclidean space. The obtained depth is thus the projective depth and has to be transformed to the Euclidean coordinate system for practical use. Reliability of this approach is also subject to examination, as noted by the authors.
The need is met according to the present invention by providing a method, a system, and a computer program product for deriving a three-dimensional panorama from a plurality of stereo image pairs of a scene generated from a plurality of cameras, that includes acquiring a plurality of stereo image pairs of the scene, wherein there is an intra-overlap region between vertically aligned stereo image pairs; providing disparity data for recovering scene spatial information; generating (X,Y,Z) values for each of the stereo image pairs with respect to a local three-dimensional coordinate system wherein the intra-stereo image pair is taken; acquiring a plurality of stereo image pairs of the scene by rotating the plurality of cameras about a Y-axis (vertical axis), wherein there is an inter-overlap region between adjacent stereo image pairs; selecting a reference three-dimensional world coordinate system against which the overall spatial information of the scene can be correctly presented; transforming the generated (X,Y,Z) values from each of the local three-dimensional coordinate systems to the selected reference three-dimensional world coordinate system; warping the transformed (X,Y,Z) images onto a cylindrical surface, and forming a plurality of warped (X,Y,Z) images; registering adjacent warped (X,Y,Z) images; and forming a three-dimensional panorama, i.e., a (X,Y,Z) panorama using the warped (X,Y,Z) images.