The invention relates generally to the field of panoramic image formation, and in particular to generating environment maps containing information extracted from pairs of stereo images.
There is a growing number of imaging applications where the viewer has the perception of being able to move about a virtual environment. One method of developing such a virtual environment is to capture a plurality of images, which can be combined into a 360xc2x0 panoramic view. Panoramic images can be considered as capturing an image as if the capture medium, e.g., film, were wrapped in a cylinder. For applications such as virtual reality, a portion of the image is transformed to appear as if the image were captured with a standard photographic system. However, since optical systems are not easily built which capture images on cylindrical photoreceptors, a variety of methods have been developed to provide the functionality. Panoramic stitching is a method that can be used to generate a 360xc2x0 (or less) panoramic image from a series of overlapping images acquired from an ordinary camera. S. E. Chen describes the process of panoramic stitching in Quick Time VRxe2x80x94An Image-based Approach to Virtual Environment Navigation, Proc. SIGGRAPH ""95, 1995, pp. 29-38.
Because conventional panoramic images do not have range associated with the objects in the scene, there are many potential applications of virtual-reality which are not accomplished easily. One such application is the ability to introduce objects synthetically into a panoramic image and interact with the image as one might like. For instance, if the objects are a distance d from the camera and a synthetic object is de be placed midway and have the zooming property of virtual images to operate in a manner appearing normal, the range must be known. Also, if synthetic objects are to interact with real objects in the image, the range information is critical. However, since the panoramic image capture systems do not acquire images from different vantage points, it is unreasonable to expect the system to estimate the range of the objects to the image capture point. Estimating range information can be accomplished in many ways, though a common and well-known method involves the use of stereo image pairs.
Range estimation by stereo correspondence is a topic that has been heavily researched. By capturing images from two different locations that are a known distance apart, orthographic range to any point in the scene can be estimated if that point can be found in both images. (Orthographic range is the orthogonal distance from a real world point to a plane parallel to an image plane passing through the rear nodal point of the capture device.) The difficulty in this method of range estimation is finding correspondence points. A variety of stereo correspondence algorithms exists; see for example. Y. Ohta and T. Kanade, Stereo by Intra-and Inter-Scanline Search Using Dynamic Programming, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-7, No. 2, March, 1985, pp. 139-154, and I. J. Cox, S. O. Hingorani, S. B. Rao, and B. M. Maggs, A Maximum Likelihood Stereo Algorithm, Computer Vision and Image Understanding, Vol. 63, No. 3, May 1996, pp. 542-567.
Conventionally, stereo images are captured using two optical systems having parallel optical axes, with the plane formed by the optical axis and vector between image centers being parallel to the ground. The reason for this arrangement is that stereo images are typically used to give the viewer a perception of three-dimensionality, and the human visual system has this arrangement. There have been systems producing stereo panoramic images, (see Huang and Hung, Panoramic Stereo Imaging System with Automatic Disparity Warping and Seaming, Graphical Models and Image Processing, Vol. 60, No. 3, May, 1998, pp. 196-208); however, these systems use a classical side-by-side stereo system as their intent is to utilize the stereo images for a human viewer, and not to estimate the depth of objects. One problem of the side-by-side approach is that panoramic images are best captured when the axis of rotation is at the rear-nodal point of the optical system. In a conventional side-by-side configuration this is geometrically impossible. As a result, at least one of the panoramic images is sub-optimal.
In the conventional camera arrangement for stereo imaging, the horizontal distance between the two cameras is commonly referred to as the baseline distance. With this arrangement the corresponding points in the two images can be found on the same horizontal line in the image. For a digital system this implies that a corresponding point exists in the same scan line for each image, though the position within the scan line of each image will differ depending upon the distance of the associated object from the cameras. According to the aforementioned Ser. No. 09/162,310, which is incorporated herein by reference, stereo imaging is achieved by capturing stereo image pairs that are displaced from each other along a vertical axis through the rear nodal point of a taking lens, rather than being displaced along a horizontal axis as in the conventional stereo image capture systems. This configuration retains the advantage of the conventional stereo capture configuration, in that points in one image of the stereo pair have their corresponding point in the other image of the pair along a line; however, the line is now a vertical line, which simplifies the formation of depth data for panoramic images.
Referring now to FIG. 1, the apparatus described in copending Ser. No. 09/162,310 for capturing a collection of stereo images to produce a 360xc2x0 panoramic image with range data includes a pair of 360xc2x0 stereoscopic cameras 110 and 110xe2x80x2 that are mounted on a rotating camera support for rotation about the respective rear nodal points of the cameras. The rotating camera support includes a support rod 118 that is mounted for rotation in a base 119. The operator rotates the camera assembly after each photograph in the series. The rotation angle is sufficiently small to permit adequate overlap between successive images for the subsequent xe2x80x9cimage stitching.xe2x80x9d An angle of 30xc2x0 is generally adequate, but the amount of rotation is dependent upon the field of view of the camera system. The cameras 110 and 110xe2x80x2 are vertically displaced from each other along the rotating camera support rod 118 to provide a baseline distance d.
Each camera 110, 110xe2x80x2 has an optical axis, and alignment between the two cameras must assure that the two optical axes are as parallel as possible. The cameras are mounted such that they share the same rotational angle. In order to keep the cameras in vertical alignment a set of level indicators 128 are mounted on the support mechanism. The camera mounts 123 lock a camera to eliminate any side-to-side rotation creating misalignment between the two cameras. The base 127 is like a common camera tripod and the lengths of the three legs are adjustable, each with a locking mechanism 129. By using the locking mechanism 129 and the level indicators 128 the operator can align the cameras to be displaced solely in the vertical direction. In order to activate the cameras, a remote shutter 125 is used which triggers both cameras simultaneously. The cameras are connected to the shutter control by a pair of wires 122 or by an RF control signal.
The cameras 110, 110xe2x80x2 are assumed to be identical and share in the same optical specifications, e.g., focal length and field of view. The baseline distance d of the two cameras 110, 110xe2x80x2 directly influences the resolution of depth estimates for each point in the 360xc2x0 panoramic image. The amount of baseline is a function of the expected distance from the camera to the objects of interest. To permit this adjustment, the camera system has an adjustable vertical distance mechanism 124, such as an adjustable rack and pinion on the rotating camera support rod 118. The amount of vertical baseline is displayed on a vernier gauge 126 on the support rod 118. The displacement distance d must be noted and employed in accurately estimating the distance from the cameras to the objects. Creating a panoramic environment map from the resulting series of overlapping images can be divided into three tasks: cylindrical warping, image registration, and image blending.
What is needed is a more effective way of processing stereo pairs of images in order to accomplish these tasks and synthesize a panoramic environment map containing intensity (not precluding color) and range information present in a scene. Two problems are introduced when applying panoramic stitching techniques to range estimates: first, range estimates are an orthographic distance from objects to a plane parallel to the image plane, and second, range estimates are available only at a sparse sampling of points. These problems prohibit current panoramic stitching techniques from being adequate for the task of stitching together range estimates. If these problems could be solved, it would be possible to create a much improved panoramic environment map containing both intensity and range information.
The present invention is directed to overcoming one or more of the problems set forth above. Briefly summarized, according to one aspect of the present invention, a method for generating a panoramic environment map from a plurality of stereo image pairs begins by acquiring a plurality of stereo image pairs of the scene, wherein there is an overlap region between adjacent stereo image pairs. Then orthographic range values are generated from each of the stereo image pairs corresponding to the orthogonal distance of image features in the stereo pairs from an image capture point. The range values are transformed into a coordinate system that reflects a common projection of the features from each stereo image pair and the transformed values are then warped onto a cylindrical surface, forming therefrom a plurality of adjacent warped range images. The adjacent warped range images are registered and then the overlap regions of the registered warped range images are blended to generate a panoramic environment map containing range information. The range information is then concatenated with intensity information to generate an environment map containing both range and intensity information.
While current methods of generating composite panoramic scenes are used in applications such as virtual reality, the advantage of the invention is that the augmentation of range information to these scenes can extend the degree of interaction experienced by the user in a photorealistic virtual world.
These and other aspects, objects, features and advantages of the present invention will be more clearly understood and appreciated from a review of the following detailed description of the preferred embodiments and appended claims, and by reference to the accompanying drawings.