1. Technical Field
The invention is related to stereo reconstruction approaches, and more particularly to a system and process for computing a dense 3D reconstruction associated with a panoramic view of a surrounding scene using multiperspective panoramas derived from a collection of single-perspective images.
2. Background Art
Traditional stereo reconstruction begins with two calibrated perspective images taken with pinhole cameras. To reconstruct the 3D position of a point in the first image, its corresponding point in the second image has to be found before applying triangulation. Perspective cameras have the property that corresponding points lie on straight lines, which are called epipolar lines. In order to simplify the search for correspondences, the two images can optionally be rectified so that epipolar lines become horizontal.
Recently, there has been a lot of work on 3D reconstruction from large collections of images. Multi-baseline stereo using several images can produce better depth maps by averaging out noise and reducing ambiguities [12]. Space sweeping approaches, which project multiple images onto a series of imaging surfaces (usually planes), also use significant data redundancy for better reconstruction [3, 18, 22, 9].
Consider the problem of building a 3D environment model from thousands of images captured on video. Many modeling approaches to date have concentrated on coarse reconstruction using structure from motion with a small number (typically hundreds) of tracked feature points. What is really desired, however, are truly photorealistic reconstructions, and these require dense 3D reconstruction. One attempt at generating dense 3D reconstructions involved computing a depth map for each input image [21]. However, this method is computationally expensive. Granted, the expense could be lowered by sub-sampling the input frames (e.g., by simply dropping neighboring frames). However, this risks not having enough overlapping frames to build good correspondences for accurate 3D reconstruction.
It is noted that in the preceding paragraphs, as well as in the remainder of this specification, the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, xe2x80x9creference [1]xe2x80x9d or simply xe2x80x9c[1]xe2x80x9d. Multiple references will be identified by a pair of brackets containing more than one designator, for example, [3, 18, 22, 9]. A listing of the publications corresponding to each designator can be found at the end of the Detailed Description section.
The present invention involves a new approach to computing a 3D reconstruction of a scene and associated depth maps from two or more multiperspective panoramas. These reconstructions can be used in a variety of ways. For example, they can be used to support a xe2x80x9clook around and move a littlexe2x80x9d viewing scenario or to extrapolate novel views from original panoramas and a recovered depth map.
The key to this new approach is the construction and use of multiperspective panoramas that efficiently capture the parallax available in the scene. Each multiperspective panorama is essentially constructed by constraining camera motion to a radial path around a fixed rotation center and taking a single perspective image of the scene at a series of consecutive rotation angles around the center of rotation. A particular columnar portion of each of the single-perspective images captured at each consecutive rotation angle is then rebinned to form the multiperspective panorama. This columnar portion of the single-perspective image can be of any width, however, it is preferred that it is one pixel column wide. The column may also have any available height desired, but it is preferred the height correspond to the vertical field of view of the associated single-perspective image. Note that each of these columns would have been captured at a different viewpoint, thus the name xe2x80x9cmultiperspectivexe2x80x9d panorama. In addition, it is important to note that the aforementioned xe2x80x9cparticularxe2x80x9d column selected from each of the single-perspective images to form the multiperspective panorama refers to the fact that each of the selected columns must have been captured at the same angle relative to the camera lens.
Single-perspective images having the attributes discussed above are preferably captured in one of two ways. In a first technique, slit cameras (or xe2x80x9cregularxe2x80x9d perspective cameras where only the center xe2x80x9ccolumnxe2x80x9d is used) are mounted on a rotating bar at various radial distances from the center of rotation. The cameras are directed so as to face perpendicular to the bar, and so tangent to the circle formed by the camera when the bar is rotated. Each slit image is captured at a different angle of rotation with respect to the rotation center, and each radial position on the bar is used to capture images that will be used to construct a separate multiperspective panorama. It is noted that a single camera could also be employed and repositioned at a different radial position prior to each complete rotation of the bar. As the images captured by this method are all concentric, the multiperspective panoramas constructed are referred to as concentric panoramas or mosaics.
The other preferred technique for capturing the desired single-perspective images employs a xe2x80x9cregularxe2x80x9d perspective camera mounted on a rotating bar or table looking outwards. Here again, each image is captured at a different angle of rotation with respect to the rotation center. However, in this case, the images are captured at the same radial distance from the center of rotation, and each image column having the same off-axis angle from each image is used to construct a different one of the multiperspective panoramas. The off-axis angle is the angle formed between a line extending from the viewpoint of a column towards the lateral centerline of the portion of the scene depicted by the column and a swing line defined by the center of rotation and the viewpoint. For example, the 20th image column taken from each image may be used to form a particular multiperspective panorama. This type of multiperspective panorama has been dubbed a swing panorama.
Multiperspective panoramas make it simple to compute 3D reconstructions associated with a panoramic image of the scene. Specifically, these reconstructions can be generated using a novel cylindrical sweeping approach, or if conditions are right, traditional stereo matching algorithms.
The cylindrical sweep process involves first projecting each pixel of the multiperspective panoramas being used to compute the depth map onto each of a series of cylindrical surfaces of progressively increasing radii. The radius of each of these cylindrical surfaces also exceeds the radius of the outermost multiperspective panorama employed in the process. It is noted that the change in the radius for each consecutive cylindrical surface should be made as small as possible, in light of computational limitations, so as to maximize the precision of the resulting reconstruction. The projection of the pixels of each multiperspective panorama onto a particular cylindrical surface simply consists of a horizontal translation and a vertical scaling. Next, for each pixel location on each cylindrical surface, a fitness metric is computed for all the pixels projected from each multiperspective panorama onto the pixel location. This fitness metric provides an indication as to how closely a prescribed characteristic of the projected pixels match each other. Then, for each respective group of corresponding pixel locations of the cylindrical surfaces, it is determined which particular location of the group has a computed fitness metric that indicates the prescribed characteristic of the projected pixels matches more closely than the rest. This will be referred to as the winning pixel location. Specifically, the winning pixel location can be determined using any appropriate correlation-based or global optimization technique. If correlation-based methods are employed, it is preferred that this entail spatially aggregating the computed fitness metric of each pixel location on each respective cylindrical surface using the computed metrics of its neighboring pixel locations.
The theory behind this cylindrical sweep approach is that the pixels in each multiperspective panorama, which depict the same portion of the scene, will converge to the same location on one of the candidate cylindrical surfaces at the depth associated with that portion of the scene. For each winning pixel location, its panoramic coordinates are identified and these coordinates are designated to be the position of the portion of the scene depicted by the pixels projected from the multiperspective panoramas to that location. Finally, a depth map can be generated via conventional methods from the panoramic coordinates. It is also noted that if the cylinders are processed in front to back order, occlusion relationships can also be determined via conventional methods.
It was mentioned earlier that if conditions are right, traditional stereo matching algorithms can also be employed to compute the desired depth map. One example of the right conditions involves the use a pair of symmetric multiperspective panoramas. A symmetric pair of multiperspective panoramas in the case of swing panoramas is one where the off-axis angles are symmetric with respect to the swing line. In the case of concentric panoramas, a symmetric pair is one where each panorama was generated from images taken by cameras respectively facing in opposite direction at the same radius from the center of rotation. A symmetric pair of multiperspective panoramas has the characteristic that the epipolar geometry consists of horizontal lines. As such, any traditional stereo algorithm requiring this horizontal epipolar geometry (e.g., a hierarchical warp algorithm) can be employed to compute the desired reconstruction and depth maps.
However, it is desirable that more than two multiperspective panoramas be used to compute the depth map in order to obtain a more accurate and robust correspondence, while still employing the less computationally intense stereo algorithms. Such multi-image stereo matching algorithms do exist, but require a horizontal epipolar geometry between the input images. Fortunately, it has been discovered that under certain circumstances the epipolar geometry between the multiperspective panoramas produced in accordance with the methods of the present invention do sufficiently approximate the required horizontal lines that the aforementioned traditional multi-image stereo matching algorithms can be employed to compute the desired reconstruction. The circumstances that ensure this approximately horizontal epipolar geometry are as follows. First, the distance from the center of rotation to the viewpoints used to capture the images employed to construct the panorama should be no more than about 0.7 of the distance from the center of rotation of each multiperspective panorama to the nearest scene point depicted in the images. Second, in the case of a swing panorama, the off-axis angle should be less than about 15 degrees. If either of these conditions are met, current multi-image stereo matching algorithms can be applied without modification to compute the depth maps (or if only two multiperspective panoramas are available, the less accurate two-image based stereo matching algorithms can be used). It is noted that in the case of a swing panorama, it may be necessary to compensate for a global vertical scaling differences caused by a non-zero off-axis angle before applying the aforementioned traditional algorithms.
In addition to the just described benefits, other advantages of the present invention will become apparent from the detailed description which follows hereinafter when taken in conjunction with the drawing figures which accompany it.