1. Field of the Invention
The present invention is directed to a method for recovering 3D structure and camera motion, and more particularly to a linear algorithm for recovering the structure and motion data from points, lines, and/or directly from the image intensities.
2. Prior Art
The science of rendering a 3D model from information derived from a 2D image predates computer graphics, having its roots in the fields of photogrammetry and computer vision.
Photogrammetry is based on the basic idea that when a picture is taken, the 3D world is projected in perspective onto a flat 2D image plane. As a result, a feature in the 2D image seen at a particular point actually lies along a particular ray beginning at the camera and extending out to infinity. By viewing the same feature in two different photographs the actual location can be resolved by constraining the feature to lie on the intersection of two rays. This process is known as triangulation. Using this process, any point seen in at least two images can be located in 3D. It is also possible to solve for unknown camera positions as well with a sufficient number of points. The techniques of photgrammetry and triangulation were used in such applications as creating topographic maps from aerial images. However the photogrammetry process is time intensive and inefficient.
Computer vision techniques include recovering 3D scene structure from stereo images, where correspondence between the two images is established automatically from two images via an iterative algorithm, which searches for matches between points in order to reconstruct a 3D scene. It is also possible to solve for the camera position and motion using 3D scene structure from stereo images.
Current computer techniques are focused on motion-based reconstruction and are a natural application of computer technology to the problem of inferring 3D structure (geometry) from 2D images. This is known as Structure-from-Motion. Structure from motion (SFM), the problem of reconstructing an unknown 3D scene from multiple 2D images, is one of the most studied problems in computer vision.
Most SFM algorithms that are currently known reconstruct the scene from previously computed feature correspondences, usually tracked points. Other algorithms are direct methods that reconstruct from the images intensities without a separate stage of correspondence computation. Previous direct methods were limited to a small number of images, required strong assumptions about the scene, usually planarity or employed iterative optimization and required a starting estimate.
These approaches have complementary advantages and disadvantages. Usually some fraction of the image data is of such low quality that it cannot be used to determine correspondence. Feature-based method address this problem by pre-selecting a few distinctive point or line features that are relatively easy to track, while direct methods attempt to compensate for the low quality of some of the data by exploiting the redundancy of the total data. Feature-based methods have the advantage that their input data is relatively reliable, but they neglect most of the available image information and only give sparse reconstructions of the 3D scene. Direct methods have the potential to give dense and accurate 3D reconstructions, due to their input data's redundancy, but they can be unduly affected by large errors in a fraction of the data.
A method based on tracked lines is described in “A Linear Algorithm for Point and Line Based Structure from Motion”, M. Spetsakis, CVGIP 56:2 230-241, 1992, where the original linear algorithm for 13 lines in 3 images was presented. An optimization approach is disclosed in C. J. Taylor, D. Kriegmann, “Structure and Motion from Line Segements in Multiple Images,” PAMI 17:11 1021-1032, 1995. Additionally, in “A unified factorization algorithm for points, line segments and planes with uncertainty models” K. Morris and I. Kanade, ICCV 696-702, 1998, describes work on lines in an affine framework. A projective method for lines and points is described in “Factorization methods for projective structure and motion”, B. Triggs, CVPR 845-851, 1996, which involves computing the projective depths from a small number of frames. “In Defense of the Eight-Point Algorithm: PAMI 19, 580-593, 1995, Hartley presented a full perspective approach that reconstructs from points and lines tracked over three images.
The approach described in M. Irani, “Multi-Frame Optical Flow Estimation using Subspace Constraints,” ICCV 626-633, 1999 reconstructs directly from the image intensities. The essential step of Irani for recovering correspondence is a multi-frame generalization of the optical-flow approach described in B. Lucas and T. Kanade, “An Iterative Image Registration Technique with an Application to Stereo Vision”, IJCAI 674-679, 1981, which relies on a smoothness constraint and not on the rigidity constraint. Irani uses the factorization of D simply to fill out the entries of D that could not be computed initially.
Irani writes the brightness constancy equation (7) in matrix form as Δ=−DI, where D tabulates the shifts di and I contains the intensity gradients ∇I(pn). Irani notes that D has rank 6 (for a camera with known calibration), which implies that Δ must have rank 6. To reduce the effects of noise, Irani projects the observed Δ onto one of rank 6. Irani then applies a multi-image form of the Lucas-Kanade approach to recovering optical flow which yields a matrix equation DI2=−Δ2, where the entries of I2 are squared intensity gradients Ia Ib summed over the “smoothing” windows, and the entries of Δ2 have the form Ia ΔI. Due to the added Lucas-Kanade smoothing constraint, the shifts D or dni can be computed as D=−Δ2 [I2]+ denotes the pseudo-inverse, except in smoothing windows where the image intensity is constant in at least one direction. Using the rank constraint on D, Irani determines additional entries of D for the windows where the intensity is constant in one direction.