Estimation of the structure of a scene and the motion of a camera rig using only visual input data has been an important problem in computer vision research. Structure and motion (also called structure from motion) is the problem of reconstructing the geometry of a scene from a stream of images based on the estimation of camera motion. Conventional systems rely on using finite points or line segments for structure and motion estimation. However, structure and motion estimation based on finite line segments has the drawback that limits the quality of three-dimensional (3D) structure reconstruction due to unreliable determination of the points or end points of the line segments. In the worst case scenario, such as a cluttered environment where lines are often occluded, the end points may be false corners or unstable T-junctions. The unreliable determination of the end points of the occluded line segments often leads to poor quality of 3D structure reconstruction from line correspondences across multiple views.
A variation of structure and motion estimation based on finite line segments tries to minimize reprojection error in the image plane for variable end points across views. However, the cost function associated with the method can be optimized only by iterative local minimization which requires initialization and is not amendable to a real-time implementation. Similarly, existing line-based systems for structure and motion compensation are not optimized for real-time implementation.
Multifocal tensors are another commonly used approach for structure and motion estimation from several line correspondences. However, this approach involves extensive book-keeping to enforce non-linear dependencies within tensor indices. Further, this approach uses a large number of line correspondences to produce a linear estimate, which can be arbitrarily far from a rigid-body motion in the presence of noise. Thus, multifocal tensors approach is too cumbersome for estimating even a 6-degree-of-freedom (dof) motion between two calibrated stereo pairs.