Visual odometry is a process of determining the position and orientation of a moving object (e.g., a robot) by analyzing the associated camera images. Real-time estimation of camera motion using only sparse sets of features from visual input in visual odometry is an active and challenging research topic in the computer vision and robotics communities. The number of features (e.g., points and lines) observed, noise-level (in feature detection as well as tracking) and their distribution, all have a major impact on the final camera motion estimate. For a real-time and robust implementation, it is preferable to have a unified framework that, independent of feature type, computes the camera motion from minimal sets over the available data.
One existing method for performing visual odometry from feature points with stereo cameras uses three feature points extracted from visual input data (e.g., video frames). Since the polynomial constraint for deriving camera motion parameters is configured to use the triangle law of cosines, this approach works only for a configuration of three points in general position and is therefore constrained in a random sample consensus (RANSAC) framework for establishing support.
Other existing methods (e.g., camera motion from four or five known three-dimensional (3D) points) solve polynomial equation systems that are established from geometrical constraints by enforcing algebraic conditions (e.g., rotation matrix orthonormality). The existing methods have tried to develop minimal solver based systems for point feature odometry, but did not make use of line features of visual input data to develop a robust, real-time camera motion estimation system.
Traditionally, line features have been employed in structure from motion algorithms using a multifocal tensor framework. A trifocal tensor is a 3×3×3 cube operator that expresses the projective geometric constraints between three views independent of scene structure. In general, the trifocal tensor has 27 parameters, but only 18 degrees of freedom (up to projective ambiguity). The remaining 9 constraints must be enforced to obtain a consistent solution. Existing four-view extension of multifocal tensor framework exploits the known orientation between a stereo pair of cameras in a quadrifocal tensor to enforce constraints between image intensities of adjacent stereo pairs. However, the existing multifocal tensor framework faces challenges in camera motion estimation due to lack of an efficient unified, closed-form formulation that makes full use of point and line features in a multifocal tensor framework.