Visual odometry is the process of incrementally estimating the current state of an object (defined as position and heading) using only camera images. Typical monocular visual odometry comprises feature extraction, feature matching between images, motion estimation, and local optimization. The features, typically extracted at corner points (e.g., corners of buildings and/or the like), are used to establish the correspondence between two temporally spaced monocular images. A feature matching framework is employed to filter out incorrect correspondences and return a list of pixel-wise matches. The motion estimation step uses the pairwise matches and the camera matrix to recover the translation and rotation between the matched features. The local optimization step constrains the space of rotation and translation so that the visible triangulated feature points remain consistent across all the frames.
The feature extraction and matching steps in typical visual odometry are highly sensitive to illumination conditions. Typical feature locations, such as those associated with corner points, are unstable across changes in scale, and orientation. Additionally, standard monocular or stereo visual odometry requires highly textured regions to perform image matching. As a result, the lack of texture and discernable features in the imagery captured in low illumination conditions, such as at dusk and during the night, may cause the system to incorrectly estimate the state of the object. The lack of ambient illumination causes traditional visual odometry systems to fail due to insufficient discernible features and/or texture to effectively perform the matching of features between images and the local optimization.