Feature correspondences between two 2D images is an important topic in many computer vision applications. Examples of such applications, illustrated in FIG. 1, include stereo vision, motion tracking, 3D structural imaging of moving objects (SfM), feature recognition, odometry, and simultaneous location and mapping (SLAM).
Traditionally, features were tracked using conventional visual tracking (VT) methods such as optical flow, depending on image derivatives, patch matching, correlation, or optimization. The Lucas-Kanade tracker is one implementation of an optical flow method. These methods can provide high accuracy, but there is a tradeoff with speed—the number of features tracked is normally limited because of the computational cost incurred by tracking each feature.
Recently, with advances in machine learning or deep learning, much work on this topic has been focused on using a neural network, pre-trained with mostly synthetic natural imagery, with known ground truth motion. The advantage of such machine learning methods is their speed, which can allow for real-time tracking, but their downside is their lower accuracy.
Some efforts have been made to combine the two different approaches by using MLT to detect multi-pixel features, then feeding the results to a VT module that can track the features that have been detected. These efforts have the disadvantage that the initial detection process requires specific targets, so they cannot be applied to track individual pixels without any target in mind.
There is, therefore, a need for a detection-less (meaning that every pixel is a potential feature) method of tracking features, that can provide the speed and efficiency that are characteristic of machine learning-based tracking techniques, as well as the high accuracy and reliability that are characteristic of conventional visual tracking techniques.