Feature extraction and matching are common low level building blocks in computer vision pipelines. By tracking feature points temporally, ego-motion of the capturing platform or a motion model of the observed objects in a scene may be estimated. In order to track the feature points, a matching algorithm is used to find the most probable correspondences between feature points in a reference frame and a target frame. In order to match pairs of feature points, each feature point is represented by a descriptor. The matching algorithm uses a distance function that compares the two descriptors. The minimal distance is taken as pointing to the best pair correspondences.
Conventional techniques apply a brute-force approach to matching feature points. The result is a list of correspondences with respective matching distance results, showing the best matching (if existing), for each reference feature, in the target feature set. The matching process is prone to errors in some cases. In one example, a specific reference feature point may not have an actual target feature point match. As a result, the next best target feature point is wrongly matched with the specific reference feature point. In another example, a specific reference feature point may have an actual target feature point match, but the best match score wrongly points to another target feature point.
In order to minimize such errors, a cross check process is applied. The conventional cross-check process is computationally expensive. The conventional cross-check process runs the matching process twice, or uses large memory buffers in order to store intermediate descriptor matching results. A simpler approach only runs the second, cross-check matching on target features that have been identified as possible pairs of the first pass matching process. However, in a worst case scenario, the full cross-check process has to be performed, which is costly.
It would be desirable to implement an approximate cross-check for real-time feature matching.