Various computer vision tasks including camera pose estimation, tracking, multi-view stereo estimation, structure-from-motion determination, co-segmentation, retrieval, compression, etc., call for identifying parts of a first image that correspond to parts of a second image. Determining corresponding parts of a first image and a second image is referred to as correspondence estimation, or the task of estimating how parts of visual signals (e.g., images or volumes) correspond to each other.
Current techniques for determining correspondence between visual signals include modelling photometric and geometric transformations (e.g., occlusions, large displacements, viewpoints, shading, illumination change, etc.) and leveraging the determined models to infer correspondence. For example, a popular approach for determining correspondence between visual signals includes detecting interest or salient points in a visual signal (e.g., image or volume), and matching the interest or salient points to interest or salient points in another visual signal (e.g., image or volume) based on measuring the Euclidean distance using descriptors that are designed to be invariant to certain classes of transformations. While these current technologies can generate accurate matches, the computational complexity of matching potential interest points restricts the applicability of the current technologies to a small number of key-points. That is, the computational complexity limits the scalability of current techniques.