1. Field
One feature relates to computer vision, and more particularly, to methods and techniques for improving performance, efficiency, and reducing computational complexity of image recognition techniques.
2. Background
Various applications may benefit from having a machine or processor that is capable of identifying objects in a visual representation (e.g., an image or picture). The field of computer vision attempts to provide techniques and/or algorithms that permit identifying objects or features in an image, where an object or feature may be characterized by descriptors identifying one or more keypoints. These techniques and/or algorithms are often also applied to face recognition, object detection, image matching, 3-dimensional structure construction, stereo correspondence, and/or motion tracking, among other applications. Generally, object or feature recognition may involve identifying points of interest (also called keypoints) in an image for the purpose of feature identification, image retrieval, and/or object recognition. Preferably, the keypoints may be selected and the patch(es) around them processed such that they are invariant to image scale changes and/or rotation and provide robust matching across a substantial range of distortions, changes in point of view, and/or noise and change in illumination. Further, in order to be well suited for tasks such as image retrieval and object recognition, the feature descriptors may preferably be distinctive in the sense that a single feature can be correctly matched with high probability against a large database of features from a plurality of target images.
After the keypoints in an image are detected and located, they may be identified or described by using various descriptors. For example, descriptors may represent the visual features of the content in images, such as shape, color, texture, and/or rotation, among other image characteristics. The individual features corresponding to the keypoints and represented by the descriptors are then matched to a database of features from known objects. Therefore, a correspondence searching system can be separated into three modules: keypoint detector, feature descriptor, and correspondence locator. In these three logical modules, the descriptor's construction complexity and dimensionality have direct and significant impact on the performance of the feature matching system.
Such feature descriptors are increasingly finding applications in real-time object recognition, 3D reconstruction, panorama stitching, robotic mapping, video tracking, and similar tasks. Depending on the application, transmission and/or storage of feature descriptors (or equivalent) can limit the speed of computation of object detection and/or the size of image databases. In the context of mobile devices (e.g., camera phones, mobile phones, etc.) or distributed camera networks, significant communication and power resources may be spent in transmitting information (e.g., including an image and/or image descriptors) between nodes. Feature descriptor compression is hence important for reduction in storage, latency, and transmission.
Computer vision and/or image capture implementations tend to be processing intensive. Object recognition is often hampered by an imprecise feature matching process that is exacerbated by affine transformations and other distortions, leading to reduced true positives (recognition) and increased false positives (reduced precision). In areas of computer vision such as the classifier stage of object recognition systems, wide baseline stereo matching, and pose estimation, an important step is the fitting of a correct model using contaminated data. A basic assumption is that the data consists of “inliers”, i.e., data (or points) whose distribution can be explained by some set of model parameters, and “outliers” which are data that do not fit the model. Geometric consistency or verification is often imposed to reject outliers after the matching process in an object recognition system but the computational cost is high and often prevents real-time operation of object recognition systems. The parameters of a data fitting model might be used, for example, for the estimation of a fundamental matrix in stereo matching or projective transformation for outlier rejection in object recognition and outlier rejection in image stitching. For example, RANdom SAmple Consensus (RANSAC) is a data fitting model widely used to work with contaminated data and works by randomly sampling a set of points from data to estimate model parameters and iteratively verify against all the data to determine the fitting. However, as the ratio of inliers to outliers drops, a RANSAC algorithm becomes exponentially slower (i.e., slower convergence rate).
Therefore, there is a need to improve the slow convergence rate of geometric verification techniques and/or eliminate the need for geometric verification.