People are increasingly interacting with computers and other electronic devices in new and interesting ways. For example, mobile devices are increasingly offering multiple high quality cameras that enable additional types of functionality. In some of these devices, one or more pairs of these high quality cameras can be used to provide three-dimensional (“3D”) image capture, such as stereoscopic image capture, for both still and video imaging. Additionally, the availability of these high quality cameras allows for a growing number of large digital image collections, where applications operating on these devices can use the camera to initiate search queries about objects in visual proximity to the user. Such applications can be used for identifying products, comparison shopping, finding information about movies, etc. Conventional systems have utilized feature-based object tracking algorithms, such as scale-invariant feature transform (SIFT) or speeded up robust feature (SURF) algorithm, to identify distinguishing feature points and calculate descriptors (unique fingerprints) for each feature point. In order to match the feature points identified by these algorithms to real-world objects, a computing device, or system in communication therewith, must compare the feature points to images stored for these real-world objects. Unfortunately, since there are so many objects and feature points, image databases often lack images from all possible angles and under various types of lighting conditions. Further, the feature points can be subject to geometric and photometric distortions encountered when the user captures the query photo from an arbitrary viewpoint, which often leads to unrecognized or misrecognized information.