Computer-based systems have been developed to locate or recognize certain objects in digital images. For instance, applications have been developed that receive an image captured from a hand-held device, recognize one or more objects in such image, and then output information pertaining to the recognized object. In a specific example, an individual can utilize a camera on a mobile telephone to capture an image of a landmark, can upload the image of the landmark to a server, and the server can include an object recognition system that recognizes the landmark in the image. The server can then transmit information pertaining to the landmark back to the mobile telephone of the individual.
Generally, object recognition systems recognize objects by extracting image patches from a first image, generating descriptors for the image patches, and then comparing those descriptors with descriptors from a second image that include a known object. If there is sufficient similarity between descriptors of the first image and the second image, then it can be ascertained (with a certain probability) that the first image and the second image include a substantially similar object.
If the first image and the second image were identical, such that the images were taken in identical lighting from identical angles and at identical distances from an object, then object recognition would be relatively trivial. In real-world applications, however, no two images are the same. That is, between images of a substantially similar object, many factors pertaining to the object in the image can change, including position of the object across images, distance from the camera to the object across images, ambient light can alter with respect to when two images were generated, and/or an object can change over time. Accordingly, it is desirable to generate descriptors that are robust with respect to these variables across images.
Many approaches exist for selecting image patches and generating descriptors for image patches. These approaches include the use of Histograms of Gradients (HoG), which is defined as the histogram of image gradients over a combination of positions, orientations, and scales. While HoG has shown to be an effective mechanism for describing image patches, descriptors generated by such approach are somewhat variant with respect to image illumination and position of an object in an image.