People have engaged in production of images representative of subjects deemed memorable by their perceivers for millenia, using a variety of progressively less labor intensive techniques and resulting in renditions capable of increasingly more detailed depictions of the subjects. As a result of the reduction in labor intensity, larger numbers of renditions tend to be produced for any given subject of interest to the perceiver.
With film photographic or paper imaging techniques, a relatively small number of images result from any one session in which such images are produced. Accordingly, sorting techniques relying on manual effort are sufficient for most purposes to identify similar images from a collection of images.
However, in some situations, comparisons of images to identify those having similarity to a given exemplary image or set of query characteristics can require intense or prolonged effort when performed using manual techniques. Examples of such comparisons include sorting of fingerprint and photograph images spanning a relatively large geopolitical area or a relatively large population to find a match to a specific image or set of characteristics.
Development of digital techniques for storing and comparing data can facilitate sorting of a large collection of data, for example by use of keywords to sort textual databases, to find matches to specific characteristics or datasets. However, this technology has also led to evolution of increasingly large databases, exacerbating the difficulties involved in selection or matching of bodies of data to locate those having specific characteristics.
One approach to sorting images to find those exemplifying particular characteristics involves associating each image with one or more keywords and then storing the keywords in a database together with at least information identifying the image associated with the keywords. Sorting of textual databases or groups to select examples associated with a particular Hollerith string, key word or keywords or textual content can yield matches having desired characteristics but this approach also has limitations.
Image sorting tasks include need to be able to account for differences in scale or perspective, rotation, translation, inclusion of a desired set of information within a larger body of data and other aspects not generally involved in textual data searching tasks. As a result, techniques have been developed to extract content-based information from digital images and to employ the content-based information in order to be able to search for specific characteristics, which may be exemplified by a sample image.
Typically, bodies of data representing images are constructed by indexing each image to generate a feature vector capturing some key properties of the image and then storing the resultant feature vector in a database or feature-base. A sample feature vector or “query” vector, which may result from indexing a sample image, is then compared to the stored feature vectors to identify those stored feature vectors most likely to correspond to images similar to a desired image. The feature vectors may be augmented using one or more keywords, as described above, to provide capability for low-level feature matching using the feature vectors coupled with matching of a limited amount of higher-level data in the form of keywords.
Feature-vector based image recognition algorithms need to meet conflicting criteria. A difference, such as a normalized scalar corresponding to subtraction of the sample or query feature vector and one of the stored feature vectors, should be large if and only if the images are not “similar”. The feature vectors should lend themselves to rapid formulation from an image. Smaller size of the feature vectors themselves can increase computational “efficiency” and decrease need for data storage capacity. Such efficiency may be manifested by facilitating rapid manipulation of the feature vectors. As a result, the types of feature vectors most frequently employed to date are based on relatively low-level features, in part to restrict the size of the resultant feature vectors and data bodies, and thus such feature vectors capture only some aspects of the image content.
Unfortunately, low-level information, such as that employed by such digital image search algorithms presently available, often does not provide a good match to human perceptions of the images. As a result, matching procedures based on these kinds of low-level data often do not provide satisfactory results. They may result, for example, in a relatively large number of false positives, particularly with large databases, or they may not manifest robust matching in view of appearance changes between images of a common subject (e.g., modified viewpoints from which otherwise similar images are generated). Additionally, some of the image sorting techniques developed to date require considerable user sophistication in order to employ them and to interpret the results.
The development of digital photography and increasingly user-friendly digital cameras and scanners capable of digitizing images permits a larger portion of the public to participate in digital image manipulation. Wider availability and distribution of increasingly powerful microprocessors facilitates sorting of digital images by automated apparatus, such as by a personal computer, even by persons having little or no formal training in computer techniques.
An exemplary product providing, among other things, a capability for sorting of digital images, is known as the “Adobe Photoshop Album”, available at www.adobe.com/products/photoshopalbum/main.html from Adobe Systems Incorporated, 345 Park Avenue, San Jose, Calif. A review of this product, entitled Adobe Photoshop Album Review, is available at www.dpreview.com/reviews/products/adobephotoshopalbum/page2.
The sorting performed by this product is based on a color similarity search across a given body of images using one or more user-provided sample images employed to form a query. However, color similarity is only one limited measure of match between images. Simple examples where color similarity would be expected to fail include comparison of a colored image to a sepia image or a gray scale image of highly similar subject matter.
Much research has been devoted to rendering digital photography user-friendly and to enable those having relatively little training or sophistication to provide and manipulate high quality images. However, improved automatic image sorting and matching continues to be desirable, particularly for photographic instruments intended to foster broad market acceptance across a user base reflecting different degrees of user sophistication.