One function of image recognition devices is generally to search for similar images in a database, either two-by-two, for example, for suppressing duplicates in this base, either from a request for an image, for example for searching images illustrating the same subject as the request-image.
“By similar images”, in the context of this application means images illustrating the same object or the same scene, under potentially different snapshot conditions. This definition notably covers images which are modified in a synthetic way, for example by a compression operation or a mischievous filtering attack.
Multiple applications may be contemplated, such as for example the identification of stolen objects on on-line auction sites, the sorting of batches of photographs, and the identification of counterfeits of models or images.
Another function of these devices is to evaluate the likeness of two similar images, in particular, in order to arrange the resulting images in order according to their relevance.
There are multiple technologies in existence for use in image recognition. Most of these technologies, at least the most recent ones, are based on the use of local descriptors, characterizing more particularly interesting areas of an image.
More extensive information on the detection of these areas of interest and on the generation of descriptors relating to them may be found in the following article: Lowe D, Distinctive image features from scale-invariant keypoints, IJCB, 60 (2004) 91-110 (“the Lowe article”), the contents of which are hereby incorporated by reference in its entirety. The comparison and/or the search for images then amounts to comparing local descriptors with each other, which is finer than direct comparison of computer files with each other or than a comparison of images on the basis of global descriptors.
With time, the databases store in memory increasingly large amounts of images. Their use has also widely developed, in particular, through Internet.
In other words, the search for similar images involves constantly more numerous comparison operations. And in practice, this number is so large that it makes the application impossible for devices based on direct comparisons of descriptors with each other.
Another article, (the contents of which are incorporated herein by reference in its entirety), Sivic, J. and Zisserman, A., Video Google A Text Retrieval Approach to Object Matching in Videos (“the Sivic article”), in ICCV (2003), proposes that a match be made between the descriptors and an index. An integer value selected from a finite set of integer values is associated with each descriptor. Comparing images with each other then amounts to comparing set of integer values with each other, which requires few computational resources and accordingly accelerates the search.
However, the devices based on this latter technique, a so-called “bag-of-features” technique, do not give entire satisfaction. The number of images estimated to be similar is often too large, with very different images being considered by the device as similar, and/or it is not possible to distinguish the most relevant images.
Further, if the number of values of the index is increased, the opposite effect is obtained, i.e. the devices no longer return any image in practice.
In order to overcome this difficulty, another article entitled: Object retrieval with large vocabularies and fast spatial Matching, J. Philbin, O. Schum, M. Isard, J. Sivic and A. Zisserman, CPVR'2007 (the contents of which are incorporated by reference in its entirety), proposes to complete the technique of the “bag of features”, by re-evaluating the results obtained by means of this latter technique by integrating space information on the position of the points.
For example, the method disclosed in the Lowe article performs a so-called “Hough” transformation in order to determine every time the parameters of an affine transformation transforming the request-image into one of the resulting images. A score is assigned to each of the images, depending on the number of descriptors which verify the respective affine transformation. Calculating a Hough transform requires resources such that this transform cannot be applied for significant batches of images.
However, because of its very high computing cost, the reclassifying phase only applies to a limited number of images. For very large bases of images, similar images are thereby missed since the bag-of-features technique has not classified them sufficiently well.