1. Field of the Invention
The present invention relates to the field of the image analysis.
2. Description of the Related Art
In the field of the image analysis, a common operation provides for comparing two images in order to find the relation occurring therebetween in case both the images include at least a portion of a same scene or of a same object.
Among a high number of applications, the image comparison is of the utmost importance for calibrating video cameras belonging to a multi-camera system, for assessing the motion occurring between two frames of a video shoot, and for the recognition of an object within an image (e.g., a picture). The latter application is now assuming more and more importance due to the recent development of object recognition algorithms specifically designed to be employed in the so-called visual searching engines, i.e., automated services that, starting from a picture, are capable of identifying the object(s) pictured therein and offering information related to the identified object(s). Examples of known services of this type include Google Goggles, Nokia Point&Find, and kooaba Smart Visuals. An object recognition application provides for comparing a first image—in jargon, referred to as “query image”—depicting an object to be recognized with a plurality of reference images, each one depicting a respective known object; this allows to perform a comparison among the object depicted in the query image and the objects depicted in the reference images.
The reference images are typically arranged in a proper reference database. The higher the number of reference images included in the database, the higher the number of comparing operations to be performed. In some cases the reference database may become very large, negatively affecting the efficiency of the object recognition process. For example, in case the object recognition is exploited in an online shopping scenario, wherein each reference image corresponds to an item offered by an online store (e.g., the picture of a book cover, a DVD cover and/or a CD cover), the number of reference images may exceed few millions of unities. Moreover, in order to efficiently manage such huge amount of data, the comparing operations should be performed by a processing unit provided with a sufficient processing power.
In the last decade, different algorithms have been proposed for reducing the time required to perform object recognition. These algorithms provides for heavily reducing the number of reference images which are candidate to include the object depicted in the query image.
A very efficient way for performing comparing operations between two images provides for selecting a set of points—in jargon, referred to as keypoints—in the first image and then matching each keypoint of the set to a corresponding keypoint in the second image. The selection of which point of the first image has to become a keypoint is carried out by taking into consideration local features of the area of the image surrounding the point itself. On this regard, see “Distinctive image features from scale-invariant keypoints” by David G. Lowe, International Journal of computer vision, 2004.
If a matching between a keypoint of the first image and a corresponding keypoint of the second image is correct, in the sense that both keypoints correspond to a same point of a same object (depicted in both the two images), such keypoint match is referred to as “inlier”.
Conversely, if a matching between a keypoint of the first image and a corresponding keypoint of the second image is incorrect, in the sense that the two keypoints do not correspond to a same point of the same object, such keypoint match is referred to as “outlier”.
Therefore, in order to obtain a reliable result, a procedure capable of distinguishing the inliers from the outliers is advantageously performed after the keypoint matches have been determined.
Several examples of procedures of this type are already known in the art.
The most used procedure makes use of the RANSAC algorithm disclosed in “Random sample consensus: A paradigm for outlier fitting with applications to image analysis and automated cartography” by Martin A. Fischler and Robert C. Bolles, Communications of the ACM, 24(6):381-395, June 1981. However, this algorithm is time consuming, because is based on an iterative approach.
The algorithms disclosed in “Fast geometric re-ranking for image-based retrieval” by Sam S. Tsai, Davide Chen, Gabriel Takacs, Vijay Chandrasekhar, Ramakrishna Vedantham, Radek Grzeszczuk, Bernd Girod, International Conference on Image Processing, October 2010, and in the international patent application WO2009/130451 are based on the fact that the ratio between the distances of keypoints is an invariant under translation, rotation, and scaling. Further algorithms of this type are also disclosed in “Adding Affine Invariant Geometric Constraint for Partial-Duplicate Image Retrieval” by Zhipeng Wu, Qianqian Xu, Shuqiang Jiang, Qingming Huang, Peng Cui, Liang Li, International Conference on Pattern Recognition, August 2010, pages 842-845, and in “Using Local Affine Invariants to Improve Image Matching” by Daniel Fleck, Zoran Duric, 20th International Conference on Pattern Recognition, 2010, pages 1844-1847.
Further, US 2010/0135527 A1 discloses an image recognition algorithm including a keypoints-based comparison and a region-based color comparison. A method of identifying a target image using the algorithm includes: receiving an input at a processing device, the input including data related to the target image; performing a retrieving step including retrieving an image from an image database, and, until the image is either accepted or rejected, designating the image as a candidate image; performing an image recognition step including using the processing device to perform an image recognition algorithm on the target and candidate images in order to obtain an image recognition algorithm output; and performing a comparison step including: if the image recognition algorithm output is within a pre-selected range, accepting the candidate image as the target image; and if the image recognition algorithm output is not within the pre-selected range, rejecting the candidate image and repeating the retrieving, image recognition, and comparison steps.
US2010/0183229 A1 refers to a method, system and computer program product for matching image. The images to be matched are represented by feature points and feature vectors and orientations associated with the feature points. First, putative correspondences are determined by using feature vectors. A subset of putative correspondences is selected and the topological equivalence of the subset is determined. The topologically equivalent subset of putative correspondences is used to establish a motion estimation model. An orientation consistency test is performed on the putative correspondences and the corresponding motion estimation transformation that is determined, to avoid an infeasible transformation. A coverage test is performed on the matches that satisfy orientation consistency test. The candidate matches that do not cover a significant portion of one of the images are rejected. The final match images are provided in the order of decreasing matching, in case of multiple images satisfying all the test requirements.
“An Evaluation of Affine Invariant-Based Classification for Image Matching” by Daniel Fleck et al, 30 Nov. 2009, ADVANCES IN VISUAL COMPUTING, SPRINGER BERLIN HEIDELBERG, BERLIN, HEIDELBERG, page(s) 417-429, discloses a detail evaluation of an approach that uses affine invariants for wide baseline image matching. Specifically, the approach uses the affine invariant property that ratios of areas of shapes are constant under an affine transformation. Thus, by randomly sampling corresponding shapes in the image pair a histogram of ratios of areas can be generated. The matches that contribute to the maximum histogram value are then candidate inliers.
“Affine Invariant-Based Classification of Inliers and Outliers for Image Matching” by Daniel Fleck et al, 6 Jul. 2009, IMAGE ANALYSIS AND RECOGNITION, SPRINGER BERLIN HEIDELBERG, BERLIN, HEIDELBERG, page(s) 268-277, discloses an approach to classify tentative feature matches as inliers or outliers during wide baseline image matching. Specifically, the approach uses the affine invariant property that ratios of areas of shapes are constant under an affine transformation. Thus, by randomly sampling corresponding shapes in the image pair a histogram of ratios of areas can be generated. The matches that contribute to the maximum histogram value are then candidate inliers. The candidate inliers are then filtered to remove any with a frequency below the noise level in the histogram. The resulting set of inliers is used to generate a very accurate transformation model between the images.
Further, “Statistical modelling of outliers for fast visual search”, by S. Lepsoy, G. Francini, G. Cordara, P. P. B. de Gusmao, IEEE International Conference on Multimedia and Expo (ICME), 2011, discloses that the matching of keypoints present in two images is an uncertain process in which many matches may be incorrect. The statistical properties of the log distance ratio for pairs of incorrect matches are distinctly different from the properties of that for correct matches. Based on a statistical model, it is proposed a goodness-of-fit test in order to establish whether two images contain views of the same object. This technique can be used as a fast geometric consistency check for visual search.