Field of the Invention
The present invention relates to an image processing apparatus, an image processing method, and a storage medium.
Description of the Related Art
Methods for searching for a similar image using regional feature amounts (local feature amounts) of images have been proposed. Local feature amounts can be calculated in the following manner, for example. First, characteristic points (feature points) are extracted from an image (C. Harris and M. J. Stephens, “A combined corner and edge detector” in Alvey Vision Conference, pages 147-152, 1988). Then, local feature amounts are calculated based on the feature points and image information of regions surrounding the feature points (David G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer Vision, 60, 2 (2004), pages 91-110). In general, local feature amounts are expressed as vectors. When local feature amounts with rotational invariance or enlargement/reduction invariance are used, a similar image can be found by calculation even if an image is rotated, enlarged, or reduced. For example, Lowe describes calculation of local feature amounts that have rotational invariance by calculating a dominant direction from pixel patterns in local regions surrounding feature points, and performing directional normalization by way of rotation of the local regions on the basis of the dominant direction at the time of calculation of the local feature amounts. Also, local feature amounts that have enlargement/reduction invariance can be calculated by internally generating images of different scales, and extracting feature points and calculating local feature amounts from each of the images of different scales.
Local feature amounts of a plurality of feature points are calculated from one image using various methods. Matching between similar images is performed by comparing local feature amounts calculated for different images. The following describes a case in which an image similar to a search query image is searched for from a group of candidate images. In a voting method described in Japanese Patent Laid-Open No. 2009-284084, a candidate image is voted for when there are feature points having local feature amounts that are similar to local feature amounts of feature points extracted from a search query image. The larger the number of votes, the more the candidate image is determined to be similar to the search query image.
One example of other methods is RANSAC processing described in M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography”, Commun. ACM, No. 24, vol. 6, pages 381-395, June 1981. The following describes an example of RANSAC processing. In RANSAC processing, a pair of a feature point in a search query image and a feature point in a candidate image (a feature points pair) is set such that a similarity between local feature amounts of the two feature points is equal to or larger than a threshold. Next, some (e.g., two) feature points pairs are randomly selected from among a plurality of feature points pairs. Furthermore, with reference to the coordinates of the selected feature points pairs (e.g., in the case of two feature points pairs, the coordinates of four feature points), a function, such as an affine transformation, for transforming the coordinates of a feature point in the search query image into the coordinates of a feature point in the candidate image composing the same pair is derived. In one example, a transformation matrix for transforming the coordinates of a feature point in the search query image into the coordinates of a feature point in the candidate image is derived. Then, whether the remaining feature points pairs satisfy the transformation is determined, that is to say, whether transformation of the coordinates of feature points in the search query image yields the coordinates of feature points in the candidate image is determined. If the number of feature points pairs satisfying the transformation is equal to or larger than a preset threshold, it is determined that the search query image matches the candidate image. On the other hand, if the number of feature points pairs satisfying the transformation is smaller than the threshold, some new feature points pairs are randomly selected, and similar processing is repeated. If the number of iterations has reached an upper limit, it is determined that the search query image does not match the candidate image.