The present technology relates to an image processing device, an image processing method, and a program. More specifically, the present technology is directed to removing some of feature points detected from an image in a spatially uniform manner.
Conventionally, in various circumstances such as when an object is searched for from an image, when a moving object is detected from an image sequence, or when alignment of a plurality of images is performed, it has become necessary to match identical objects between the plurality of images.
As a method of matching identical objects, a method called block matching or a feature point-based method is used.
In block matching, a given image is split into block regions, and SAD (Sum of Absolute Difference) or NCC (Normalized Cross Correlation) is computed. Then, on the basis of the computed SAD or NCC, a region having high similarity to each block is searched for from another image. This method involves quite a high computational cost as it is necessary to compute the similarity between block regions while gradually shifting the block center coordinates within the search range. Further, as it is necessary to search for a corresponding position even in a region that is difficult to be matched, the processing efficiency is low.
In the feature point-based method, a position that is easily matched, such as a corner of an object or a picture in an image, is first detected as a feature point. Methods of detecting feature points come in a variety of types. Representative methods include a Harris corner detector (see C. Harris, M. J. Stephens, “A combined corner and edge detector”, In Alvey Vision Conference, pp. 147-152, 1988), FAST (see Edward Rosten, Tom Drummond, “Machine learning for high-speed corner detection”, European Conference on Computer Vision (ICCV), Vol. 1, pp. 430-443, 2006), and DoG (Difference of Gaussian) maxima (see David G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, International Journal of Computer Vision (IJCV), Vol. 60, No. 2, pp. 91-110, 2004).
When feature points detected as described above are matched between two images, it becomes possible to perform alignment of the images. For example, an optimum image transformation matrix such as an Affine conversion matrix or a projection transformation matrix (homography), which describes the relationship between the coordinate systems of two images, is determined from the feature point coordinates and the correspondence relationship, using a robust estimation method. Using such an image transformation matrix allows alignment of the images.
Meanwhile, when feature points are detected from an image containing fine pictures (e.g., an image containing many trees or grasses), there may be cases where a large number of feature points are detected. Herein, if a large number of feature points are detected, a time required to search for a matching point of each feature point or compute an image transformation matrix becomes long. Therefore, if a first method of adjusting a threshold, which is regarded as a feature point, is used, it becomes possible to control the number of feature points. However, when the number of feature points is controlled by adjusting a threshold, there may be cases where the feature points are distributed in a spatially non-uniform manner. Meanwhile, in a second method called “Non Maximum Suppression,” the reliability (score) as a feature point is determined for each feature point, and only the feature points having higher scores than the neighboring feature points are left, whereby the number of the feature points is reduced. In the second method, all the feature points having high scores are left. Thus, the number of the feature points are dependent on the design of the picture, and thus it is impossible to control the number of feature points so that a desired number of feature points are left. In addition, it is impossible to remove feature points in a spatially uniform manner. Therefore, a method called “ANMS (Adaptive Non Maximal Suppression)” that improves the first and second methods is proposed (see M. Brown, R. Szeliski, S. Winder, “Multi-Image Matching Using Multi-Scale Oriented Patches”, Computer Vision and Pattern Recognition (CVPR), Vol. 1, pp. 510-517, 2005).