A method has been suggested to extract feature quantities in a local area (local feature quantities) around each feature point by detecting many distinctive points in an image (feature points) in order to identify a subject in the image in a robust manner against changes in the size of image-capturing and the angle, and occlusion. A local feature quantity extraction device using SIFT (Scale Invariant Feature Transform) feature quantity is disclosed in PTL1 and NPL1 as a typical method.
First, a local feature quantity extraction device extracts information about brightness from each pixel of an image, and detects many distinctive points (feature points) from the extracted brightness information. Then, the local feature quantity extraction device outputs feature point information which is information about each of the detected feature points. In this case, for example, the feature point information indicates the coordinate position and the scale of a detected local feature point, the orientation of the feature point, and the like. Then, the local feature quantity extraction device obtains a local area, where feature quantity extraction is performed, from the feature point information, i.e., the coordinate value, the scale, the orientation, and the like of each of the detected feature points, and generates (describes) local feature quantities.
Then, an image including the same subject as the subject in the captured image is identified by comparing a local feature quantity 1 extracted from the captured image (i.e., input image) with a local feature quantity 2 generated from a referred image as described in NPL 1. More specifically, first, distances of all the combinations of the feature quantities describing information about each feature point constituting the local feature quantity 1 and the feature quantities constituting the local feature quantity 2 are calculated in the feature space. Then, a combination of the local feature quantity 1 and the local feature quantity 2 of which calculated distance is the closest is determined to be corresponding feature quantities. Further, a combination of feature points which are the sources for generating these corresponding feature quantities is also determined to be corresponding. Thereafter, a determination is made as to whether the combination of feature points determined to be corresponding moves according to particular geometric transformation information from the coordinate position of the feature point in the input image to the coordinate position of the feature point in the reference image. Whether the corresponding feature points are correct or incorrect is determined on the basis of the determination of this movement. In this case, when the number of feature points determined to be correctly corresponding is equal to or more than a preset value, the same subject is determined to be shown.