The present invention relates to an apparatus, a method, and a program for accurately identifying an object in an image.
To enable robust identification of a subject in an image with respect to variations in photographed size and angle and to occlusion, systems are proposed which detect a large number of characteristic points (feature points) in the image and which extract a descriptor of a local area (a local descriptor) around each feature point. As representative systems thereof, Patent Document 1 and Non-Patent Document 1 disclose local descriptor extraction apparatuses that use a SIFT (Scale Invariant Feature Transform) descriptor.
Conventionally, with a local descriptor extraction apparatus, information related to brightness is first exclusively extracted from each pixel in an image, a large number of characteristic points (feature points) are detected from the extracted brightness information, and feature point information that is information related to each feature point is outputted. In this case, feature point information indicates, for example, a coordinate position or a scale of a detected local feature point or an orientation of a feature point. Subsequently, a local area from which descriptor extraction is to be performed is acquired from the feature point information that is a coordinate position, a scale, an orientation, or the like of each detected feature point to generate (describe) a local descriptor.
For example, as described in Non-Patent Document 1, in order to identify an image showing a same subject as a subject in a photographed image, a local descriptor 1 extracted from the photographed image or, in other words, an input image is compared with a local descriptor 2 generated from a reference image. Specifically, distance calculations on a feature space are performed on all combinations of respective descriptors of areas in a vicinity of feature points constituting the local descriptor 1 and respective descriptors of areas in a vicinity of feature points constituting the local descriptor 2. A nearest descriptor is determined as a corresponding descriptor. The corresponding descriptor is determined so as to also correspond to a feature point that is a source of descriptor generation. Subsequently, regarding a combination of feature points determined to be corresponding feature points, whether the corresponding feature points are correct or erroneous is determined based on whether or not coordinate positions resulting from moving coordinate positions of the feature points in the input image in accordance with a specific geometric transformation are consistent with coordinate positions of the feature points in the reference image. When the number of feature points determined to be correctly corresponding feature points is equal to or larger than a prescribed value, it is determined that a same subject is shown (in other words, the subject in the input image and the subject in the reference image are consistent with each other).    Patent Document 1: U.S. Pat. No. 6,711,293    Patent Document 2: Patent Publication JP2010-79545A    Non-Patent Document 1: David G. Lowe, “Distinctive image features from scale-invariant keypoints”, USA, International Journal of Computer Vision, 60 (2), 2004, pages. 91-110
Conventional object identification systems that utilize local descriptors identify an object based on a correspondence relationship between a local descriptor extracted from brightness information of an input image and a local descriptor extracted from brightness information of a reference image. With such an identification method, when an object shown in the input image and an object shown in the reference image differ from each other but the difference between the two objects is minute, there is a problem that the images are erroneously identified to show a same object due to the existence of a large number of corresponding feature points.