The present invention relates to a local feature descriptor extracting apparatus, a method for extracting a local feature descriptor, and a program.
Schemes have been proposed which are intended to robustly identify an object in an image in the presence of a change in photographing size or angle or occlusion. The schemes involve detecting a large number of interest points (feature points) in an image and extracting a feature descriptor in a local region (local feature descriptor) around each of the feature points. As a typical scheme, Patent Literature 1 and Non Patent Literature 1 disclose a local feature descriptor extracting apparatus that uses SIFT (Scale Invariant Feature Transform) feature descriptors.
FIG. 15 is a diagram showing an example of a general configuration of a local feature descriptor extracting apparatus that uses SIFT feature descriptors. Furthermore, FIG. 16 is a diagram showing an image of extraction of SIFT feature descriptors in the local feature descriptor extracting apparatus shown in FIG. 15.
As shown in FIG. 15, the local feature descriptor extracting apparatus includes a feature point detecting unit 200, a local region acquiring unit 210, a subregion dividing unit 220, and a subregion feature vector generating unit 230. The feature point detecting unit 200 detects a large number of interest points (feature points) in an image and outputs the coordinate position, scale (size), and orientation of each of the feature points. The local region acquiring unit 210 acquires a local region from which the feature descriptor is to be extracted based on the coordinate value, scale, and orientation of each of the detected feature points. The subregion dividing unit 220 divides the local region into subregions. In an example illustrated in FIG. 16, the subregion dividing unit 220 divides the local region into 16 blocks (4×4 blocks). The subregion feature vector generating unit 230 generates a gradient orientation histogram for each of the subregions of the local region. Specifically, the subregion feature vector generating unit 230 calculates a gradient orientation for each of pixels in each subregion and carries out quantization to obtain eight orientations. Then, the subregion feature vector generating unit 230 aggregates the frequencies of the quantized eight orientations for each subregion to generate a gradient orientation histogram. Thus, the gradient orientation histogram of 16 blocks×eight orientations generated for each feature point is output as 128-dimensional local feature descriptors.
Furthermore, Patent Literature 2 discloses a technique for improving the accuracy of searches and recognitions using local feature descriptors. The technique limits targets for calculation of local feature descriptors to feature points with high reproducibility sufficient to allow the feature points to be extracted even when the corresponding image has been, for example, rotated, enlarged, or reduced.
Patent Literature 1: U.S. Pat. No. 6,711,293
Patent Literature 2: Patent Publication JP-A-2010-79545
Non Patent Literature 1: David G. Lowe, “Distinctive image features from scale-invariant keypoints”, (U.S.), International Journal of Computer Vision, 60(2), 2004, p. 91-110
However, the techniques disclosed in Patent Literature 1 and Non Patent Literature 1 generate a local feature descriptor for all feature points extracted from an input image. Thus, the size of the local feature descriptors generated increases consistently with the number of detected feature points. The thus increased size of the local feature descriptors may pose a problem when the local feature descriptors are used to match images with each other (matching). For example, when a user terminal (for example, a portable terminal with a camera) extracts local feature descriptors from an image and transmits the local feature descriptors to a server in order to search for an image similar to the image from which the local feature descriptors have been extracted, a large size of the local feature descriptors increases a communication time. This increases the amount of time until the result of an image search is obtained. Furthermore, a large size of the local feature descriptors increases a processing time when images are matched with each other based on local feature descriptors. In addition, in the image search using local feature descriptors, the local feature descriptors in the image are stored in a memory. However, a large size of the local feature descriptors reduces the number of images for which the local feature descriptors may be stored on the memory. Hence, the use of local feature descriptors is not suitable for large-scale image searches intended for a large number of images.
Furthermore, the technique disclosed in Patent Literature 2 may limit the targets for calculation of the local feature descriptors to the feature points with high reproducibility. However, a large number of feature points with high reproducibility lead to a problem similar to the problem posed by the technique disclosed in Patent Literature 1 or Non Patent Literature 1.