The present invention relates to a local feature descriptor extracting apparatus, a method for extracting a local feature descriptor, and a program.
Schemes have been proposed which are intended to robustly identify an object in an image in the presence of a change in photographing size or angle or occlusion. The schemes involve detecting a large number of interest points (feature points) in an image and extracting a feature descriptor in a local region (local feature descriptor) around each of the feature points. As a typical scheme, Patent Literature 1 and Non Patent Literature 1 disclose local feature descriptor extracting apparatuses that use SIFT (Scale Invariant Feature Transform) feature descriptors.
FIG. 31 is a diagram showing an example of a general configuration of a local feature descriptor extracting apparatus that uses SIFT feature descriptors. Furthermore, FIG. 32 is a diagram showing an image of extraction of SIFT feature descriptors in the local feature descriptor extracting apparatus shown in FIG. 31.
As shown in FIG. 31, the local feature descriptor extracting apparatus includes a feature point detecting unit 200, a local region acquiring unit 210, a subregion dividing unit 220, and a subregion feature vector generating unit 230. The feature point detecting unit 200 detects a large number of interest points (feature points) in an image and outputs the coordinate position, scale (size), and orientation of each of the feature points. The local region acquiring unit 210 acquires a local region from which the feature descriptor is to be extracted based on the coordinate value, scale, and orientation of each of the detected feature points. The subregion dividing unit 220 divides the local region into subregions. In an example illustrated in FIG. 32, the subregion dividing unit 220 divides the local region into 16 blocks (4×4 blocks). The subregion feature vector generating unit 230 generates a gradient orientation histogram for each of the subregions of the local region. Specifically, the subregion feature vector generating unit 230 calculates a gradient orientation for each of pixels in each subregion and carries out quantization to obtain eight orientations. The determined orientation directions are directions relative to the orientations of the feature points output by the feature point detecting unit 200. That is, the orientation directions are directions normalized with respect to the orientations output by the feature point detecting unit 200. Then, the subregion feature vector generating unit 230 aggregates the frequencies of the quantized eight orientations for each subregion to generate a gradient orientation histogram. Thus, the gradient orientation histogram of 16 blocks×eight orientations generated for each feature point is output as a 128-dimensional local feature descriptor.
Patent Literature 1: U.S. Pat. No. 6,711,293
Non Patent Literature 1: David G. Lowe, “Distinctive image features from scale-invariant keypoints”, (U.S.), International Journal of Computer Vision, 60(2), 2004, p. 91-110
The above-described local feature descriptor disadvantageously has an increased size. For example, in order to express a histogram value in each dimension in 1 byte, the SIFT feature descriptor needs a size of 128 dimensions×1 byte. The thus increased size of the local feature descriptor may pose a problem when local feature descriptors are used to match images against each other (matching). For example, when a user terminal (for example, a portable terminal with a camera) extracts a local feature descriptor from an image and transmits the local feature descriptor to a server in order to search for an image similar to the image from which the local feature descriptor has been extracted, a large size of the local feature descriptor increases a communication time. This increases the amount of time until the result of an image search is obtained. Furthermore, a large size of the local feature descriptor increases a processing time when images are matched against each other based on local feature descriptors. In addition, in the image search using local feature descriptors, the local feature descriptors in the image are stored in a memory. However, a large size of the local feature descriptor reduces the number of images for which the local feature descriptors may be stored on the memory. Hence, the use of local feature descriptors is not suitable for large-scale image searches intended for a large number of images.