A nearest neighbor searching technique is an important technique that is used in various application fields. For example, the nearest neighbor searching technique is used in an application field in which data having similar features are detected from an enormous database in consideration of the similarity of data such as an image and a Web page. As a very simple method, the feature of each data is represented as one point in a space (for example, one point in a Euclidean space), in which a distance is defined, and the distances from a query point to all the data are calculated. Thus, the data near to the query can be detected. However, a method of directly calculating the distances requires a very expensive cost, and as the number of the data increases, a calculation load is nonlinearly increased. Thus, various methods have been proposed.
Patent Literature 1 (JP 2004-021430A) discloses an image searching apparatus that can accurately search an image, which includes an image of a particular photographic subject, as a similar image without any cumbersome operation of a user. The image searching apparatus extracts the image similar to a reference image serving as a search key, from among a search target image group that includes a plurality of images targeted for a search. The image searching apparatus includes: a section for dividing the reference image and the respective images included in the search target image group into a plurality of regions; a section for extracting at least one feature amount from the each of the regions of each of the images included in the search target image group and the reference image; and a section for selecting a part of the region for each image included in the search target image group, and sequentially selecting the images from the search target image group, and selecting a predetermined number of regions of the selected image based on a similarity between a feature amount extracted from each of the regions of the reference image and a feature amount extracted from each region of the selected image; and a section for extracting the image similar to the reference image from the search target image group, based on the feature amount of a partial region selected for each image included in the search target image group.
Patent Literature 2 (JP 2005-070927A) discloses an image feature acquiring method that acquires an image feature, by which an image similar to a drawn shape can be detected, irrespective of a coincident degree of images. The image feature acquiring method is a method that acquires an image feature of a two-dimensional image. The two-dimensional image is changed to a predetermined size, and if the image is a color image, the image is converted into a scale image in grayscale. Then, one-dimensional raster image in a horizontal direction is generated by sequentially connecting an end point of a sequence of pixels connected in the horizontal direction from a start point which is set in one of a right side and a left side of the two-dimensional image, and a start point of a next sequence of pixels in a vertical direction in each scale image, and one-dimensional raster image in the vertical direction is generated by sequentially connecting an end point of a sequence of pixels connected in the vertical direction from a start point which is set in one of a right side and a left side of the two-dimensional image, and a start point of a next sequence of pixels in a horizontal direction in each scale image. A proper conversion processing is carried out to these one-dimensional raster images.
Patent Literature 3 (JP H10-326286A) discloses a similarity searching apparatus in which the precision of a similarity search can be improved and there is a low possibility that an important similarity data is removed from a search result. The similarity searching apparatus is characterized by including: a vector database for accumulating a plurality of vector data which are generated for plurality of targets, and in which a plurality of attributes for indicating the features of the targets serve as vector configuration elements; a target vector data generating section for generating vector data for a specified similarity search target; a search condition set generating section for generating a plurality of search conditions; a similarity searching engine for searching vector data, which satisfies the above search conditions and is similar to the above target vector data, from the plurality of vector data accumulated in the above vector database, for each individual search condition generated by this search condition set generating section; and a search result display for displaying a result searched by the above similarity searching engine for each individual search condition.
Typically, in the nearest neighbor search in a higher-dimensional space, a problem becomes further difficult, as compared with a lower-dimensional space. For this reason, an approximate nearest neighbor search method is proposed in which nearest neighbor data is not determined through strict distance calculation for enormous higher-dimensional data, and data in a near distance is determined approximately or probabilistically. One typical example is LSH (Locality Sensitive Hashing) (refer to Non-Patent Literature 1). The LSH is a method that uses a hash function in which as a distance between optional two points is closer, they collide at a high probability (have a same value), and it is possible to reduce a time required for a nearest neighbor detection to a query input. Here, when the LSH is used, a probability at which data p collides with a query q is based on only the distance d(p,q). Thus, all of a plurality of data located on a circumference of a circle having the query q as a center collide with the query q at a same probability.