Conventionally, a technique is known of searching for data having a similarity or the degree of association with input query data meeting predetermined conditions, from a plurality of items of data registered in a database.
As an example of these techniques, a technique of neighborhood search is known of indicating a similarity or the degree of association between data as a distance in a multi-dimensional space, and selecting data having the distance to query data meeting predetermined conditions. For example, an information processing device which executes such neighborhood search calculates a distance between data registered in a database and input query data, and searches for data having the calculated distance in a predetermined range as neighborhood data of query data.
Meanwhile, when multiple items of data are registered in the database, if the information processing device executes linear search of calculating distances between all items of data registered in the database and input data, calculation cost for neighborhood search is high. Further, a technique is known of reducing calculation cost for executing neighborhood search by creating in advance an index based on data registered in a database, and limiting the number of items of data for which distances to query data are calculated.
For example, an information processing device which hierarchically adopts this technique divides a multi-dimensional space using a format such as KD (K Dimensional)-Tree, SR (Sphere/Rectangle)-Tree or R-Tree. Further, the information processing device limits the number of items of data for which distances to query data are calculated, by limiting the space of a neighborhood search target, and reduces calculation cost.
Further, a technique is known of pruning data which can be obviously excluded from neighborhood candidates of query data, by setting a reference point in the multi-dimensional space and calculating in advance the distance between the set reference point and each data registered in a database. For example, the information processing device which adopts this technique calculates in advance the distance between the reference point and each data registered in the database. Further, the information processing device calculates the distance between query data and the reference point when executing neighborhood search.
Further, the information processing device prunes data which can be obviously excluded from neighborhood candidates according to a triangular inequality, based on the distance between query data and the reference point and the distance between the reference point and each data. Hence, the information processing device limits the number of items of data for which distances to query data are calculated, and reduces calculation cost.
Further, a technique is known of generating a B+Tree index, using a distance between a reference point and data registered in a database as an index key. For example, an information processing device which adopts this technique generates an index key based on a distance from the reference point when registering data in the database, and registers data in the index together with the generated key. Further, the information processing device limits the number of items of data for which distances to query data and reduces calculation cost by searching for registration data having a key in a neighborhood range of query data as a neighborhood candidate, using an index when executing neighborhood search.
Further, a technique is known of increasing the speed of searching for data positioned in a neighborhood range of query data by tolerating an error in the neighborhood range of query data. An information processing device which adopts this technique reduces calculation cost for neighborhood search and increases the speed of neighborhood search by approximately searching for data positioned in the neighborhood of query data.
In some cases, the above information processing device which executes neighborhood search executes biometric authentication of, for example, acquiring biological data such as fingerprints and vein shapes as query data, and determining whether or not biological data of the same person as a person indicated by the acquired biological data is stored in a database. An information processing device which executes this biometric authentication limits the number of items of data which are targets of biological data matching processing which requires high calculation cost, and reduces calculation cost by narrowing down data which is similar to query data by neighborhood search.    Patent Document 1: Japanese Laid-open Patent Application No. 2009-199151    Patent Document 2: Japanese Laid-open Patent Application No. 2007-073063    Patent Document 3: Japanese Laid-open Patent Application No. 2004-046612    Patent Document 4: International Publication Pamphlet No. WO 2008/026414    Patent Document 5: Japanese Laid-open Patent Application No. 2005-322161    Patent Document 6: Japanese Laid-open Patent Application No. 2010-224903    Non-Patent Document 1: B. Bustos, G. Navarro, and E. Chaves: Pivot selection techniques for proximity searching in metric spaces, Pattern Recognition Letters, Vol. 24, No. 14, pp. 2357-2366 (2003)    Non-Patent Document 2: H. V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu and Rui Zhang iDistance: An Adaptive B+-tree Based Indexing Method for Nearest Neighbor Search, ACM Transactions on Data Base Systems (ACM TODS), Vol. 30, No. 2, pp. 364-397 (2005)
However, the above techniques of performing neighborhood search are directed to performing neighborhood search of query data based only on a distance between each data registered in a database and query data, using registration data as one point in the multi-dimensional space. Hence, when there is a different error per data registered in the database, there is a problem that appropriate neighborhood search may not be performed.
For example, when biological data such as fingerprints and veins changes depending on a situation upon acquisition, and therefore, even with biological data of the same person, query data hardly matches completely with data registered in the database. Further, data registered in the database and query data have variations depending on individual differences or an environment in which biological data is acquired.
Hence, in some cases, when executing neighborhood search based only on a distance between data and query data using biological data of significant variation as query data, the information processing device causes search omission for data which needs to be included in a search result.
FIG. 22 is a view for describing conventional neighborhood search. For example, with an example illustrated in FIG. 22, an information processing device uses data included in a range of (B) in FIG. 22 from query data indicated by (A) in FIG. 22 as data in a neighborhood range. Meanwhile, even when (C) in FIG. 22 is data which needs to be included in a search result, if data has significant variation and input query data is far from the data indicated by (C) in FIG. 22, the information processing device causes search omission for data which needs to be included in the search result.
Further, FIG. 23 is a view for describing conventional neighborhood search. As illustrated in, for example, FIG. 23, when, as indicated by (B) in FIG. 23, setting a wide range which is a neighborhood of query data indicated by (A) in FIG. 23 to prevent search omission for data, too much data with little variation is included in a search result. As a result, the information processing device increases the number of items of data for which distances to query data are calculated, resulting in increasing calculation cost for neighborhood search.