A technique is known in which each feature quantity of a plurality of pieces of content is distributed in a feature quantity space as a feature point and is classified into at least two groups, and a line or a hyperplane having an identical vertical distance from all the feature points in each group is obtained. In this technique, a hash function that converts the content to a binary value is created based on which of the spaces divided by the obtained line or hyperplane the feature point is positioned in.
Also, a technique is known in which similarity degree data indicating the similarity between the learned data used for creating learning data and unprocessed data is created for each of a plurality of learning data. In this technique, a part of learning data among the plurality of learning data is selected based on the created similarity degree data, and machine learning processing is performed on the processed data using the selected learning data.
Also, a technique is known in which a data pair is created from each feature quantity vector included in a learning data set, and a hyperplane that divides a feature quantity vector space is learned using the created data pair. Related-art techniques are disclosed in Japanese Laid-open Patent Publication Nos. 2013-109479 and 2006-252333, and International Publication Pamphlet No. WO 2014118976.
A problem described later arises when the similarity is calculated in the following case. A plurality of record data including a plurality of feature quantities are obtained and stored in a database in advance. Query data including a plurality of feature quantities is obtained. The similarity between each record data and the query data is calculated, and the record data having the maximum similarity is extracted. In general, the dissimilarity between each data is roughly corresponding to the distance between each data in a feature quantity space, and it is possible roughly to determine the similarity between each data from the distance between each data in a feature quantity space. In this regard, in this specification, data that is stored in the database as a matching target in advance and that includes a plurality of feature quantities is referred to as “record data”, and data that is obtained and input in order to be matched with the record data and that includes a plurality of feature quantities is referred to as “query data”.
However, the acquisition conditions of query data sometimes changes from the time of obtaining the record data. In this case, the position of the query data in a feature quantity space changes, and thus the distance between the query data and the individual record data changes unequally in the feature quantity space. Accordingly, the determination precision of the similarity deteriorates. That is to say, for example, the record data that does not have the shortest distance from the query data in a feature quantity space at first might be erroneously extracted as the record data having the maximum degree of the similarity with a change in the position of the query data.
According to an aspect of the disclosed technique, it is desirable to make it possible to reduce deterioration of the determination precision of the similarity when acquisition conditions of data are changed.