Data such as an image and a document is often evaluated based on a similarity between features of the data more than evaluated based on agreement of the data. When classifying or summarizing such data, processing of grouping data so that a similarity between the data is equal to or greater than a predetermined value is effective. This processing is generally called grouping based on similarity (e.g., see PTL 1).
An information search device described in PTL 1 provides a function of grouping results of a similarity search input by a user (similarity search from a user) further into similar groups. In PTL 1, the information search device further executes a similarity search for similarity search results (similarity search for search results) received from a user when grouping the similarity search results received from the user. Then, for the search results, the information search device groups data which have similarities equal to or greater than a predetermined threshold into a group. The information search device executes grouping based on such an operation. At this time, the information search device executes a similarity search from a search result with a high similarity among the similarity search results from the user. Then, the information search device groups search so that the similarity is equal to or greater than the predetermined threshold. However, the information search device does not execute grouping on search results on which the similarity search has been already executed.
A data structure for speeding up a similarity search is proposed (e.g., see NPL 1). A technique described in NPL 1 constructs a data tree structure (hereinafter referred to simply as a “tree structure”) in consideration of a hierarchy of similarities between data. The technique described in NPL 1 achieves speed-up of a similarity search by using such a tree structure. The tree structure described in NPL 1 is roughly configured as following. That is, a node constituting the tree structure stores data. When a certain node includes data whose capacity exceeds a capacity of the node, the technique described in NPL 1 selects data (representative data) which becomes a representative among data included in the node, and disposes the representative data in a parent node of the node. Further, the technique described in NPL 1 associates an upper limit of a similarity between the representative data and the data in the node with an edge between the parent node and the node. Then, the technique described in NPL 1 maintains the tree structure in such a manner that a value of the similarity associated with the edge increases toward a leaf node from a root node in the entire tree structure. The technique described in NPL 1 provides a data structure focusing on the hierarchy of similarities as a tree structure. However, NPL 1 fails to disclose a grouping method.