The present invention relates to an apparatus and a method of calculating a correlation between annotations.
Information attached to data such as text by pattern matching, natural language processing or the like is referred to as “annotation.” Conceivable annotations, for example, include an annotation, such as a product name or a price, extracted and attached by string pattern matching, an annotation “heat problem” attached through interpretation of expressions “smoke came out” and “it smelled,” and the like.
It may be desirable to know a correlation between such annotations. For example, it may be desirable to know a correlation between Annotations a and b in order to check if Problem b is more likely to occur in Product a than in common products.
Here, there is a known technique related to attachment of annotations to text. In particular, WO2010/119615 discloses that a learning-data generating device is provided with a learning-data candidate clustering unit that conducts clustering of multiple learning-data candidates that have had labels indicating an annotation class given thereto, based on the amount of feature including context information thereof; and a learning-data generating unit that refers to each of the clusters obtained as a result of the clustering, obtains a distribution of the labels of the learning-data candidates within each of the clusters, specifies the learning-data candidates that satisfy configured conditions based on the obtained distribution, and generates learning-data using the specified learning-data candidates.
Moreover, there is also a known technique of calculating a correlation between two fuzzy sets. The publication of B. B. Chaudhuri, A. Bhattacharya, “On correlation between two fuzzy sets”, Fuzzy Sets and Systems 118 (2001) 447-456, discloses calculation of a correlation between two fuzzy sets using Spearman's rank correlation coefficient.