Effective use of large amounts of information has been called for in recent years. Classification and management of information is extremely important to effectively use large amounts of information. For example, classifying newspaper articles by topic such as politics and the economy or classifying technical documents by technical field helps narrow down the target for investigation when investigating a specific subject. Therefore, classification of information is useful in using information effectively.
There are not one but many classification methods for classifying any given information, and since each classification method has its advantages and disadvantages, selecting the optimal classification method can prove difficult. Methods for selecting an optimal classification method in the case where a single classification criterion is provided have thus heretofore been proposed (e.g., see Patent Literature 1).
Specifically, with the information classification method disclosed in Patent Literature 1, processing such as the following is executed. First, feature elements are extracted from classification sample data for each classification category. Next, the classification method having the highest classification precision is determined from among a plurality of classification methods, based on the classification sample data. Subsequently, in accordance with the determined classification method, classification learning information representing features for each of the classification categories is generated based on the extracted feature elements. Thereafter, a new text group serving as a classification target is classified for each of the classification categories, in accordance with the determined classification method and the classification learning information.
Incidentally, even if an optimal information classification method is determined, the fact that the classification criterion differs depending on the person in the case where people's sentiment affects the classification criterion makes it difficult to find an intrinsically correct solution to problems regarding classification (classification problems). Suppose, for example, that there are classification problems such as whether an opinion is being given in a certain sentence or whether a feature of a certain product is a factor in that product selling well. Since such classification problems are dependent on people's sentiment, the classification criterion will differ. In order to correct such differences in the classification criterion, a technique has been proposed for executing information classification after determining the classification criterion by a poll involving a plurality of people.
As for conventional polling-type information classification techniques, the following two classification methods are known, for example. Note that in the following description, it is assumed that category classification is performed in advance on a number of pieces of information by each of a plurality of people (n people), and that information obtained by the category classification of each person is assigned as sample data. Also, in the following methods, category classification is executed such that information that is viewed as belonging to a certain category by m (≦n) people or more is classified into that category. Note that, hereinafter, the case where target information is classified into the category of interest will be designated as a “positive example”, and the case where target information is not classified into the category of interest will be designated as a “negative example”.
With the first classification method, sample data obtained as a result of the category classification performed by n people is acquired first. Next, an information classifier having a specific information classification rule is constructed based on the sample data (e.g., see Non-Patent Literature 1). Note that a specific information classification rule includes, for example, a rule according to which information judged to belong to the category of interest by m people or more is taken as a positive example of the category, and all other information is taken as a negative example of the category.
With the second classification method, sample data obtained by the category classification performed by n people is also acquired first. Next, with the second classification method, the sample data is analyzed per person and n information classifiers corresponding to each person are constructed. Classification into positive examples and negative examples is executed, with information judged by m information classifiers or more to belong to the category of interest being taken as a positive example of the category, and all other information being taken as a negative example of the category. The second classification method differs from the first classification method in that separate information classifiers are constructed with the criterion of each person.