Conventionally, when classifying data items in accordance with classification rules created by a user, there have been problems as follows:
(1) Addition and/or Modification of Rules When Data is Added
Generally, data items to be classified are added one after another. In such a case, because the data items to be added cannot be classified according to only the classification rules, which has been created in advance, adding new rules and/or modifying the existing rules is needed at any time. However, it is not easy to create effective classification rules.
(2) Consistency Between the Classification Rules and Classified Data Items
In a case that the data items are classified into a category, which is different from a former category, because of the addition and/or modification of the rule, it is said that the rule is inconsistent with the classified data item and the rule is called as an inconsistent rule. In a directory type search service, which classifies the data items according to a predefined category tree, it is desired to have a means to guarantee that the classification results before and after updating the rule base are identical. In order to resolve the problem, it is verified whether or not the already classified data item is classified into the same category as the formerly identified category according to the added and/or modified rule, that is, there is no inconsistency. If there is inconsistency, the added and/or modified rule is modified repeatedly until the inconsistency disappears. It costs a lot for the verification, and it is desired to develop a technique for automatically generating a rule without the inconsistency.
(3) Consistency Between Classification Rules
There is a case where an already-known data item is classified into different categories according to the added and/or modified rule and other rules, that is, a case where conflicting rules are created. For example, according to a first rule that “if P AND Q are satisfied, it is classified into C1” and a second rule that “if P and R are satisfied, it is classified into C2”, a data item satisfying “P, Q, and R” is classified into C1 and C2, respectively. If C1 is different from C2, the first rule and the second rule are the conflicting rules. Because a rule base should finally classify each data item in to a single category, when a conflict occurs, a means to resolve the conflict is needed. Accordingly, a method (first matching method) is well known in which the evaluation orders of rules are determined in advance and then the category of a classifying destination is determined using a firstly matched rule. However, if plural conflicting rules exist, the category of the classifying destination is strongly influenced by the application order of the rule so that it is difficult to determine the validity of each individual rule. Therefore, it is important to prevent the creation of the conflicting rules, but generally it is difficult to be carried out.
Incidentally, for example, JP-A-2002-157262 discloses a technique for presenting a user with information to evaluate the validity of a classification rule that the user would like to register in a case that a classification system is objective and complicated. More specifically, in a method of supporting the definition of classification rules in a document classification system for classifying electronic documents into categories based on the classification rules, by applying a classification rule input by the user through an input device to plural classified electronic documents, a reliability degree of the applied classification rule and a contribution degree to the improvement or the maintenance of classification accuracy are calculated, and the calculation results is notified to the user through an output device. However, the system does not have a function for generating candidates of the classification rules. Accordingly, the user has to master the classification system and features of the electronic documents, and it is difficult for unskillful users to create the classification rule. In addition, only taking into account newly created rules may cause the conflict with the existing rules. However, this publication does not care about this problem.
As described above, according to the background art, it is difficult to generate an appropriate classification rule for new data items, and to resolve the conflict with the existing classification rules.