The present invention relates to a technique for classifying data. In particular, the present invention relates to a technique for appropriately classifying data by learning a model on the basis of previously-given training data.
A classification problem in machine learning is known as a major problem applicable to various fields. For example, by solving a classification problem, a particular condition of a patient can be predicted based on his/her test result, or whether or not to give credit approval to a credit applicant can be judged based on an attribute of the applicant. A classification problem is: to learn correspondence relationships between data and classes by using training data that are classified into a plurality of classes; and to then appropriately classify data yet to be classified, on the basis of the learnt relationships. Relationships between data and classes are learned in order to improve the accuracy of classification. The accuracy of classification is often evaluated on a correct rate of classification.
For descriptions of sampling of training data for machine learning, the following documents can be referred to, for example.
[Patent document 1] Japanese Patent Application Publication No. 2005-92253
[Patent document 2] U.S. Pat. No. 6,938,049
[Non-patent document 1] Leo Breiman (1996), Bagging Predictor, Machine Learning, 24(2):123-140
[Non-patent document 2] Dacheng Tao et al. (2006), Asymmetric Bagging and Random Subspace for Support Vector Machines-Based Relevance Feedback in Image Retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(7):1088-1099
In some problems, however, it is inappropriate to use a correct rate of classification as the accuracy of classification. Assume that extremely few data are to be classified into a certain class, for example. In this case, if relationships between data and classes are learnt such that all the data would be classified into classes other than the certain class, a high correct rate can be obtained in the classification. However, such learning may sometimes obstruct acquisition of useful information. For example, through such learning, it is difficult to find, from a large number of patients, a small number of patients who have a particular disease, or to find, from a large number of credit applicants, a small number of applicants who are not eligible for credit approval.