As a method for automatically classifying a document into one of a plurality of categories, there is automatic classification using machine learning. In the automatic classification of a document using machine learning, a document classification device learns about features of each classification category using learning sample documents which have been divided into a plurality of classification categories, and classifies a classification target document based on a result of learning.
Accordingly, the accuracy of classification of the document classification device using machine learning depends on the learning sample documents. However, it takes a lot of work to manually collect a large volume of learning sample documents that are correctly classified, so that this has been a problem in practical application. To address this problem, Patent Document 1 discloses a technique of generating learning sample documents classified into categories by performing rule-based filtering using string matching on unclassified sample documents.