(i) Technical Field
The present invention relates to a document classification device, a document classification method, and a computer readable medium.
(ii) Related Art
Techniques for classifying document data into specific categories have been known.
When boundaries between categories are ambiguous, document data may not be able to be clearly classified into specific categories. For example, categories of document data may be “History” as well as “Summary”. Categories of document data may be “Configuration” as well as “Summary”. Furthermore, when a person intends to classify document data, the way in which classification is performed varies depending on the person. Therefore, categories assigned to document data may be different from person to person. As described above, it is difficult to guarantee independence of categories. Furthermore, fluctuations in category classification depending on people occur. Therefore, for learning data that is classified into a specific category in advance, a combination of the learning data and the category may not be accurate. Although it may be considered that categories are finely defined so that fluctuations in category classification do not occur, a finer category definition may require a higher category setting cost, and fluctuations in category classification may occur depending on the person who assigns categories. Furthermore, in the case where categories are determined automatically, for example, when the term “Summary” does not appear in a specific data set, even if the data set represents “Summary”, it is impossible to classify the data set into a “Summary” category. Moreover, when categories are determined automatically, a person may not be able to understand the meaning of the categories which have been classified.