1. Field of the Invention
The present invention relates to a system classifying documents stored in a database to provide information about the stored document to a user, and a document management method.
2. Description of the Related Art
Recently, the amount of documents and knowledge obtained through Internet connection among documents needed for organization is increasing at a steady rate because of rapid expansion and diffusion of an Internet service. Therefore, a document structuring technique, which is preceded for information retrieval such as content-based retrieval, filtering and routing in a mass document information system, becomes highly significant.
When a structure of a basic class taxonomy tree is provided within each of categories by document domain experts, document classifiers extract attribution from documents presently stored or newly inputted in a system and then assign the documents to each of the categories formed in the class taxonomy tree by the attribution.
It is required that the structure of the class taxonomy tree initially set by the domain experts be changed according as the documents are continually assigned. Thus, the domain experts have to change the structure through a close examination of contents of the documents assigned in the categories. That is, in case that new document assemblage not included in the existing class taxonomy tree is inputted, and thus a new category capable of including the new document assemblage is generated, the new document assemblage is annexed to a predetermined position of the class taxonomy tree, or in case that document assemblage capable of being bound into a new category is generated because heterogeneity among contents of the documents included in each of categories is higher, the new category should be divided into two categories or more.
However, a related art document management method, which depends on efforts of person in document classification and a management operation of the class taxonomy tree, has a limit in its application under recent working environment where the document assemblage is continually changed and also the amount of the documents rapidly increases.
Also, each of the classifiers has a different experience and knowledge. Therefore, there is a defect in that it is difficult to continually maintain consistency in the document classification.