1. Field of the Invention
The present invention relates to a method and system for document classification, and particularly to a method and system for document classification that employs multiple algorithms to classify documents in multiple dimensions.
2. Description of the Related Art
In current document classification mechanism, the method for document classification always belongs to a single dimension classification method. That is, one document is classified into one or multiple detailed catalogues by employing one classification algorithm. Since only one algorithm is employed in the classification procedure, the document is classified according to its most noticeable feature, such as a keyword having maximum appearances or the similarity of the document.
However, features considered important but not paramount may not be classified and extracted. For example, the author of the document cannot be classified, since the name of the author only appears in the cover page. In addition, the technique in a system analysis document also cannot be classified, since the analysis is more important than the technique in the document.
FIG. 1 is a schematic diagram showing an example of classification structure 100 of the documents in an enterprise. The classification structure 100 includes four categories, xe2x80x9cAuthorxe2x80x9d 110, xe2x80x9cClassificationxe2x80x9d 120, xe2x80x9cAnalysis Methodxe2x80x9d 130, and xe2x80x9cApplication Areaxe2x80x9d 140. Category xe2x80x9cAuthorxe2x80x9d 110 includes detailed catalogues, xe2x80x9cEmployee Axe2x80x9d 111 and xe2x80x9cEmployee Bxe2x80x9d 112; Category xe2x80x9cClassificationxe2x80x9d 120 includes detailed catalogues, xe2x80x9cRequirement Specificationxe2x80x9d 121 and xe2x80x9cDesign Specificationxe2x80x9d 122; Category xe2x80x9cAnalysis Methodxe2x80x9d 130 includes detailed catalogues, xe2x80x9cSDG2 Analysisxe2x80x9d 131 and xe2x80x9cUse Case Analysisxe2x80x9d 132; and Category xe2x80x9cApplication Areaxe2x80x9d 140 includes detailed catalogues, xe2x80x9cCatalog Servicexe2x80x9d 141 and xe2x80x9cSupply Chain Managementxe2x80x9d 142.
As an example, the requirement of a catalog service is described in a specification, and the word xe2x80x9cCatalog Servicexe2x80x9d is mentioned repeatedly in this specification, the author of the specification, xe2x80x9cEmployee Axe2x80x9d, and the word xe2x80x9cRequirement Specificationxe2x80x9d only appear in the cover page, and the word xe2x80x9cAnalysis Methodxe2x80x9d only appears once in one section of the specification. In conventional methods, since the feature of xe2x80x9cCatalog Servicexe2x80x9d is stronger than the feature of xe2x80x9cEmployee Axe2x80x9d, xe2x80x9cRequirement Specificationxe2x80x9d and/or xe2x80x9cAnalysis Methodxe2x80x9d, the specification is only classified into the detailed catalogues, xe2x80x9cCatalog Servicexe2x80x9d 141, as shown in FIG. 2 (denoted by the black circle). However, the features of xe2x80x9cEmployee Axe2x80x9d, xe2x80x9cRequirement Specificationxe2x80x9d and/or xe2x80x9cAnalysis Methodxe2x80x9d are not taken into consideration.
It is therefore an object of the present invention to provide a method and system for document classification with multiple dimensions and multiple algorithms. Users can set categories (dimensions) and the corresponding algorithms according to the characteristics of documents, so as to employ these algorithms to classify documents in respective dimensions.
To achieve the above objects, the present invention provides a method for document classification with multiple dimensions and multiple algorithms. According to one aspect of the invention, first, a classification preference is set. The classification preference includes a plurality of categories, and each of the categories has a corresponding algorithm. Then, a document is classified according to the classification preference, thus one or several detailed catalogues corresponding to each of the categories are acquired.
According to another aspect of the invention, first, a document is received, and a classification code is determined. The classification code contains a classification preference. The classification preference includes a plurality of categories, and each of the categories has a corresponding algorithm. Then, the classification code is executed to classify the document, thus one or several detailed catalogues corresponding to each of the categories are acquired.
According to the embodiment of the present invention, a system for document classification with multiple dimensions and multiple algorithms is also provided. The system includes a preference database, a generator, and a classification unit. The preference database stores at least one classification preference. The classification preference includes a plurality of categories, and each of the categories has a corresponding algorithm. The generator transforms the classification preference into a classification code. The classification unit executes the classification code to classify the document, thus one or several detailed catalogues corresponding to each of the categories are acquired.
It should be noted that the document is classified in each of the categories by employing the algorithms corresponding to the categories respectively.