Field
The present specification generally relates to electronic documents and, more specifically, to methods of organizing electronic documents by reclassifying and clustering documents using a similarity value indicating a similarity between documents of document pairs.
Technical Background
Electronic documents of a document corpus are often classified according to a classification system. For example, journal articles may be classified according to the All Science Journal Codes or the Field of Science and Technology classification systems, both of which have a plurality of hierarchical classification codes associated therewith. Journal articles are often classified according to the journal in which it has been published. However, in many cases, journal articles (or other types of documents) are misclassified because the journal article may actually be more relevant to articles associated with a classification code other than classification code associated with the journal. Additionally, journals having a broad focus may have a broad classification code associated therewith; however, individual articles within the journal often have a narrower focus and could be classified more accurately. Misclassification of documents in the classification system may cause problems for researchers. For example, the misclassification of documents may prevent a researcher from finding documents relevant to his or her search.
Further, classification systems may not provide sufficient granularity to be beneficial to a researcher. For example, there may be a large number of sub-groups of related documents below the lowest level of the classification system. These sub-groups are not represented by the classification system, particularly in classification systems only two or three levels deep. Additionally, manual creation and management of multiple sub-levels in a hierarchical classification system may be too burdensome.
Accordingly, a need exists for alternative methods of reclassifying and clustering documents in a classification system to provide for additional levels of the classification system.