Indexing of documents is useful for categorizing the content for later searching, retrieval and possibly other operations. Indexing can involve applying a code to the document which relates to its subject matter. The codes are then linked with the categories for various types of subject matter. In order to search for all documents containing content in a particular category, a computer system can simply retrieve all documents containing the code linked for that category.
The accuracy of traditional indexing of documents relies heavily on the person applying the codes to the documents. Each particular code must be individually and manually entered throughout each document. Therefore, if a document relates to one particular category “A” but was coded under a different category “B,” a search within the category “A” will most likely not result in an identification of the document.
Also, the granularity of coding is also manually entered. This manual coding means that a person must enter codes relating to broad categories for a document as well as any codes for more narrow categories. If different persons are coding documents, as is often the case due to high volume, different and potentially inconsistent applications of granular coding can result. As a result, different coders may apply various broad and narrow categories in different manners, resulting in less effective indexing.
Accordingly, a need exists for improved methods and taxonomies for indexing data.