The present disclosure relates to content management including document classification. Document classification involves assigning a document to one or more categories based on contents of the document. Various taxonomies can be used in any conventional document classification system. Document classification can be manual or automated. Automated document classification typically operates by processing electronic documents. The electronic documents can include files originally created by a computerized device, electronic copies of paper documents scanned and processed into computer recognizable text or images, etc. Automated document classification operates by identifying keywords within a given document, and then assigning the given document to a category based on the identified keywords.
Document classification systems usually operate prior to, or in conjunction with, a search and retrieval system, or with other systems for performing further actions on the classified documents. A search and retrieval system uses one or more key words or phrases to find matching resources within a data repository. With a set of documents classified, a search query can be focused within a selected class of documents for more accurate search results and to identify documents of interest.