To find a particular document of interest, a computer user may conduct an electronic search through a query engine to search a collection of documents. However, some collections of documents, such as web pages on the Internet and document databases, may return numerous documents to a user generally based on the query terms suggested by the user. To address the dispersion of retrieved document, the results, or links to documents, may be further sorted or filtered by date, popularity, similarity to the search terms, and/or categorized according to a manually derived hierarchical taxonomy. Additionally or alternatively, the user may select a particular category to restrict the search to those documents in that category.
Generally, a hierarchical taxonomy (or text categorization) is generated by manually defining a set of rules which encode expert knowledge on how to classify documents in a predetermined set of categories. Machine augmented taxonomy generation has generally depended on manually maintaining a controlled dictionary and sorting the documents based on assigned key words or metadata associated with a document and found in the controlled dictionary.