Search engines, such as are used in conjunction with the Word Wide Web, are typically expected to search through vast amounts of data, yet return a manageable number of quality, relevant results. Web directories and clustering engines both attempt to provide context to user queries. Web directories typically rely on humans—often volunteers—to hand select pages that are relevant to a given topic. Over inclusion and under inclusion are two significant problems that frequently occur with web directories. Humans include in the directory documents that don't belong, or are not the best documents on a given topic, while simultaneously failing to include better, more significant documents about a topic. Clustering engines attempt to remove human error by grouping results together based on textual cues in the search results. The groupings created by clustering engines are often arbitrary, such as by naming a group of documents after the word occurring most frequently in those documents (such as “fur” instead of “cat”), and are thus difficult for humans to use. Additionally, classification algorithms are typically slow and designed to work on small, clean corpora, such as a library collection, rather than documents the World Wide Web which is a very large and noisy environment. As with human classification, false positives and false negatives frequently result.
Therefore, there exists a continuing need to be able to provide relevant documents to users.