The Internet can be viewed as a large collection of documents, for example, text files, web-pages, newsgroup postings or pictures. Internet search engines provide a means of searching through the vast amount of documents to produce a results list of the documents found which match the terms in a search query. Typically the results list is presented as a list of document summaries that includes hyperlinks (“links”) that connect each entry to the appropriate Internet document. The results list is generally ranked by relevance (in relation to the query), with each entry included in the list presented either higher or lower on the list according to the relevance ranking as determined by the search engine being used. The way in which these relevance rankings are determined is constantly evolving as the Internet continues to evolve.
Search engines apply different algorithms to “filter” the available documents and assign relevance rankings to the documents reviewed. The relevance rankings are generally stored in a search index which corresponds to documents for a specific search term (or related search terms).
Initially, Internet search engines applied “content-based” filtering which is simply examining the number of times that a query search term appeared within a document, such that the greater the number of times that a search term appeared, the more relevant the document was considered and the higher it was ranked. However, content-based filtering produces rankings which are easily manipulated by the author of the documents reviewed, that is, an author can fill their web-page with multiple copies of the words that they believe will be searched upon and thereby increase the alleged relevance of their web-page.
More advanced ranking methods apply “link analysis” algorithms, i.e., examining the links contained within a document to other documents with relatively high relevance rankings. However, as with the manipulation of terms in a web-page, as discussed above, an author can increase the number of links to other documents with high relevance rankings in order to increase the alleged relevance of their web-page.
Editor-controlled search engines use a “staff” of editors (paid or volunteer) to manually select and rank individual web-page documents contained in a results list for a specific search term from a ranked search index. Documents may have their rankings changed, or document may be added and removed from the index over time as the editors perform their reviews. Since there billions of web-page documents available on the Internet, and the number of documents continues to grow at a tremendous pace, the amount of labor needed to maintain a current and complete editor-controlled search index is very high. The Open Directory Project (ODP) is an example of a co-operative editing process that uses a large number of volunteer editors to assess and modify the relevance rankings of documents related to a search term or within a specific category. ODP applies rules to the editor selection process. The end result of ODP is a editor-controlled ranking index that can be searched directly, however, the editorial feedback is not used to improve the efficiency of an automated search engine algorithm.