Development of a search engine that can index a large and diverse collection of documents, yet return to a user a short, relevant list of result documents in response to a query has long been recognized to be a difficult problem. The Internet, currently containing billions of documents stored on host computers around the world, represents a particularly diverse and large collection of documents. A user of a search engine typically supplies a short query to the search engine, the query containing only a few terms, such as “hazardous waste” or “country music” and expects the search engine to return a list of relevant documents. In reality, although the search engine may return a list of tens or hundreds of documents, most users are likely to only view the top three or two documents on the list returned by the search engine. Thus, to be useful to a user, a search engine must be able to determine, from amongst billions of documents, the two or three documents that a human user would be most interested in, given the query that the user has submitted. In the past, search engine designers have attempted to construct relevance functions that take a query and a document as their input and return a relevance value. The relevance value may be used, for example, to create a list of the documents indexed by the search engine, the list ranking the documents in order of relevance to the query, to serve this need. For the top two or three documents on this list to be useful to a user, the underlying relevance function must be able to accurately and quickly determine the relevance of a given document to a query.
A user's perception of true relevance is influenced by a number of factors, many of which are highly subjective. These preferences are generally difficult to capture in an algorithmic set of rules defining a relevance function. Furthermore, these subjective factors may change over time, as for example when current events are associated with a particular query term. As another example, changes over time in the aggregate content of the documents available in the Internet may also alter a user's perception of the relative relevance of a given document to a particular query. A user who receives a return list from a search engine that contains documents that he does not perceive to be highly relevant will quickly become frustrated and abandon the use of the search engine.
Given the above background, it is desirable to devise a method to determine a document ranking function that reflects one or more human users' perceptions of document relevance to a query, but can still readily be implemented as an algorithm on a computer. Additionally, it is desirable to devise a method that can rapidly adapt to changes in both the underlying documents in the database and in users' interests over time.