Document retrieval systems, such as search engines, are designed to receive a search query and output a set of search results that are most relevant to the search query, ranking the documents according to their relevance.
Accordingly, a document retrieval system must be able to compute the relevance of a given search result item (hereinafter “document”) to a given search query. In some systems, such a relevance score is computed as a combination of a set of different feature values expressing how strongly a given document embodies the document feature in question, such as a “number of viewings” feature expressing how many times a given document has been viewed, or a “token occurrence” feature expressing how many times a token of the search query appears in content of the document.
In practice, different features will contribute in different degrees to the overall relevance of documents, and hence such systems will need to weight the different features appropriately when computing the overall relevance score for a document. For example, in practice the “token occurrence” feature might influence overall document relevance more strongly than the “number of viewings” feature, in which case the “token occurrence” feature should be weighted more highly than the “number of viewings” feature. Thus, the appropriate weights to use for different features will need to be determined.
One approach to determining the appropriate feature weights might be to employ some users of the system to manually specify, for each document in a set of search results, how relevant the user considers the document to be (e.g., “highly relevant”, “somewhat relevant”, etc.). However, such manual document relevance specification would take an enormous of human effort, thus requiring considerable time and expense.