In a typical search engine service a user can enter a query by selecting the topmost relevant documents out of an indexed collection of URLs (universal resource locators) that match the query. To serve the queries quickly the search engine utilizes one or more methods (e.g., an inverted index data structure) that map keywords to documents. For example, a first step performed by the engine can be to identify the set of candidate documents that contain the keywords specified by the user query. These keywords can be located in the document body or the metadata, or additional metadata about this document that is actually stored in other documents or datastores (such as anchor text).
In a large index collection the cardinality of the candidate document set can be big, depending on the commonality of the query terms (e.g., potentially millions). Instead of returning the entire set of candidate documents the search engine performs a second step of ranking of the candidate documents with respect to relevance. Typically, the search engine utilizes a ranking function to predict the degree of relevance of a document to a particular query. The ranking function takes multiple features from the document as inputs and computes a number that allows the search engine to sort the documents by predicted relevance.
The quality of the ranking function with respect as to how accurately the function predicts relevance of a document is ultimately determined by the user satisfaction with the search results or how many times on average the user finds the answer to the question posed. The overall user satisfaction with the system can be approximated by a single number (or metric), because the number can be optimized by varying the ranking function. Usually, the metrics are computed over a representative set of queries that are selected up front by random sampling of the query logs, and involve assigning relevance labels to each result returned by the engine for each of the evaluation queries. However, these processes for document ranking and relevance are still inefficient in providing the desired results.