The Internet, which allows access to billions of content items stored on host computers around the world, represents a particularly diverse and large collection of content items. Development of a search engine that can such index a large and diverse collection of content items, yet provide the user a short, relevant result set of content items in response to a query has long been recognized as a problem in information retrieval. For example, a user of a search engine typically supplies a query to the search engine that contains only a few terms and expects the search engine to return a result set comprising relevant content items. Although a search engine may return a result set comprising tens, hundreds, or more content items, most users are likely to only view the top several content items in the result set. Thus, to be useful to a user, a search engine should determine those content items in a given result set that are most relevant to the user, or that the user would be most interested in, on the basis of the query that the user submits.
A user's perception of the relevance of a content item to a query is influenced by a number of factors, many of which are highly subjective. These factors are generally difficult to capture in an algorithmic set of rules represented by a relevance function. Furthermore, these subjective factors may change over time, as for example when current events are associated with a particular query term. As another example, changes over time in the aggregate content of the content items available through the Internet may also alter a user's perception of the relative relevance of a given content item to a given query. Users who receive search result sets that contain results not perceived to be highly relevant become frustrated and potentially abandon the use of the search engine. Designing effective and efficient retrieval functions is therefore of high importance to information retrieval
In the past, search engine designers have attempted to construct relevance functions that take a query and a content item as a set of inputs and return a relevance value, which indicates the relevance of the content item to the query. The relevance value may be used, for example, to order by relevance a set of content items that are responsive to a given query. For the ordering to be useful, however, the underlying relevance function should accurately and quickly determine the relevance of a given content item to a given query. Many retrieval systems and methods are known to those of skill in the art, including vector space models, probabilistic models and language modeling methods. In constructing its relevance functions, however, existing retrieval systems do not effectively use information regarding user-made judgments of the relevance of a content item to a given queries expressed as clickthrough information, which enables formulation of relevance functions with improved accuracy and effectiveness over existing systems and techniques.