This specification relates to evaluating automated resource selection processes for use by search engines.
Search engines, e.g., Internet search engines, provide information about resources (e.g., Web pages, images, text documents, multimedia content) that are responsive to a user's search query. Search engines return a set of search results (e.g., as a ranked list of results) in response to a user-submitted query. A search result includes, for example, a link (e.g., a URL) to, and a snippet of information from, a corresponding resource.
In order to identify the most responsive resources to a user's query, search engines build indexes that map words and phrases to resources determined to be relevant to the words and phrases. To build this index, search engines crawl available resources, e.g., by crawling the Internet. Index space is finite; therefore, search engines determine whether to include each resource that is crawled in the index. In some search engines, the determination of whether to include a particular resource in the search engine index is made according to an automated resource selection process. Automated resource selection processes analyze the values of one or more index selection signals for the resource to determine whether the resource should be included in the index. Each index signal is a metric of a quality of the resource derived by combining one or more attributes of a resource. Each index selection signal value is a quantity (generally scalar) derived from one or more attributes of the resource. Resource attributes can be internal to a resource, e.g., a number of words in a given resource or a length of the title of the given resource. Resource attributes can also be external to the resource, e.g., attributes derived from resources that link to a given resource or attributes derived from user behavior toward the resource.
To evaluate different resource selection processes, a system can build separate indexes and consider the indexes side by side, e.g., by comparing the resources identified by each index in response to various queries. However, this requires the overhead of building and maintaining two separate indexes, which can be costly.
Alternatively, a system can build a single index, where some resources are selected according to a first resource selection process and other resources are selected according to a different second resource selection process. User behavior toward the resources selected by the first resource selection process and user behavior toward the resources selected by the second resource process can then be observed. However, the user behavior data is incomplete, because it fails to account for how users would interact with the resources if only resources selected according to one of the resource selection processes were presented to users.
As yet another alternative, a system can observe user behavior regarding resources selected according to a single resource selection process. However, only observing behavior with regard to one index can give an incomplete picture of the quality of the index selection algorithm. User behavior data for resources not selected by the resource selection process being evaluated is not available, and therefore one cannot determine whether the selection algorithm could have done better.