Currently available search technologies attempt to identify and return references to resources that are relevant to a particular search query. For example, a user may enter the web search query “baseball scores” to find documents (e.g., webpages and/or websites) that provide information on current baseball scores. The search engine may examine a database of “crawled” webpages to identify webpages that contain the terms “baseball” and “scores”. References to a subset of all webpages that contain the terms “baseball” and “scores” are returned to the user to be displayed in a search engine results page (SERP). Additionally, references to webpages that have been labeled by the search engine (or web crawler associated with the search engine) with the terms “baseball” and “scores” may be returned to the user.
Other references that may be returned to a user in a SERP are advertisements. Such advertisements may be related to the query terms. For example, the company of a website that offers baseball equipment may pay for advertisement space on a SERP whenever a search query includes the term “baseball”. Thus, given the above search query, the SERP would include a reference to that website. The advertisement arrangement may be that the user must click on the reference to that website in order for the associated company to pay any money.
Goals of search engines include increasing the quality of search results to maximize value to both the user and the advertisers. If ads that are displayed in a SERP are tailored to the interests of a user, then the user is more likely to click on the ads, thus generating ad revenue for the search engine.
There are many situations in which knowledge about a user may assist in increasing the quality of search results and ads. One situation occurs when a query is inherently ambiguous. For example, a user enters the query “jaguar price”. Based on that query alone, it is not clear whether the user wants price information about the Jaguar operating system or a Jaguar car. If the search engine knew about, for example, the recent Web activity of the user, then that activity may be used to search for appropriate results. If the user recently was answering questions about the best foreign-manufactured cars via the social networking website Yahoo! Answers™, then it is more likely that the user is interested in discovering the price of a Jaguar car.
However, a user's perception of true relevance is influenced by a number of factors, many of which are highly subjective. Such preferences are generally difficult to capture in an algorithmic set of rules defining a relevance function. Furthermore, these subjective factors may change over time, as for example when current events are associated with a particular query term. As another example, changes over time in the aggregate content of the documents available in the Internet may also alter a user's perception of the relative relevance of a given document to a particular query. A user who receives a SERP, from a search engine, that refers to documents that the user does not perceive to be highly relevant will quickly become frustrated and abandon the use of the search engine.
A recent innovation for increasing the quality of search results is to use machine-learning methods to generate a document relevance function. A document relevance function takes a document and a query as input and returns a relevance value. The relevance value for each document in a set of documents is used to rank the documents. The relevance value may dictate where in a SERP a reference to a document is to be displayed.
Thus, a context-independent document relevance function is used to predict, based on a particular query, what webpages and/or ads may be helpful to a generic user. The generic user represents the interests of all users for which applicable data may be collected. For example, a search engine may maintain a database of all queries that have ever been submitted along with identification data identifying all click-throughs (i.e., references in a SERP that have been selected). A relevance function may be generated using various techniques, one of which is described in U.S. Pat. No. 7,197,497, entitled METHOD AND APPARATUS FOR MACHINE LEARNING A DOCUMENT RELEVANCE FUNCTION.
However, a context-independent relevance function may not yield the most accurate search results for certain user. Given (a) a user with specific usage patterns and preferences, (b) a group of users with relatively specific usage patterns and preferences, or (c) a query that targets files/documents that include a particular type of content, a generic relevance function is not capable of leveraging such information to improve the quality of the search results returned to a user.
An approach for increasing the quality of search results for a particular user is to generate a user-dependent (UD) relevance function, which is a type of a context-dependent relevance function. Such a relevance function is used to predict, based on a particular query, webpages and/or ads that may be relevant to the particular user. Knowledge of a specific user is gathered to generate the UD relevance function. The UD relevance function may be used to provide high-quality query results to the specific user and to target particular advertisements to the specific user. Thus, with the help of UD relevance function, the click-through rate of sponsored advertisements may significantly increase. However, a considerable amount of information needs to be known about a user in order for the corresponding UD relevance function to be accurate and, therefore, useful. Also, a considerable amount of information is known about only a relatively few users. As a result, UD relevance functions are not widely used. Thus, a majority of users are only able to take advantage of generic relevance functions.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.