Many search engine services, such as Google and Overture, provide for searching for information that is accessible via the Internet. These search engine services allow users to search for display pages, such as web pages, that may be of interest to users. After a user submits a search request (i.e., a query) that includes search terms, the search engine service identifies web pages that may be related to those search terms. To quickly identify related web pages, the search engine services may maintain a mapping of keywords to web pages. This mapping may be generated by “crawling” the web (i.e., the World Wide Web) to identify the keywords of each web page. To crawl the web, a search engine service may use a list of root web pages to identify all web pages that are accessible through those root web pages. The keywords of any particular web page can be identified using various well-known information retrieval techniques, such as identifying the words of a headline, the words supplied in the metadata of the web page, the words that are highlighted, and so on. The search engine service identifies web pages that may be related to the search request based on how well the keywords of a web page match the words of the query. The search engine service then displays to the user links to the identified web pages in an order that is based on a ranking that may be determined by their relevance to the query, popularity, importance, and/or some other measure.
The success of the search engine service may depend in large part on its ability to rank web pages in an order that is most relevant to the user who submitted the query. Search engine services have used many machine learning techniques in an attempt to learn a good ranking function. The learning of a ranking function for a web-based search is quite different from traditional statistical learning problems such as classification, regression, and density estimation. The basic assumption in traditional statistical learning is that all instances are independently and identically distributed. This assumption, however, is not correct for web-based searching. In web-based searching, the rank of a web page of a search result is not independent of the other web pages of the search result, but rather the ranks of the web pages are dependent on one another.
Several machine learning techniques have been developed to learn a more accurate ranking function that factors in the dependence of the rank of one web page on the rank of another web page. For example, a RankSVM algorithm, which is a variation of a generalized Support Vector Machine (“SVM”), attempts to learn a ranking function that preserves the pairwise partial ordering of the web pages of training data. A RankSVM algorithm is an ordinal regression technique to minimize the number of incorrectly ranked pairs. A RankSVM algorithm is described in Joachims, T., “Optimizing Search Engines Using Clickthrough Data,” Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (“KDD”), ACM, 2002. Another example of a technique for learning a ranking function is a RankBoost algorithm. A RankBoost algorithm is an adaptive boosting algorithm that, like a RankSVM algorithm, operates to preserve the ordering of pairs of web pages. A RankBoost algorithm attempts to directly solve a preference learning. A RankBoost algorithm is described in Freund, Y., Iyer, R., Schapire, R., and Singer, Y., “An Efficient Boosting Algorithm for Combining Preferences,” Journal of Machine Learning Research, 2003(4). As another example, a neural network algorithm, referred to as RankNet, has been used to rank web pages. A RankNet algorithm also operates to preserve the ordering of pairs of web pages and models the ordinal relationship between two documents using a probability. A RankNet algorithm is described in Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G., “Learning to Rank Using Gradient Descent,” 22nd International Conference on Machine Learning, Bonn, Germany, 2005.
These machine learning techniques attempt to learn a ranking function by operating on document (e.g., web page) pairs to minimize an error function between these pairs. A RankNet algorithm uses cross entropy to measure the distance between two probability distributions. Although RankNet may provide efficient retrieval performance, the use of a cross entropy loss function has several disadvantages. First, a cross entropy loss function cannot achieve a minimal loss of 0 except when the target probability is 0 or 1, which results in corresponding inaccuracies in the ranking function. Second, a cross entropy loss function has no upper bound for the loss of a pair of documents. Because the loss for a pair that is incorrectly ranked may be too high, the ranking function based on a cross entropy loss may be biased by some pairs that are difficult to correctly rank.