1. Field of the Invention
The present invention generally relates to techniques for analyzing data. More specifically, the present invention relates to methods for assigning ranks to data items in a collection of data items, such as a database of documents, the world wide web, or any other collection of data items which can be compared by a person.
2. Related Art
The relentless growth of the Internet has been largely fueled by the development of sophisticated search engines, which enable users to comb through billions of web pages looking for specific pages of interest. Because a given query can return millions of search results it is important to be able to rank these search results to present the highest-quality results to the user.
It is impossible to measure the quality of results directly. Early techniques for ranking results considered intrinsic properties of each candidate result document, including “(a) how recently the document was updated, and/or (b) how close the search terms are to the beginning of the document” (U.S. Pat. No. 6,799,176, col. 1, lines 56-58). U.S. Pat. No. 6,799,176 introduced a new technique which considers an extrinsic property: the back-links for linked documents. This provides a better approximation of quality as it recursively infers popularity. However, even this property is somewhat under authorial control and is thus vulnerable to “spamming” techniques, especially when page authors collude to set up elaborate link structures to inflate their documents' rankings. Furthermore, these properties reflect the collective opinion of page authors, which may not match the opinion of the much larger pool of web searchers (users).
Hence, it is desirable to obtain a more direct measurement of quality from users themselves. If one can measure quality accurately by interacting with or observing users, one can produce a ranking that more closely correlates with the “ideal” ranking desired by users.
However, it is difficult to infer quality from users directly. Passive observation techniques, such as click-tracking, are noisy and difficult to interpret. For example, a user cannot see a page without clicking on it, and so a click does not necessarily correspond to a good result. Direct solicitation techniques, such as satisfaction surveys, have a number of problems. For example, they generally have low response rates, may be difficult for users to understand, and may be difficult to aggregate into a ranking because users vary greatly in the degree and direction of opinion.
Pairwise comparisons and preference orderings provide a better candidate for aggregation, but still present some difficulties: users differ in opinion (some prefer item A while others prefer item B); users are inconsistent (a user might prefer A to B, and B to C, while preferring C to A); and user preferences are relatively difficult and expensive to obtain, which limits the set of data points from which to derive the ranking.
Hence, what is needed is a method and an apparatus for accurately ranking search results without the problems of the above-described techniques.