The following relates to the informational arts, computer arts, and related arts. Some illustrative applications of the following include meta-search engines, meta-classifiers, meta-prioritizers, and so forth.
In the informational arts, a common operation relates to scoring items based on a scoring criterion. For example, an Internet search engine receives a scoring criterion in the form of a search query, and outputs a list of relevant web sites coupled with relevance rankings or scores, where the relevance ranking or score assigned to each web site reflects the search engine's assessment of the relevance of that web site to the search query. In a simple embodiment, the relevance ranking or score may be a count of the number of occurrences of terms of the search query in the content of the web site. More complex relevance scoring or ranking may take into account other factors such as where the search terms occur (for example, search terms in the title of the web site may be weighted more strongly than search terms in body text), the popularity of the web site (measured, for example, based on a count of “hits” or downloads of the web site), or whether the web site is a “favored” web site (for example, a web site that has contracted with the search engine operator to obtain more favorable relevance rankings). In general, different search engines may use different bases for computing the relevance.
Other examples of scoring items based on a scoring criterion include: soft classification in which a classifier assigns an object a probability or other measure of membership in a particular class; or service job prioritization in which a service center receives service jobs and a scheduler ranks the jobs in order of importance based on a suitable prioritization criterion.
Each such application can be generally described as having a judge (e.g., the search engine, or the classifier, or the scheduler) that assigns a score (relevance ranking, or soft classification probability, or job priority) to items (web sites, or classes of a set of classes, or service jobs) based on a scoring criterion or algorithm (a search query processed by the search engine's relevance ranking basis, or an input object processed by a classifier algorithm, or a service job input to a scheduling algorithm). The obtained result is dependent upon the scoring criterion used by the judge, and in general different judges may use different scoring criteria and hence generate different rankings.
It is known to aggregate the rankings from different judges to produce a consensus aggregation. For example, meta-search engines have been deployed which input a received query to different Internet search engines, collect the rankings from the different Internet search engines, and apply an aggregation algorithm to generate a consensus ranking. Some Internet users find that meta-search engines provide better results than any individual search engine operating alone. Similarly, “meta-classifiers” combine classification results from different classifiers to generate a consensus classification.
In such consensus aggregation, an aggregation function is applied to generate consensus scores for the items based on the scores assigned by the constituent judges. The consensus ranking is dependent upon the choice of aggregation function. For example, a simple aggregation function is an average, in which the scores generated by the different judges for each item are averaged together to generate the aggregated score for that item. A disadvantage of this approach is that it can produce a high aggregated score for an item on which the judges have little consensus. For example, an item for which half the judges rank near the top of the ranking and half the judges rank near the bottom of the ranking will end up near the middle of the aggregated ranking. However, that is not reflective the consensus of the judges.
Some more complex aggregation functions weight the average score of an item by the number of judges that rank the item above a threshold (for example, the number of judges that rank the item in the “top ten”). These approaches are deemed to improve the aggregated rankings. However, the basis for this modification of the aggregation function is not readily apparent, and the approach can overemphasize a generally high ranking (e.g., in the “top ten”) compared with the actual scores assigned by the various judges. These aggregation functions also do not recognize or take into account relationships which may or may not exist between different judges, and are not tunable to accommodate different relationships between judges. For example, consensus between two judges that use wholly independent ranking criteria may be of substantial significance; whereas, consensus between two judges that use highly correlated ranking criteria may be of little significance.