The internet has vast amounts of information distributed over a multitude of computers, thereby providing users with large amounts of information on varying topics. This is also true for a number of other communication networks, such as intranets and extranets. Finding information from such large amounts of data can be difficult.
Search engines have been developed to address the problem of finding information on a network. Users can enter one or more search terms into a search engine. The search engine will return a list of network locations (e.g., uniform resource locators (URLs)) that the search engine has determined contain relevant information. Often search engines rely upon human judges to decide on the relevancy of search results. This generally involves a group of relevancy experts employed or otherwise engaged by a search engine entity to hand label a number of query/URL pairs. These labels are used for training ranking algorithms, relevance evaluation, and a variety of other search engine tasks.
Human labeling is an expensive and labor intensive task. Therefore, financial and logistical constraints allow a small fraction of query/web page pairs to be labeled by experts. Furthermore, the quality of the labels is of great importance as labels are also used as “ground truth” when evaluating relevancy performance of search engines. Unfortunately, the quality of some of the human expert labels used in search engines may be less than desirable. Further, the quality of labels varies among different judges based on their experience and quality of work. For any given query, a significant number of relevancy labels may be inconsistent or incorrect.