With ever increasing amount of data stored at various servers, the task of efficient searching becomes an ever-more important one. Taking an example of the Internet, there are millions and millions of resources available on the Internet and several search engines (such as, GOOGLE™, YAHOO!™, YANDEX™, BAIDU™ and the like) that aim to provide users with a convenient tool for finding relevant information that is responsive to the user's search intent.
A typical search engine server executes a crawling function. More specifically, the search engine executes a robot that “visits” various resources available on the Internet and indexes their content. Specific algorithms and schedules for the crawling robots vary, but on the high level, the main goal of the crawling operation is to (i) identify a particular resource on the Internet, (ii) identify key themes associated with the particular resource (themes being represented by key words and the like), and (iii) index the key themes to the particular resource.
Once a search query from a user is received by the search engine, the search engine identifies all the crawled resources that are potentially related to the user's search query. The search engine then executes a search ranker to rank the so-identified potentially relevant resources. The key goal of the search ranker is to organize the identified search results by placing potentially most relevant search results at the top of the search engine results list. Search rankers are implemented in different manners, some employing Machine Learning Algorithms (MLAs) for ranking search results.
A typical MLA used by the search rankers is trained using training datasets. Normally, the training dataset comprises a given document (such as a web resource) potentially relevant (or responsive) to a training search query.
Crowdsourcing platforms, such as the Amazon Mechanical Turk™, make it possible to label large datasets in a shorter time and at a lower cost comparing to that needed by professional assessors. However, as assessors on crowdsourcing platforms are generally non-professional and vary in levels of expertise, the obtained label can be “noisy”—in the sense that the labels assigned to a given object by different assessors can be markedly different. For example, some assessors tend to be very conservative (i.e. assign good scores to only very relevant objects), while other assessors can be more lenient in their selection of label.
A conventional manner to get consensus labels is to compute the majority vote among noisy labels for each object. However, this solution ignores any difference between workers, which may lead to poor results when low qualified assessors are dominant in the task.
Another conventional setting is based on the latent label assumption, implying that all assessors perceive the same value of the latent true label, and then this value is corrupted by the assessors with regard to a chosen labelling model. As a consequence, labelling models designed under this assumption treat any disagreements among noisy labels for an object as mistakes made by workers.
Common approaches to noise reduction include cleansing and weighting techniques. Briefly speaking, noise cleansing techniques are similar to “outlier detection” and amount to filtering out selected labels which “look like” mislabeled for some reasons. With the weighting approach, none of the selected labels are completely discarded, while their impact on a machine learning algorithm is controlled by weights, representing the confidence in a particular label. The noise cleansing techniques and the weighting approach are both premised on the assumption that a “single true label” exists for each digital training document.