A classification problem is the attempt to identify the class of each of a set of items. Solving classification problems involves making a judgment about the class of an item. The judge provides labels for each item identifying the item's class. A machine learning model can be trained based on these judgments to perform the classification. The appropriate label for an item is selected from the provided labels, based on specific criteria or algorithms. For example, in email services, users report spam emails to the email server that misclassified the spam as a regular email. Reporting spam is a way of labeling specific emails as spam. This labeling is a form of crowdsourcing for the classification problem of distinguishing regular email from spam. A classifier can then be trained from the labeled spam.
Crowdsourcing is a process where the labeling task is outsourced to a distributed group of judges who provide labels at little to no cost. Each judge classifies, i.e., labels, a set of items. The labels provided by the crowd are analyzed in an attempt to identify the correct labels. Crowdsourcing can provide a large number of labels at a relatively low cost. However, in crowdsourcing, the judges are non-experts. Thus, the labels from a crowd are usually noisy. Because each item is labeled by multiple judges who do not typically agree unanimously, the judgments are combined to produce a single label for the item this can be done in a variety ways, and a typical one is majority voting. While labels provided by an expert could be more accurate, the cost would be higher. Thus, because labels inferred from crowds in this way could be highly noisy, there has been an increasing interest in developing techniques for computing higher quality labels from crowdsourced judgments.