As the amount of digital content continues to grow, the complexity of tasks based on the digital content can rapidly increase. A task may include identifying the class of each item in a set of items. In some examples, workers can provide labels that identify an item's class. A machine learning model can be trained based on the labels to perform the classification task. In some examples, the machine learning model can select the appropriate label for an item from the labels provided by the workers based on specific criteria or algorithms. For example, in email services, users can provide labels that identify unsolicited messages (also referred to as spam). Allowing users to label spam is one crowd sourcing technique for distinguishing solicited email from spam. In some examples, a classifier can then be trained from the labeled spam.
Crowdsourcing is a process where the labeling task is outsourced to a distributed group of workers. Each worker classifies, i.e., labels, a set of items. The labels provided by the crowd are analyzed in an attempt to identify the correct labels. Crowdsourcing can provide a large number of labels at a relatively low cost. However, in crowdsourcing, the workers may be non-experts. Thus, the labels from a crowd may include a number of correct labels and a number of incorrect labels. Because each item is labeled by multiple workers who do not typically agree unanimously, the judgments are combined to produce a single label for the item. In some examples, judgments can be combined in a variety of ways, such as majority voting, to increase the accuracy of the crowd sourced techniques.