Crowdsourcing systems (also referred to as crowdsourcing platforms) are increasingly available and are used to offer tasks to individuals or teams over the internet or using other communications methods. The tasks being offered may be labeling or ranking tasks whereby individuals classify items into classes or ranges. For example, to classify an email as being spam or not, to rate a movie, to classify a document as belonging to one of a several possible categories, to rate the relevance of a search result, and others.
For a given task, a crowdsourcing platform may receive answers completing the task from a number of different individuals in a crowd of potential participants. The individuals may be referred to as workers. The received answers differ from one another because of variation between the workers (for example, some workers are experts, some novices, some unmotivated). The crowdsourcing platform then has the problem of how to convert all the received answers for a given task into a single answer. This is difficult because of the uncertainty in the trustworthiness of individual workers and the quality of the crowdsourced answers overall. For example, workers might be unreliable and may provide incorrect answers depending on their skills, expertise and motivations. In addition, they may be unintentionally biased towards particular answers. For example, in tasks where a rating is required, certain workers may be overly conservative and always give medium scores, whilst others may be overly opinionated and always give extreme scores. These problems make the task of aggregating answers and obtaining a consistent answer challenging, particularly when there are too few answers.
Existing machine learning approaches for aggregating crowdsourced answers may give poor results where only a small amount of observed data is available about the behavior of individual crowd workers. This is a significant problem because in many real applications it is expensive to acquire enough data to learn about each worker and to estimate the true label, taking into account the estimated trustworthiness or quality of each worker. For example, there is a cost per worker per task. In addition, most workers in crowdsourcing only typically complete a small number of tasks, resulting in high uncertainty about their work quality.
Scalability of machine learning systems is another issue that typically arises for many real world applications. Training needs to be carried out and this is typically a time consuming and computationally resource intensive task. There is an ongoing need to improve the training process.
The embodiments described below are not limited to implementations which solve any or all of the disadvantages of known crowdsourcing systems.