1. Technical Field
The present invention relates to modeling data that has been processed by multiple labelers, and more particularly, to producing labeler error/accuracy estimates and simultaneously building a classifier from multi-labeler data.
2. Discussion of the Related Art
In many real-life settings, something has to be learned from data. For example, the “thing” to be learned may be “What structures in a set of medical images (the data) are indicative of cancer and thus candidates for biopsy?” This learning may be accomplished with a supervised learning algorithm that analyzes the data to produce a classifier that identifies the biopsy candidates.
In general, the data to be learned is labeled by several experts. One reason why several labelers are needed is the lack of a golden ground-truth for many real-life settings. However, depending on the setting, there may be a large variance in the experts' scores that can lead to low overall agreement. For example, radiologists specialized in heart images are better at labeling lesions of the heart compared to radiologists with lung expertise, who on the other hand, label instances of lung diseases better.
Several machine learning-based algorithms have been developed that can learn concepts in the presence of simultaneous labels from a group of experts. It has been shown that this class of algorithm can learn the concept better than traditional methods when taking into account all the labels at the same time. However, this class of algorithm assumes that the reliability of labelers is the same across all data.