A known approach in creating classification models is to collect and label data manually. The data is typically organized as belonging to a particular class, and entities within the data are typically labeled in a predetermined fashion. Models can then be trained to classify incoming data as belonging to one or more of the classes and used to extract entities from incoming data.
Unfortunately, this approach has several shortcomings. Models often require large amounts of data to become accurate above an acceptable error rate, and collecting and labeling data manually (i.e. by individuals) is expensive and time consuming. In addition, individuals may differ in how they label data leading to data that is labeled inconsistently and even incorrectly.