Building Natural Language Understanding (“NLU”) models requires a large amount of text utterances. In order to collect extensive quantities of annotations in a cost-effective manner and with fast turnaround, we leveraged crowdsourcing, specifically using unmanaged crowds. Crowds can generate creative input for open ended questions, which then can be used as bootstrapping data for NLU models. It is difficult, however, to prevent spam when collecting open text. Spam is intentionally produced and may be identified based on the regular pattern it shows (e.g. copy and paste of the same string along all the units of the task). Low quality responses from crowds are less obvious and not necessarily intentionally produced (e.g. an utterance about sports in response to a scenario regarding weather).
In most crowdsourcing tasks, gold test questions are interspersed throughout the task to measure worker accuracy and ensure data quality. This approach is not applicable when collecting open text responses because there is no single correct response.
In addition to the difficulties with open text collection, unmanaged crowds are difficult to train for complicated or specialized tasks. Workers have limited attention spans and often neglect to read the instructions for their tasks. Most crowdsourcing tasks therefore tend to be simple and intuitive. The task of labeling named entities is more difficult than typical crowdsourcing tasks because it requires an intimate understanding of many different entity labels across domains.
These and other problems exist for using crowds to annotate utterances with NER labels.