Natural language spoken dialog systems enable customers to express what they want in spoken natural language. Such systems automatically extract the meaning from speech input and act upon what people actually say, in contrast to what one would like them to say, shifting the burden from users to the machine user interface. In a natural language spoken dialog system, identifying the customer's intent can be seen as a general intent classification problem.
When statistical classifiers are employed to identify customer intent, they are typically trained using large amounts of task data that is transcribed and labeled by humans, a very expensive and laborious process. Here, labeling generally refers to the assignment of one or more predefined classification labels (e.g., calltypes) to each utterance.
It is clear that the bottleneck in building a decent statistical system is the time spent for high quality labeling. Due to a process that is naturally prone to errors, each one of the labels is usually verified by an independent party to achieve an acceptable level of quality.
An utterance can be mislabeled for many reasons, including simple labeler error and an imperfect description of classification types. It should also be noted that for multi-label tasks, where an utterance may get more than one label, it is necessary to label the utterance with all appropriate labels. If any of the labels is missing, it is considered a labeling error.
For these reasons, usually a second (or maybe more) pass(es) of labeling is required in order to check and fix the labeling errors and inconsistencies of the first (or earlier) pass(es). The motto “There is no data like more data” will generally hold only if the additional data is less “noisy”, i.e., the data contains less than a tolerable number of mislabeled utterances. Most state-of-the-art classifiers can tolerate a few percentage points of noisy data, but more significant error levels can ruin the classification performance no matter how robust the classifiers are.