Natural language business systems typically require transcription of speech and labels of correct concepts or classes. Further, large amounts of transcribed and/or labeled data are typically required to train statistical models such as understanding models or speech recognition models, including acoustic and language models. Human transcription of speech data tends to be costly, slow and unreliable. Further, correct classification is also challenging. For example, in the case of a natural language call routing system, knowledge of domain and business logic may be spread across many different departments and representatives, each of whom may only have knowledge of a limited area. Accordingly, to cover an entire enterprise, data labeling and checking may need to be done by several different experts in order to ensure high accuracy, which tends to be a very costly procedure. Furthermore, statistical models have typically been trained in isolation to optimize a parameter associated only with the particular statistical model. However, such criteria may not be appropriate for the best overall performance of a natural language business system.
In view of the foregoing, there is a need in the prior art for training statistical models in a manner that produces desirable results for the overall natural language business system, and there is also a need for automated transcription and labeling of appropriate operational data.