1. Field of the Invention
The present disclosure relates to automatic speech recognition (ASR) and more specifically to an iterative method of active learning for reducing the transcription effort for training in ASR.
2. Discussion of Related Art
State-of-the-art speech recognition systems require transcribed utterances for training, and transcription is a labor-intensive and time-consuming process. The search for effective training data sampling algorithms, in order to have better systems with less annotated data by giving the system some control over the inputs on which it trains, has been studied under the title of “active learning.” Previous work in active learning has concentrated on two approaches: certainty-based methods and committee-based methods. In the certainty-based methods, an initial system is trained using a small set of annotated examples. Then, the system examines and labels the un-annotated examples and determines the certainties of its predictions of them. The “k” examples with the lowest certainties are then presented to the labelers for annotation. In the committee-based methods, a distinct set of classifiers is also created using the small set of annotated examples. The un-annotated instances, whose annotations differ most when presented to different classifiers, are presented to the labelers for annotation. In both paradigms, a new system is trained using the new set of annotated examples, and this process is repeated until the system performance converges to a limit.
In the language-processing framework, certainty-based methods have been used for natural language parsing and information extraction. Similar sampling strategies were examined for text categorization, not to reduce the transcription cost but to reduce the training time by using less training data. While there is a wide literature on confidence score computation in ASR, few if any of these works address the active learning question for speech recognition.