1. Field of the Invention
The present invention relates generally to speech recognition systems and, more particularly, to an unsupervised and active learning process for automatic speech recognition systems.
2. Introduction
State-of-the-art speech recognition systems require transcribed utterances for training of various knowledge sources, such as language and acoustic models, used in the speech recognition system. In general, acoustic models can include the representation of knowledge about acoustics, phonetics, microphone and environment variability, gender and dialect differences among speakers, etc. Language models, on the other hand, can refer to a system's knowledge of what constitutes a possible word, what words are likely to co-occur, and in what sequence. The semantics and functions related to an operation a user may wish to perform may also be necessary for the language model.
Many uncertainties exist in automatic speech recognition (ASR). For example, uncertainties may relate to speaker characteristics, speech style and rate, recognition of basic speech segments, possible words, likely words, unknown words, grammatical variation, noise interference, and nonnative accents. Each of these uncertainties can be the cause in the reduction of recognition success in an ASR system. A successful speech recognition system must therefore contend with all of these issues.
A speech recognition system generally seeks to minimize uncertainties through the effective training of the system. As noted, this training is based on the generation of transcribed utterances, a process that is labor intensive and time-consuming process. As would be appreciated, if issues of cost were ignored, more effective speech recognition systems could be created through the use of greater amounts of transcribed data. This does not represent a practical solution. What is needed therefore is an efficient mechanism for creating a quality speech recognition system using all available sources of training data, whether existing in transcribed or un-transcribed form.