1. Field of the Invention
The present invention relates generally to improvements in the field of speech recognition, and more particularly to advantageous aspects of methods and an apparatus for discriminative training and adaptation of pronunciation networks.
2. Description of the Prior Art
A wide variety of techniques are used to perform speech recognition. Typically, speech recognition starts with the digital sampling of speech. The next stage is acoustic signal processing. Most techniques include spectral analysis.
The next stage is the recognition of phonemes, groups of phonemes, and words. This stage can be accomplished by various processes, including dynamic time warping, hidden Markov modeling, neural networks, expert systems, and combinations of techniques. An HMM-based system is currently the most commonly used and most successful approach for many applications.
A hidden Markov model (HMM) is a stochastic finite state machine that generates strings of states, transitions among which are governed by a set of transition probabilities. In speech recognition applications, the strings are speech signals, and one HMM is trained for each word in the vocabulary. Once a stochastic model has been fitted to a collection of objects, that model can be applied to classify new objects. Thus, given a new speech signal, it can be determined which HMM is most likely to have generated it, and thereby a guess can be made as to which word in the vocabulary has been spoken.
Typical prior-art speech recognition methods suffer from certain disadvantages. For example, prior-art methods may place inordinate demands on memory and computational power resources. Further, these methods may not achieve acceptable results where there is a small amount of training data available for each word.