The invention relates to the production of acoustic models of words for use in automatic speech recognition.
The acoustic modeling of words using hidden Markov models have been described in, for example, U.S. Pat. No. 4,759,068. In the speech recognition system described in that patent, and in other speech recognition system described in that patent, and in other speech regonition systems, an acoustic model for each word in the recognizer vocabulary is constructed by concatenating one or more elemental models selected from a finite alphabet of elemental models. Because each elemental model represents only a portion of a word, it is possible to construct models for each word in a large vocabulary of words from a relatively small alphabet of elemental models.
The use of a relatively small alphabet of element models in constructing an acoustic model for each word in a relatively large vocabulary of words has at least two advantages. First, the amount of electronic memory required to store the structure and parameters of the entire alphabet of elemental models and the information necessary to construct each word model from the alphabet of elemental models is significantly less than the amount of electronic memory required to store the structure and parameters of a whole acoustic model for each word in the vocabulary. Second, since the alphabet of elemental models is much smaller than the vocabulary of words, a new speaker can train the entire alphabet of elemental models to his voice by uttering a relatively small number of words.
Despite the advantages described above, in the known methods of constructing acoustic models of words from a finite alphabet of elemental models, it has been found that there are portions of words whose pronunciation cannot adequately be represented by a single elemental model.