1. Field of the Invention
The present invention relates to speech recognition and more specifically to pronunciation modeling.
2. Introduction
Pronunciation modeling is a way to model speech having different accents or dialects. One problem with current pronunciation modeling approaches is that dialectal variations are difficult to separate from other differences in speech, such as gender differences, age differences, and so forth. Two standard pronunciation modeling techniques are known in the art. A first manual approach to pronunciation modeling involves human linguists manually creating pronunciation dictionaries. A second automatic approach creates acoustic clusters that are very marginally tied to dialectal variation, if at all. Instead, this automatic approach partitions data into acoustic dimensions unrelated to dialect, such as males/females. Traditional pronunciation modeling techniques are rarely able to address dialectal variation, because other acoustic variations dominate and are easily recognized. When traditional pronunciation modeling techniques do address dialectal variations, the process is expensive and slow. These techniques produce dictionaries using an alternative phoneme symbol to allow for an alternative dialectal pronunciation. So, for example, dictionaries describing southern accents that diphthongize some lax vowels include “ey” in parallel to “ae”. The problem with this solution is that the diphthongized “ae” is different both from the conventional “ae”, conventional “ey” and the “ey” within the dialect that would diphthongize “ae”. These related but separately stored phonemes cause confusion and disparity when modeling various speech dialects. Accordingly, what is needed in the art is an improved way to model pronunciations.