The present invention relates to models of speech. In particular, the present invention relates to formant models of fluent speech.
Human speech contains spectral promanances or formants. These formants carry a significant amount of the information contained in human speech.
In the past, attempts have been made to model the formants associated with particular phonetic units, such as phonemes, using discrete state models such as a Hidden Markov Model. Such models have been less than ideal, however, because they do not perform well when the speaking rate increases or the articulation of the speaker decreases.
Research into the behavior of formants during speech indicates that one possible reason for the failure of HMM based formant systems in handling fluent speech is that during fluent speech the formant values for different classes of phonetic units become very similar as the speaking rate increases or the articulation effort decreases.
Although this phenomenon, known as reduction, has been observed in human speech, an adequate model for predicting such behavior in formant tracks has not been developed. As such, a model is needed that predicts the observed dynamic patterns of the formants based on the interaction between phonetic context, speaking rate, and speaking style.