The present invention relates to models of speech. In particular, the present invention relates to vocal tract resonance (VTR) models of structured speech and integrating the VTR models into cepstra prediction.
Human speech contains spectral prominences or VTRs. These VTRs carry a significant amount of the information contained in human speech.
In the past, attempts have been made to model the VTRs associated with particular phonetic units, such as phonemes, using discrete state models such as a Hidden Markov Model. Such models have been less than ideal, however, because they do not perform well when the speaking rate increases or the articulation effort of the speaker decreases.
Research into the behavior of VTRs during speech indicates that one possible reason for the difficulty of conventional Hidden Markov Model based systems in handling fluent speech is that during fluent speech the static VTR values and hence the static acoustic information for different classes of phonetic units become very similar as the speaking rate increases or the articulation effort decreases.
Although this phenomenon, known as reduction, has been observed in human speech, an adequate and quantitative model for predicting such behavior in VTR tracts has not been developed. As such, a model is needed that predicts the observed dynamic patterns of the VTRs based on the interaction between phonetic context, speaking rate, and speaking style.