Timing in speech synthesis is an important factor in the sound of the spoken words.
The timing of synthesized phonemes in spoken words of different languages requires rules which are consistent with the particular language. Languages are classified as to timing into at least two groups, languages with stress timed rhythm including English and German, and languages with syllable timed rhythm including Spanish and French.
In some languages, such as English and German, native speakers generate approximately equal timing between stressed syllables, and these languages are referred to as having stress timed rhythm.
However, in other languages such as Spanish and French, native speakers generate a strong component of syllable timing, referred to as syllable timed rhythm. In syllable timed rhythm, the speaker places substantially equal time duration on each syllable as the words are spoken.
Both languages with stress timed rhythm and languages with syllable timed rhythm are often synthesized by simply assigning an inherent time duration to each phoneme, modifying the duration with timing rules, and therefore allowing the time duration between rhythm elements to be synthesized by simply adding the time duration of the intervening phonemes.
Although intelligible speech may be produced by simply assigning durations to the phonemes, an improvement is needed so that the sound of the synthesized speech may be produced by more accurately timing the rhythm to correspond with rhythm elements of the language.
There is needed a way to produce synthesized speech so that the time duration of a sequence of phonemes can be adjusted to a desired value.