1. Field of the Invention
The present invention relates to a method for generating a statistic for phone lengths, and to a method for determining the length of individual phones for speech synthesis.
2. Description of the Related Art
In the present application, a phoneme is taken to mean the smallest linguistic unit which distinguishes meaning, but does not bear meaning in itself (for example “b” in “beg” which can be distinguished from “p” in “peg”). On the other hand, a phone is the uttered sound of a phoneme.
Methods for generating a statistic for phone lengths in which the phone lengths can be controlled on the basis of this statistic during synthetic speech generation are known. In such methods, a text spoken by a speaker is recorded and the spoken and recorded text is segmented into individual phones. The sound length of the individual phones is determined. This phone length is registered in a statistic having a list of triphones. A triphone is a cluster of one or more phonemes with the respective context to the right and to the left.
In the known methods, in each case an average phone length or sound length is assigned to a phoneme of the triphones in their left-right context. This phone length is determined from all the phones of the spoken text which occur in the same context in the spoken text as in the respective triphone, that is to say its adjacent phones correspond to the adjacent phonemes in the triphone.
In the known method for determining the length of individual phones for speech synthesis, the phonemes of the text to be synthesized have assigned to them in the respective average sound length of the phoneme of the statistic whose context in the triphone corresponds to the context of the phoneme in the text to be synthesized. If, for example, the phone length of the phoneme “b” in the word “about” is to be determined, in the known method the phoneme “b” has assigned to it that phone length which is assigned in the statistic to the phoneme “b” in the triphone “abou”. The context of the triphone and in the text to be synthesized are respectively identical here.