The present invention relates to a speech information processing method and apparatus for setting the duration of a phoneme upon speech synthesis, and a computer-readable storage medium holding a program for execution of a speech information processing method.
Recently, a speech synthesis apparatus has been developed so as to convert an arbitrary character string into a phonological series and convert the phonological series into synthesized speech in accordance with a predetermined speech synthesis by rule.
However, the synthesized speech outputted from the conventional speech synthesis apparatus sounds unnatural and mechanical in comparison with natural speech sounded by human being.
For example, in a phonological series xe2x80x9co, X, s, e, ixe2x80x9d of a character series xe2x80x9conseixe2x80x9d, the accuracy of a rule for controlling the duration of generating each phoneme is considered as one of the factors of the awkward-sounding result. If the accuracy is low, as appropriate duration cannot be assigned to each phoneme, the synthesized speech becomes unnatural and mechanical.
The present invention has been made in consideration of the above prior art, and has as its object to provide a speech information processing method and apparatus for setting the duration of phonological series with high accuracy and setting natural phonological duration in accordance with phonemic/linguistic environment.
To attain the foregoing objects, the present invention provides a speech information processing apparatus comprising: means for obtaining a duration of a predetermined unit of phonological series based on a duration model for an entire segment; means for obtaining a duration of each of phonemes constructing the phonological series based on a duration model for a partial segment; setting means for setting a duration of each of the phonemes based on the duration of the phonological series and the duration of each of the phonemes; and speech synthesis means for synthesizing speech based on the duration of each of the phonemes set by the setting means.
Further, the present invention provides a speech information processing method comprising: a step of obtaining a duration of a predetermined unit of phonological series based on a duration model for an entire segment; a step of obtaining a duration of each of phonemes constructing the phonological series based on a duration model for a partial segment; a setting step of setting a duration of each of the phonemes based on the duration of the phonological series and the duration of each of the phonemes; and a speech synthesis step of synthesizing speech based on the duration of each of the phonemes set at the setting step.
Other features and advantages of the present invention will be apparent from the following description taken in conjunction with the accompanying drawings, in which like reference characters designate the same name or similar parts throughout the figures thereof.