As a speech synthesizing method of obtaining desired synthesized speech, a method of generating synthesized speech by editing and concatenating speech segments in units of phonemes or CV/VC, VCV, and the like is known. Note that CV/VC is a unit with a speech segment boundary set in each phoneme, and VCV is a unit with a speech segment boundary set in a vowel.
FIGS. 9A to 9C are views schematically showing an example of a method of changing the duration length and fundamental frequency of one speech segment. The speech waveform of one speech segment shown in FIG. 9A is divided into a plurality of small speech segments by a plurality of window functions in FIG. 9B. In this case, for a voiced sound portion (a voiced sound region in the second half of a speech waveform), a window function having a time width synchronous with the pitch of the original speech is used. For an unvoiced sound portion (an unvoiced sound region in the first half of the speech waveform), a window function having an appropriate time width (longer than that for a voiced sound portion in general) is used.
By repeating a plurality of small speech segments obtained in this manner, thinning out some of them, and changing the intervals, the duration length and fundamental frequency of synthesized speech can be changed. For example, the duration length of synthesized speech can be reduced by thinning out small speech segments, and can be increased by repeating small speech segments. The fundamental frequency of synthesized speech can be increased by reducing the intervals between small speech segments of a voiced sound portion, and can be decreased by increasing the intervals between the small speech segments of the voiced sound portion. By overlapping a plurality of small speech segments obtained by such repetition, thinning out, and interval changes, synthesized speech having a desired duration length and fundamental frequency can be obtained.
Speech, however, has steady and unsteady portions. If the above waveform editing operation (i.e., repeating small speech segments, thinning out small speech segments, and changing the intervals between them) is performed for an unsteady portion (especially, a portion near the boundary between a voiced sound portion and an unvoiced sound portion at which the shape of a waveform greatly changes), synthesized speech may have a rounded waveform or abnormal sounds may be produced, resulting in a deterioration in synthesized speech.