1. Field of the Invention
This invention relates to a method and an apparatus for speech synthesis in which the speech is synthesized from a string of letters or characters or from a string of phoneme symbols. More particularly, it relates to a method and an apparatus for speech synthesis in which the speech is synthesized by overlapping plural pitch waveforms.
This application claims priority of Japanese Patent Application No. 2003-169988, filed in Japan on Jun. 13, 2003, the entirety of which is incorporated by reference herein.
2. Description of Related Art
In a parameter type speech synthesis apparatus, it has so far been known that the quality of the synthesized speech is affected significantly depending on how approximate in expression the spectral envelope characteristics of the speech synthesized may be to those of the natural speech. Up to now, several parameter type speech synthesis systems have been proposed. For example, in the following Non-Patent Cited Document 1, such a formant synthesis system has been proposed in which the formant of the speech is represented by all-pole filters of the order of the degree two, these filters being interconnected in series or in parallel to represent the envelope characteristics of the entire spectrum.
There is also known a parameter synthesis system employing linear predictive coding (LPC) employing in turn the parameters derived from a linear prediction model, or a variety of linear prediction filters, such as LSP (linear spectrum pair) or PARCOR (partial auto-correlation coefficient). The system employing the LSP parameters is described in, for example, the Non-Patent Cited Document 2.
Non-Patent Cited Document 1
    Klatt, D. H., “Software for a Cascade/Parallel Formant Synthesis”, Journal of the Acoustical Society of America, March 1980, Vol. 67, No. 3, pp. 971 to 995.Non-Patent Cited Document 2    Sadaoki Furui, “Digital Speech Processing”, Tokai University Publishing Section, pp. 89 to 98.
However, the formant synthesis and the synthesis system for the linear prediction system is basically the all-pole model and, when seen on a Z-plane, a formant is merely expressed by a sole zero point. FIGS. 9A and 9B are graphs showing the characteristics of an all-pole filter of the degree two by taking the amplitude and the frequency on the ordinate and on the abscissa, respectively. The frequency characteristics of the all-pole filter, represented by Yi=aXi+bYi−1+cYi−2, where X and Y are input and output signals, respectively, are featured by the fact that the bandwidth w or the center frequency fc of the formant, shown in FIG. 9A, cannot be controlled independently. That is, if the bandwidth w or the center frequency fc is changed individually, the shape of the spectral characteristics itself is changed significantly. For example, if the bandwidth is narrowed, as shown in FIG. 9B, the shape of the graph in the vicinity of peak area becomes sharp. Thus, the resulting sound is such a one in which emphasis is placed on only a limited portion of the formant frequency. That is, the method employing the all-pole filter suffers from the problem that parameter adjustment is highly critical such that it is difficult to obtain the desired frequency characteristics.
Moreover, since the side lobe is moderate, change of a parameter representing a formant affects the shape of the frequency ranges of other formants present ahead and at back of the formant, such that individual formants cannot be controlled by individual parameters.