The present invention relates to a speech synthesizer and more particularly to a speech synthesizer which is suitable for obtaining a synthesized speech of high quality.
The basic construction of a speech synthesis system is, for example, described in detail in an article "PROCESSING OF DIGITAL SIGNAL OF SPEECH" by Rabiner (translated by Suzuki), April 1983, and in an article "DIGITAL PROCESSING OF VOICE" by Furui, The Tokai University Publishing Society, September, 1985.
In those articles, "a vocoder" is introduced as a kind of speech synthesizer. The vocoder serves to increase the information compressibility of the speech to perform the transmission and synthesis. In the vocoder, the spectrum envelop is obtained from the speech and the speech to be reconstructed is synthesized on the basis of the spectrum envelop. The various kinds of vocoders have heretofore been developed in order to improve the sound quality. In this connection, as the typical ones, there are given the channel vocoder and homomorphic vocoder.
In the systems employing those vocoders, however, since the accuracy of extracting the spectrum envelop information is insufficient, the quality of the synthesized speech is questionable. On the other hand, as a new method of extracting the spectrum envelop information, there is recently proposed a PSE (Power Spectrum Envelop) method. This method is a method wherein the Fourier power spectrum of speech is sampled with a pitch frequency. It is considered that the synthesized speech obtained by this method has a high quality, as compared with the prior art system. The details thereof can be referred to an article "POWER SPECTRUM ENVELOP (PSE) SPEECH ANALYSIS/SYNTHESIS SYSTEM" by Nakajima et al. (JOURNAL OF THE ACOUSTICAL SOCIETY OF JAPAN, Vol. 44, No. 11, 1988-11).
In the system of synthesizing speech using in the above-mentioned PSE analysis/synthesis method, in the same manner as in the homomorphic vocoder, the impulse response is subjected to the synthesized speech at intervals of pitch period. According to the above article by the Nakajima et al., the impulse response is obtained by setting the zero phase. This is based on the knowledge in which the acoustic sense characteristics of a human has the dull sensitivity to the phase. Moreover, according to the above article "PROCESSING OF DIGITAL SIGNAL OF SPEECH" by Rabiner, in addition to the zero phase, the minimum phase and the maximum phase are set to obtain the impulse response, and the qualities of individual synthesized speech are compared with one another. As a result, it is concluded that the best quality of synthesized speech can be obtained by the minimum phase method.