1. Field of the Invention
The present invention relates to a speech analyzing and synthesizing method, for analyzing speech into parameters and synthesizing speech again from the parameters.
2. Related Background Art
As a method for speech analysis and synthesis, there is already known the mel cepstrum method.
In this method, speech analysis for obtaining spectrum envelope information is conducted by determining a spectrum envelope by the improved cepstrum method, and converting it into cepstrum coefficients on a non-linear frequency scale similar to the mel scale. Speech synthesis is conducted using a mel logarithmic spectrum approximation (MLSA) filter as the synthesizing filter, the speech is synthesized by entering the cepstrum coefficients, obtained by the speech analysis, as the filter coefficients.
The Power spectrum envelope method is also known in this field (PSE).
In the speech analysis using this method, the spectrum envelope is determined by sampling a power spectrum, obtained from the speech wave by FFT, at positions of multiples of a basic frequency, and smoothy connecting the obtained sample points with consine polynomials. Speech synthesis in conducted by determining zero-phase impulse response waves from thus obtained spectrum envelope and superposing the waves at the basic period (reciprocal of the basic frequency).
Such conventional methods, however, have been associated with following drawbacks.
(1) In the mel cepstrum method, at the determination of the spectrum envelope by the improved cepstrum method, the spectrum envelope tends to vibrate depending on the relation between the order of the cepstrum coefficient and the basic frequency of the speech. Consequently, the order of the cepstrum coefficient has to be regulated according to the basic frequency of the speech. Also this method is unable to follow a rapid change in the spectrum, if it has a wide dynamic range between the peak and the zero level. For these reasons, speech analysis in the mel cepstrum method is unsuitable for precise determination of the spectrum envelope, and gives rise to a deterioration in the tone quality. On the other hand, speech analysis in the PSE method is not associated with such drawback, since the spectrum is sampled with the basic frequency and the envelope is determined by an approximating curve (cosine polynomials) passing through the sample points.
(2) However, in the PSE method, speech synthesis by the superposition of zero-phase impulse response waves requires a buffer memory for storing the synthesized wave, in order to superpose the impulse response waves symmetrically to a time zero. Also, since the superposition of impulse response waves takes place in the synthesis of a voiceless speech period, a cycle period of superposition inevitably exists in the synthesized sound of such voiceless speech period. Thus the resulting spectrum is not a continuous spectrum, such as that of white noise, but becomes a line spectrum having energy only at multiples of the superposing frequency. Such a property is quite different from that of actual speech. For these reasons speech synthesis using the PSE method is unsuitable for real-time processing, and the characteristics of the synthesized speech are not satisfactory. On the other hand, the speech synthesis in the mel cepstrum method is easily capable of real-time processing for example with a DSP because of the use of a filter (MLSA filter), and can also prevent the drawback in the PSE method, by changing the sound source between a voiced speech period and an unvoiced speech period, employing white noise as the source for the unvoiced speech period.