The present invention relates to a speech analysis method used in a speech processing apparatus, and more particularly to a speech analysis method which can reduce variations in analytical result due to a change in pitch of speech signal and can accurately analyze even a quasi-stationary speech signal.
In a speech processing apparatus, speech analysis is usually carried out to extract features of a speech. Further, in the speech analysis, window multiplication is usually carried out for a speech signal. The window multiplication suitable for use in speech analysis has been widely studied, and is described in detail, for example, on pages 250 to 260 of a book entitled "Digital Processing of Speech Signals" by L. R. Rabiner et al. (Prentice-Hall Inc.). Usually, a Hamming window having a duration of 10 to 30 msec is used for a speech signal.
Speech waveforms (a) and (b) of FIG. 2 show examples of a vowel [i!] spoken by adult men. The waveforms (a) and (b) are different in pitch period from each other, but are substantially equal in shape of one-pitch waveform portion to each other. Accordingly, a listener cannot detect the difference in tone quality between the speech waveforms (a) and (b).
The speech analysis is required to obtain spectral information independent of the pitch period. That is, it is required that the analytical results of the speech waveforms (a) and (b) are identical with each other. According to a conventional speech analysis method, however, the analytical results of the waveforms (a) and (b) are greatly different from each other. FIG. 3 shows spectra which are obtained by extracting a one-pitch waveform from each of the speech waveforms (a) and (b) of FIG. 2, and by carrying out discrete Fourier transform (DFT) for the extracted one-pitch waveforms. Although only higher harmonics of the pitch frequency (that is, reciprocal of the pitch period) are obtained by the DFT, curves obtained by carrying out linear interpolation for the higher harmonics are shown in FIG. 3. A formant frequency which has the highest level in FIG. 3, is the reciporcal of the pitch period of the first formant component shown in FIG. 2. In the speech waveforms (a) and (b) of FIG. 2, the first formant component has the same period (that is, a period of 3.45 msec) and thus a formant frequency of 290 Hz. While, the speech waveform (a) has a pitch frequency of 130 Hz, and the speech waveform (b) has a pitch frequency of 115 Hz. As can be seen from FIG. 3, the spectrum of a speech signal is changed when the pitch frequency thereof varies. A change in spectrum is remarkable when the difference between the formant frequency and a harmonic of the pitch frequency is large.
Even when the analytical region for speech analysis is enlarged and thus the frequency resolution is enhanced, it is impossible to detect the first formant component accurately. FIG. 4 shows a spectrum which is obtained by extracting a double-pitch waveform from the speech waveform (b) of FIG. 2 and by carrying out the DFT for the extracted waveform. The spectrum of FIG. 4 has a frequency resolution of 57.5 Hz (namely, 115/2 Hz), because the analytical region is doubled. Thus, a Fourier component having a frequency of 287.5 Hz is obtained. The frequency of this spectral line (namely, 287.5 Hz) is nearly equal to the formant frequency having the highest spectral level (namely, 290 Hz), but the level of the above spectral line is very low. This is because adjacent one-pitch waveforms are different in phase of the first formant component from each other. The degree of phase shift can be known from the decimal part of a quotient which is obtained by dividing the pitch period of a speech signal by the period of the first formant component. When the decimal part of the quotient is zero, the adjacent one-pitch waveforms are equal in phase of the first formant component to each other. When the decimal part of the quotient is 0.5, the adjacent one-pitch waveforms are opposite in phase of the first formant component. For example, in the speech waveform (b) of FIG. 2, the pitch period is 8.7 msec, and the period of the first formant component is 3.45 msec. Accordingly, the quotient which is obtained by dividing the former period by the latter period, is 2.52, and the decimal part of the quotient is 0.52. Thus, adjacent one-pitch waveforms are substantially opposite in phase of the first formant component to each other.
As mentioned above, variations in spectrum of speech signal due to a change in pitch period of the speech signal is based upon a fact that adjacent one-pitch waveforms of the speech signal are different in phase of the first formant component from each other. Such variations in spectrum cannot be eliminated by increasing the number of one-pitch waveforms included in the analytical region or by carrying out window multiplication for the speech signal.