1. Field of the Invention
The present invention relates to a speech analysis apparatus for estimating precise values of glottal parameters (source parameters) and vocal tract parameters from an input voice signal.
2. Description of the Prior Art
In the prior art, when the voice signal is analyzed, characteristic parameters are extracted from the voice signal. These characteristic parameters are, for example, the vocal tract parameters in the form of formant frequencies and the voiced sound source parameters. A plurality of peaks in the spectrum envelope which characterizes vowels in the voice signal are called formants, each of which is numbered No. 1, No. 2, . . . , in order of increasing frequency. A sound source is an energy source of a voice waveform and the voiced sound source and a voiceless sound source are known. In a voice synthesizer which simulates the speech organs of humans, vocal tract characteristics and radiation characteristics are added to a voiced sound wave which is outputted from the voiced sound source. A voiced sound waveform emitted from the glottis is called the glottal wave and the effect added to the glottal wave between the glottis and the mouth is called the vocal tract characteristic. The effect added when the voice is emitted from lips is called as radiation characteristic. For extracting the characteristic parameters of the sound source from the voice signal to which the characteristics as mentioned above have been added, a speech analysis method called as Adaptive Inverse Filtering (AIF) method is known. This method is described in, for example, "A Comparison of EGG and a New Automatic Inverse Filtering Method in Phonation Change from Breathy to Normal" ICSLP-90 (1990 International Conference on Spoken Language Processing) Proceedings, 6.9.1 pp. 197-202 (1990). The AIF method is a technique which will be explained hereinafter and a circuit arrangement of the speech analysis apparatus using this method is shown in FIG. 1. A signal s(n) to be analyzed is processed through a Hamming window, and then filtered by high-pass filter means 201 to produce a signal sh(n). The high-pass filter means 201 is a filter for eliminating data which causes gradual rising or falling of the output signal. First order LPC (Linear Predictive Coding) analysis means 202 which inputs the sh(n) signal performs analysis by using the LPC in which predicting order is 1. The signal sh(n) is inverse filtered by an inverse-filter 203 using the result of the analysis carried out by the first order LPC analysis means 202 to produce a signal sv(n) with only the effect of the vocal tract. The signal sv(n) is analyzed by a high order LPC analysis means 204 which analyzes by using high prediction order of LPC, and then the signal sh(n) is inverse filtered by an inverse-filter 205 using the result of the analysis to produce a signal sgr(n) with only the effects of the sound source and the radiation characteristics. If necessary, the signal sgr(n) is integrated by an integrator 206 to produce the source waveform alone. Then, the source parameters are also extracted from the signal sgr(n) by extracting means 207.
In the prior art apparatus of this type, because the signal is inverse filtered as it is, regardless of the number of formants, the result of the analysis is inaccurate. It will be explained in more detail hereinafter.
If the high prediction order of the high order LPC analysis means 204 is twice the expected number of poles, vocal tract components can be eliminated mainly in the inverse filter 205. However, since the LPC prediction order used for expressing the source characteristics is only first order (the processing of the first order analysis means 202), the extraction of the source waveform becomes inaccurate. Namely, when the source characteristics estimated by the first order LPC prediction is expressed as shown in FIG. 2, as compared to, for example, fifth order LPC prediction shown in FIG. 3, the extraction parts of particularly the lower frequencies can be seen to be inaccurate. On the other hand, if the high prediction order is larger than twice the expected number of poles, the inverse filter 205 will eliminate not only the glottal components but also vocal tract components, because the results of the LPC include both glottal and source components. Furthermore, when the LPC analysis method is used in the AIF method, estimation of the pole frequency and bandwidth is not as accurate as estimation by other methods such as AbS (Analysis-by-synthesis).