This invention relates to a system for the extraction of pole parameter values in the voice output frequency characteristic pattern to be used for the analysis-synthesis or the recognition of voices.
It is known that the frequency spectrum of the voice waveform has frequency components called formants at which energies are concentrated corresponding to the resonant frequencies of the vocal tract. It is also known that the formants substantially correspond to the pole parameters obtained by approximating the frequency spectrum of the voice waveform based on the total pole model. As a typical way of extracting the pole parameter (formant parameter) from the voice waveform, there is known the so-called AbS (analysis by synthesis) method in which frequency spectrum for various formant patterns are synthesized on the basis of a voice forming model, for approximation of the synthesized frequency spectrum to the spectrum of natural voice. Further as a way of extracting formants by use of the AbS type technique, there is known a method entitled "Automatic Formant Tracking by a Newton-Raphson Technique" by J. P. Olive. The Journal of the Acoustical Society of America, Vol. 50, No. 2 (Part 2), 1971, pp 661-670, which discloses rather close resemblance to a system of the present invention.
This proposal accomplishes the formant extraction by use of the least-square fit (equivalent to inverse filtering in the region of frequency. This method, however, has the disadvantage that it entails a huge volume of arithmetic operations and, therefore, prevents real-time processing with a practical circuit of a small scale.
As is well known, there is also available a method in which a multiplicity of pole parameter values are prepared, a voice signal is applied to an inverse filter using linear prediction coefficients derived from the various pole parameter values, and a pole parameter is determined which minimizes the error power obtained by accumulating squares of the output values from the inverse filter. More particularly, since the transfer function A(z) (z=ej.omega.T, T: sampling period) obtained by approximating the frequency spectrum envelope of the voice waveform on the basis of the total pole model is expressed by the following formula: ##EQU1## where .alpha..sub.1m =-b.sub.m.sup.2
.alpha..sub.2m =2b.sub.m cos 2.pi.f.sub.m T PA1 M: number of poles PA1 f.sub.m : frequency of pole PA1 b.sub.m : bandwidth of pole PA1 H.sub.m (z): transfer function at the m-th pole,
this method selects such a pole parameter as will minimize the energy (error power) of the output waveform obtained by passing the actual voice signal through the inverse filter of A.sup.-1 (z) which is the reciprocal of the filter of the formula (1).
The inverse filter of H.sub.m.sup.-1 (z) corresponding to one formant, when two linear prediction coefficients .alpha..sub.1m and .alpha..sub.2m are given, delivers an output signal e.sub.n corresponding to an input signal S.sub.n, which is expressed as: EQU e.sub.n =S.sub.n -.alpha..sub.1m S.sub.n-1 -.alpha..sub.2m S.sub.n-2 ( 2)
The error power E, therefore, is given by the following formula: ##EQU2## where n.sub.A and n.sub.B are the first and last sampling numbers in the analysis window. It is known that the time width of the analysis window for the voice is required to be about 30 m.sec. If the voice waveform is sampled at 10 KHz, for example, then the length of the accumulation area (n.sub.B -n.sub.A) of the formula (3) is about 300. The calculation of the error power of the formula (3) for the linear prediction coefficients corresponding to the various pole parameter values, therefore, entails a huge volume of arithmetic operations. The combination of relevant prediction coefficients with respect to a total of four formants, for example, proves to be a highly troublesome work.