This invention relates to a method of determining formant frequencies from a part of a speech signal located within a given time interval, in which
for successive instants located within the time interval a parameter value is derived from the part of the speech signal located within the time interval,
a polynomial of a given order is determined from the parameter values,
the formant frequencies are derived from the given polynomial. The invention, also relates to a device for performing the method.
A method of and a device for deriving the formant frequencies in a speech signal is described in U.S. Pat. No. 4,346,262 (8/24/82), which is hereby incorporated by reference.
Formants are actually the resonances of the vocal cords and are characterized by much energy in the spectrum. During speaking the vocal cords constantly change their shape and hence the formants also change as far as the location on the frequency axis and the bandwidth are concerned. In a source filter model for speech production a description of the filter in terms of format frequencies and bandwidths is frequently used. The speech analysis for the Philips' speech synthesis chips MEA 8000 and PCF 8200 also uses a formant description of the speech signal, see list of literature (1) and (2). The reasons for using a formant description are:
economical coding is possible,
data to be interpreted physically are concerned so that manipulation provide an insight, such as for example concatenation of diphone segments and editing for the speech synthesis chip.
The description above gives the impression that the speech signal could always be described by means of a number of formants (=resonances). In that case the filter in the source filter model only comprises resonances (all pole filter). In running speech the speech production system does not always comply with this model: there are sounds for which the model should comprise fewer formants or there are sounds for which the model, besides comprising formants, should also comprise zeros (that means antiresonances: this is a frequency range in which a phenomenon contrasting with resonance occur so that the signal is not subjected to a resonant rise but is notched, and in which there is locally little energy in the spectrum). However, in a practical system the structure of the source filter model and hence the numbers of formants is laid down. The fact that the model used is not adapted to all actually occurring situations causes an operational definition to be given to the formants in the case of speech synthesis. The speech synthesis filter only comprises a fixed number of formants (and no zeros) and the associated speech analysis is assigned to find the model parameters independently of the suitability of the model for speech production.
A formant analysis is extensively described in (3). Two problems occur in this formant analysis:
the prescribed number of formants is not always found,
occasionally the analysis fails for numerical reasons: the algorithm used does not converge.