Methods for extracting information starting from signals which have a great content of information, such as audio signals (vocal and musical signals especially) are more and more important. They are in particular used in an increasing number of applications such as for example: speech recognition, analysis of musical signals, detection in telephony of service signaling (DTMF) in presence of audio signals.
More precisely, these applications comprise a step for extracting frequency information (typically: amplitude and evolution of amplitude for narrow frequency bands extracted from signal), followed by steps for recognizing or identifying using, often mainly, this information relative to frequencies contained in these signals.
However, one generally considers that there is necessarily a trade-off to be done between determination with a good accuracy of the frequencies contained in the signal on one hand, and on the other hand determination with a good accuracy of the instants of appearance and disappearance of the various frequencies contained in the signal to be studied.
Other signals, with a higher frequency, such as ultrasound signals or wideband radio signals may also have a great content of information and benefit from the same technical principles for frequency recognition as signals of vocal type.
Within this context, nature and richness of information gathered during extraction of information related to frequencies contained in the signal to be studied play a very significant role for the subsequent steps of signal processing, which are often steps for recognition or identification of signals determined by their frequency profile and their time profile (for example phonemes in vocal recognitions), and thus for the performance of the whole signal processing chain that may exist in such applications.
For example, signal processing for speech recognition is actually principally done by the means of audio filter banks operating in parallel (according to the principle of “vocoder”) or, equivalently, by sliding windowed Fourier Transforms (that is operating on signals that have been previously multiplied by window). The audio signal main have been filtered in order to suppress or enhance frequencies, in order to operate on the audio frequency band which contains most of the information that allow speech recognition or source identification that is a frequency band containing the frequency band extending from 300 Hz to 3.200 Hz (frequency band for telephony).
Typically, frequency information obtained are obtained with time windows that have a duration on the order of 10 to 20 milliseconds, duration during which audio signals are supposed to be stationary (or quasi stationary).
This stationarity hypothesis is globally respected, but prevents to see well transitions between periods during which signal is stationary (or quasi stationary).
In the case of signal analysis Sliding Fourier Transform (TFG), but also with other techniques such as Wavelet analysis, it is a known fact that it is not possible to have simultaneously a good resolution in time and a good resolution in frequency. Furthermore, a good noise rejection is associated with an analysis that is as accurate as possible in frequency.
It would thus be particularly advantageous to have information that is both precise in time and frequency, and that also allow to reject noise as much as possible.