Methods for extracting information from audio signals (musical signals and vocal signals in particular) are increasingly important. They are in particular used in an increasing number of applications such as for example: voice recognition, analysis of musical signals, detection of service signals for telephony applications (DTMFs) in presence of voice signals.
More precisely, these applications include a step of extraction of frequency domain information (typically: amplitude and amplitude evolution for narrow frequency bands extracted from the signal), followed by steps of recognition or identification using, often as main information, this frequency domain information. These two steps use different techniques: signal processing for the first, pattern recognition for the second. The step of extraction of frequency domain information is often implemented by means of Sliding Fourier Transform (or Short-Term Fourier Transform).
Within that context, nature and wealth of information gathered during the frequency domain information extraction step play a very significant role for the subsequent step of recognition or identification, and thus for the performance of such applications as a whole.
Audio signal processing is currently mainly done by means of audio filter banks operating in parallel (according to the principle of a vocoder) or, equivalently, of windowed Fourier transforms (that is operating on signals previously multiplied by a window). The audio signal may have been subjected to filtering operations aiming at eliminating or enhancing frequencies, for example for enhancing high frequencies and/or limiting the bandwidth of the signal that will be subject to processing. On the other hand, these processings operate on the usual audio signal without that signal incurring any frequency modification.
As a consequence, these processings operate on the part of the audio frequency band which contains the main part of the information allowing voice recognition or identification of the source, that is a frequency band including the frequency band going from 300 Hz to 3200 Hz (frequency band for telephony). The frequency domain information is obtained with time domain windows with a duration on the order of 10 to 20 milliseconds, duration during which audio signals are assumed to be stationary (or quasi stationary).
This assumption of stationarity or of quasi-stationarity is globally well respected, but prevents from seeing well the transitions between periods during which the signal is stationary (or quasi stationary).
The information associated with each frequency gathered during the extraction of frequencies is frequency and amplitude information regarding:
a) A set of frequencies defined in advance (actually frequency bands with a width defined in advance centered on this set of defined frequencies)
b) A given time window, which typically changes by discontinuous steps
The recognition or identification steps generally use the fact that, at a given instant, or during successive instants, a set of well defined frequencies are present together.
From this perspective, the following factors are thus particularly important:
a) the accuracy with which the frequencies are detected (the width of the frequency band centered around each pre defined frequency) is an important parameter: at least for low frequencies, in particular lower to 800 Hertz approximately, it is important that these frequencies are known with the best possible precision both for amplitude and phase
b) the wealth of information associated with each frequency so detected (for example: amplitude, instantaneous frequency, variations in time of that information)
The object of the present invention is to obtain, with simple and economical means such information associated with frequency, and to obtain it continuously.
Such frequency domain information can allow improving performances of applications that comprise such frequency extraction step.
It may also be advantageous to have information related to Sliding Fourier Transform (STFT), in parallel with instantaneous frequency and amplitude information. Indeed, it is advantageous in some applications to be able to make a synthesis of the analyzed signal, in particular after the analyzed signal has been subjected to transformations in the frequency domain, and conditions for invertibility of STFT are known.
It is known that one way of analyzing the result of a STFT on a signal is to notice that executing a STFT is equivalent to make the signal going through a bank of passband filters, which the invention permits. One then notices that the invention permits to get into the conditions under which the STFT is invertible.
One finally notices that, starting from the previous observations according to which the invention allows to make an invertible STFT, it is possible to use the invention to perform a Hilbert transform.
Concerning the electric signals that the invention is able to process, one notices that audio signals are a particular case of electric signals generated by a sensor (S) and representative of physical waves that propagate in a physical medium. As an example of such waves, one can mention: acoustic waves, electromagnetic waves, seismic waves, ultrasound waves, sound waves in a medium other than air (water, human or animal body).
Within the framework of the present invention, we shall be particularly interested in signals generated by sensors (S) which are electrical signals designated as “real,” as opposed to signals designated as “complex,” which means couples of real signals.