In acoustics and more particularly in the field of speech, the noise that is present in an audio signal may be defined, for example, by anything that can be of hindrance in the comprehension of the audio signal by a human being, or else by that which may be of hindrance to the result of a task of recognition or of discrimination between various types of signals (speech/music for example). The presence of noise in an audio signal can be disruptive because the noise can mask the specific features of the payload signal (speech or music for example). In certain cases, the background noise may be an element of the communication that can provide information that is useful to the listeners (the context of mobility, geographic location, sharing of ambiance).
In the field of voice communication, the noise included in a speech signal, termed “background noise”, may include various noises: sounds originating from engines (motor vehicles, motorcycles), aircraft passing overhead, noises of conversation/murmurs—for example in a restaurant or cafe environment—, music and many other audible noises.
Since the arrival of mobile telephony, the possibility of communicating from any location has contributed to increasing the presence of background noise in the speech signals transmitted and consequently has made it necessary to process the background noise in order to preserve an acceptable level of communication quality.
Moreover, in addition to the noises coming from the environment in which the sound pick-up takes place, spurious noises, produced notably during the encoding and transmission of the audio signal over the network (packet loss, for example, in voice over IP), may also interact with the background noise. In this context, it is therefore possible to suppose that the perceived quality of the transmitted speech is dependent on the interaction between the various types of noise making up the background noise.
Thus, document [1]: “Influence of informational content of background noise on speech quality evaluation for VoIP application”, by A. Leman, J. Faure and E. Parizet—an article presented at the “Acoustics'08” conference that was held in Paris from Jun. 29 to Jul. 4, 2008—describes subjective tests which not only show that the sound level of the background noises plays a dominant role in the evaluation of the voice quality in the context of a voice over IP (VoIP) application, but also demonstrate that the type of background noise (environmental noise, line noise, etc.) which is superposed on the voice signal (the payload signal) plays an important role during the evaluation of the voice quality of the communication.
The classification of noise in audio signals has already been the subject of known work. For example, document [2]: “Context awareness using environmental noise classification”, by L. Ma, D. J. Smith and B. P. Milner—an article presented to the “Eurospeech” conference, 2003—describes a noise classification method based on a hidden Markov model (HMM). According to the method described, ten environmental noises (bar, beach, street, office, etc.) can be classified by using MFCC (Mel-Frequency Cepstral Coefficients) coefficients and derivative indicators (some thirty indicators in total), in order to characterize an audio signal. The indicators obtained are then applied to a hidden Markov model (HMM). The indicators used thus make it possible to classify 90% of the noises present in a signal. However, the aforementioned method is extremely costly in terms of processing time given the high number of indicators used.
Document [3]: “Robust speech recognition using PCA-based noise classification” by N. Thaphithakkul et al.—an article presented to the “SPECOM” conference, 2005—describes a method for classifying environmental noise using a principal component analysis (PCA), intended for voice recognition. According to the method described, four types of noises are classified (white noise, pink noise, vehicle noise, confused murmuring (babble)) using characteristic vectors consisting of normalized logarithmic spectra (NLS) which are then projected onto the principal components of a space originating from the learning by PCA. The classification is finally made by support vector machines (SVM). However, the classified noises are of too precise type, with easily identifiable frequency characteristics and, on the other hand, this technique is also quite costly in terms of processing resources.
Document [4]: “Frame-level noise classification in mobile environments” by K. El-Maleh et al.—which appeared in Proc. IEEE Conf. Acoustics, Speech, Signal Proc. (Phoenix, Ariz.), pp. 237-240, March 1999—describes a technique for classifying background noise in the context of mobile telephony; in particular four types of noises are classified: street, confused murmuring (babble), factory, bus. The characteristics used for the classification are the line spectral frequencies (LSF). Various types of classifiers using these characteristics are then compared, in particular a decision tree (DTC: decision tree classifier) and a quadratic Gaussian classifier (QGC). However, the latter technique also uses indicators (LSF) that are costly to compute unless placing oneself in the encoded and non-audio domain.
Therefore, the abovementioned noise classification techniques are complex and require considerable computing time notably because of the type and large number of parameters required to carry out the classification.