In the field of voice communications, the background noises included in a speech signal can include various types of noise: sounds coming from engines (automobiles, motorcycles), from aircraft passing overhead, noise from conversation/background chat—for example, in a restaurant or cafe environment—, music, and many other audible noises. In some cases, the background noises may be an additional element of the communication able to provide information useful for the listeners (mobility context, geographic location, sharing of atmosphere).
Since the advent of mobile telephones, the possibility of communicating from any given location has contributed to increasing the presence of background noises in the speech signals transmitted, and has consequently made necessary the processing of the background noise, in order to preserve an acceptable level of communication quality. Furthermore, aside from the noises coming from the environment where the sound capture takes place, electronic noise, notably produced during the coding and the transmission of the audio signal over the network (loss of packets for example, in voice-over-IP), may also interact with the background noise.
In this context, it may therefore be assumed that the perceived quality of the transmitted speech is dependent on the interaction between the various types of noise composing the background noise. Thus, the document: “Influence of informational content of background noise on speech quality evaluation for VoIP application” (hereafter denoted as “Document [1]”), by A. Leman, J. Faure and E. Parizet—an article presented at the conference “Acoustics '08” which was held in Paris from Jun. 29 to Jul. 4, 2008—describes subjective tests which not only show that the sound level of the background noises plays a dominant role in the evaluation of the voice quality in the framework of a voice-over-IP (VoIP) application, but also demonstrates that the type of background noises (environmental noise, line noise, etc.) which is superimposed onto the voice signal (the useful signal) plays an important role during the evaluation of the voice quality of the communication.
FIG. 1, appended to the present description, comes from the aforementioned Document [1] (see section 3.5, FIG. 2 of this document) and represents the opinion means (MOS LQSN), with the associated confidence interval, calculated from scores given by tester listeners to audio messages containing six different types of background noise, according to the ACR (Absolute Category Rating) method. The various types of noise are as follows: pink noise, stationary speech noise (SSN), electrical noise, city noise, restaurant noise, television or voice noise, each noise being considered at three different levels of perceived loudness.
The horizontal line situated above the other curves represents the score corresponding to an audio signal that contains no background noise. The scores given, “MOS LQSN”—for “Mean Opinion Score of Listening Quality obtained with Subjective method for Narrow band signals”—are in accordance with the recommendations P. 800 and P. 800.1 of the ITU-T, having respectively the titles: “Methods for subjective determination of transmission quality” and “Mean Opinion Score (MOS) terminology”. As can be seen from FIG. 1, the scores given for the same useful signal (in other words the speech signal contained in the audio signal tested) vary not only according to the type of background noises contained in the audio signal, but also according to the perceived sound level (loudness) of a background noise in question.
However, the type of the background noise present in an audio signal being considered is not currently taken into account in the known methods of objective evaluation of the voice quality of a speech signal, whether this be for example the PESQ model (cf. Rec. ITU-T, P.862), the E-model (described for example in the Rec. ITU-T, G.107 “The E-model, a computational model for use in transmission planning”, 2003), or else non-intrusive methods such as that described in the document “P.563-The ITU-T Standard for Single-Ended Speech Quality Assessment”, by L. Malfait, J. Berger, and M. Kastner, IEEE Transaction on Audio, Speech, and Language Processing, vol. 14(6), pp. 1924-1934, 2006.