Telephones can be used in many different environments. There is always some background noise around the speaker (far end) as well as around the listener (near end). The type and the level of the background noise can vary from stationary office and car noise to more non-stationary street and cafeteria noise. Many speech processing algorithms try to emphasize the actual speech signal and on the other hand reduce the unwanted masking effect of background noise, in order to improve the perceived audio quality and intelligibility. For these speech enhancement algorithms it is useful to know what kind of noise is present at either end of the transmission link because different noise situations require different performance from the algorithms. It is difficult to classify noises exactly but usually it is enough to classify noise according to its level and degree of mobility.
Telephones are often used in noisy environments and there is always some background noise summed to the speech signal. Many of the speech enhancement algorithms try to improve the quality and intelligibility of the transmitted speech signal by amplifying the actual speech and attenuating the background noise. For detecting the time slots of the signal that really contain speech, algorithms called voice activity detection (VAD) have been developed. These voice activity detection algorithms often interpret speech-like noise, hum of voices, as speech as well, which leads to undesired situations where background noise is amplified. To prevent these situations, a babble noise detection procedure, which determines if the speech detected by VAD is actual speech or just background babble, is needed.
In addition to algorithms using VAD information, some other speech enhancement algorithms, such as artificial bandwidth expansion (ABE), benefit from the background noise classification information. This information about the background noise enables an optimal performance of the algorithm in different noise situations. Babble noise situations often contain other non-stationary noise as well, like for example tinkle of dishes in a cafeteria or rustling of papers. Depending on the case, these sounds can also be included in the concept of babble noise and in that kind of situations it would be desired that the babble noise detector would detect these sounds as well.
In “Noise Suppression with Synthesis Windowing and Pseudo Noise Injection,” A. Sugiyama, T. P. Hua, M. Kato, M. Serizawa, IEEE Proceedings of Acoustics, Speech, and Signal Processing, Volume: 1, 13-17 May 2002, babble noise was detected using zero-crossing information. The noise was considered babble noise if the average number of zero-crossings of a time domain signal exceeded a certain threshold.
Thus, there is a need for an improved technique for detecting babble noise. Further, there is a need to distinguish between speech and background noise. Even further, there is a need to combine results from separate detection algorithms for babble noise detection.