In audio communication systems which allow for the exchange of audio signals between two or more endpoints (also referred to as terminals), it is desirable to control the level of the exchanged audio signals. In particular, leveling of an input audio signal is desirable in voice conferencing systems, in order to provide listeners at the one or more endpoints with equalized audio signals from each of the participants (i.e. from each of the endpoints). Typically, such voice conferencing systems comprise a system component (referred to as a leveling unit) which is responsible for determining an amount of leveling gain to apply to the input audio signal, based on some characteristics of the audio signal, thereby yielding a leveled audio signal. A challenge with the process of leveling is in ensuring that only segments of the input audio signal containing voice data are used to determine the leveling gain. Automatically differentiating between voice and non-voice data is a difficult problem and yet to be completely solved. Given a short time-frame of audio data, it is possible that some background noises can be erroneously determined as voice, notably when the background noises contain the properties that the classifying algorithm uses to characterize voice data. When the desired voice within the input audio signal is inactive, the leveling unit may erroneously identify and start to level to background noises or voice-like sounds within the input audio signal. As a result, the leveling unit may determine a leveling gain which brings up the background noises or voice-like sounds to a target level, thereby injecting undesirable noise into the communication system. Furthermore, the leveled background noises may lead to additional data being transmitted within the communication system, thereby increasing the required bandwidth of the communication system.
Possible solutions to the above mentioned technical problem are directed at restricting the range of the leveling gain or at reducing the sensitivity of the leveling unit to low level voice. However, these solutions provide a tradeoff between the ability to track and level desired low level voice signals within the input audio signal.
The present document addresses the above mentioned technical problem of leveling an input audio signal to a target level. In particular, the present document describes methods and systems for leveling, which prevent an over emphasis of undesirable background voice or noise segments within the input audio signal, while at the same time ensuring an appropriate leveling of the desired voice segments within the input audio signal.