In audio communication, where a speech source is captured at a certain venue through a microphone, the variation in obtained signal level (amplitude) can be significant. The variation may be related to several factors including the distance between the speech source and the microphone, the variation in loudness and pitch of the voice and the impact of the surrounding environment. When the captured audio signal is digitalized, significant variations or fluctuations in signal level can result in signal overload and clipping effects. Such deficiencies may result in that adequate post-processing of the captured audio signal becomes unattainable and, in addition, spurious data overloads can result in an unpleasant listening experience at the audio rendering venue.
A common way to reduce these deficiencies or drawbacks is to employ compression of the captured signal, which reduces the dynamic range so that a more compact amplitude representation of the signal of interest is obtained. A typical compressor uses a pre-defined threshold to select which signal amplitudes that require attention. For the considered case of downward compression, signal levels above the pre-defined threshold are reduced by a pre-set damping factor or ratio.
Dynamic Range Compression (DRC) can be performed in several ways involving different levels of mathematics. The damping factor is usually a fixed value, but its effect is generally smoothed by “fade in” (attack) and “fade out” (release) time intervals, which can be seen as a time variation of the damping. The level of compression can be frequency independent and hence fixed for all frequencies present in the signal, or, it can be dynamically computed for different frequency bands.
Considering the most advanced method of downward DRC with time varying and frequency dependant damping, the computational effort can be significant. In real-time applications, multi-band analysis can be unobtainable if additional speech processing algorithms, such as e.g. Acoustic Echo Cancelling (AEC) or noise removal, are to be performed in conjunction with the compression for full-band signals (24 kHz bandwidth) over short time windows (typically 10 ms), which are common in communications.
Moreover, conventional compression of the amplitude in the time domain introduces artifacts since the signal is modulated in every instance where the amplitude exceeds the pre-defined threshold. Although the audibility of these effects can be limited by careful selection of the attack and release times, the wave characteristics of the sound is still altered. Furthermore, the selection of the user parameters, such as compression ratio, threshold, attack and release times is ambiguous, and thus no trivial task.