In teleconferencing, particularly in voice conferencing, the participating audio streams must be mixed at some point prior to playing out (reproducing) the audio streams at a client node (e.g., an end user's PC or a mobile device).
In conventional systems, the digital audio samples are represented by a certain fixed-width (fixed-length) data type, and all processing will tend to be based around this type. For example, it is common practice to represent audio samples with 16-bit signed integers.
However, in conventional systems, problems arise from using this type fixed-width representation when signals are combined. More specifically, the combined signal, which is a sum of the individual signals, may be larger than is possible to represent in the fixed-width type. This is because, during the actual summation of the individual signals, it is typical to temporarily upcast to a sufficiently wide type to avoid integer overflow.
Further, in order to return (or to downcast) back to the original width, the signal is truncated or “clipped” to the available range (usually denoted by [intmin, intmax], intmin and intmax being the minimum and maximum integer representation in the fixed width domain). In audio processing, this clipping results in audible and potentially severe distortion. Thus, the combined signal must be reduced or softly limited in some way to avoid the harsh distortion of clipping.