Audio signals, like speech, are encoded for example for enabling an efficient transmission or storage of the audio signals.
Speech encoders and decoders (codecs) are usually optimized for speech signals, and quite often, they operate with a fixed bit rate.
An audio codec can also be configured to operate with varying bit rates, though. At the lowest bit rates, such an audio codec may work with speech signals as well as a pure speech codec at similar rates. At the highest bit rates, the performance may be good with any signal, including music and background noises, which may be considered as a part of the audio signal instead of just noise.
A further audio coding option is an embedded variable rate speech coding, which is also referred to as a layered coding. Embedded variable rate speech coding denotes a speech coding, in which a bit stream is produced, which comprises primary coded data generated by a core encoder and additional enhancement data, which refines the primary coded data generated by the core encoder. A subset or subsets of the bit stream can then be decoded with good quality. ITU-T standardization aims at a wideband codec of 50 to 7000 Hz with bit rates from 8 to 32 kbps. The codec core will work with 8 kbps and additional layers with quite small granularity will increase the observed speech and audio quality. Minimum target is to have at least five bit rates of 8, 12, 16, 24 and 32 kbps available from the same embedded bit stream.
When encoding audio signals, noise suppression may be used in some cases as a processing step preceding the actual encoding in order to improve the sound quality. Especially lower bit rates may benefit from noise suppression, as it may allow obtaining reasonably good output quality in a noisy environment.
The low bit rate performance of a codec operating without noise suppression suffers, because the codec tries to reproduce the whole signal, which includes the noise component. As a result, there are not enough bits to preserve the waveform and key speech characteristics. This problem decreases with an increasing bit rate.
Higher bit rates may thus result in a high audio quality without any pre-processing. In the case of music signals, noise suppression may even add additional distortions to the signal. In order to achieve a high quality coding with variable bit rates, it is thus possible to use more noise suppression in low bit rate speech encoding, but no noise suppression in higher bit rate audio/speech encoding.
Also with embedded variable bit rate coding, the lower bit rates, in this case mainly 8 and 12 kbps, would benefit from noise suppression, while higher bit rates would result in the highest speech and audio quality without any pre-processing. In this case, it would be possible to employ an adaptive noise suppression approach. That is, a first amount of noise suppression could be applied to an audio signal and the resulting signal could be encoded with a core encoder. In addition, a second amount of noise suppression or no noise suppression could be applied to the same audio signal, and the resulting signal could be used for generating enhancement data.
In addition to different bit rates, an audio coder may also select between different coding modes for encoding an audio signal. A first coding mode may be optimized for instance for speech, a second for music and a third for mixed signals, etc. A respective coding mode may be selected for example based on determined parameters of a signal that is to be encoded.