Traditionally mobile networks are designed to handle speech signals at low bitrates. This has been realised by using designated speech codecs which show good performance for speech signals at low bit rates, but has poor performance for music and mixed content. There is an increasing demand that the networks should also handle these signals, for e.g. music-on-hold and ringback tones. Mobile internet applications further drive the need for low bitrate audio coding for streaming applications. Audio codecs normally operate using a higher bitrate than the speech codecs. When constraining the bit budget for the audio codec, certain spectral regions of the signal may be coded with a low number of bits, and the desired target quality of the reconstructed signal can therefore not be guaranteed. The spectral regions refer to frequency domain regions, e.g., certain subbands of the frequency transformed signal block. For simplicity “spectral regions” will be used throughout the specification with the meaning of “part of short-time signal spectra”.
Moreover, at low- and moderate bitrates there will be spectral regions with no bits assigned. Such spectral regions have to be reconstructed at the decoder, by reusing information from the available coded spectral regions (e.g., noise-fill or bandwidth extension). In all these cases some attenuation of energy of low accuracy reconstructed regions is desirable to avoid loud signal distortions.
The signal regions coded with either sufficient number of bits or with no bits assigned will be reconstructed with low accuracy and accordingly it is desired to attenuate these spectral regions. Here, the insufficient number of bits is defined as a number of bits which are too low to be able to represent the spectral region with perceptually plausible quality. Note that this number will be dependent on the sensitivity of the audio perception for that region as well as the complexity of the signal region at hand.
However, attenuation of low-accuracy coded spectral regions is not a trivial problem. On one hand, strong attenuation is desired to mask unwanted distortion. On the other hand, such attenuation might be perceived by listeners as loudness loss in the reconstructed signal, change of frequency characteristics, or change in signal dynamics e.g., over time coding algorithm can select different signal regions to noise-fill. For these reasons conventional audio coding systems apply very conservative, i.e. limited, attenuation, which achieves on average certain balance between different types of the above listed distortions.