It is favorable to mix different coding modes in order to code general audio signals representing a mix of audio signals of different types such as speech, music, or the like. The individual coding modes may be adapted for particular audio types, and thus, a multi-mode audio encoder may take advantage of changing the coding mode over time corresponding to the change of the audio content type. In other words, the multi-mode audio encoder may decide, for example, to encode portions of the audio signal having speech content using a coding mode especially dedicated for coding speech, and to use another coding mode(s) in order to encode different portions of the audio content representing non-speech content such as music. Linear prediction coding modes tend to be more suitable for coding speech contents, whereas frequency-domain coding modes tend to outperform linear prediction coding modes as far as the coding of music is concerned.
However, using different coding modes makes it difficult to globally adjust the gain within an encoded bitstream or, to be more precise, the gain of the decoded representation of the audio content of an encoded bitstream without having to actually decode the encoded bitstream and then re-encoding the gain-adjusted decoded representation again, which detour would inevitably decrease the quality of the gain-adjusted bitstream due to requantizations performed in re-encoding the decoded and gain-adjusted representation.
For example, in AAC, an adjustment of the output level can easily be achieved on bitstream level by changing the value of the 8-bit field “global gain”. This bitstream element can simply be passed and edited, without the need for full decoding and re-encoding. Thus, this process does not introduce any quality degradation and can be undone losslessly. There are applications which actually make use of this option. For example, there is a free software called “AAC gain” [AAC gain] which applies exactly the approach just-described. This software is a derivative of the free software “MP3 gain”, which applies the same technique for MPEG1/2 layer 3.
In the just-emerging USAC codec, the FD coding mode has inherited the 8-bit global gain from AAC. Thus, if USAC runs in FD-only mode, such as for higher bitrates, the functionality of level adjustment would be fully preserved, when compared to AAC. However, as soon as mode transitions are admitted, this possibility is no longer present. In the TCX mode, for example, there is also a bitstream element with the same functionality also called “global gain”, which has a length of merely 7-bits. In other words, the number of bits for encoding the individual gain elements of the individual modes is primarily adapted to the respective coding mode in order to achieve a best tradeoff between spending less bits for gain control on the one hand, and on the other hand avoiding a degradation of the quality due to a too coarse quantization of the gain adjustability. Obviously, this tradeoff resulted in a different number of bits when comparing the TCX and the FD mode. In the ACELP mode of the currently emerging USAC standard, the level can be controlled via a bitstream element “mean energy”, which has a length of 2-bits. Again, obviously the tradeoff between too much bits for mean energy and too less bits for mean energy resulted in a different number of bits than compared to the other coding modes, namely TCX and FD coding mode.
Thus, until now, globally adjusting the gain of a decoded representation of an encoded bitstream encoded by multi-mode coding, is cumbersome and tends to decrease the quality. Either, decoding followed by gain adjustment and re-encoding is to be performed, or the adjustment of the loudness level has to be performed heuristically merely by adapting the respective bitstream elements of the different modes influencing the gain of the respective different coding mode portions of the bitstream. However, the latter possibility is very likely to introduce artifacts into the gain-adjusted decoded representation.