Embodiments according to the present invention are related to a multi-mode audio signal decoder for providing a decoded representation of an audio content on the basis of an encoded representation of the audio content.
Further embodiments according to the invention are related to a multi-mode audio signal encoder for providing an encoded representation of an audio content on the basis of an input representation of the audio content.
Further embodiments according to the invention are related to a method for providing a decoded representation of an audio content on the basis of an encoded representation of the audio content.
Further embodiments according to the invention are related to a method for providing an encoded representation of an audio content on the basis of an input representation of the audio content.
Further embodiments according to the invention are related to computer programs implementing said methods.
In the following, some background of the invention will be explained in order to facilitate the understanding of the invention and the advantages thereof.
During the past decade, big effort has been put on creating the possibility to digitally store and distribute audio contents. One important achievement on this way is the definition of the international standard ISO/IEC 14496-3. Part 3 of this standard is related to an encoding and decoding of audio contents, and sub-part 4 of part 3 is related to general audio coding. ISO/IEC 14496 part 3, sub-part 4 defines a concept for encoding and decoding of general audio content. In addition, further improvements have been proposed in order to improve the quality and/or reduce the needed bit rate.
Moreover, it has been found that the performance of frequency-domain based audio coders is not optimal for audio contents comprising speech. Recently, a unified speech-and-audio codec has been proposed which efficiently combines techniques from both worlds, namely speech coding and audio coding (see, for example, Reference [1].)
In such an audio coder, some audio frames are encoded in the frequency domain and some audio frames are encoded in the linear-prediction-domain.
However, it has been found that it is difficult to transition between frames encoded in different domains without sacrificing a significant amount of bit rate.
In view of this situation, there is a desire to create a concept for encoding and decoding an audio content comprising both speech and general audio, which allows for an efficient realization of transitions between portions encoded using different modes.