Embodiments according to the invention are related to an audio signal encoder for providing an encoded representation of an audio content on the basis of an input representation of the audio content.
Embodiments according to the invention are related to an audio signal decoder for providing a decoded representation of an audio content on the basis of an encoded representation of the audio content.
Embodiments according to the invention are related to a method for providing an encoded representation of an audio content on the basis of an input representation of the audio content.
Embodiments according to the invention are related to a method for providing a decoded representation of an audio content on the basis of an encoded representation of the audio content.
Embodiments according to the invention are related to computer programs for performing said methods.
Embodiments according to the invention are related to a new coding scheme for a unified speech and audio coding with low delay.
In the following, the background of the invention will be briefly explained in order to facilitate the understanding of the invention and the advantages thereof.
During the past decade, big effort has been put on creating the possibility to digitally store and distribute audio contents with good bitrate efficiency. One important achievement on this way is the definition of the International Standard ISO/IEC 14496-3. Part 3 of the Standard is related to encoding and decoding of audio contents, and subpart 4 of part 3 is related to general audio coding. ISO/IEC 14496 part 3, subpart 4 defines a concept for encoding and decoding of general audio content. In addition, further improvements have been proposed in order to improve the quality and/or to reduce the necessitated bitrate.
Moreover, audio coders and audio decoders have been developed which are specifically adapted for encoding and decoding speech signals. Such speech-optimized audio coders are described, for example, in the technical specifications “3GPP TS 26.090”, “3GPP TS 26.190” and “3GPP TS 26.290” of the Third Generation Partnership Project.
It has been found that there are a number of applications in which a low encoding and decoding delay is desirable. For example, low delay is desired in real time multimedia applications, because noticeable delays result in an unpleasant user impression in such applications.
However, it has also been found that a good tradeoff between quality and bitrate sometimes necessitates a switching between different coding modes, depending on the audio content. It has been found that variations of the audio content bring along the desire to change between coding modes like, for example, between a transform-coded-excitation-linear-prediction-domain mode and an code-excitation-linear-prediction-domain mode (like, for example, an algebraic-code-excitation-linear-prediction-domain mode), or between a frequency domain mode and a coded-excitation-linear-prediction-domain mode. This is due to the fact that some audio contents (or some portions of a contiguous audio content) can be encoded with a higher coding efficiency in one of the modes, while other audio contents (or other portions of the same contiguous audio content) can be encoded with better coding efficiency in a different of the modes.
In view of this situation, it has been found that it is desirable to switch between different of the modes without necessitating a large bitrate overhead for the switching and also without significantly compromising the audio quality (for example, in the form of a switching “click”). In addition, it has been found that the switching between different of the modes should be compatible with the objective to have a low encoding and decoding delay.
In view of this situation, it is an objective of the invention to create a concept for a multimode audio coding which brings along a good tradeoff between bitrate efficiency, audio quality and delay when switching between different of the coding modes.