Embodiments according to the invention are related to an encoder for providing an audio stream on the basis of a transform-domain representation of an input audio signal. Further embodiments according to the invention are related to a decoder for providing a decoded representation of an audio signal on the basis of an encoded audio stream. Further embodiments according to the invention provide methods for encoding an audio signal and for decoding an audio signal. Further embodiments according to the invention provide an audio stream. Further embodiments according to the invention provide computer programs for encoding an audio signal and for decoding an audio signal.
Generally speaking, embodiments according to the invention are related to a noise filling.
Audio coding concepts often encode an audio signal in the frequency domain. For example, the so-called “advanced audio coding” (AAC) concept encodes the contents of different spectral bins (or frequency bins), taking into consideration a psychoacoustic model. For this purpose, intensity information for different spectral bins is encoded. However, the resolution used for encoding intensities in different spectral bins is adapted in accordance with the psychoacoustic relevances of the different spectral bins. Thus, some spectral bins, which are considered as being of low psychoacoustic relevance, are encoded with a very low intensity resolution, such that some of the spectral bins considered to be of low psychoacoustic relevance, or even a dominant number thereof, are quantized to zero. Quantizing the intensity of a spectral bin to zero brings along the advantage that the quantized zero-value can be encoded in a very bit-saving manner, which helps to keep the bit rate as small as possible. Nevertheless, spectral bins quantized to zero sometimes result in audible artifacts, even if the psychoacoustic model indicates that the spectral bins are of low psychoacoustic relevance.
Therefore, there is a desire to deal with spectral bins quantized to zero, both in an audio encoder and an audio decoder.
Different approaches are known for dealing with spectral bins encoded to zero in transform-domain audio coding systems and also in speech coders.
For example, the MPEG-4 “AAC” (advanced audio coding) uses the concept of perceptual noise substitution (PNS). The perceptional noise substitution fills complete scale factor bands with noise only. Details regarding the MPEG-4 AAC may, for example, be found in the International Standard ISO/IEC 14496-3 (Information Technology—Coding of Audio-Visual Objects—Part 3: Audio). Furthermore, the AMR-WB+ speech coder replaces vector quantization vectors (VQ vectors) quantized to zero with a random noise vector, where each complex spectral value has a constant amplitude, but a random phase. The amplitude is controlled by one noise value transmitted with the bitstream. Details regarding the AMR-WB+ speech coder may, for example, be found in the technical specification entitled “Third Generation Partnership Project; Technical Specification Group Services and System Aspects; Audio Codec Processing Functions; Extended Adaptive Multi-Rate-Wide Band (AMR-WB+) Codec; Transcoding Functions (Release Six)”, which is also known as “3GPP TS 26.290 V6.3.0 (2005-06)—Technical Specification”.
Further, EP 1 395 980 B1 describes an audio coding concept. The publication describes a means by which selected frequency bands of information from an original audio signal, which are audible, but which are perceptionally less relevant, need not be encoded, but may be replaced by a noise filling parameter. Those signal bands having content, which is perceptionally more relevant are, in contrast, fully encoded. Encoding bits are saved in this manner without leaving voids in the frequency spectrum of the received signal. The noise filling parameter is a measure of the RMS signal value within the band in question and is used at the reception end by a decoding algorithm to indicate the amount of noise to inject in the frequency band in question.
Further approaches provide for a non-guided noise insertion in the decoder, taking into account the tonality of the transmitted spectrum.
However, the conventional concepts typically bring along the problem that they either comprise a poor resolution regarding the granularity of the noise filling, which typically degrades the hearing impression, or may use a comparatively large amount of noise filling side information, which entails extra bit rate.
In view of the above, there is the need for an improved concept of noise filling, which provides for an improved trade-off between the achievable hearing impression and the bit rate that may be used.