The present invention relates to an apparatus and a method for decoding an encoded audio signal, an apparatus for encoding, a method for encoding and an audio signal.
In the art, frequency domain coding schemes such as MP3 or AAC are known. These frequency-domain encoders are based on a time-domain/frequency-domain conversion, a subsequent quantization stage, in which the quantization error is controlled using information from a psychoacoustic module, and an encoding stage, in which the quantized spectral coefficients and corresponding side information are entropy-encoded using code tables.
On the other hand there are encoders that are very well suited to speech processing such as the AMR-WB+ as described in 3GPP TS 26.290. Such speech coding schemes perform a Linear Predictive filtering of a time-domain signal. Such a LP filtering is derived from a Linear Prediction analysis of the input time-domain signal. The resulting LP filter coefficients are then coded and transmitted as side information. The process is known as Linear Prediction Coding (LPC). At the output of the filter, the prediction residual signal or prediction error signal which is also known as the excitation signal is encoded using the analysis-by-synthesis stages of the ACELP encoder or, alternatively, is encoded using a transform encoder which uses a Fourier transform with an overlap. The decision between the ACELP coding and the Transform Coded eXcitation coding which is also called TCX coding is done using a closed loop or an open loop algorithm.
Frequency-domain audio coding schemes such as the high efficiency-AAC encoding scheme which combines an AAC coding scheme and a spectral bandwidth replication technique, can also be combined to a joint stereo or a multi-channel coding tool which is known under the term “MPEG surround”. On the other hand, speech encoders such as the AMR-WB+ also have a high frequency enhancement stage and a stereo functionality.
Said spectral band replication (SBR) comprises a technique that gained popularity as an add-on to popular perception audio coded such as MP3 and the advanced audio coding (AAC). SBR comprise a method of bandwidth extension (BWE) in which the low band (base band or core band) of the spectrum is encoded using an existing coding, whereas as the upper band (or high band) is coarsely parameterized using fewer parameters. SBR makes use of a correlation between the low band and the high band in order to predict the high band signal from extracting lower band features.
SBR is, for example, used in HE-AAC or AAC+SBR. In SBR it is possible to dynamically change the crossover frequency (BWE start frequency) as well as the temporal resolution meaning the number of parameter sets (envelopes) per frame. AMR-WB+ implements a time domain bandwidth extension in combination with a switched time/frequency domain core coder, giving good audio quality especially for speech signals. A limiting factor to AMR-WB+ audio quality is the audio bandwidth common to both core codecs and BWE start frequency that is one quarter of the system's internal sampling frequency. While the ACELP speech model is capable to model speech signals quite well over the full bandwidth, the frequency domain audio coder fails to deliver decent quality for some general audio signals. Thus, speech coding schemes show a high quality for speech signals even at low bit rates, but show a poor quality for music signals at low bit rates.
Frequency-domain coding schemes such as HE-AAC are advantageous in that they show a high quality at low bit rates for music signals. Problematic, however, is the quality of speech signals at low bit rates.
Therefore, different classes of audio signal demand different characteristics of bandwidth extension tool.