The present invention is related to audio coding and, particularly, to low bit rate audio coding schemes.
In the art, frequency domain coding schemes such as MP3 or AAC are known. These frequency-domain encoders are based on a time-domain/frequency-domain conversion, a subsequent quantization stage, in which the quantization error is controlled using information from a psychoacoustic module, and an encoding stage, in which the quantized spectral coefficients and corresponding side information are entropy-encoded using code tables.
On the other hand there are encoders that are very well suited to speech processing such as the AMR-WB+ as described in 3GPP TS 26.290. Such speech coding schemes perform a Linear Predictive filtering of a time-domain signal. Such a LP filtering is derived from a Linear Prediction analysis of the input time-domain signal. The resulting LP filter coefficients are then quantized/coded and transmitted as side information. The process is known as Linear Prediction Coding (LPC). At the output of the filter, the prediction residual signal or prediction error signal which is also known as the excitation signal is encoded using the analysis-by-synthesis stages of the ACELP encoder or, alternatively, is encoded using a transform encoder, which uses a Fourier transform with an overlap. The decision between the ACELP coding and the Transform Coded eXcitation coding which is also called TCX coding is done using a closed loop or an open loop algorithm.
Frequency-domain audio coding schemes such as the high efficiency-AAC encoding scheme, which combines an AAC coding scheme and a spectral band replication technique can also be combined with a joint stereo or a multi-channel coding tool which is known under the term “MPEG surround”.
On the other hand, speech encoders such as the AMR-WB+ also have a high frequency enhancement stage and a stereo functionality.
Frequency-domain coding schemes are advantageous in that they show a high quality at low bitrates for music signals. Problematic, however, is the quality of speech signals at low bitrates.
Speech coding schemes show a high quality for speech signals even at low bitrates, but show a poor quality for music signals at low bitrates.
Frequency-domain coding schemes often make use of the so-called MDCT (MDCT=modified discrete Cosine transform). The MDCT has been initially described in J. Princen, A. Bradley, “Analysis/Synthesis Filter Bank Design Based on Time Domain Aliasing Cancellation”, IEEE Trans. ASSP, ASSP-34(5):1153-1161, 1986. The MDCT or MDCT filter bank is widely used in modern and efficient audio coders. This kind of signal processing provides the following advantages:
Smooth cross-fade between processing blocks: Even if the signal in each processing block is altered differently (e.g. due to quantization of spectral coefficients), no blocking artifacts due to abrupt transitions from block to block occur because of the windowed overlap/add operation.
Critical sampling: The number of spectral values at the output of the filterbank is equal to the number of time domain input values at its input and additional overhead values have to be transmitted.
The MDCT filterbank provides a high frequency selectivity and coding gain.
Those great properties are achieved by utilizing the technique of time domain aliasing cancellation. The time domain aliasing cancellation is done at the synthesis by overlap-adding two adjacent windowed signals. If no quantization is applied between the analysis and the synthesis stages of the MDCT, a perfect reconstruction of the original signal is obtained. However, the MDCT is used for coding schemes, which are specifically adapted for music signals. Such frequency-domain coding schemes have, as stated before, reduced quality at low bit rates or speech signals, while specifically adapted speech coders have a higher quality at comparable bit rates or even have significantly lower bit rates for the same quality compared to frequency-domain coding schemes.
Speech coding techniques such as the so-called AMR-WB+ codec as defined in “Extended Adaptive Multi-Rate-Wideband (AMR-WB+) codec”, 3GPP TS 26.290 V6.3.0, 2005-06, Technical Specification, do not apply the MDCT and, therefore, can not take any advantage from the excellent properties of the MDCT which, specifically, rely in a critically sampled processing on the one hand and a crossover from one block to the other on the other hand. Therefore, the crossover from one block to the other obtained by the MDCT without any penalty with respect to bit rate and, therefore, the critical sampling property of MDCT has not yet been obtained in speech coders.
When one would combine speech coders and audio coders within a single hybrid coding scheme, there is still the problem of how to obtain a switch from one coding mode to the other coding mode at a low bit rate and a high quality.