The present invention is related to audio coding and, particularly, to low bit rate audio coding schemes.
In the art, frequency domain coding schemes such as MP3 or AAC are known. These frequency-domain encoders are based on a time-domain/frequency-domain conversion, a subsequent quantization stage, in which the quantization error is controlled using information from a perceptual module, and an encoding stage, in which the quantized spectral coefficients and corresponding side information are entropy-encoded using code tables.
On the other hand there are encoders that are very well suited to speech processing such as the AMR-WB+ as described in 3GPP TS 26.290. Such speech coding schemes perform a Linear Predictive filtering of a time-domain signal. Such a LP filtering is derived from a Linear Prediction analysis of the input time-domain signal. The resulting LP filter coefficients are then quantized/coded and transmitted as side information. The process is known as Linear Prediction Coding (LPC). At the output of the filter, the prediction residual signal or prediction error signal which is also known as the excitation signal is encoded using the analysis-by-synthesis stages of the ACELP encoder or, alternatively, is encoded using a transform encoder, which uses a Fourier transform with an overlap. The decision between the ACELP coding and the Transform Coded eXcitation coding which is also called TCX coding is done using a closed loop or an open loop algorithm.
Frequency-domain audio coding schemes such as the High Efficiency AAC (HE-ACC) encoding scheme, which combines an AAC coding scheme and a spectral band replication (SBR) technique can also be combined with a joint stereo or a multi-channel coding tool which is known under the term “MPEG surround”.
On the other hand, speech encoders such as the AMR-WB+ also have a high frequency extension stage and a stereo functionality.
Frequency-domain coding schemes are advantageous in that they show a high quality at low bitrates for music signals. Problematic, however, is the quality of speech signals at low bitrates.
Speech coding schemes show a high quality for speech signals even at low bitrates, but show a poor quality for other signals at low bitrates.