Embodiments of the present invention refer to an encoder for encoding an audio signal to obtain a data stream and to a decoder for decoding a data stream to obtain an audio signal. Further embodiments refer to the corresponding method for encoding an audio signal and for decoding a data stream. A further embodiment refers to a computer program performing the steps of the methods for encoding and/or decoding.
The audio signal to be encoded may, for example, be a speech signal; i.e. the encoder corresponds to a speech encoder and the decoder corresponds to a speech decoder. The most frequently used paradigm in speech coding is algebraic code excited linear prediction (ACELP) which is used in standards such as the AMR-family, G.718 and MPEG USAC. It is based on modeling speech using a source model, consisting of a linear predictor (LP) to model the spectral envelope, a long time predictor (LTP) to model the fundamental frequency and an algebraic codebook for the residual. The codebook parameters are optimized in a perceptually weighted synthesis domain. The perceptual model is based on the filter, whereby the mapping from the residual to the weighted output is described by a combination of linear predictor and the weighted filter.
The largest portion of the computational complexity in ACELP codecs is spent on choosing the algebraic codebook entry, which is on quantization of the residual. The mapping from the residual domain to the weighted synthesis domain is essentially a multiplication by a matrix of size N×N, wherein N is the vector length. Due to this mapping, in terms of weighted output SNR (signal to noise ratio), residual samples are correlated and cannot be quantized independently. It follows that every potential codebook vector has to be evaluated explicitly in weighted synthesis domain to determine the best entry. This approach is known as the analysis-by-synthesis algorithm. Optimal performance is possible only with a brute-force search of the codebook. The codebook size depends on the bit-rate but given a bit-rate of B, there are 28 entries to evaluate for a total complexity of O (26 N2), which clearly unrealistic when B is larger or equal to 11. In practice codecs therefore employ non-optimal quantizations that balance between complexity and quality. Several of these iterative algorithms for finding the best quantization which limit complexity at the cost of accuracy have been presented. To overcome this limitation, a new approach is needed.