The Code-Excited Linear Prediction (CELP) model is widely used to encode sound signals, for example speech, at low bit rates.
In CELP coding, the speech signal is sampled and processed in successive blocks of a predetermined number of samples usually called frames, each corresponding typically to 10-30 ms of speech. The frames are in turn divided into smaller blocks called sub-frames.
In CELP, the signal is modelled as an excitation processed through a time-varying synthesis filter 1/A(z). The time-varying synthesis filter may take many forms, but very often a linear recursive all-pole filter is used. The inverse of the time-varying synthesis filter, which is thus a linear all-zero non-recursive filter A(z), is defined as a short-term predictor (STP) since it comprises coefficients calculated in such a manner as to minimize a prediction error between a sample s(n) of the input sound signal and a weighted sum of the previous samples s(n−1), s(n−2), . . . , s(n−m), where m is the order of the filter and n is a discrete time domain index, n=0, . . . , L−1, L being the length of an analysis window. Another denomination frequently used for the STP is Linear Predictor (LP).
If the prediction error from the LP filter is applied as the input of the time-varying synthesis filter with proper initial state, the output of the synthesis filter is the original sound signal, for example speech. At low bit rates, it is not possible to transmit the exact error residual (minimized prediction error from the LP filter). Accordingly, the error residual is encoded to form an approximation referred to as the excitation. In CELP coders, the excitation is encoded as the sum of two contributions, the first contribution taken from a so-called adaptive codebook and the second contribution from a so-called innovative or fixed codebook. The adaptive codebook is essentially a block of samples v(n) from the past excitation signal (delayed by a delay parameter t) and scaled with a proper gain gp. The innovative or fixed codebook is populated with vectors having the task of encoding a prediction residual from the STP and adaptive codebook. The innovative or fixed codebook vector c(n) is also scaled with a proper gain gc. The innovative or fixed codebook can be designed using many structures and constraints. However, in modern speech coding systems, the Algebraic Code-Excited Linear Prediction (ACELP) model is used. An example of an ACELP implementation is described in [3GPP TS 26.190 “Adaptive Multi-Rate-Wideband (AMR-WB) speech codec; Transcoding functions”] and, accordingly, ACELP will only be briefly described in the present disclosure. Also, the full content of this reference is herein incorporated by reference.
Although very efficient to encode speech at low bit rates, ACELP codebooks cannot gain in quality as quickly as other approaches (for example transform coding and vector quantization) when increasing the ACELP codebook size. When measured in dB/bit/sample, the gain in quality at higher bit rates (for example bit rates higher than 16 kbits/s) obtained by using more non-zero pulses per track in an ACELP codebook is not as large as the gain in quality (in dB/bit/sample) at higher bit rates obtained with transform coding and vector quantization. This can be seen when considering that ACELP essentially encodes the sound signal as a sum of delayed and scaled impulse responses of the time-varying synthesis filter. At lower bit rates (for example bit rates lower than 12 kbits/s), the ACELP model captures quickly the essential components of the excitation. But at higher bit rates, higher granularity and, in particular, a better control over how the additional bits are spent across the different frequency components of the signal are useful.