The CELP model is widely used to encode sound signals, for example speech, at low bit rates. In CELP, the sound signal is modelled as an excitation processed through a time-varying synthesis filter. Although the time-varying synthesis filter may take many forms, a linear recursive all-pole filter is often used. The inverse of this time-varying synthesis filter, which is thus a linear all-zero non-recursive filter, is called “Short-Term Prediction” (STP) filter since it comprises coefficients calculated in such a manner as to minimize a prediction error between a sample s[i] of the sound signal and a weighted sum of previous samples s[i-1], s[i-2], . . . , s[i-m] of the sound signal, where m is the order of the filter. Another denomination frequently used for the STP filter is “Linear Prediction” (LP) filter.
If a residual of the prediction error from the LP filter is applied as the input of the time-varying synthesis filter with proper initial state, the output of the synthesis filter is the original sound signal, such as speech. At low bit rates, it is not possible to transmit an exact prediction error residual. Accordingly, the prediction error residual is encoded to form an approximation referred to as the excitation. In traditional CELP coders, the excitation is encoded as the sum of two contributions; the first contribution is produced from a so-called adaptive codebook and the second contribution is produced from a so-called innovation or fixed codebook. The adaptive codebook is essentially a block of samples from the past excitation with proper gain. The innovation or fixed codebook is populated with codevectors having the task of encoding the prediction error residual from the LP filter and adaptive codebook.
The innovation or fixed codebook can be designed using many structures and constraints. However, in modern speech coding systems, the Algebraic Code-Excited Linear Prediction (ACELP) model is often used. ACELP is well known to those of ordinary skill in the art of speech coding and, accordingly, will not be described in detail in the present specification. In summary, the codevectors in an ACELP innovation codebook each contain few non-zero pulses which can be seen as belonging to different interleaved tracks of pulse positions. The number of tracks and non-zero pulses per track usually depend on the bit rate of the ACELP innovation codebook. The task of an ACELP coder is to search the pulse positions and signs to minimize an error criterion. In ACELP, this search is performed using an analysis-by-synthesis procedure in which the error criterion is calculated not in the excitation domain but rather in the synthesis domain, i.e. after a given ACELP codevector has been filtered through the time-varying synthesis filter. Efficient ACELP search algorithms have been proposed to allow fast search even with very large ACELP innovation codebooks.
FIG. 1 is a schematic block diagram showing the main components and the principle of operation of an ACELP decoder 100. Referring to FIG. 1 the ACELP decoder 100 receives decoded pitch parameters 101 and decoded ACELP parameters 102. The decoded pitch parameters 101 include a pitch delay applied to the adaptive codebook 103 to produce an adaptive codevector. As indicated hereinabove, the adaptive codebook 103 is essentially a block of samples from the past excitation and the adaptive codevector is found by interpolating the past excitation at the pitch delay using an equation including the past excitation. The decoded pitch parameters also include a pitch gain applied to the adaptive codevector from the adaptive codebook 103 using an amplifier 112 to form the first, adaptive codebook contribution 113. The adaptive codebook 103 and the amplifier 112 form an adaptive codebook structure. The decoded ACELP parameters comprise ACELP innovation-codebook parameters including a codebook index applied to the innovation codebook 104 to output a corresponding innovation codevector. The decoded ACELP parameters also comprise an innovation codebook gain applied to the innovation codevector from the codebook 104 by means of an amplifier 105 to form the second, innovation codebook contribution 114. The innovation codebook 104 and the amplifier 105 form an innovation codebook structure 110. The total excitation 115 is then formed through summation in an adder 106 of the first, adaptive codebook contribution 113 and the second, innovation codebook contribution 114. The total excitation 115 is then processed through a LP synthesis filter 107 to produce a synthesis 111 of the original sound signal, for example speech. The memory of the adaptive codebook 103 is updated for a next frame using the excitation of the current frame (arrow 108); the adaptive codebook 103 then shifts to processing the decoded pitch parameters of the next subframe (arrow 109). Several modifications can be made to the basic CELP model previously described. For example the excitation signal at the input of the synthesis filer can be processed to enhance the signal. Also postprocessing can be applied at the output of the synthesis filter. Further, the gains of the adaptive and algebraic codebooks can be jointly quantized.
Although very efficient to encode speech at low bit rates, ACELP codebooks may not gain in quality as quickly as other approaches such as transform coding and vector quantization when increasing the ACELP codebook size. When measured in dB/bit/sample, the gain at higher bit rates (e.g. bit rates higher than 16 kbit/s) obtained by using more non-zero pulses per track in an ACELP innovation codebook is not as large as the gain (in dB/bit/sample) of transform coding and vector quantization. This can be seen when considering that ACELP essentially encodes the sound signal as a sum of delayed and scaled impulse responses of the synthesis filter. At lower bit rates (e.g. bit rates lower than 12 kbit/s), the ACELP technique captures quickly the essential components of the excitation. But at higher bit rates, higher granularity and, in particular, a better control over how the additional bits are spent across the different frequency components of the signal are useful.
Therefore, there is a need for an innovation codebook structure better adapted for use at higher bit rates.