Most modern speech coders are based on some form of model for generation of the coded speech signal. The parameters and signals of the model are quantized and information describing them is transmitted on the channel. The dominant coder model in cellular telephony applications is the Code Excited Linear Prediction (CELP) technology.
A conventional CELP decoder is depicted in FIG. 1. The coded speech is generated by an excitation signal fed through an all-pole synthesis filter with a typical order of 10. The excitation signal is formed as a sum of two signals ca and cf, which are picked from respective codebooks (one fixed and one adaptive) and subsequently multiplied by suitable gain factors ga and gf. The codebook signals are typically of length 5 ms (a subframe) whereas the synthesis filter is typically updated every 20 ms (a frame). The parameters associated with the CELP model are the synthesis filter coefficients, the codebook entries and the gain factors.
In FIG. 2, a conventional CELP encoder is depicted. A replica of the CELP decoder (FIG. 1) is used to generate candidate coded signals for each subframe. The coded signal is compared to the uncoded (digitized) signal at 21 and a weighted error signal is used to control the encoding process. The synthesis filter is determined using linear prediction (LP). This conventional encoding procedure is referred to as linear prediction analysis-by synthesis (LPAS).
As understood from the description above, LPAS coders employ waveform matching in a weighted speech domain, i.e., the error signal is filtered with a weighting filter. This can be expressed as minimizing the following squared error criterion: EQU D.sub.W =.parallel.S.sub.W -CS.sub.W.parallel..sup.2 =.parallel.W.multidot.S-W.multidot.H.multidot.(ga.multidot.ca+gf.multidot. cf).parallel..sup.2 (Eq. 1)
where S is the vector containing one subframe of uncoded speech samples, S.sub.W represents S multiplied by the weighting filter W, ca and cf are the code vectors from the adaptive and fixed codebooks respectively, W is a matrix performing the weighting filter operation, H is a matrix performing the synthesis filter operation, and CS.sub.W is the coded signal multiplied by the weighting filter W. Conventionally, the encoding operation for minimizing the criterion of Equation 1 is performed according to the following steps:
Step 1. Compute the synthesis filter by linear prediction and quantize the filter coefficients. The weighting filter is computed from the linear prediction filter coefficients. PA1 Step 2. The code vector ca is found by searching the adaptive codebook to minimize D.sub.W of Equation 1 assuming that gf is zero and that ga is equal to the optimal value. Because each code vector ca has conventionally associated therewith an optimal value of ga, the search is done by inserting each code vector ca into Equation 1 along with its associated optimal ga value. PA1 Step 3. The code vector cf is found by searching the fixed codebook to minimize D.sub.W, using the code vector ca and gain ga found in step 2. The fixed gain gf is assumed equal to the optimal value. PA1 Step 4. The gain factors ga and gf are quantized. Note that ga can be quantized after step 2 if scalar quantizers are used.
The waveform matching procedure described above is known to work well, at least for bit rates of say 8 kb/s or more. However, when lowering the bit rate, the ability to do waveform matching of non-periodic, noise-like signals such as unvoiced speech and background noise suffers. For voiced speech segments, the waveform matching criterion still performs well, but the poor waveform matching ability for noise-like signals leads to a coded signal with an often too low level and an annoying varying character (known as swirling).
For noise-like signals, it is well known in the art that it is better to match the spectral character of the signal and have a good signal level (gain) matching. Since the linear prediction synthesis filter provides the spectral character of the signal, an alternative criterion to Equation 1 above can be used for noise-like signals: EQU D.sub.E =(E.sub.S +L -E.sub.CS +L ).sup.2 (Eq. 2)
where E.sub.S is the energy of the uncoded speech signal and E.sub.CS is the energy of the coded signal CS=H.multidot.(ga.multidot.ca+gf.multidot.cf). Equation 2 implies energy matching as opposed to waveform matching in Equation 1. This criterion can also be used in the weighted speech domain by including the weighting filter W. Note that the square root operations are included in Equation 2 only to have a criterion in the same domain as Equation 1; this is not necessary and is not a restriction. There are also other possible energy-matching criteria such as D.sub.E =.vertline.E.sub.S -E.sub.CS.vertline..
The criterion can also be formulated in the residual domain as follows: EQU D.sub.E =(E.sub.r +L -E.sub.x +L ).sup.2 (Eq. 3)
where E.sub.r is the energy of the residual signal r obtained by filtering S through the inverse (H.sup.-1) of the synthesis filter, and E.sub.x is the energy of the excitation signal given by x=ga.multidot.ca+gf.multidot.cf.
The different criteria above have been employed in conventional multi-mode coding where different coding modes (e.g., energy matching) have been used for unvoiced speech and background noise. In these modes, energy matching criteria as in Equations 2 and 3 have been used. A drawback with this approach is the need for mode decision, for example, choosing waveform matching mode (Equation 1) for voiced speech and choosing energy matching mode (Equations 2 or 3) for noise-like signals like unvoiced speech and background noise. The mode decision is sensitive and causes annoying artifacts when wrong. Also, the drastic change of coding strategy between modes can cause unwanted sounds.
It is therefore desirable to provide improved coding of noise-like signals at lowered bit rates without the aforementioned disadvantages of multi-mode coding.
The present invention advantageously combines waveform matching and energy matching criteria to improve the coding of noise-like signals at lowered bit rates without the disadvantages of multi-mode coding.