A general diagram of a CELP encoder 100 is shown in FIG. 1A. A CELP encoder uses a model of the human vocal tract to reproduce a speech input signal. The parameters for the model are actually extracted from the speech signal being reproduced, and it is these parameters that are sent to a decoder 114, which is illustrated in FIG. 1B. Decoder 114 uses the parameters to reproduce the speech signal. Referring to FIG. 1A, synthesis filter 104 is a linear predictive filter and serves as the vocal tract model for CELP encoder 100. Synthesis filter 114 takes an input excitation signal μ(n) and synthesizes a speech signal s′(n) by modeling the correlations introduced into speech by the vocal tract and applying them to the excitation signal μ(n).
In CELP encoder 100 speech is broken up into frames, usually 20 ms each, and parameters for synthesis filter 104 are determined for each frame. Once the parameters are determined, an excitation signal μ(n) is chosen for that frame. The excitation signal is then synthesized, producing a synthesized speech signal s′(n). The synthesized frame s′(n) is then compared to the actual speech input frame s(n) and a difference or error signal e(n) is generated by subtractor 106. The subtraction function is typically accomplished via an adder or similar functional component as those skilled in the art will be aware. Actually, excitation signal μ(n) is generated from a predetermined set of possible signals by excitation generator 102. In CELP encoder 100, all possible signals in the predetermined set are tried in order to find the one that produces the smallest error signal e(n). Once this particular excitation signal μ(n) is found, the signal and the corresponding filter parameters are sent to decoder 112, which reproduces the synthesized speech signal s′(n). Signal s′(n) is reproduced in decoder 112 using an excitation signal μ(n), as generated by decoder excitation generator 114, and synthesizing it using decoder synthesis filter 116.
By choosing the excitation signal that produces the smallest error signal e(n), a very good approximation of speech input s(n) can be reproduced in decoder 112. The spectrum of error signal e(n), however, will be very flat, as illustrated by curve 204 in FIG. 2. The flatness can create problems in that the signal-to-noise ratio (SNR), with regard to synthesized speech signal s′(n) (curve 202), may become too small for effective reproduction of speech signal s(n). This problem is especially prevalent in the higher frequencies where, as illustrated in FIG. 2, there is typically less energy in the spectrum of s′(n). In order to combat this problem, CELP encoder 100 includes a feedback path that incorporates error weighting filter 108. The function of error weighting filter 108 is to shape the spectrum of error signal e(n) so that the noise spectrum is concentrated in areas of high voice content. In effect, the shape of the noise spectrum associated with the weighted error signal ew(n) tracks the spectrum of the synthesized speech signal s(n), as illustrated in FIG. 2 by curve 206. In this manner, the SNR is improved and the quality of the reproduced speech is increased.
The weighted error signal ew(n) is also used to minimize the error signal by controlling the generation of excitation signal μ(n). In fact, signal ew(n) actually controls the selection of signal μ(n) and the gain associated with signal μ(n). In general, it is desirable that the energy associated with s′(n) be as stable or constant as possible. Energy stability is controlled by the gain associated with μ(n) and requires a less aggressive weighting filter 108. At the same time, however, it is desirable that the excitation spectrum (curve 202) of signal s′(n) be as flat as possible. Maintaining this flatness requires an aggressive weighting filter 108. These two requirements are directly at odds with each other, because the generation of excitation signal μ(n) is controlled by one weighting filter 108. Therefore, a trade-off must be made that results in lower performance with regard to one aspect or the other.