The present invention relates to speech processing in general, and more particularly to a speech encoding method and system based on code excited linear prediction (CELP).
FIG. 6 shows the conventional model for human speech production. The vocal cords are modeled by an impulse generator that produces an impulse train 602. A noise generator produces white noise 604 which models the unvoiced excitation component of speech. In practice, all sounds have a mixed excitation, which means that the excitation consists of voiced and unvoiced portions. This mixing is represented by a switch 608 for selecting between voiced and unvoiced excitation. An LPC filter 610 models the vocal tract through which the speech is formed as the air is forced through it by the vocal chords. The LPC filter is a recursive digital filter; its resonance behavior (frequency response) being defined by a set of filter coefficients. The computation of the coefficients is based on a mathematical optimization procedure referred to as linear prediction coding, hence “LPC filter.”
Code-excited linear prediction (CELP) is a speech coding technique commonly used for producing high quality synthesized speech at low bit rates, i.e., 4.8 to 9.6 kilobits-per-second (kbps). This class of speech coding, also known as vector-excited linear prediction, utilizes a codebook of excitation vectors to excite the LPC filter 610 in a feedback loop to determine the best coefficients for modeling a sample of speech. A difficulty of the CELP speech coding technique lies in the extremely high computationally intense activity of performing an exhaustive search of all the excitation code vectors in the codebook. The codebook search consumes roughly 60% of the total processing time of a speech codec (compression encoder-decoder).
The ability to reduce the computation complexity without sacrificing voice quality is important in the digital communications environment. Thus, a need exists for improved CELP processing.