1. Field of the Invention
The present invention relates to the coding of speech signals. Specifically, the present invention relates to classifying speech signals and employing one of a plurality of coding modes based on the classification.
2. Description of the Related Art
Many communication systems today transmit voice as a digital signal, particularly long distance and digital radio telephone applications. The performance of these systems depends, in part, on accurately representing the voice signal with a minimum number of bits. Transmitting speech simply by sampling and digitizing requires a data rate on the order of 64 kilobits per second (kbps) to achieve the speech quality of a conventional analog telephone. However, coding techniques are available that significantly reduce the data rate required for satisfactory speech reproduction.
The term “vocoder” typically refers to devices that compress voiced speech by extracting parameters based on a model of human speech generation. Vocoders include an encoder and a decoder. The encoder analyzes the incoming speech and extracts the relevant parameters. The decoder synthesizes the speech using the parameters that it receives from the encoder via a transmission channel. The speech signal is often divided into frames of data and block processed by the vocoder.
Vocoders built around linear-prediction-based time domain coding schemes far exceed in number all other types of coders. These techniques extract correlated elements from the speech signal and encode only the uncorrelated elements. The basic linear predictive filter predicts the current sample as a linear combination of past samples. An example of a coding algorithm of this particular class is described in the paper “A 4.8 kbps Code Excited Linear Predictive Coder,” by Thomas E. Tremain et al., Proceedings of the Mobile Satellite Conference, 1988.
These coding schemes compress the digitized speech signal into a low bit rate signal by removing all of the natural redundancies (i. e., correlated elements) inherent in speech. Speech typically exhibits short term redundancies resulting from the mechanical action of the lips and tongue, and long term redundancies resulting from the vibration of the vocal cords. Linear predictive schemes model these operations as filters, remove the redundancies, and then model the resulting residual signal as white gaussian noise. Linear predictive coders therefore achieve a reduced bit rate by transmitting filter coefficients and quantized noise rather than a full bandwidth speech signal.
However, even these reduced bit rates often exceed the available bandwidth where the speech signal must either propagate a long distance (e.g., ground to satellite) or coexist with many other signals in a crowded channel. A need therefore exists for an improved coding scheme which achieves a lower bit rate than linear predictive schemes.