I. Field of the Invention
The present invention relates to the coding of speech signals. Specifically, the present invention relates to classifying speech signals and employing one of a plurality of coding modes based on the classification.
II. Description of the Related Art
Many communication systems today transmit voice as a digital signal, particularly long distance and digital radio telephone applications. The performance of these systems depends, in part, on accurately representing the voice signal with a minimum number of bits. Transmitting speech simply by sampling and digitizing requires a data rate on the order of 64 kilobits per second (kbps) to achieve the speech quality of a conventional analog telephone. However, coding techniques are available that significantly reduce the data rate required for satisfactory speech reproduction.
The term xe2x80x9cvocoderxe2x80x9d typically refers to devices that compress voiced speech by extracting parameters based on a model of human speech generation. Vocoders include an encoder and a decoder. The encoder analyzes the incoming speech and extracts the relevant parameters. The decoder synthesizes the speech using the parameters that it receives from the encoder via a transmission channel. The speech signal is often divided into frames of data and block processed by the vocoder.
Vocoders built around linear-prediction-based time domain coding schemes far exceed in number all other types of coders. These techniques extract correlated elements from the speech signal and encode only the uncorrelated elements. The basic linear predictive filter predicts the current sample as a linear combination of past samples. An example of a coding algorithm of this particular class is described in the paper xe2x80x9cA 4.8 kbps Code Excited Linear Predictive Coder,xe2x80x9d by Thomas E. Tremain et al., Proceedings of the Mobile Satellite Conference, 1988.
These coding schemes compress the digitized speech signal into a low bit rate signal by removing all of the natural redundancies (i e., correlated elements) inherent in speech. Speech typically exhibits short term redundancies resulting from the mechanical action of the lips and tongue, and long term redundancies resulting from the vibration of the vocal cords. Linear predictive schemes model these operations as filters, remove the redundancies, and then model the resulting residual signal as white gaussian noise. Linear predictive coders therefore achieve a reduced bit rate by transmitting filter coefficients and quantized noise rather than a full bandwidth speech signal.
However, even these reduced bit rates often exceed the available bandwidth where the speech signal must either propagate a long distance (e.g. ground to satellite) or coexist with many other signals in a crowded channel. A need therefore exists for an improved coding scheme which achieves a lower bit rate than linear predictive schemes.
The present invention is a novel and improved method and apparatus for the variable rate coding of a speech signal. The present invention classifies the input speech signal and selects an appropriate coding mode based on this classification. For each classification, the present invention selects the coding mode that achieves the lowest bit rate with an acceptable quality of speech reproduction. The present invention achieves low average bit rates by only employing high fidelity modes (i.e., high bit rate, broadly applicable to different types of speech) during portions of the speech where this fidelity is required for acceptable output. The present invention switches to lower bit rate modes during portions of speech where these modes produce acceptable output.
An advantage of the present invention is that speech is coded at a low bit rate. Low bit rates translate into higher capacity, greater range, and lower power requirements.
A feature of the present invention is that the input speech signal is classified into active and inactive regions. Active regions are further classified into voiced, unvoiced, and transient regions. The present invention therefore can apply various coding modes to different types of active speech, depending upon the required level of fidelity.
Another feature of the present invention is that coding modes may be utilized according to the strengths and weaknesses of each particular mode. The present invention dynamically switches between these modes as properties of the speech signal vary with time.
A further feature of the present invention is that, where appropriate, regions of speech are modeled as pseudo-random noise, resulting in a significantly lower bit rate. The present invention uses this coding in a dynamic fashion whenever unvoiced speech or background noise is detected.
The features, objects, and advantages of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference numbers indicate identical or functionally similar elements. Additionally, the left-most digit of a reference number identifies the drawing in which the reference number first appears.