I. Field of the Invention
The present invention relates to speech processing. More particularly, the present invention relates to a novel and improved method and apparatus for performing linear predictive speech coding using burst excitation vectors.
II. Description of the Related Art
Transmission of voice by digital techniques has become widespread, particularly in long distance and digital radio telephone applications. This in turn has created interest in determining methods which minimize the amount of information sent over the transmission channel while maintaining high quality in the reconstructed speech. If speech is transmitted by simply sampling and digitizing, a data rate on the order of 64 kilobits per second (kbps) is required to achieve a speech quality of conventional analog telephone. However, through the use of speech analysis, followed by the appropriate coding, transmission, and resynthesis at the receiver, a significant reduction in the data rate can be achieved.
Devices which employ techniques to compress voiced speech by extracting parameters that relate to a model of human speech generation are typically called vocoders. Such devices are composed of an encoder, which analyzes the incoming speech to extract the relevant parameters, and a decoder, which resynthesizes the speech using the parameters which it receives over the transmission channel. The model is constantly changes to accurately model the time varying speech signal. Thus the speech is divided into blocks of time, or analysis frames, during which the parameters are calculated. The parameters are then updated for each new frame.
Of the various classes of speech coders, the Code Excited Linear Predictive Coding (CELP), Stochastic Coding, or Vector Excited Speech Coding coders are of one class. An example of a coding algorithm of this particular class is described in the paper "A 4.8 kbps Code Excited Linear Predictive Coder" by Thomas E. Tremain et al., Proceedings of the Mobile Satellite Confers, 1988. Similarly, examples of other vocoders of this type are detailed in patent application Ser. No. 08/004,484, filed Jan. 14, 1993, now U.S. Pat. No. 5,414,796 entitled "Variable Rate Vocoder" and assigned to the assignee of the present invention, and U.S. Pat. No. 4,797,925, entitled "Method For Coding Speech At Low Bit Rates". The material in the aforementioned patent application and the aforementioned U.S. patent is incorporated by reference herein.
The function of the vocoder is to compress the digitized speech signal into a low bit rate signal by removing all of the natural redundancies inherent in speech. Speech typically has short term redundancies due primarily to the filtering operation of the vocal tract, and long term redundancies due to the excitation of the vocal tract by the vocal cords. In a CELP coder, these operations are modeled by two filters, a short term formant (LPC) filter and a long term pitch filter. Once these redundancies are removed, the resulting residual signal can be modeled as white Gaussian noise, which also must be encoded.
The process of determining the coding parameters for a given frame of speech is as follows. First, the parameters of the LPC filter are determined by finding the filter coefficients which remove the short term redundancy, due to the vocal tract filtering, in the speech. Second, the parameters of the pitch filter are determined by finding the filter coefficients which remove the long term redundancy, due to the vocal cords, in the speech. Finally, an excitation signal, which is input to the pitch and LPC filters at the decoder, is chosen by driving the pitch and LPC filters with a number of random excitation waveforms in a codebook, and selecting the particular excitation waveform which causes the output of the two filters to be the closest approximation to the original speech. Thus the transmitted parameters relate to three items (1) the LPC filter, (2) the pitch filter, and (3) the codebook excitation.
One shortcoming of CELP coders is the use of random excitation vectors. The use of the random excitation vectors fails to take into account the burst like nature of the ideal excitation waveform, which remains after the short-term and long-term redundancies have been removed from the speech signal. Unstructured random vectors are not particularly well suited for encoding the burst like residual excitation signal, and result in an inefficient method for coding the residual excitation signal. Thus, there is a need for an improved method for coding the target signals which incorporates the burst like nature of the residual excitation signal, resulting in higher quality speech at tower encoded data rates.