I. Field of the Invention
The present invention relates to the coding of speech signals. Specifically, the present invention relates to coding quasi-periodic speech signals by quantizing only a prototypical portion of the signal.
II. Description of the Related Art
Many communication systems today transmit voice as a digital signal, particularly long distance and digital radio telephone applications. The performance of these systems depends, in part, on accurately representing the voice signal with a minimum number of bits. Transmitting speech simply by sampling and digitizing requires a data rate on the order of 64 kilobits per second (kbps) to achieve the speech quality of a conventional analog telephone. However, coding techniques are available that significantly reduce the data rate required for satisfactory speech reproduction.
The term xe2x80x9cvocoderxe2x80x9d typically refers to devices that compress voiced speech by extracting parameters based on a model of human speech generation. Vocoders include an encoder and a decoder. The encoder analyzes the incoming speech and extracts the relevant parameters. The decoder synthesizes the speech using the parameters that it receives from the encoder via a transmission channel. The speech signal is often divided into frames of data and block processed by the vocoder.
Vocoders built around linear-prediction-based time domain coding schemes far exceed in number all other types of coders. These techniques extract correlated elements from the speech signal and encode only the uncorrelated elements. The basic linear predictive filter predicts the current sample as a linear combination of past samples. An example of a coding algorithm of this particular class is described in the paper xe2x80x9cA 4.8 kbps Code Excited Linear Predictive Coder,xe2x80x9d by Thomas E. Tremain et al., Proceedings of the Mobile Satellite Conference, 1988.
These coding schemes compress the digitized speech signal into a low bit rate signal by removing all of the natural redundancies (i.e., correlated elements) inherent in speech. Speech typically exhibits short term redundancies resulting from the mechanical action of the lips and tongue, and long term redundancies resulting from the vibration of the vocal cords. Linear predictive schemes model these operations as filters, remove the redundancies, and then model the resulting residual signal as white gaussian noise. Linear predictive coders therefore achieve a reduced bit rate by transmitting filter coefficients and quantized noise rather than a full bandwidth speech signal.
However, even these reduced bit rates often exceed the available bandwidth where the speech signal must either propagate a long distance (e.g., ground to satellite) or coexist with many other signals in a crowded channel. A need therefore exists for an improved coding scheme which achieves a lower bit rate than linear predictive schemes.
The present invention is a novel and improved method and apparatus for coding a quasi-periodic speech signal. The speech signal is represented by a residual signal generated by filtering the speech signal with a Linear Predictive Coding (LPC) analysis filter. The residual signal is encoded by extracting a prototype period from a current frame of the residual signal. A first set of parameters is calculated which describes how to modify a previous prototype period to approximate the current prototype period. One or more codevectors are selected which, when summed, approximate the difference between the current prototype period and the modified previous prototype period. A second set of parameters describes these selected codevectors. The decoder synthesizes an output speech signal by reconstructing a current prototype period based on the first and second set of parameters. The residual signal is then interpolated over the region between the current reconstructed prototype period and a previous reconstructed prototype period. The decoder synthesizes output speech based on the interpolated residual signal.
A feature of the present invention is that prototype periods are used to represent and reconstruct the speech signal. Coding the prototype period rather than the entire speech signal reduces the required bit rate, which translates into higher capacity, greater range, and lower power requirements.
Another feature of the present invention is that a past prototype period is used as a predictor of the current prototype period. The difference between the current prototype period and an optimally rotated and scaled previous prototype period is encoded and transmitted, further reducing the required bit rate.
Still another feature of the present invention is that the residual signal is reconstructed at the decoder by interpolating between successive reconstructed prototype periods, based on a weighted average of the successive prototype periods and an average lag.
Another feature of the present invention is that a multi-stage codebook is used to encode the transmitted error vector. This codebook provides for the efficient storage and searching of code data. Additional stages may be added to achieve a desired level of accuracy.
Another feature of the present invention is that a warping filter is used to efficiently change the length of a first signal to match that of a second signal, where the coding operations require that the two signals be of the same length.
Yet another feature of the present invention is that prototype periods are extracted subject to a xe2x80x9ccut-freexe2x80x9d region, thereby avoiding discontinuities in the output due to splitting high energy regions along frame boundaries.
The features, objects, and advantages of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference numbers indicate identical or functionally similar elements. Additionally, the left-most digit of a reference number identifies the drawing in which the reference number first appears.