The present invention relates to method and apparatus for encoding speech signals.
It is highly desirable to be able to store and transmit speech signals using a reduced bandwidth. For example, if 8000 Hz of a speech signal is sampled at the Nyquist rate with 12-bit accuracy, the resulting data rate required is almost 200 kilobits per second of speech. Since the actual information content of speech is far smaller than this, it is extremely desirable to reduce the data rate required to encode speech down to something closer to the actual information content as received by a human listener. Such compressed speech coding has three principal areas of application, each of major importance: synthetic speech, transmission of spoken messages, and speech recognition.
A principal area of efforts to accomplish this end has been linear predictive coding of speech. In the general linear prediction model, a signal s.sub.n is considered to be the output of a system with an input u.sub.n such that the following relation hold: ##EQU1## where b.sub.0 is defined as one, and a.sub.k (k ranging over integers between l and p inclusive), and b.sub.m (m ranging over integers between l and q inclusive), and the gain G are the parameters of the hypothesized system. Since the signal s.sub.n is modeled as a linear function of past outputs and present and past inputs, linear prediction from these outputs and inputs specifies the value of s.sub.n.
A slightly simplified version of this model, which is much more tractable, is the autoregressive or all-pole model. In this model, the signal s.sub.n is assumed to be a linear combination of past values and of a single input value u.sub.n : ##EQU2## where G is a gain factor. By taking the z transform of both sides of this equation, the system transfer function H(z) is ##EQU3## Given a particular signal sequence s.sub.n, analysis according to this model requires that the predictor coefficients a.sub.k and the gain G be determined in some manner.
In the model of human speech upon which the present invention is based, the human voice is modeled as a combination of an excitation function with a linear predictive filter. Once the system has been analyzed according to this fashion, the excitation function can normally be transmitted at quite a low bit rate. However, the present invention is not directed to excitation function modeling, and conventional modeling, analysis, and encoding methods are used for this aspect. See generally Rabmer & Schafer, Digital Processing of Speech Signals (1978). Markel & Gray, Linear Prediction of Speech (1976); Atal et al, "Speech Analysis and Synthesis by Linear Prediction of the Speech Wave", 50 Journal of the Acoustical Society of America 637 (1971); Makharl "Linear Prediction: A Tutorial Review", 63 Proceedings IEEE p. 561 (1975); all of which are hereby incorporated by reference. Pitch and gain energy are commonly used as a minimum set of excitation parameters.
To represent speech in accordance with the LPC model, the predictor coefficients a.sub.k, or some equivalent set of parameters, must be transmitted so that the linear predictive model can be used to resynthesize the speech signal at the receiver. In the prior art, reflection coefficients have often been used as the transmitted parameters. The desirable features to be selected for, in deciding which set of parameters is to be transmitted to permit resynthesis of speech according to the LPC model, include: 1. The synthesized filter should be guaranteed stable. 2. The parameters transmitted should preferably correspond fairly closely to perceptual parameters, to permit perceptually efficient use of bandwidth. 3. A minimum computational load should be imposed, at both transmitting and (especially) receiving ends. 4. Preferably the parameters should have a natural ordering, so that an efficiently reduced set of parameters can be obtained by truncation.
Thus is an object of the present invention to provide a method for encoding speech according to the linear predictive coding model, such that the stability of the LPC filter is guaranteed, at minimum bit rate.
It is a further object of the present invention to provide a method for encoding speech parameters in accordance with the linear predictive coding model, such that the encoded parameters correspond closely to perceptual parameters and require minimum bit rate.
It is a further object of the present invention to provide a method for encoding speech for synthesis according to the linear predictive coding model at minimum bit rate, such that a minimium computational load is required to regenerate the encoded speech.