The present invention generally relates to a method for encoding speech, and more particularly to the coding of the linear predictive (LPC) residual signal by using either its Fourier Transform magnitude or phase.
The encoding of digital speech data as derived from analog speech signals to enable the speech information to be placed in a compressed form for storage and transmission as speech signals using a reduced bandwidth has long been recognized as a desirable goal. Speech encoding produces a significant compression in the speech signal as derived from the original analog speech signal which can be utilized to advantage in the general synthesis of speech, in speech recognition and in the transmission of spoken speech.
A technique known as linear predictive coding is commonly employed in the analysis of speech as a means of compressing the speech signal without sacrificing much of the actual information content thereof in its audible form. This technique is based upon the following relation: ##EQU1## where s.sub.n is a signal considered to be the output of some system with some unknown input u.sub.n, with a.sub.k, 1.ltoreq.k.ltoreq.p, b.sub.l, 1.ltoreq.l.ltoreq.q, and the gain G being the parameters of the hypothesized system. In equation (1), the "output" s.sub.n is a linear function of past outputs and present and past inputs. Thus, the signal s.sub.n is predictable from linear combinations of past outputs and inputs, whereby the technique is referred to as linear prediction. A typical implementation of linear predictive coding (LPC) of digital speech data as derived from human speech is disclosed in U.S. Pat. No. 4,209,836 Wiggins, Jr. et al issued June 24, 1980 which is hereby incorporated by reference. As noted therein, linear predictive coding systems generally employ a multi-stage digital filter in processing the encoded digital speech data for generating an analog speech signal in a speech synthesis system from which audible speech is produced.
By taking the z transform on both sides of equation (1), where H(z) is the transfer function of the system, the following relationship is obtained: ##EQU2## is the z transform of s.sub.n, and U(z) is the z transform of u.sub.n. In equation (2), H(z) is the general pole-zero model, with the roots of the numerator and denominator polynomials being the zeros and poles of the model, respectively. Linear predictive modeling generally has been accomplished by using a special form of the general pole-zero model of equation (2), namely--the autoregressive or all-pole model, where it is assumed that the signal s.sub.n is a linear combination of past values and some input u.sub.n, as in the following relationship: ##EQU3## where G is a gain factor. The transfer function H(z) in equation (2) now reduces to an all-pole transfer function ##EQU4## Given a particular signal sequence s.sub.n, speech analysis according to the all-pole transfer function of equation (5) produces the predictor coefficients a.sub.k and the gain G as speech parameters. To represent speech in accordance with the LPC model, the predictor coefficients a.sub.k, or some equivalent set of parameters, such as the reflection coefficients k.sub.k, must be transmitted so that the linear predictive model can be used to re-synthesize the speech signal for producing audible speech at the output of the system. A detailed discussion of linear prediction as it pertains to the analysis of discrete signals is given in the article "Linear Prediction: A Tutorial Review"--John Makhoul, Proceedings of the IEEE, Vol. 63, No. 4, pp. 561-580 (April 1975) which is hereby incorporated by reference.
In linear predictive coding, a residual error signal (i.e., the LPC residual signal) is created. In order to encode speech using the linear predictive coding technique at medium to high bit rates (e.g. a medium rate of 8000-16,000 bits per second, and a high bit rate in excess of 16,000 bits per second) while maintaining very high speech quality, an encoding technique including the coding of the LPC residual signal would be desirable. In general, the LPC residual signal may be considered a non-minimum phase signal ordinarily requiring knowledge of both the Fourier Transform magnitude and phase in order to fully correspond to the time domain waveform. In the time domain, the energy density of a minimum phase signal is higher around the origin and tends to decrease as it moves away from the origin. During periods of voiced speech, the energy in the LPC residual is relatively low except in the vicinity of a pitch pulse where it is generally significantly higher. Based upon these observations, it has been determined in accordance with the present invention that the LPC residual of a speech signal may be transformed in a manner permitting its encoding at medium to high bit rates while maintaining very high quality speech.