1. Field of the Invention
The present invention relates to a method and system for coding low bit rate speech for a communications system. More particularly, the present invention relates to a method and apparatus for encoding perceptually important information about the phase components of a prototype waveform.
2. Background of the Invention
Currently, various speech encoding techniques are used to process speech. These techniques do not adequately address the need for a speech encoding technique that improves the modeling and quantization of a speech signal, specifically, the spectral characteristics of a speech prediction residual signal which includes a prototype waveform (PW) gain vector, a PW magnitude vector, and a PW phase information.
In particular, prior art techniques are representative but not limited to the following see, e.g., L. R. Rabiner and R. W. Schafer, “Digital Processing of Speech Signals” Prentice-Hall 1978 (hereinafter known as reference 1), W. B. Klejin and J. Haagen, “Waveform Interpolation for Coding and Synthesis”, in Speech Coding and Synthesis, Edited by W. B. Klejin, K. K. Paliwal, Elsevier, 1995 (hereinafter known as reference 2); F. latakura, “Line Spectral Representation of Linear Predictive Coefficients of Speech Signals”, Journal of Acoustical Society of America, vol 4. 57, no. 1, 1975 (hereinafter known as reference 3); P. Kabal and R. P. Ramachandran, “The Computation of Line Spectral Frequencies Using Chebyshev Polybimials”, IEEE Trans. On ASSP, vol. 34, no. 6, pp. 1419-1426, December 1986 (hereinafter known as reference 4); W. B. Klejin, “Encoding Speech Using Prototype Waveforms” IEEE Transactions on Speech and Audio Processing, Vol. 1, No. 4, 386-399, 1993 (hereinafter known as reference 5); and W. B. Kleijn, Y. Shoman, D. Sen and R. Hagen, “A Low Complexity Waveform Interpolation Coder”, IEEE International Conference on Acoustics, Speech and Signal Processing, 1996 (hereinafter known as reference 6). All of the references 1 through 6 are herein incorporated in their entirety by reference.
The prototype waveforms are a sequence of complex Fourier transforms evaluated at pitch harmonic frequencies, for pitch period wide segments of the residual, at a series of points along the time axis. Thus, the PW sequence contains information about the spectral characteristics of the residual signal as well as the temporal evolution of these characteristics. A high quality of speech can be achieved at low coding rates by efficiently quantizing the important aspects of the PW sequence.
In PW based coders, the PW is separated into a shape component and a level component by computing the RMS (or gain) value of the PW and normalizing the PW to a unity RMS value. As the pitch frequency varies, the dimensions of the PW vectors also vary, typically in the range of 11-61.
A PW magnitude vector sequence contains the evolving spectral characteristics of a linear predictive (LP) excitation signal and therefore is important in signal compression. Prior art techniques separate the PW sequence into slowly evolving and rapidly evolving components. This results in three disadvantages.
First the algorithmic delay of the prior art coding schemes are significantly increased and requires linear low pass and high pass filtering to separate the SEW and REW components. This delay can be noticeable in telephone conversations.
Second, the signal processing process used in the prior art is complicated due to the filters that are involved. This increases the cost and time to process the signal.
Third, performance of the prior art is poor at low coding rates. This is due to the fact that only SEW and REW magnitudes are coded in the prior art. Specifically, at the decoder phase models are used to obtain SEW and REW phases. Therefore, even if the SEW and REW magnitude spectra were accurately encoded, the magnitude of the sum of the complex SEW and REW vectors cannot come close to the original PW magnitude spectrum because the phases are estimated in the case of the prior art.
In addition, some prior art methods, references 2-6, employ a binary model based on a periodic phase or a random phase to encode SEW and REW phases. This results in poor performance because it is based on a binary voicing decision with only two states.
In some cases of prior art, the SEW phase is obtained at the receiver by a fixed phase model. The REW phase is obtained at a receiver using random phase models. The use of fixed and random phase models results in reconstructed speech that is excessively rough or excessively periodic due to the approximations made.
In prior art, at the receiver, the PW phase is determined by a vector addition of the SEW and REW vectors. Even if the SEW and REW magnitudes are preserved exactly, the PW magnitude cannot be accurately reproduced at the receiver.
Thus, a need exists for a system and method that provides information about the PW phase such that the characteristics of the PW phase can be reproduced at the decoder. Furthermore, a need exists for a system and method that provides for reproducing the phase characteristics of the PW phase without compromising the accuracy of the reproduction of the PW magnitude information.