1. Field of the Invention
The present invention relates to a method and system for coding low bit rate speech for communication systems. More particularly, the present invention relates to a method and apparatus for performing prototype waveform magnitude quantization using vector quantization.
2. Background of the Invention
Currently, various speech encoding techniques are used to process speech. These techniques do not adequately address the need for a speech encoding technique that improves the modeling and quantization of a speech signal, specifically, the evolving spectral characteristics of a speech prediction residual signal which includes a prototype waveform (PW) gain vector, a PW magnitude vector, and a PW phase information.
In particular, prior art techniques are representative but not limited to the following see, e.g., L. R. Rabiner and R. W. Schafer, “Digital Processing of Speech Signals” Prentice-Hall 1978 (hereinafter known as reference 1), W. B. Klejin and J. Haagen, “Waveform Interpolation for Coding and Synthesis”, in Speech Coding and Synthesis, Edited by W. B. Klejin, K. K. Paliwal, Elsevier, 1995 (hereinafter known as reference 2); F. Iatakura, “Line Spectral Representation of Linear Predictive Coefficients of Speech Signals”, Journal of Acoustical Society of America, vol 4. 57, no. 1, 1975 (hereinafter known as reference 3); P. Kabal and R. P. Ramachandran, “The Computation of Line Spectral Frequencies Using Chebyshev Polybimials”, IEEE Trans. On ASSP, vol. 34, no. 6, pp. 1419–1426, December 1986 (hereinafter known as reference 4); W. B. Klejin, “Encoding Speech Using Prototype Waveforms” IEEE Transactions on Speech and Audio Processing, Vol. 1, No. 4, 386–399, 1993 (hereinafter known as reference 5); and W. B. Kleijn, Y. Shoman, D. Sen and R. Hagen, “A Low Complexity Waveform Interpolation Coder”, IEEE International Conference on Acoustics, Speech and Signal Processing, 1996 (hereinafter known as reference 6). All of the references 1 through 6 are herein incorporated in their entirety by reference.
The prototype waveforms are a sequence of complex Fourier transforms evaluated at pitch harmonic frequencies, for pitch period wide segments of the residual, at a series of points along the time axis. Thus, the PW sequence contains information about the spectral characteristics of the residual signal as well as the temporal evolution of these characteristics. A high quality of speech can be achieved at low coding rates by efficiently quantizing the important aspects of the PW sequence.
In PW based coders, the PW is separated into a shape component and a level component by computing the RMS (or gain) value of the PW and normalizing the PW to a unity RMS value. As the pitch frequency varies, the dimensions of the PW vectors also vary, typically in the range of 11–61. Existing VQ techniques, such as direct VQ, split VQ and multi-stage VQ are not well suited for variable dimension vectors. Adaptation of these techniques for variable dimension is not neither practical from an implementation viewpoint nor satisfactory from a performance viewpoint. It's not practical since the worst case high dimensionality results in a high computational cost and a high storage cost.
To address the variable dimensionality problem, prior art in reference 4 uses analytical functions of a fixed order to approximate the variable dimension vectors. The coefficients of the analytical function that provide the best fit to the vectors are used to represent the vectors for quantization. This approach suffers from three disadvantages. First, a modeling error is added to the quantization error, leading to a loss in performance. Second, analytical function approximation for reasonable orders in the magnitude of 5–10 deteriorate with increasing frequency. Third, if spectrally weighted distortion metrics are used during VQ, the complexity of these methods become formidable.
A PW magnitude vector sequence determines the evolving spectral characteristics of a linear predictive (LP) excitation signal and therefore is important in signal characterization. Prior art techniques separate the PW sequence into slowly evolving (SEW) and rapidly evolving (REW) components. This results in two disadvantages.
First the algorithmic delay of the coding scheme in prior art is significantly increased as it requires linear low pass and high pass filtering to separate the SEW and REW components. This delay can be noticeable in telephone conversations.
Second, the signal processing in prior art needed for this purpose is complicated due to the filtering that is necessary. This increases the computational complexity of processing the signal resulting higher cost.
Additionally, prior art techniques use a non-hierachical approach in quantizing the PW vectors (see references 2–6). This results in lower CODEC performance and less robustness to channel errors.
Thus, a need exists for a system and method that can accurately recreate perceptually important spectral features of the PW magnitude while maintaining computational and storage efficiency. Specifically, this permits the evolving spectral features of the LP residual signal to be reproduced accurately at the decoder.