The present invention is directed to low bit rate (4.8 kb/s and below) speech coding, and particularly to a robust and efficient quantization scheme for use in such coding.
The number of harmonic magnitudes that must be quantized and transmitted for a given speech frame is a function of the estimated pitch period. This figure can vary from 8 harmonics in the case of high pitched speaker to as much as 80 for an extremely low pitched speaker. For the ITU 4 kb/s toll quality speech coding algorithm, there are only 80 bits available to quantize the whole speech model parameters (LSF coefficients, Pitch, Voicing information, and Spectral Amplitudes or Harmonic Magnitudes). For this purpose, only 21 bits are available to quantize 2 sets of spectral amplitudes (2 frames). Straightforward quantization schemes do not provide enough degree of transmission efficiency with the desired performance. Efficient quantization of the variable dimension spectral vectors is a crucial issue in low bit rate harmonic speech coders.
Recently, several techniques have been developed for the quantization of variable dimension spectral vectors. In R. J. McAulay and T. F. Quatieri xe2x80x9cSinusoidal Codingxe2x80x9d, in Speech Coding and Synthesis (W. B. Kleijn and K. K. Paliwal, edts.), Amsterdam, Elsevier Science Publishers, 1995, and S. Yeldener, A. M. Kondoz, B. G. Evans xe2x80x9cMulti-Band Linear Predictive Speech Coding at Very Low Bit Ratesxe2x80x9d IEEE Proc. Vis. Image and Signal Processing, October 1994, Vol. 141, No. 5, pp. 289-295, an all-pole (LP) model is used to approximate the spectral envelope using a fixed number of parameters. These parameters can be quantized using fixed dimension Vector Quantization (VQ). In Band Limited Interpolation (BLI), e.g., described by M. Nishignchi, J. Matsumoto, R. Walcatsuld and S. Ono xe2x80x9cVector Quantized MBE with simplified V/LV decision at 3 Kb/sxe2x80x9d, Proc. of ICASSP-93, pp. II-151-154, the variable dimension vectors are converted into fixed dimension vectors by sampling rate conversion which preserves the shape of the spectral envelope. The concept of spectral bins for the dimension conversion is employed in variable dimension vector quantization (VDVQ), described by A. Das, A. V. Rao, A. Gersho xe2x80x9cVariable Dimension Vector Quantization of Speech Spectra for Low Rate Vocodersxe2x80x9d Proc. of Data Compression Conf. Pp. 421-429, 1994. In VDVQ, the spectral axis is divided into segments, or bins and each spectral sample is mapped onto the closest spectral bin to form a fixed dimension vector for quantization. A truncation method (P. Hedelin xe2x80x9cA tone oriented voice excited vocoderxe2x80x9d Proc. of ICASSP-81, pp. 205-208, and a zero padding method (E. Shlomot, V. Cuperman and A. Gersho xe2x80x9cCombined Harmonic and Waveform Coding of Speech at Low Bit Ratesxe2x80x9d Proc. ICASSP-98, pp. 585-588) convert the variable dimension vector to a fixed dimension vector by simply truncating or zero padding, respectively. Another method for the quantization of the spectral amplitudes is the linear dimension conversion which is called non-square transform VQ (NSTVQ), described by P. Lupini, V. Cuperman xe2x80x9cVector Quantization of harmonic magnitudes for low rate speech codersxe2x80x9d Proc. IEEE Globecorn, 1994.
All of these schemes mentioned above are not very efficient methods to quantize the spectral amplitudes with a minimal distortion using only a few bits.
It is an object of the invention to provide an improved method of quantizing spectral amplitudes, to provide a higher degree of transmission efficiency and performance.
In accordance with this invention, two consecutive frames are grouped and quantized together. The spectral amplitude gain for the second sub-frame is quantized using a 5-bit non-uniform scalar quantizer. Next, the shape of the spectral harmonic amplitudes are split into odd and even harmonic amplitude vectors. The odd vector is converted to LOG and then DCT domain, and then quantized using 8 bits. The even vector is converted to LOG and then used to generate a difference vector relative to the quantized odd LOG vector and the difference vector, and this difference vector is then quantized using 5 bits. Since the vector quantizations for spectral amplitudes can be done in the DCT domain, a weighting can be used that gives more emphasis to the low order DCT coefficients than the higher order ones. In the end, a total of 18 bits are used for spectral amplitudes of the second frame.
The spectral amplitudes for the first frame are quantized based on optimal linear interpolation techniques using the spectral amplitudes of the previous and next frames. Since the spectral amplitudes have variable dimension from one frame to the next, an interpolation algorithm is used to convert variable dimension spectral amplitudes into a fixed dimension. Further interpolation between the spectral amplitude values of the previous and next frames yields multiple sets of interpolated values, and comparison of these to the original interpolated (i.e., fixed dimension) spectral amplitude values for the current frame yields an error signal. The best interpolated spectral amplitudes are then chosen in accordance with a mean squared error (MSE) approach, and the chosen amplitude values (or an index representing the same) are quantized using three bits.