The present invention relates to enhanced speech coding techniques for low-rate speech coders, and particularly, to improved speech frame analysis and vector quantization methods.
A low-bit-rate speech coder is disclosed in U.S. Pat. No. 4,975,956, issued to Y. J. Liu and J. H. Rothweiler, entitled xe2x80x9cLow-Bit-Rate Speech Coder Using LPC Data Reduction Processingxe2x80x9d, which is incorporated herein by reference. This speech coder employs linear predictive coding (LPC) analysis to generate reflection coefficients for the input speech frames and pitch and gain parameters. To obtain a low bit rate of 400 bps, these parameters are further compressed. The reflection coefficients are first converted to line spectrum frequencies (LSFs) and formants. For even frames, these spectral parameters are vector-quantized into clean codeword indices. Odd frames are omitted, and are regenerated by interpolation at the decoder end. The vector quantization module compares the spectral parameters for an input word against a vocabulary of codewords for which vector indices have been generated and stored during a training sequence, and the optimally matching codeword is selected for transmission. Pitch and gain bits are quantized using trellis coding. Output speech is reconstructed from the regenerated vector-quantization indices using a matching codebook at the decoder end.
In a quiet background, this 400-bps speech coder has a high intelligibility for a low-bit-rate transmission. However, in a background of high noise, such as in a helicopter or jet, the encoded speech becomes unintelligible. A detailed study has shown that conversion of voicing and spectral parameters in the high-noise environment is the key to the loss of intelligibility. The LPC conversion causes a majority of voiced frames to become unvoiced. The result is a whispering LPC speech and an almost inaudible low-rate voice. Even if the voicing is correct, spectral distortion causes the low-rate voice to be significantly muffled and buzzy. Although the pitch has no audible errors, the gain has a predominantly annoying effect.
It is therefore a principal object of the invention to provide an improved low-bit-rate speech coder capable of high quality speech coding in a high-noise environment. In accordance with the invention, a two-step approach to conversion of voicing and spectral parameters is taken. In the first step, robust speech frame features whose distributions are not strongly affected by noise levels are generated. In the second step, linear programming is used to determine an optimum combination of these features. A technique of adaptive vector quantization is also used in which a clean codebook is updated based upon an estimate of the background noise levels, and the xe2x80x9cnoisyxe2x80x9d codebook is then searched for the best match with an input speech vector. The corresponding clean codeword is then selected for transmission and for synthesis at the receiver end. The results are better spectral reproduction and significant intelligibility enhancement over the previous coding approach.
In a preferred implementation of the system for the environment of helicopter, it is found that the following features are well distributed to allow good discrimination between voiced and unvoiced speech: (1) low-band energy; (2) zero-crossing counts adapted for noise level; (3) AMDF ratio (speech periodicity) measure; (4) low-pass filtered, backward correlation; (5) low-pass filtered, forward correlation; (6) inverse-filtered backward correlation; and (7) inverse-filtered pitch prediction gain measure. By linear programming analysis, five of these robust features are determined to significantly improve voicing decisions in the speech coder system. Adaptive vector quantization, using estimates of the average noise amplitude and average noise reflection coefficients to update codebook vectors, significantly improves input vector matching.