Various methods have been developed for digital encoding of speech signals. The encoding enables the speech signal to be stored or transmitted and subsequently decoded, thereby reproducing the original speech signal.
Model-based speech encoding permits the speech signal to be compressed, which reduces the number of bits required to represent the speech signal, thereby reducing data transmission rates. The lower data rates are possible because of the redundancy of speech and by mathematically simulating the human speech-generating system. The vocal tract is simulated by a number of "pipes" of differing diameter, and the excitation is represented by a pulse stream at the vocal chord rate for voiced sound or a random noise source for the unvoiced parts of speech. Reflection coefficients at junctions of the pipes are represented by coefficients obtained from linear prediction coding (LPC) analysis of the speech waveform.
In harmonic speech coding systems, the pitch period and harmonic spectral amplitudes play an important role in synthesizing high quality speech. The vocal chord rate is represented by an estimated pitch period. This pitch period dictates the number of harmonic amplitudes. Because pitch varies from one frame of a speech signal to another, the number of harmonic frequencies will vary. For example, there may be a few as 8 harmonics for high pitched speech or as many as 80 for low pitched speech.
One problem encountered in speech encoding is that the varying number of harmonic amplitudes causes difficulty when the amplitudes are quantized. A quantization scheme that is efficient for high pitched speech may be unsuitable for low pitched speakers. On the other hand, a quantization method that is designed to accommodate low pitched speaker may not be efficient. Conventional vector quantization methods suffer from a decrease in efficiency when vector dimensions are increased to improve the quality of speech reproduction.