Low-bandwidth speech communication techniques, i.e., those that require only a small number of bits of information to represent a sample of audio data, are used in a variety of applications, such as mobile telephony, voice over Internet Protocol (VoIP), recording, audio data storage, and multimedia. In such applications, it is desirable to minimize the required bandwidth while maintaining acceptable quality in the reconstructed (de-coded) sound.
Phoneme based speech communication techniques have been used to accomplish low data rate speech communication. Such techniques satisfy the need to communicate via low bandwidth speech coding, but do not generally produce speech output that can be recognized as the voice of a particular speaker. Accordingly, the output speech from such systems has typically been machine-like, conveying little information about a speaker's emphasis, inflection, accent, etc. that the original speaker might use to convey more information than can be carried in the words themselves.
HVXC (Harmonic Vector eXcitation Coding) and CELP (Code Excited Linear Prediction) are defined as part of the (Moving Picture Experts Group) MPEG-4 audio standard and enable bit rates on the order of 1,500 to 12,000 per second, depending on the quality of the voice recording. As with vocoder (Voice codec) based methods such as defined in the G.722 standard, the HVXC and CELP methods utilize a set of tabulated and indexed human voice samples and identifies an index number of the sample that best matches the current audio waveform. The HVXC and CELP methods, however, separate the spectral portion of the sample from the stochastic portion, which varies with the speaker and the environment. Although they achieve higher compression rates than traditional vocoding, the HVXC and CELP methods requires 5 to 60 times higher bit rates than phoneme-based methods for voice transmission.