A digital speech coder is part of a speech communication system that typically contains an analog-to-digital converter ("ADC"), a digital speech encoder, a data storage or transmission mechanism, a digital speech decoder, and a digital-to-analog converter ("DAC"). The ADC samples an analog input speech waveform and converts the (analog) samples into a corresponding datastream of digital input speech samples. The encoder applies a coding to the digital input datastream in order to compress it into a smaller datastream that approximates the digital input speech samples. The compressed digital speech datastream is stored in the storage mechanism or transmitted by way of the transmission mechanism to a remote location.
The decoder, situated at the site of the storage mechanism or at the remote location, decompresses the compressed digital datastream to produce a datastream of digital output speech samples. The DAC then converts the decompressed digital output datastream into a corresponding analog output speech waveform that approximates the analog input speech waveform. The encoder and decoder form a speech coder commonly referred to as a coder/decoder or codec.
Speech is produced as a result of acoustical excitation of the human vocal tract. In the well-known linear predictive coding ("LPC") model, the vocal tract function is approximated by a time-varying recursive linear filter, commonly termed the formant synthesis filter, obtained from directly analyzing speech waveform samples using the LPC technique. Glottal excitation of the vocal track occurs when air passes the vocal cords. The glottal excitation signals, although not representable as easily as the vocal tract function, can generally be represented by a weighted sum of two types of excitation signals: a quasi-periodic excitation signal and a noise-like excitation signal. The quasi-periodic excitation signal is typically approximated by a concatenation of many short waveform segments where, within each segment, the waveform is periodic with a constant period termed the average pitch period. The noise-like signal is approximated by a series of non-periodic pulses or white noise.
The pitch period and the characteristics of the formant synthesis filter change continuously with time. To reduce the data rate required to transmit the compressed speech information, the pitch data and the format filter characteristics are periodically updated. This typically occurs at intervals of 10 to 30 milliseconds.
The Telecommunication Standardization Sector of the International Telecommunication Union ("ITU") is in the process of standardizing a dual-rate digital speech coder for multi-media communications. "Dual Rate Speech Coder for Multimedia Communications Transmitting at 5.3 & 6.3 kbits/s," Draft G.723, Telecommunication Standardization Sector of ITU, 7 Jul. 1995, 37 pages (hereafter referred to as the "July 1995 G.723 specification"), presents a description of this standardized ITU speech coder (hereafter the "G.723 coder"). Using linear predictive coding in combination with an analysis-by-synthesis technique, the digital speech encoder in the G.723 coder generates a compressed digital speech datastream at a data rate of 5.3 or 6.3 kilobits/second ("kbps") starting from an uncompressed input digital speech datastream at a data rate of 128 kbps. The 5.3-kbps or 6.3 kbps compressed data rate is selectively set by the user.
After decompression of the compressed datastream, the digital speech signal produced by the G.723 coder is of excellent communication quality. However, a high computation capability is needed to implement the G.723 coder. In particulars the G.723 coder typically requires approximately twenty million instructions per second of processing power furnished by a dedicated digital signal processor. A large portion of the G.723 coder's processing capability is utilized in performing energy error minimization during the generation of codebook excitation information.
In software running on a general purpose computer such as a personal computer, it is difficult to attain the data processing capability needed for the G.723 coder. A digital speech coder that provides communication quality comparable to that of the G.723 coder but at a considerably reduced computation power is desirable.