A desirable objective in the operation of a digital telecommunication network is to reduce the bit rate required to transmit speech signals. In a typical telephone network, speech signals are limited to a band of frequencies that is about 4 kHz wide. In order to digitally encode such speech signals, a sampling rate of 8 kHz is required by the Nyquist criterion. For acceptable fidelity, a resolution of about 16 bits per sample is required. Thus, a bit rate of about 128 kb/s would be needed to digitize telephonic speech.
In order to provide a maximum number of speech channels that can be transmitted through a band-limited medium, considerable efforts have been made to reduce the bit rate allocated to each channel. For example, by using a logarithmic quantization scale, such as in .mu.-Law PCM encoding, high quality speech can be encoded and transmitted at 64 kb/s. One variation of such an encoding method, adaptive .mu.-Law PCM (ADPCM) encoding, can reduce the required bit rate to 32 kb/s.
Further advances in speech coding have exploited characteristic properties of speech signals and of human auditory perception in order to reduce the quantity of data that needs to be transmitted in order to acceptably reproduce an input speech signal at a remote location for perception by a human listener. For example, a voiced speech signal such as a vowel sound is characterized by a highly regular short-term wave form (having a period of about 10 ms) which changes its shape relatively slowly. Such speech can be viewed as consisting of an excitation signal (i.e., the vibratory action of vocal chords) that is modified by a combination of time varying filters (i.e., the changing shape of the vocal tract and mouth of the speaker). Hence, coding schemes have been developed wherein an encoder transmits data identifying one of several predetermined excitation signals and one or more modifying filter coefficients, rather than a direct digital representation of the speech signal. At the receiving end, a decoder interprets the transmitted data in order to synthesize a speech signal for the remote listener. In general, such speech coding systems are referred to as a parametric coders, since the transmitted data represents a parametric description of the original speech signal.
Parametric speech coders can achieve bit rates of approximately 8-16 kb/s, which is a considerable improvement over PCM or ADPCM. In one class of speech coders, code-excited linear predictive (CELP) coders, the parameters describing the speech are established by an analysis-by-synthesis process. In essence, one or more excitation signals are selected from among a finite number of excitation signals; a synthetic speech signal is generated by combining the excitation signals; the synthetic speech is compared to the actual speech; and the selection of excitation signals is iteratively updated on the basis of the comparison to achieve a "best match" to the original speech on a continuous basis. Such coders are also known as stochastic coders or vector-excited speech coders.
Telecommunication signals are typically subjected to other signal processing functions in addition to speech coding. One such function is echo cancellation. In an echo canceler, an adaptive transversal filter is provided for estimating the impulse response of an echo path between a received signal and a transmitted signal. The received signal is convolved with the estimated impulse response to provide an estimated echo signal. The estimated echo signal is then subtracted from the transmitted signal to remove the echo component of the original transmitted signal.
When echo cancellation is performed in conjunction with speech coding, the performance of echo cancellation is impaired by the mismatch, at any given moment, between the excitation signals characterizing the encoded near-end speech and the excitation signals characterizing the far-end speech. While PCM-based echo cancelers can achieve an echo return loss enhancement of 30 dB or more, the use of CELP coding can reduce the performance of the canceler to an echo return loss enhancement of about 20 dB or less. One reason for such reduction in performance is that the estimated echo signal is determined as a function of the received signal, which is expressed in terms of the far-end excitation signal selected by the far-end CELP coder. The estimated echo signal is then subtracted from the transmitted signal, which, in turn, is based upon the current near-end excitation signal selected by the near-end CELP coder. Hence, the resulting echo-canceled signal will include a noise component attributable to differences between the near-end and far-end excitation signals.