This invention relates to echo cancellation and, more particularly, to an improved method for carrying out acoustic echo cancellation in voice communication networks.
In voice communication networks, the digital speech signal is ultimately transmitted from a source to a destination. A primary objective in devising speech encoders is to minimize the number of bits required to represent the speech signal, while maintaining speech intelligibility. This objective has led to the development of a class of low-bit rate vocoders (i.e. speech encoders), which are based on constructing a model of the speech source and transmitting the model parameters.
In the area of mobile communications, most speech coding methods are based on some variant of Linear Predictive Coding (LPC), the main purpose of which is to reduce the amount of bits sent across a channel. A linear predictive coder is a popular vocoder that extracts perceptually significant features of speech directly from a time waveform rather than from frequency spectra, as do channel and formant vocoders.
Fundamentally, an LPC encoder analyses a speech waveform to produce a time-varying model of the vocal tract excitation and transfer function. A synthesizer in the receiving terminal recreates the speech by passing the specified excitation through a mathematical model of the vocal tract. By periodically updating the parameters of the model and the specification of the excitation, the synthesizer adapts to changes in either. During any one specification interval, however, the vocal tract is assumed to represent a linear time-invariant process. Because only a handful of parameters are transmitted, the voice data rate is low. This type of speech coding may be used in limited bandwidth. applications where other techniques cannot. In addition, LPC provides more natural sounding speech than the purely frequency domain based vocoders.
Generally, the LPC encoder at the speaker""s end generates various pieces of information which are transmitted to the listener""s end where they are used to reconstruct the original speech signal. This information consists of (a) nature of the excitation i.e. voiced or unvoiced, (b) pitch period (for voiced excitation), (c) a gain factor and (d) predictor coefficients (parameters of vocal tract model).
In the field of modern telecommunications, hands-free telephony continues to be an increasingly desirable feature. Handsfree telephones are desirable in a variety of applications from teleconferencing systems, to mobile cellular phones and multimedia terminals. High-quality full-duplex handsfree communication is difficult to achieve, however. In these systems, the loudspeaker and microphone are typically located away from the users thereby requiring large signal gains to maintain comfortable volume levels. These large fixed gains may lead to electro-acoustic instability. In some handsfree systems, the microphone and loudspeaker are placed within the same acoustic enclosure in order to market the handsfree as a single desktop unit. In this case, the large amount of gain plus the close loudspeaker-microphone coupling provides a large echo path back to the talker conversing with the handsfree terminal. Currently, there is a strong emphasis on communications based on Voice over Internet Protocol (VoIP) and in this environment, the packet networks can introduce substantial delay into the echo path (e.g.  greater than  greater than 40 ms). The delayed echo can seriously impair conversations.
A number of solutions have been proposed and implemented to make handsfree telephony a feasible technology. Traditionally, it has been assumed that two talkers will not converse at the same time and, as such, initial handsfree terminals achieved echo-free operation by introducing manual or automatic switched-loss functions in the unused voice path. This method requires some sort of switching decision mechanism to find who is the more deserving talker, and requires a finite amount of switching time. This switching can cause some impairment of its own, most noticeably clipping and chopping of words or sentences. The fact that only one voice path is available at a time defines this type of system as half-duplex. True full-duplex handsfree telephony may be possible, however, with xe2x80x98echo cancellationxe2x80x99 technology. Echo cancellers model the impulse response of the acoustic echo path and synthesize a replica of the actual echo signal for cancellation.
Echo cancellers come in two varieties. Line or hybrid echo cancellers cancel the echoes which leak through imperfect hybrid devices on the line. Acoustic echo cancellers (AECs), however, cancel the acoustic echo received by the microphone from the loudspeaker. Acoustic echo cancellation is a more involved and complex problem than electrical hybrid echo cancellation for various reasons: (a) the acoustic echo path is affected by any movement within its acoustic surroundings, (b) the length of cancellation required is very long (c), the presence of background acoustic noise in the room (d) and the acoustic echo path often has non-linear components, an example of which may be the loudspeaker. These non-linearities can be significant to the point that they limit the performance of most current echo cancellation schemes.
AECs generally employ adaptive filters to mathematically model and remove the loudspeaker-coupled component from the microphone signal. An adaptive filter is used to provide a linear model that represents the best fit to the unknown impulse response of acoustic echo path. Throughout the history of AEC implementation, the Least Mean Square (LMS) algorithm or Normalized Least Mean Square (NLMS) algorithm has often prevailed as the method of choice, due to its simplicity and low computational requirements. In recent years, as available processing power has increased, algorithms which offer better performance albeit at a higher computational cost have become desirable.
One such algorithm which offers better performance, is the Generalized Multi-Delay Frequency (GMDF) domain adaptive filter. Since the algorithm operates in the frequency domain, a separate domain transformation stage is required. Therefore, some block processing is always necessary before filtering can take place. This introduces throughput delay, which is undesirable, especially in situations where the communications link is already introducing delay. Delay during conversations decreases the amount of perceptibly tolerable echo, which then increases the performance requirements of the acoustic echo canceller.
Fundamental to the user acceptability of handsfree systems is the performance of algorithms for acoustic echo cancellation and noise reduction. For these and other reasons, acoustic echo cancellers continue to be an area of great interest. In particular, issues pertaining to the stability and convergence rate of these algorithms are the subjects of on-going research. The convergence speed is the time required to reach a steady-state mean-squared error variance from algorithm intialization. Increasing the convergence depth and rate of the echo canceller are two contributing factors which will increase the maximum achievable cancellation.
The present invention is an innovative way of performing acoustic echo cancellation in telephone terminals, particularly in handsfree mode, that results in improved performance and reduced processing load. Most speech coding algorithms are based on some variant of linear predictive coding (LPC), and data which has undergone this transformation is in a form more amenable to echo cancellation. Instead of doing echo cancellation in the time domain, the echo canceller is operated in the LPC domain resulting in a process more matched with speech characteristics.
Specifically, a far-end speech signal and the LPC parameters it is constructed from are used in conjunction with an adaptive model of the acoustic echo path between the loudspeaker and microphone to generate estimates of the corresponding echo LPC parameters. The echo LPC parameters are then fed into a standard LPC decoder which synthesizes a real-time estimate of the echo signal. This estimate of the echo signal is subtracted from the microphone signal to isolate the local (near-end) speech. In this manner, the acoustic echo path is not unnecessarily modelled in areas that are not relevant to the speech and will, therefore, not contribute to the speech quality.
Operating an acoustic echo canceller (AEC) on the LPC parameters at the receiver, before the decoding stage offers some important advantages. Firstly, the speech coding process produces a noise-like xe2x80x98excitation sequencexe2x80x99 which, if used as an input to an NLMS algorithm, will speed up the convergence rate. Secondly, the acoustic echo canceller (AEC) and the LPC encoder may share some of the computation processing load since the domain transformation (from time to LPC parameters) is already part of the encoding stage. In addition, an echo code book may be used to store the necessary excitation sequence for the echo cancellation process, reducing the adaptive filtering process to a simple table lookup procedure. Also, the LPC transform data has less parameters and, hence, less taps and can therefore be more efficient, due directly to the reduction in bit rate. As well, LPC space coordinates are based on speech characteristics. Speech input to the LPC transform is, therefore, spectrally broad stimulating the LPC coordinates with a density much more uniform than in a Fourier transform or direct temporal filter models. This leads to faster and more uniform convergence of the LPC echo model. Lastly, the performance available today of noise and echo cancellers operating in the time domain is the result of many years of research and optimization. If such efforts are applied to the present invention, an even increased performance can surely be realized in the future.