The present invention relates generally to signal processing, and more specifically to techniques for canceling acoustic echo and suppressing noise using array microphone.
Full-duplex hands-free communication systems are commonly used for many applications, such as speakerphone, hands-free car kit, teleconferencing system, cellular phone, hands-free voice recognition devices, and so on. For each of these systems, one or more microphones in the system are used to pick up an acoustic signal emitted by a speaking user, which is then processed and transmitted to a remote user. However, the microphones may also pick up undesirable reflections of the acoustic signal from the borders of an enclosure, such as a room or a car compartment. The propagation paths for the reflections may change due to various factors such as, for example, movement of the microphones, loudspeaker, and/or speaking user, volume changes on the loudspeaker, and environment changes. As a result, the electro-acoustic circuit in the system may become unstable and produce howling, which is highly undesirable.
In the case of a telecommunication system, a speech signal from a remote speaking user is outputted from a loudspeaker, and portions of this speech signal may be reflected to the microphones and transmitted back to the remote user. This acoustic disturbance is referred to as echo. In general, users are annoyed by hearing their own voice delayed, for example, by the path of the system.
Echo cancellation is often required in many communication systems to suppress echo as well as to avoid howling effects. For example, echo cancellation is typically used in full-duplex communication environments where the speaker and microphone may be located some distance away from a user. Examples of such environments include hands-free speakerphone (e.g., in a vehicle or a room), Internet/Intranet Protocol phone, and so on.
Conventionally, echo cancellation is achieved by a circuit that employs an adaptive filter. The adaptive filter performs echo cancellation by deriving an estimate of the echo based on a reference signal, which may be a line output from a communication or telematics device such as a cellular phone or some other device. The adaptive filter is typically able to remove the portion of the echo that is correlated to the reference signal.
However, conventional echo cancellation techniques are not able to remove certain portions of the echo. For example, nonlinearity of the circuitry in the system (e.g., the speaker, analog-to-digital (A/D) converter, digital-to-analog (D/A) converter, and so on) generates echo that is not correlated to the reference signal. This type of echo cannot be canceled by conventional echo cancellation techniques that employ only an adaptive filter. Moreover, user movement, position changes in the microphones and loudspeakers, and volume changes can cause the echo path to vary. This results in time-varying echo that typically cannot be canceled very well, particularly if the echo path changes faster than the convergence rate of the adaptive filter.
Nonlinear echo cancellation techniques may be used to attempt to cancel the residual echo that is not canceled by the adaptive filter in the echo canceller. However, these techniques typically cannot cancel echo due to serious nonlinearity. Nonlinear echo may be caused by various conditions such as an overdriven loudspeaker, a microphone in saturation, mechanical vibration, and so on. These techniques also cannot handle high volume echo. Moreover, some conventional nonlinear echo cancellation techniques, such as a center clipper, can cause voice distortion by cutting off low power voice signal. Other conventional nonlinear echo cancellation techniques, such as conventional post filters, also cannot deal with large echo and serious nonlinearity.
Many communication systems and voice recognition devices are designed for use in noisy environments. Examples of such applications include communication and/or voice recognition in cars or mobile environments (e.g., on street). For these applications, the microphones in the system pick up not only the desired voice but noise as well. The noise can degrade the quality of voice communication and speech recognition performance if it is not dealt with in an effective manner.
Noise suppression is often required in many communication systems and voice recognition devices to suppress noise and to improve communication quality and voice recognition performance. Noise suppression may be achieved using various techniques, which may be classified as single microphone techniques and array microphone techniques.
Single microphone noise reduction techniques typically use spectral subtraction to reduce the amount of noise in a noisy speech signal. With spectral subtraction based techniques, the power spectrum of the noise is estimated and then subtracted from the power spectrum of the noisy speech signal. The phase of the resultant enhanced speech signal is maintained equal to the phase of the noisy speech signal so that the speech signal is minimally distorted. The spectral subtraction based techniques are effective in reducing stationary noise but are not very effective in reducing non-stationary noise. Moreover, even for stationary noise reduction, these techniques can cause distortion in the speech signal at low signal-to-noise ratio (SNR).
Array microphone noise reduction techniques use multiple microphones that are placed at different locations and are separated from each other by some minimum distance to form a beam. Conventionally, the beam is used to pick up speech that is then used to reduce the amount of noise picked up outside of the beam. The array microphone techniques can suppress non-stationary noise but are not efficient in reducing noise in a reverberant environment (i.e., diffuse noise).
For many applications, noise may continually vary and may further change dramatically dues to changes in the environment. Moreover, different applications may be associated with different type and amount of noise. For example, the noise in a car at high speed will likely be different and higher than the noise in conference room. Since different noise reduction techniques are effective at dealing with different types of noise and since different applications may be associated with different types and levels of noise, it is normally difficult to obtain good performance for a wide range of environments and noise conditions based on a single specific noise suppression technique and a single set of parameter values.
As can be seen, techniques that can effectively cancel echo and suppress noise in communication systems and voice recognition devices are highly desirable.