Many audio applications use both an audio transducer, such as a loudspeaker, as well as a microphone in the same audio environment. For example, telephone or teleconferencing applications typically employ speakers and microphones in close proximity.
However, acoustic coupling between the loudspeaker and the microphone causes the microphone signal to include elements of the sound rendered from the loudspeaker which is often disadvantageous.
For example, for telephone and teleconference devices, the acoustic coupling between the device's loudspeaker and microphone causes a portion of the produced loudspeaker signal to be captured by the microphone and transmitted back to the far-end user resulting in a disturbance known as an acoustic echo. It is usually assumed that this echo path can be sufficiently modeled using a linear filter since the microphone picks up reflections of the loudspeaker signal with different delays and intensities, depending on the acoustic environment where the device is being used. Therefore, linear echo cancellers are widely used to reduce the acoustic echo.
In practice, however, and depending on the device, components in the acoustic echo path also include the audio amplifier and loudspeaker which often exhibit nonlinear characteristics. Therefore, purely linear echo cancellation tends to be suboptimal and tends to not be able to completely remove the acoustic echo.
The primary causes of nonlinearities in loudspeakers are the non-uniform magnetic flux density and nonlinear suspension system. The latter mainly contributes to distortion at low frequencies, while the former is exacerbated by high-amplitude signals. In effect, large cone excursions, particularly those outside of the loudspeaker's linear operating range, result in nonlinear distortions.
In more detail, the behavior of a loudspeaker system may be considered for different frequency ranges of an input signal. For frequencies above the resonance frequency, the loudspeaker can be characterized by its voice coil's resistance and inductance. Therefore, as the input power to the voice coil increases and the excursions become large enough for the coil to fall outside of the magnetic field, the driving force decreases, resulting in a form of compression or clipping effect.
For low frequencies, the loudspeaker is predominantly characterized by its moving system impedance which is proportional to the power of the magnetic flux. This means that as the voice coil moves outside of the magnetic field, this impedance decreases and therefore instead of clipping, the amplitude of the current in the voice coil actually increases before the loudspeaker's suspension system limits the excursion.
Proper modeling of the nonlinear behavior of loudspeakers remains a challenging topic in the field of acoustic echo cancellation. This is especially true for hands-free communication applications where low-cost audio components such as amplifiers and loudspeakers are used. These components are often driven into their nonlinear range of operation in order to achieve a high sound output level required for such applications. The resulting nonlinear distortion not only limits the performance of acoustic echo cancellers, which usually assume a linear impulse response between the loudspeaker and microphone, but also affects the perceived quality of the loudspeaker signal.
Therefore, systems for managing nonlinear acoustic echo play a significant role in improving the audio quality for two-way communication systems.
In the prior art, three main classes of systems exist for cancelling or suppressing nonlinear acoustic echoes:    1. Nonlinear acoustic echo cancellation.    2. Loudspeaker linearization for linear acoustic echo cancellation.    3. Nonlinear acoustic echo suppression.
In the first type of system, the acoustic echo path nonlinearity is modeled by the acoustic echo canceller. For example, saturation of the audio amplifier can be modeled using a clipping function with a clipping level that matches that of the audio amplifier. If this clipping function is applied to the digital loudspeaker signal, then a standard linear acoustic echo canceller can be used to model the linear acoustic path between the loudspeaker and microphone. As mentioned previously, the loudspeaker is also a source of nonlinearities. Unlike a clipping function which is memoryless, loudspeaker nonlinearities usually contain some form of memory, and are most commonly modeled by a Volterra series expansion which is computationally quite expensive. While low-cost versions of Volterra based algorithms exist, such as the power series expansion, these often still require signal orthogonalization methods which can still be computationally intensive.
A major drawback of the first type of system is that they are required to closely match the model to that of the underlying physical system. This typically cannot be achieved with a high degree of accuracy. Furthermore, they tend to be computationally very intensive.
The second type of system applies a non-linear function to the loudspeaker signal so that the concatenation of this function with that of the loudspeaker's response approximates a linear function, and thus the loudspeaker signal captured by the device's microphone is approximately a linear function of the loudspeaker signal. Accordingly, standard linear adaptive filters can be used to model this linear function and perform acoustic echo cancellation.
A drawback of such an approach is that it can only approximately linearize the loudspeaker's output signal and the performance usually degrades when amplifier saturation also occurs since such a transformation is not easy to linearize.
The third type of system is often used as a post-processing step to acoustic echo cancellation, where residual nonlinear acoustic echoes which could not be suppressed in the echo cancellation stage are suppressed. Usually this suppression is performed in the spectral amplitude domain using a spectral model of echo nonlinearity.
The main drawback of this approach is that due to over-suppression and lack of echo phase information in the spectral amplitude-domain, near-end audio (and specifically speech) originating from the local environment can be severely attenuated which may result in making e.g. full duplex communication with the far-end party difficult.
In general, prior art approaches to echo-cancellation tend to be complex, result in sub-optimal performance and/or high computational resource usage.
Hence, an improved approach would be advantageous and in particular an approach allowing increased flexibility, reduced complexity, facilitated implementation, reduced resource usage and/or improved performance would be advantageous.