Many portable electronics devices, such as interactive video game controllers are capable of handling two-way audio signals. Such a device typically includes a microphone that receives a local speech signal s(t) from a user of the device and a speaker that emits a speaker signal x(t) that is audible to the user. To make the video game controller more compact it is often desirable to place the microphone and speaker relatively close to each other, e.g., within about 10 centimeters of each other. The user, by contrast may be much further from the microphone, e.g., about 3 to 5 meters away. The microphone produces a signal d(t) that includes both the local speech signal s(t) and a speaker echo signal x1(t). In addition, the microphone may pick up background noise n(t) so that the overall microphone signal d(t)=s(t)+x1(t)+n(t). Due to the relative proximity of the speaker, the microphone signal d(t) may be dominated by the speaker echo signal x1(t).
Speaker echo is a commonly observed phenomenon in telecommunications applications and echo suppression and echo cancellation are relatively mature technologies. Echo suppressors work by detecting if there is a voice signal going in one direction on a circuit, and then inserting a great deal of loss in the other direction. Usually the echo suppressor at the far-end of the circuit adds this loss when it detects voice coming from the near-end of the circuit. This added loss prevents the speaker signal x(t) from being retransmitted in the local speech signal d(t). Echo cancellation may implemented as part of a software based on a room impulse model that includes the effects of the acoustics of the room in which the microphone and speaker are located. Such echo cancellation often uses the speaker signal x(t) as a reference.
The Acoustic Echo Cancellation (AEC) process works as follows. The received speaker signal x(t) is digitally sampled to form a reference signal r(t). The reference signal r(t) is then used to produce sound with the speaker. The microphone picks up the resulting direct path sound, and consequent reverberant sound which is converted to the microphone signal d(t). The microphone signal d(t) is digitally sampled and filtered to extract the echo signal x1(t). The reference signal and echo signal are compared. The reference signal is summed with the echo signal at 180° out of phase. Again, in an ideal system this results in a perfect cancellation. This process continues for every sample.
There are two main issues that echo cancellers must deal with. The first is the changes and additions to the original signal caused by imperfections of the loudspeaker, microphone, reverberant space and physical coupling. The second is the changing nature of those changes. See below. The first problem is dealt with by the room model, which models the acoustic space in the time and frequency domains. AEC (Acoustic Echo Cancellation) algorithms approximate the result of the next sample by comparing the difference between the current and previous samples. Specifically, a sound may be sampled pre-speaker and post microphone, then compared for initial differences in frequencies, and frequencies that are longer than they were in the original sample. This may be visualized by a Fourier Transform. The resulting information may then be used to predict how the next sound will be altered by the acoustic path. The model of the acoustic space (sometimes referred to as the room model) is therefore continually updated.
The changing nature of the sampled signal is mainly due to changes in the acoustic environment, not the characteristics of loudspeaker, microphone or physical coupling. These are from moving objects in the environment, and movement of the microphone within that environment. For this reason, the cancellation algorithm also has a degree of aggressive adaptation called Non-Linear Processing. This allows the algorithm to make changes to the model of the acoustic path that are suggested, but not yet confirmed by comparison of the two signals.
Echo cancellation algorithms can tolerate a certain amount of latency between the speaker signal x(t) and the microphone signal d(t). However, this latency may sometimes be larger than anticipated.
It is within this context that embodiments of the present invention arise.