Many portable electronics devices, such as interactive video game controllers are capable of handling two-way audio signals. Such a device typically includes a microphone that receives a local speech signal s(t) from a user of the device and a speaker that emits a speaker signal x(t) that is audible to the user. To make the video game controller more compact it is often desirable to place the microphone and speaker relatively close to each other, e.g., within about 10 centimeters of each other. The user, by contrast may be much further from the microphone, e.g., about 3 to 5 meters away. The microphone produces a signal d(t) that includes both the local speech signal s(t) and a speaker echo signal x1(t). In addition, the microphone may pick up background noise n(t) so that the overall microphone signal d(t)=s(t)+x1(t)+n(t). Due to the relative proximity of the speaker, the microphone signal d(t) may be dominated by the speaker echo signal x1(t).
Speaker echo is a commonly observed phenomenon in telecommunications applications and echo suppression and echo cancellation are relatively mature technologies. Echo suppressors work by detecting if there is a voice signal going in one direction on a circuit, and then inserting a great deal of loss in the other direction. Usually the echo suppressor at the far-end of the circuit adds this loss when it detects voice coming from the near-end of the circuit. This added loss prevents the speaker signal x(t) from being retransmitted in the local speech signal d(t).
While effective, echo suppression often leads to several problems. For example it is common for the local speech signal s(t) and the remote speaker signal x(t) to occur at the same time, at least briefly. This situation is sometimes referred to as double-talk. The situation where only the remote speaker signal is present is sometimes referred to as remote single talk. Because each echo suppressor will then detect voice energy coming from the far-end of the circuit, the effect would ordinarily be for loss to be inserted in both directions at once, effectively blocking both parties. To prevent this, echo suppressors can be set to detect voice activity from the near-end speaker and to fail to insert loss (or insert a smaller loss) when both the near-end speaker and far-end speaker are talking. Unfortunately, this temporarily defeats the primary effect of having an echo suppressor at all.
In addition, since the echo suppressor is alternately inserting and removing loss, there is frequently a small delay when a new speaker begins talking that causes clipping of the first syllable from that speaker's speech. Furthermore, if the far-end party on a call is in a noisy environment, the near-end speaker will hear that background noise while the far-end speaker is talking, but the echo suppressor will suppress this background noise when the near-end speaker starts talking. The sudden absence of the background noise gives the near-end user the impression that the line has gone dead.
To address the above problems echo cancellation techniques were developed. Echo cancellation may use some form of analog or digital filter to remove unwanted noise or echoes from an input signal and produce a filtered signal e(t). In echo cancellation, complex algorithmic procedures are used to compute speech models. This involves feeding the microphone signal d(t) and some of the remote signal x(t) to an echo cancellation processor, predicting the speaker echo signal x1(t) then subtracting this from microphone signal d(t). The format of the echo prediction must be learned by the echo cancellation processor in a process known as adaptation.
The effectiveness of such techniques is measured by an echo suppression ratio (ESR), which is just the ratio (typically expressed in decibels) of the true echo energy received at the microphone to residual echo energy left in the filtered signal x1(t). According to standards defined by International Telecommunication Union (ITU), the level of the echo is require an attenuation (ESR) of at least 45 dB in case of remote single talk. During double talk (or during strong background noise) this attenuation can be lowered to 30 dB. However, these recommendations were developed for systems where the user generating the local speech signal is much closer to the microphone, so the recorded SNR (ratio of target voice energy to echo noise energy) is better than 5 dB mostly For applications such as video game controllers, where the user may be 3 to 5 meters away, and a loudspeaker plays loud echoes very close to an open microphone less than 0.5 meter away, the resulting SNR may be less than −15 dB to −30 dB an ESR greater than about 60 dB may be required for remote single talk, and 35 db for double-talk Existing echo cancellation techniques cannot achieve such a high level of ESR.
Thus, there is a need in the art, for an echo cancellation system and method that overcomes the above disadvantages.