Increasing interest in communication media, such as the Internet, electronic presentations, voice mail, and audio-conference communication systems, is increasing the demand for high-fidelity audio and communication technologies. Currently, individuals and businesses are using these communication media to increase efficiency and productivity, while decreasing cost and complexity. For example, audio-conference communication systems allow one or more individuals at a first location to simultaneously converse with one or more individuals at other locations through full-duplex communication lines in nearly real time, without wearing headsets or using handheld communication devices.
In many audio-conference communication systems, audio signals carry a large amount of data, and employ a broad range of frequencies. Modern audio-conference communication systems attempt to provide clear transmission of audio signals over a single channel, also called a “monochannel,” free from perceivable distortion, background noise, and other undesired audio artifacts. One common type of undesired audio artifact is an acoustic echo. Acoustic echoes can occur when a transmitted audio signal loops through an audio-conference communication system due to the coupling of a microphone and a speaker at a location. FIG. 1 shows a schematic diagram of an exemplary, two-location, monochannel audio-conference communication system 100. The audio-conference communication system 100 includes a near room 102 and a far room 104. Sounds, such as voices, produced in the near room 102 are detected by a microphone 106, and sounds produced in the far room 104 are detected by a microphone 108. Microphones 106 and 108 convert sounds into signals represented by x(t) and y(t), respectively, where t represents time.
The microphone 106 can detect many different sounds produced in the near room 102, including sounds output by a loudspeaker 114. An analog signal produced by the microphone 106 is represented by:y(t)=s(t)+e(x(t))+v(t)where
s(t) is an analog signal representing sounds produced in the near room 102,
v(t) is an analog signal representing noise, or extraneous signals created by disturbances in the microphone or communication channel 110, that, for example, may produce an annoying buzzing sound output from the loudspeaker 116, and
e(x(t)) is an analog signal representing an acoustic echo.
The acoustic echo e(x(t)) is due to both acoustic propagation delay in the near room 102 and a round-trip transmission delay of the analog signal x(t) over the communication channels 110 and 112. Sounds generated by the analog signal y(t) are output from a loudspeaker 116 in the far room 104. Depending on the amplification, or gain, in the amplitude of the signal y(t) and the magnitude of the acoustic echo e(x(t)), a person speaking into the microphone 108 in the far room 104 may hear, in addition to the sounds carried by the signal s(t), an echo or an annoying, high-pitched, howling sound emanating from the loudspeaker 116 as a result of the sound generated by the acoustic echo e(x(t)). Designers and manufacturers of audio-conference communication systems have attempted to compensate for acoustic echoes in various ways. One compensation technique employs a filtering system that reduces the acoustic echo. Typically, filtering systems employ adaptive filters that adapt to changing conditions at an audio-signal-receiving location.
In recent years there has been an increasing interest in developing multichannel audio communication systems in an effort to enhance the audio-conference experience. Multichannel systems employ a plurality of microphones and loudspeakers in the near and far rooms creating a plurality of acoustic echoes that are each separated by several hundred milliseconds of communication delay, which can be a significant obstacle to effectively deploying multichannel audio-conference communication systems. These methods typically approximate the plurality of echo paths by sending excitation signals to the speakers which produce impulse responses characterizing each of the echo paths in the room. These approximate impulse responses are convolved with the sent signals to produce approximate acoustic echoes that are subtracted from the return signals. A significant challenge in these multichannel systems is spatial correlation of the excitation signals sent to the loudspeakers to approximate the echo paths.
Multichannel echo control has been a challenging problem due to inherent instability caused in approximating echo path impulse responses. In particular, at any time, an acoustic echo cancellation system faces a situation where an infinite collection of candidate approximate impulse responses can be used to remove the echo. Among all these candidate approximate impulse responses, there is just one unique impulse response and a small percentage of good approximately impulse responses. The remaining approximate impulse responses result in an unstable impractical system. This phenomenon is known as the “non-uniqueness problem” and the challenge is how to identify good, approximate impulse responses.
A variety of algorithms have been developed to address the non-uniqueness problem. For example, designers and manufacturers have developed methods that employ nonlinear or time-variant functions to uncorrelate excitation signals prior to exciting the loudspeakers. However, these methods often lead to distortions of spatial and temporal attributes of the audio signals that ultimately diminish the spatial audio experience. Other methods attempt to approximate the space of echo paths by a finite number of set-theoretic constraints. In general, these methods do not distort the excitations signals, but they do not resolve the non-uniqueness problem, and as a result, these methods are slower to converge and have higher levels of residual echoes.
Although in recent years there have been a number of advances in multichannel communications, designers, manufacturers, and users of multichannel, audio-conference communication systems continue to seek enhancements that reliably remove acoustic echoes from audio signals in real-time and rapidly adapt to the changing conditions at audio-signal-receiving locations.