Tele-conferencing plays an extremely important role in communications today. The teleconference, particularly the telephone conference call, has become routine in business, in part because teleconferencing provides a convenient and inexpensive forum by which distant business interests communicate. Internet conferencing, which provides a personal forum by which the speakers can see one another, is enormously popular on the home front, in part because it brings together distant family and friends without the need for expensive travel.
In a teleconferencing system, the sounds present in a room, hereinafter referred to as the "near-end room" such as those of a near-end speaker are received by a microphone, transmitted to a "far end system" and broadcast by a far-end loudspeaker. Similarly, the far-end speaker is received by the far-end microphones and transmitted to the near-end system, and broadcast by the near-end loudspeaker. The near-end microphone receives the broadcasted sounds along with their reverberations and transmits them back to the far-end, together with the desired signals generated by, for example, speakers at the near-end, thereby resulting in a disturbing echo heard by the speaker at the far-end. The far-end speaker will hear himself after the sound has traveled to the near-end system and back, thereby resulting in a delayed echo which will annoy and confuse the far-end speaker. The problem is compounded in video and internet conferencing systems where the delay is more extremely pronounced.
The simplest way to overcome the problem of echo is by blocking the near-end microphone while the far-end signal is broadcast by the near-end loudspeaker. Sometimes referred to as "ducking", the technique of blocking the microphone is effectively a half-duplex communication. Problematically, if the microphone is blocked for a prolonged period to avoid transmission of the reverberations, the half-duplex communication becomes a significant drawback because the far-end speaker will lose too much of the near-end speaker. In the video or Internet conferencing system, where the delay created by the communication lines is extreme, ducking becomes quite annoying.
A more complex method to avoid echo is to employ an echo canceling system which measures the signals send from the far-end and broadcast it the near-end loudspeaker, estimates the resulting signal present at the near-end microphone (including the reverberations) and subtracts those signals representing the echo from the near-end microphone signals. The echo-free signals are then transmitted back to the far-end system.
In order to reduce the echo from the near-end microphone signal, it is required to obtain the transfer function that expresses the relationship between the near-end loudspeaker signal and the reverberations as they actually appear at the near-end microphone. This transfer function depends on the relative position of the near-end loudspeaker to the near-end microphone, the room structure, position of the system and even the presence of people in the room. Since it is impossible to predict these parameters a priori, it is preferred that the echo-canceling system updates the transfer function continuously in real time.
The adaptation process by which the echo-canceling system is updated in real time may be an LMS (least means square) adaptive filter (Widrow, et al., Proc. IEEE, vol. 63, pp. 1692-1716, Proc. IEEE, vol. 55, No. 12, December 1967) with the far-end signal used as the reference signal. The LMS filter estimates the interference elements (echoes) present in the interfered channel by multiplying the reference channel by a filter and subtracting the estimated elements from the interfered signal. The resulting output is used for updating the filter coefficients. The adaptation process will converge when the resulting output energy is at a minimum, leaving an echo-free signal.
Important to the adaptation process is the selection of the size of the adaptation step of the filter coefficients. In the standard LMS algorithm the step size is controlled by a predetermined adaptation coefficient, the level of the reference channel and the output level. In other words, the adaptation process will have bigger steps for strong signals and smaller steps for weaker signals.
A better behaved system is one in which its adaptation steps are independent of the reference channel levels. This is accomplished by normalizing the adaptation coefficient by the reference channel energy, this method is called the Normalized Least Mean Square (NLMS) as, for example, described in see for example "A Family of Normalized LMS Algorithms", Scott C. Douglas, IEEE Signal Processing Letters, Vol. 1, No. 3, March 1994. It should be noted that the energy estimator, if not designed properly, may fail to track when large and fast changes in the level of the reference channel occur. Thus, the normalized coefficient may be too big during the transition period, and the filter coefficient may diverge.
Another problem is that the adaptive process feeds the output back to determine the new filter coefficients. When the interfering elements in the signal are less pronounced than the non-interfering signal, there is not much to reduce and the filter may diverge or converge to a wrong value which results in signal distortions.
When properly converged, the adaptive filter actually estimates the transfer function between the far-end loudspeaker signal and the echo elements in the main channel. However, changes in the room will effect a change in the transfer function and the adaptive process will adapt itself to the new conditions. Sudden or quick changes, in particular, will take the adaptive filter time to adjust for and an echo will be present until the filter adapts itself to the new conditions.
In order to improve the audio quality, sometimes a number of microphones are used instead of a single one. This system either selects a different microphone each time someone is speaking in the room or creates a directional beam using a linear combination of microphones. By multiplexing the microphones or steering the directional audio beam, the relationship between the loudspeaker signal and the audio signal obtained by the microphones can be changed. Problematically, each time such a transition takes place, an echo will "leak" into the system until the new condition has been studied by the adaptive filter. To allow the use of a steerable directional beam and prevent the transient echo, one can either perform continuous echo canceling on each of the microphones separately or on each of the microphone combinations (the combinations of microphones could be infinite). However, the increase in the computation load required to perform numerous echo-canceling systems concurrently on each of the microphones or allowable beams is not realistic.
An efficient echo-canceling system is needed which will reduce the echo drastically. However, because of the large dynamic ranges required by the microphone to be able to pick up very low voices, the microphone will most likely pick up some of the residual echo as well. The residual echo is most disturbing when no other signal is present but less noticed when a full duplex discussion is taking place.
Another problem typical to multi-user conferencing systems is that the background noise from several systems is transmitted to all the participating systems and it is preferred that this noise be reduced to a minimum. The beam forming process reduces the background noise but not enough to account for the plurality of systems.