Meetings conducted in two separate locations with at least one of the locations involving two or more individuals can be facilitated using an audio conferencing system. Audio conferencing systems typically include some number of microphones, at least one loudspeaker and a base station which is connected to a public communication network. In such a system, microphones can operate to pick up acoustic audio signals (speech) from a near side speaker and transmit the signals to a base station which generally operates to provide session control and to process the audio signals in a number of ways before sending it to a far side communication device to be played by a loudspeaker. Among other things, the base station can be configured with functionality to amplify audio signals, it can regulate microphone signal gain (automatic gain control or AGC) and microphone, suppress noise, and it can automatically remove acoustic echo present in the system.
FIG. 1 is a diagram showing functional elements comprising a commercially available audio conference system 100. The system 100 can be comprised of a number of wireless or wired microphones 11 and 12 respectively, one or more loudspeakers 13, and an audio control and processing device 15. Typically, in such room audio systems, the loudspeakers 13 are wired to the device 15 and the processing device 15 is comprised of complex digital signal processing and audio signal control functionality. The audio signal control can include functionality to automatically control near side audio signal gain, functionality to control microphone sensitivity, and system mode control (duplex/half duplex modes) to name only a few, and the digital signal processing can include automatic echo cancellation (AEC) functionality, residual echo suppression functionality or other non-linear processing, noise cancellation functionality and double talk detection and mitigation.
AEC functionality is an essential element in audio conferencing systems, and it generally operates to remove acoustic echo from a near side audio signal prior to the signal being transmitted to a far side system. Specifically, acoustic echo occurs when a far side audio signal received and played by a near side system is picked up by a near side microphone as acoustic echo. An audio signal generated by the near side microphone that includes the acoustic echo, is then sent to the far end system where the far end talker can hear the echo. This acoustic echo is distracting and can severely degrade the quality of an audio conferencing session if it is not effectively cancelled at the near end audio conferencing system. FIG. 2 is a diagram showing typical prior art AEC functionality that can be implemented in an audio conferencing system 200. The system 200 includes an adaptive filter 210, a summation function 220, a loudspeaker and a microphone. In operation, a far end (F.E.) audio signal is received at the system 200 and sent to both a loudspeaker and to the adaptive filter 210 which operates to, among other things, derive an estimated echo signal which is sent to a summation function 220. The loudspeaker plays the F.E. audio signal and the microphone proximate to the loudspeaker can pick up the acoustic audio signal played by the loudspeaker and send it (microphone signal) to the summation function 220 which operates to subtract the estimated echo from the microphone signal. The output of the summation function 220 is an error signal 230, and this error signal is an input to an adaptive algorithm that operates to update coefficients comprising the adaptive filter. The resultant filter coefficients are an approximation of a transfer function, which models the acoustic environment between the loudspeaker and the microphone. The updated filter coefficients are used to minimize the error signal (which in the absence of any N.E. audio is ideally zero). As long as most of the audio energy in the microphone signal is comprised of F.E. audio, the adaptive filter is able to converge to a solution, which is the minimization of the error signal. However, the adaptive filter 210 may not converge within a reasonable period of time, or may never be able to converge to a solution, if N.E. audio (from a talker proximate to the microphone) is present in the microphone signal with or without F.E. audio also being present. In the case that only N.E. audio is present in a microphone signal, the coefficients associated with the adaptive filter can be frozen or the rate at which the coefficients are calculated can be retarded, this prevents the filter from diverging from a previous solution. Further, in the event that both N.E. and F.E. audio are present in a microphone signal, it is necessary that the filter is able to adapt to cancel any acoustic echo present in a microphone signal, but not attempt to adapt to the N.E. audio component of the signal. In this case, the N.E. audio can be suppressed in some manner, such as the system 100 switching to a half duplex mode of operation in which only the F.E. audio is processed by the adaptive filter. The presence of both N.E. and F.E. audio in a microphone signal is referred to as double talk.
As described above, double talk occurs when a far side talker and a near side talker speak at the same time. If a DT condition is not correctly detected by the audio conferencing system, AEC functionality may not be able to converge to a solution, and acoustic echo can be transmitted back to the far end. Typically, conferencing systems handle double talk by detecting when both a F.E. talker and a N.E. talker are speaking at the same time, and reacting by either preventing a filter from adapting (slowing or freezing the filter coefficients) or by transitioning to a half duplex mode of operation in which near side speech is suppressed. FIG. 3 is a diagram showing an audio conferencing system 300 that includes acoustic echo cancellation functionality 310, a double talk detector (DTD) 320, a loudspeaker 330 and a wireless microphone 340. The DTD 320 generally operates to detect audio signal energy in a F.E. signal, received from a F.E. audio source, and a N.E. audio signal received from the microphone which it uses to determine whether or not the system should enter into the double talk mode of operation.