Audio systems having wireless microphones, some number of loud speakers and a base station are typically designed according to the application for which they are intended. One application for such a system is in a meeting room environment, where audio system microphones operate to capture acoustic energy and to send resulting audio signals to a base station that operates to control how and where the audio signals are played. In the case that the audio system is operating locally, the audio signals can be played in the same room or in another room that is local to the audio system. Another application for an audio system is in an audio conferencing environment, where a local audio system (audio conferencing system) receives and processes a local audio signal and transmits this signal over a network (LAN or WAN) to a remote audio system (audio conferencing system).
Audio conferencing systems typically include some number of microphones, at least one loudspeaker and a base station which is connected to a communication network. In such a system, microphones can operate to pick up acoustic audio signals (speech) from a near side speaker and transmit the signals to a base station which generally operates to provide session control and to process the audio signals in a number of ways before sending it to a far side communication device to be played by a loudspeaker. Among other things, the base station can be configured with functionality to amplify audio signals, it can regulate microphone signal gain (automatic gain control or AGC), suppress noise, and it can automatically remove acoustic echo present in the system.
FIG. 1 is a diagram showing components comprising an audio conference system 100. The system 100 can have a number of wireless or wired microphones 120A-120D, one or more loudspeakers 110, and an audio control and signal processing device (base station/server) 105. Typically, the device 105 is comprised of complex digital signal processing and audio signal control functionality. The audio signal control can include functionality to automatically control near side audio signal gain, functionality to control microphone sensitivity, and system mode control (duplex/half duplex modes) to name only a few, and the digital signal processing can include automatic echo cancellation (AEC) functionality, residual echo suppression functionality or other non-linear processing, noise cancellation functionality, and double talk detection and mitigation.
The AEC functionality can be an essential element in both an audio conferencing system and in a room audio system, and it generally operates to remove acoustic echo from a near side audio signal prior to the signal being transmitted to a far side system to be played. Specifically, acoustic echo occurs when a far side audio signal received and played by a near side system is picked up by a near side microphone as acoustic echo. An audio signal generated by the near side microphone that includes the acoustic echo, is then sent to the far end system where the far end talker can hear the echo. This acoustic echo is distracting and can severely degrade the quality of an audio conferencing session if it is not effectively cancelled at the near end audio conferencing system.
FIG. 2 is a diagram showing typical AEC functionality that can be implemented in the audio system 200 that is substantially similar to the audio system 100 described earlier with reference to FIG. 1. The system 200 includes a base station comprising an adaptive filter 210, a summation function 220, a loudspeaker and a microphone. In operation, a far end (F.E.) audio signal is received at the system 200 and sent to both a loudspeaker and to the adaptive filter 210 which operates to, among other things, derive an estimated echo signal which is sent to a summation function 220. The loudspeaker plays the F.E. audio signal and the microphone proximate to the loudspeaker can pick up the acoustic energy played by the loudspeaker and send it as an audio signal (microphone signal) to the summation function 220 which operates to subtract the estimated echo from the microphone signal. The output of the summation function 220 is an error signal 230, and this error signal is used as an input to an adaptive algorithm that operates to update coefficients comprising the adaptive filter. The resultant filter coefficients are an approximation of a transfer function, which models the acoustic environment between the loudspeaker and the microphone. The updated filter coefficients are used to minimize the error signal (which in the absence of any N.E. audio is ideally zero). As long as most of the audio energy in the microphone signal is comprised of F.E. audio, the adaptive filter is able to converge to a solution, which is the minimization of the error signal. However, the adaptive filter 210 may not converge within a reasonable period of time, or may never be able to converge to a solution, if N.E. audio (from a talker proximate to the microphone) is present in the microphone signal with or without F.E. audio also being present. In the case that only N.E. audio is present in a microphone signal, this audio should not be cancelled and so the coefficients associated with the adaptive filter can be frozen or the rate at which the coefficients are calculated can be retarded, this prevents the filter from diverging from a previous solution. Further, in the event that both N.E. and F.E. audio are present in a microphone signal, it is necessary that the filter is able to adapt to cancel any acoustic echo present in a microphone signal, but not attempt to adapt to the N.E. audio component of the signal. In this case, the N.E. audio can be suppressed in some manner, such as the system 100 switching to a half duplex mode of operation in which only the F.E. audio is processed by the adaptive filter. The condition in which both N.E. audio and F.E. audio are present in a microphone signal is referred to as double talk.
FIG. 3 is a diagram showing an audio conference system 300 that includes a base station comprising acoustic echo cancellation functionality 310 and a double talk detector (DTD) 320, a loudspeaker 330 and a wireless microphone 340. The DTD 320 generally operates to detect audio signal energy in a F.E. signal and a N.E. audio signal received from the microphone which it uses to determine whether or not the system should enter into the double talk mode of operation.