1. Field of the Invention
The present invention relates to speakerphone technologies and, more particularly, to a speakerphone system provided with a scheme for minimizing the effects of echo and gains introduced to allow hands-free telephone operation.
2. Description of Related Art
Speakerphones are widely used today. The ability to use a telephone set without occupying the user's hands, and also enable multiple parties to participate in the telephone call without requiring conference-type calling has been found to be a significant advantage over conventional hand-held telephone systems.
A speakerphone defines a speaker telephone which includes a microphone and a loudspeaker to enable telephone communications without a conventional telephone handset. Typically, the microphone and loudspeaker in the speakerphone are contained in the same physical structure and are thus in close proximity to each other. In many instances, however, such as in desk-top PC applications, the speakerphone electronics cannot assume that the placement of the microphone and the loudspeaker, and thus the acoustic echo path, is fixed. Thus, if the microphone is physically separable from and independently moveable relative to the loudspeaker assembly, the user may want to move the microphone relative to the loudspeaker during the conversation.
As a consequence of such close and variable proximity, speakerphones are plagued with certain problems that are inherent in the simultaneous use of the microphone and loudspeaker which comprise the speakerphone. Significant problems with speakerphone clarity and stability, e.g., disruption of the conversation due to howling, in full-duplex speakerphones are associated with acoustic coupling between the microphone and the loudspeaker. In a full-duplex system, simultaneous two-way communication is enabled where the local user can speak and listen to received speech simultaneously with the remote user. Such simultaneous conversation, however, creates acoustic feedback problems which occur when the speech received by the loudspeaker at the local end is picked up by the local microphone and directed back to the remote end. As a result, the remote party may hear a strong echo of his or her own voice. If this acoustic cycling continues, the voice quality and conversation will be distorted and significantly degraded, causing the system to become unstable, i.e., howling will occur. Of course, if the remote end user is also using a speakerphone in a back-to-back arrangement, the acoustic feedback problems are magnified.
Thus, some of the problems associated with acoustic and electrical feedback are echo, as mentioned above, howling, and gain switching, among others. For example, with regard to problems with echo, although conventional echo cancellers are generally used to reduce echo in the speakerphone performance, echo cancellation alone is not always adequate in limiting the total loop gain to less than unity to limit the positive feedback loop and, therefore, maintain system stability.
The loop gain refers to the total resultant gain of the voice signal as it passes through the various components of the speakerphone. The gain loop typically includes any speakerphone components from the microphone to the transmit channel to the remote telephone system, back through the receive channel to the loudspeaker and acoustically coupled to the microphone. Some of the major internal components of the speakerphone includes echo cancellers, such as an acoustic echo canceller (AEC) and a line echo canceller (LEC), and a voice control processor.
One type of echo canceller, e.g., an acoustic echo canceller (AEC), typically comprises a plurality of adaptive filters associated with the microphone and loudspeaker which estimate the impulse response between the microphone and loudspeaker. Another echo canceller, e.g., line echo canceller (LEC), may be implemented across the transmit and receive channels to cancel the electric reflection of signals generated by an impedance mismatch in the telephone network interface circuitry.
For each impulse response of the echo paths, an estimate of the echo is determined and subtracted from the incoming speech signal. The adaptive filters are generally included in a digital signal processing (DSP) device or other programmable processor, and are defined by a variety of algorithms that affect and determine their real-time performance, i.e., the speed necessary to converge to the echo path impulse response and accuracy of the estimation process. The algorithm coefficients are continuously adapted to represent the impulse response between the loudspeaker and microphone or the impulse response between the transmit channel and the receive channel of the network interface.
If the echo canceller impulse response accurately matches that of the echo path, the echo will be canceled. However, due to conventional device limitations, e.g., for a finite-bit resolution device, inaccuracy in coefficients exists, such that 100% cancellation can rarely, if ever, be achieved. In addition, any changes in echo path will cause the current estimated impulse response to deviate from its real one. Before the echo canceller can recognize and compensate for the change, and thus reconverge itself, a larger residue echo will be present in the system.
Moreover, in full-duplex speakerphones, AEC and LEC are typically situated adjacent each other to cancel acoustic and electrical echoes. Consequently, the AEC and the LEC must be precisely controlled so that their coefficients are adapted only in receive and transmit modes, respectively. The coefficient adaptation process is limited to receive and transmit modes because (1) the room impulse response is modeled only with the receive signal, and the local talker signal can disturb the process, and (2) the network interface impulse response is modeled only with the transmit signal, and the remote talker signal can disturb the process. Therefore, it is crucial to maintain awareness of the continuous changes in the voice signal and the system parameters.
As speakerphone use becomes more commonplace, back-to-back speakerphone system performance is of greater concern. Thus, the voice control processor must consider such arrangements to maintain complete, end-to-end speakerphone performance. Some speakerphones, however, are only directed to local speakerphones which communicate with remote handsets, rather than remote speakerphones. Accordingly, the voice control processor must be able to handle such situations.
The voice control processor is the central control of the complete speakerphone system. It should include speech detectors and loop gain control. The speech detectors determine the communication mode which, in turn, controls the adaptation process of the echo cancellers, as mentioned above. Current speech detectors, however, are not sufficiently sensitive to low level voice signals or are inadequate in speedy detection of double talk or falsely detect noise as speech. An example of such a speech detector uses a simple comparator to compare the transmit and receive signal levels and assert a detection signal if the receive level is greater than the transmit level. It is known that depending on the strength of acoustic coupling and the telephone line loss characteristics, the transmit level can be many times greater than the receive level during much of the conversation. In such cases, speech detection would thus be too slow or too late to detect the receive signal for correct channel gain adjustment. Moreover, this late decision would cause the echo canceller to drift away from its converged impulse response and lose some of the cancellation performance when it is desperately needed. This kind of speech detectors also false detect the echo path change and thus delay the echo canceller's convergence process.
As mentioned earlier, echo cancellers also cannot always maintain the optimum cancellation performance. Consequently, some gain switching must be applied to the system to maintain system stability. Furthermore, in speakerphone systems which utilize automatic gain control, the total loop gain can change abruptly due to sharp changes in the input signal. The loop gain scheme must be capable of adequately compensate for the sudden changes in the signal as well as the echo strength. Conventional speakerphone systems, however, greatly simplify the loop gain scheme to result in unnatural voice conversation, degradation of voice level, or temporal system instability.
As described above, accuracy, speed, and smoothness of gain switching are also necessary to system performance. Depending upon the desired transmit and receive signal levels, the gain can be adjusted to increase the signal level by applying a multiplier greater than one to the signal. Likewise, the signal can be decreased, or attenuated, by multiplying the signal with a gain value of less than one. The speaker device determines the optimum gain to apply to both the transmit channel and the receive channel via a variety of gain calculation algorithms.
In a full-duplex speakerphone system, typically four different conversation modes can exist. These modes may include (1) silence mode (no conversation at local or remote ends); (2) transmit mode (local user is active, remote user is silent); (3) receive mode (local user silent, remote user active); and (4) double-talk mode (simultaneous two-way local and remote communication). Due to the above-described problems of echo and howling, when switching from one communication mode to another, smooth gain switching must be applied to ensure good voice quality, as well as system stability. Without understanding the relationship between the communication mode switching and the corresponding gain switching requirement, speech clicking, syllable chopping, or transient echo may be heard as in many conventional speakerphones.
In summary, sensitive and accurate speech detection, well-designed echo cancellers, sophisticated loop gain processing, and smooth gain switching process are some of the key factors to making a fully-working full-duplex speakerphone.