1. Field of the Invention
This invention relates to improvements in communication systems, and more particularly to improvements in speaker-phone systems, and still more particularly to improvements in methods and apparatuses for detecting the presence of speech in a speaker-phone system, or the like.
2. Relevant Background
Speaker-phone systems in widespread use are systems by which individuals communicate over telephone, intercom, radio, or other transmission media, essentially "hands free." In a typical speaker-phone system, both communication ends have a transmitter to translate a voice or other sound to be transmitted into electrical signals for transmission, normally using a microphone, and a receiver to translate the received signals into sound for listening, normally using a speaker. As a matter of convention, a remote end of the system is referred to herein as the "far-end," and a close end is referred to as the "near-end."
Although most speaker-phone systems use a half duplex mode of operation, in which only one speaker can talk at any time, some speaker-phone systems use either a full duplex or a "pseudo-full" duplex mode of operation, in which both communication ends transmit and receive simultaneously. However, in full and pseudo full duplex systems if the speaker and microphone are positioned too closely together or to common reflecting surfaces, such as walls, or the like, or if the speaker volume is set too high, a portion of the received signal is fed back into the transmitting path, often causing unwanted echos of varying magnitude. In some cases, the systems may oscillate or squeal, particularly if similar conditions exist at both the near and far ends of the system.
This problem has been addressed in various ways. For example, most speaker-phone systems use an echo cancellation circuit at each end of the system. A typical echo cancellation unit, for instance at the near-end of the system, processes a signal that has been received from the far-end in a far-end speech module, which computes the power of the received far-end signal before passing the received signal to the speaker or audio circuitry. Usually, the far-end speech module calculates the power over both short and long-term time windows. The echo canceler unit also receives the signal that is being transmitted from the near end, and processes it in a near-end speech module, which computes the power of the near-end signal before passing it to the transmitter circuitry. Usually, the near-end speech module also calculates the power over both short and long-term time windows.
The power computations of the far and near end speech modules are then processed in a control logic module, which, depending upon the relative power ratios that are calculated, modifies the characteristics of line and acoustic echo cancellation circuits. For example, if the ratio of the short term power to long term power exceeds a predetermined threshold, the control logic module determines that far-end speech exists. If far-end speech is determined to be present, a further comparison is made between the relative short term powers that are computed to determine if near-end speech also is present.
Proper analysis of the far-end and near-end speech signals enables the control logic to accurately discriminate between four possible modes of speaker-phone operation: idle, far-end speech only, near-end speech only, and double talk, in which both near-end and far-end speech occur simultaneously. The control logic module then uses operating mode information to control the echo canceler circuit adaptation process and to switch losses into the loop, for example by modifying the operation of the near-end and far end speech units to maintain an overall loop gain at less then 0 dB. Additionally, the determinations generally are used to adjust the thresholds of the received and transmitted signals, so that a particular volume of near-end audio is required to initiate a particular transmission and attenuate the received signal.
Echo cancellation circuits generally operate in one of three modes. When near-end speech that exceeds a predetermined threshold is detected, one mode is to merely continue the same transmitter and receiver parameter values as existed just before the time at which the near-end speech was detected. Another mode of operation is to selectively switch the received signals off and on, so that, for instance, when near-end speech is detected, the transmitter is switched on and the receiver is switched off. This is probably the most widely used technique. The third mode, which is used in more sophisticated full duplex systems, is to modify or apportion attenuation between the received and transmitted signals to control the overall loop gain to a constant value, less than 0 dB, thereby avoiding the undesired oscillations or squeals described above.
Thus, in operation, when a near-end and far-end connection is established, an automatic initialization and calibration of the near-end and far-end units is generally performed. In the initialization and calibration procedure, the loop attenuation is calibrated and the operating parameters of the echo cancellation circuits are determined and adjusted, depending upon the particular environmental and background noise conditions existing at the time of calibration. However, even when such circuits, processing techniques, and echo cancellation procedures are used, the problems described above are not totally eliminated. For example, if one user suddenly changes the volume of his speaker, or if a noise source, such as a fan, suddenly starts, most speaker-phone systems still have a tendency to squeal, at least until a new automatic initialization and calibration procedure can be re-performed.
It can be seen that this approach and its variants may lead to incorrect decisions regarding the particular operating mode in which the speaker-phone system should be operating. The thresholds described above are commonly determined empirically, and are not robust to changes in environmental conditions, often resulting in a poorly performing speaker-phone system.
What is needed is a method and apparatus to accurately detect the presence of near-end speech to assure quality of transmission and to control feedback of the received far-end signal.