This invention relates to a telephone employing sub-band analysis and synthesis for echo cancellation and noise reduction and, in particular, to a control circuit that utilizes a plurality of voice activity detector (VAD) circuits in the sub-bands for controlling the operation of the telephone.
As used herein, “telephone” is a generic term for a communication device that utilizes, directly or indirectly, a dial tone from a licensed service provider. As such, “telephone” includes desk telephones (see FIG. 1), cordless telephones (see FIG. 2), speaker phones (see FIG. 3), hands free kits (see FIG. 4), and cellular telephones (see FIG. 5), among others. For the sake of simplicity, the invention is described in the context of telephones but has broader utility; e.g. communication devices that do not utilize a dial tone, such as radio frequency transceivers, and audio processing circuits that perform complex wave analysis, such as geophones and electronic stethoscopes.
As understood by those of ordinary skill in the relevant art, a voice activity detector (VAD) is an algorithm or circuit that distinguishes between speech (often accompanied by noise) and noise only. The output from a VAD is typically a single binary bit that indicates whether or not the input signal contains speech; see for example, “Voice Activity Detection in Noisy Environments” by Stadermann, Stahl, and Rose, Eurospeech 2001 Scandanavia or U.S. Patent Application Publication 2003/0093268 (Zinser, Jr. et al.) paragraph [0303].
The detector described herein is referred to as a voice activity detector but is not limited to just that function. As will be apparent from a complete understanding of the invention, the detector can be adjusted to sense intelligence or patterns of various kinds, e.g. fax signals, not just voice signals. Calling the detector a “message” activity detector or a “communication” activity detector is not more clear than the more familiar term of voice activity detector and, therefore, these other terms are not used.
As well known to those of ordinary skill in the art, a double talk detector requires at least two signals for inputs and distinguishes one voice from another voice (as opposed to distinguishing a voice from noise); see for example Benesty et al. Advances in Network and Acoustic Echo Cancellation, Springer-Verlag©, 2001, Chapter 6 “A Fast Normalized Cross-Correlation DTD Combined with A Robust Multichannel Fast Recursive Least-Squares Algorithm,” or Gay and Benesty, Ed. Acoustic Signal Processing for Telecommunication, Kluwer Academic Publishers© 2000, Chapter 5 “Double Talk Detection Schemes for Acoustic Echo Cancellation.”
Virtually since the invention of the telephone, techniques have been developed to improve the clarity of the sound reproduced at each station. There are a number of techniques but two are of particular interest. A first technique uses what is known as a sub-band analysis and synthesis, of which complementary comb filters i.e. a plurality of filters wherein band pass filters alternate with band stop filters, are an example. Comb filters with complementary pass bands and stop bands are coupled in the two audio channels connecting the two stations of a telephone call. That is, the pass bands in one channel are the stop bands in the other channel. As a result, a signal traveling in one direction will be slightly attenuated but a signal traveling in a loop, i.e. an echo, will encounter both sets of stop bands and be highly attenuated.
The use of the complementary comb filters reduces the acoustic coupling between the speaker and microphone at each station as well as inter-station or line echo. Echo canceling circuits, which try to recognize a delayed signal as an echo, are much more complicated than complementary comb filter circuits and the two are often used together to eliminate echoes and other noises. However, comb filters degrade the quality of speech and do not always provide a sufficient margin of acoustic stability. One reason for the degradation is that the frequency response of a room in which the microphone and speaker of a station are located is characterized by a large number of resonant peaks. The band transitions in the comb filter transfer functions are often not sharp enough to suppress the resonant peaks, because if the transitions are too sharp the quality of the transmitted audio signal is adversely affected.
Complementary comb filter circuits are disclosed in U.S. Pat. No. 5,386,465 (Addeo et al.). This patent includes complementary comb filters in combination with other apparatus for processing audio signals to reduce noise. U.S. Pat. No. 4,991,167 (Petri et al.) discloses a slightly different system, illustrated in FIG. 6. Signals in the transmit direction are separated by filter block 11 into a set of bands, each including an attenuator, such as attenuator 12. Similarly, signals in the receive direction are separated by filter block 13 into the same set of bands, each including an attenuator, such as attenuator 14. The signals in the corresponding transmit band and receive band are compared, such as in comparator 15. The band with the smaller signal is attenuated by control circuit 16. Thus, the transmit and receive bands are paired and there is no logic interconnecting the control circuits for each pair.
Another variation on the comb filter is disclosed in U.S. Pat. No. 3,567,873 (Peroni), illustrated in FIG. 7. In this patent, the receive signal is passed through a filter bank, represented by filters 21, 22, 23, 24, and 25. The signals in each sub-band is compared with a threshold in level detectors 26, 27, 28, 29, and 30, respectively. Relays 31, 32, 33, 34, and 35 close their respective contacts for each band of the received signal that exceeds its threshold. In an alternative embodiment, a second set of contacts is included in the receive channel and operated oppositely from the first set from the first set. As with the Petri patent, there is no control logic looking at all the sub-bands in both channels.
A problem with these approaches is that, unlike complementary comb filters, one can attenuate the signals in adjacent bands, thereby noticeably degrading the quality of the voice transmission. If the signal in one channel is particularly loud, the telephone is reduced to “half duplex” or simplex operation, i.e. single direction at a time because sounds from the other station are inaudible. The person speaking must stop and the circuits must re-settle before a person at the other station can be heard.
U.S. Pat. No. 6,798,881 discloses the system illustrated in FIG. 8. Transmitting channel 41 and receiving channel 42 operate independently except for control 40, which controls each variable gain amplifier, to which it is connected by a dashed line. When the circuit is first turned on, each variable gain amplifier is set to unity gain. At unity gain, a signal on input 43 is divided into a plurality of bands by the band pass filters and then recombined, unaffected, in summing circuit 44. Similarly, a signal on input 46 is divided into a plurality of bands by the band pass filters and then recombined, unaffected, in summing circuit 47.
The output of each band pass filter is also coupled to a detector, such as detector 51 at the output of filter 52. Detector 51 senses when the power of the signal from filter 52 briefly exceeds a threshold and provides a suitable signal to control logic 40. Detectors, such as detector 53, sense when the power of the signal exceeds a threshold for a longer period and provide a suitable signal to control logic 40. Control logic 40 analyzes the information from all inputs and controls the attenuators accordingly. In particular, echoes are reduced by controlling the attenuators in one channel in accordance with the amplitude of the signal in a corresponding band in the other channel. Background noise is reduced by attenuating the signals in a channel in accordance with the amplitude of the signals in each band of that channel. Adjacent bands in a channel may not be attenuated fully, i.e. set to minimum gain/maximum attenuation. Secondly, maximum attenuation does not take place in the same band in both channels. In general, control logic 40 operates to minimize background noise and echo. It is desired to improve the control of the signal level in each channel to prevent, to the extent possible, half duplex operation.
Anyone who has used a typical speaker telephone is well aware of the cut off speech and the silent periods during a conversation caused by echo canceling circuitry. Such telephones operate in what is known as half-duplex mode, which means that only one person can speak at a time. While such silent periods assure that the sound from the speaker is not coupled directly into the microphone within a speaker telephone, the quality of the call is poor.
Whether or not to receive (listen) or transmit (talk) is not easily resolved in the particular application of telephone communication. Voices may overlap, so-called “double talk,” particularly if there are more than two parties to a call. Background noise may cause problems if the noise level is a significant percentage of the voice level. Pauses in a conversation do not necessarily mean that a person is finished speaking and that it is time for someone else to speak. A voice signal is a complex wave that is discontinuous because not all speech sounds use the vocal chords. Analyzing a voice signal in real time and deciding whether or not a person has finished speaking is a complex problem despite the ordinary human experience of doing it unconsciously or subconsciously. A variety of electronic systems have been proposed in the prior art for arbitrating send or receive but the problem remains.
U.S. Pat. No. 4,796,287 (Reesor et al.) discloses a speaker telephone in which a decremented counter provides a delay to channel switching by the remainder of the circuit. The magnitudes of the line signal and the microphone signal are used in determining whether or not to switch channels.
U.S. Pat. No. 4,879,745 (Arbel) discloses a half-duplex speaker telephone that controls the selection of either a transmit or a receive audio path based upon a present state of the speaker telephone and the magnitudes of three variables associated with each path. The three variables for each path include signal power, noise power, and worst-case echo.
U.S. Pat. No. 5,418,848 (Armbrüster) discloses a double talk detector wherein an evaluation circuit monitors voice signals upstream and downstream of echo canceling apparatus for detecting double talk. An up-down counter is incremented and decremented at different rates and a predetermined count is required before further signal processing takes place.
U.S. Pat. No. 5,598,466 (Graumann) discloses a voice activity detector including an algorithm for distinguishing voice from background noise based upon an analysis of average peak value of a voice signal compared to the current number of the audio signal.
U.S. Pat. No. 5,692,042 (Sacca) discloses a speaker telephone including non-linear amplifiers to compress transmitted and received signals, and level detectors to determine the levels of the compressed transmitted and received signals. The compressed signals are compared in a comparator having hysteresis to enable either transmit mode or receive mode.
U.S. Pat. No. 5,764,753 (McCaslin et al.) discloses a double talk detector that compares the send and receive signals to determine “Return Echo Loss Enhancement,” which is stored as a digital value in a register. The digital value is adjusted over time and is used to provide a variable, rather than fixed, parameter to which new data is compared in determining whether to send or receive.
U.S. Pat. No. 5,867,574 (Eryilmaz) discloses a voice activity detection system that uses a voice energy term defined as the sum of the differences between consecutive values of a speech signal. Comparison of the voice energy term with threshold values and comparing the voice energy terms of the transmit and receive channels determines which channel will be active.
U.S. Pat. No. 6,138,040 (Nicholls et al.) discloses comparing the energy in each “frame” (thirty millisecond interval) of speech with background energy to determine whether or not speech is present in a channel. A timer is disclosed for bridging gaps between voiced portions of speech.
Typically, these systems are implemented in digital form and manipulate large amounts of data in analyzing the input signals. The Sacca patent discloses an analog system using an amplifier with hysteresis to avoid dithering, which, to a large extent, is unavoidable with a simple amplitude comparison. On the other hand, an extensive computational analysis to determine relative power takes too long. The Eryilmaz patent attempts to simplify the amount of computation but still requires manipulation of significant amounts of data. All these systems manipulate amplitude data, or data derived from amplitude, up to the point of making a binary value signal indicating voice.
One can increase the speed of a system by reducing the amount of data being processed. Unfortunately, this typically reduces the resolution of the system. For example, all other parameters being equal, eight bit data is more quickly processed than sixteen bit data. The problem is that resolution is reduced. In an acoustic environment, the quality or fidelity of the audio signal requires a minimum amount of data. Thus, the problem remains of speeding up a system other than by simply increasing the clock frequency.
Some of the prior art systems use historical data, e.g. three occurrences of what is interpreted as a voice signal. Such systems require large amounts of memory to handle the historical voice data and the current voice data.
Voice detection is not just used to determine transmit or receive. A reliable voice detection circuit is necessary in order to properly control echo canceling circuitry, which, if activated at the wrong time, can severely distort a desired voice signal. In the prior art, this problem has not been solved satisfactorily.
In view of the foregoing, it is therefore an object of the invention to provide an improved method and apparatus for controlling echo cancellation and noise reduction in a telephone.
Another object of the invention is to provide a method and apparatus for controlling a telephone to minimize half duplex operation during a call.
A further object of the invention is to provide a circuit having dynamically adjustable thresholds for analyzing energy content of a speech signal.
Another object of the invention is to provide a voice activity detector that does not require large amounts of data for reliable detection of a voice signal.