This invention relates to a voice activity detector and, in particular, to a circuit that provides a stable indication of voice activity for use in communication systems, such as speaker phones and other applications.
The detector described herein is referred to as a voice activity detector but is not so limited in function. As will be apparent from a complete understanding of the invention, the detector can be adjusted to messages of various kinds, e.g. fax signals, not just voice signals. Calling the detector a “message” activity detector or a “communication” activity detector is not more clear than the more familiar term of voice activity detector and, therefore, these terms are not used.
Anyone who has used current models of speaker phones is well aware of the cut off speech and the silent periods during a conversation caused by echo canceling circuitry within the speaker phone. Such phones operate in what is known as half-duplex mode, which means that only one person can speak at a time. While such silent periods assure that the sound from the speaker is not coupled directly into the microphone within a speaker phone, the quality of the call is poor.
Whether or not to receive (listen) or transmit (talk) is not easily resolved in the particular application of telephone communication. Voices may overlap, so-called “double talk,” particularly if there are more than two parties to a call. Background noise may cause problems if the noise level is a significant percentage of the voice level. Pauses in a conversation do not necessarily mean that a person is finished speaking and that it is time for someone else to speak. A voice signal is a complex wave that is discontinuous because not all speech sounds use the vocal chords. Analyzing a voice signal in real time and deciding whether or not a person has finished speaking is a complex problem despite the ordinary human experience of doing it unconsciously or subconsciously. A variety of electronic systems have been proposed in the prior art for arbitrating send or receive but the problem remains.
U.S. Pat. No. 4,796,287 (Reesor et al.) discloses a speaker phone in which a decremented counter provides a delay to channel switching by the remainder of the circuit. The magnitudes of the line signal and the microphone signal are used in determining whether or not to switch channels.
U.S. Pat. No. 4,879,745 (Arbel) discloses a half-duplex speaker phone that controls the selection of either a transmit or a receive audio path based upon a present state of the speaker phone and the magnitudes of three variables associated with each path. The three variables for each path include signal power, noise power, and worst-case echo.
U.S. Pat. No. 5,418,848 (Armbrüster) discloses a double talk detector wherein an evaluation circuit monitors voice signals upstream and downstream of echo canceling apparatus for detecting double talk. An up-down counter is incremented and decremented at different rates and a predetermined count is required before further signal processing takes place.
U.S. Pat. No. 5,598,466 (Graumann) discloses a voice activity detector including an algorithm for distinguishing voice from background noise based upon an analysis of average peak value of a voice signal compared to the current number of the audio signal.
U.S. Pat. No. 5,692,042 (Sacca) discloses a speaker phone including non-linear amplifiers to compress transmitted and received signals, and level detectors to determine the levels of the compressed transmitted and received signals. The compressed signals are compared in a comparator having hysteresis to enable either transmit mode or receive mode.
U.S. Pat. No. 5,764,753 (McCaslin et al.) discloses a double talk detector that compares the send and receive signals to determine “Return Echo Loss Enhancement,” which is stored as a digital value in a register. The digital value is adjusted over time and is used to provide a variable, rather than fixed, parameter to which new data is compared in determining whether to send or receive.
U.S. Pat. No. 5,867,574 (Eryilmaz) discloses a voice activity detection system that uses a voice energy term defined as the sum of the differences between consecutive values of a speech signal. Comparison of the voice energy term with threshold values and comparing the voice energy terms of the transmit and receive channels determines which channel will be active.
U.S. Pat. No. 6,138,040 (Nicholls et al.) discloses comparing the energy in each “frame” (thirty millisecond interval) of speech with background energy to determine whether or not speech is present in a channel. A timer is disclosed for bridging gaps between voiced portions of speech.
Typically, these systems are implemented in digital form and manipulate large amounts of data in analyzing the input signals. The Sacca patent discloses an analog system using an amplifier with hysteresis to avoid dithering, which, to a large extent, is unavoidable with a simple amplitude comparison. On the other hand, an extensive computational analysis to determine relative power takes too long. The Eryilmaz patent attempts to simplify the amount of computation but still requires manipulation of significant amounts of data. All these systems manipulate amplitude data, or data derived from amplitude, up to the point of making a binary value signal indicating voice.
One can increase the speed of a system by reducing the amount of data being processed. Unfortunately, this typically reduces the resolution of the system. For example, all other parameters being equal, eight bit data is more quickly processed than sixteen bit data. The problem is that resolution is reduced. In an acoustic environment, the quality or fidelity of the audio signal requires a minimum amount of data. Thus, the problem remains of speeding up a system other than by simply increasing the clock frequency.
Some of the prior art systems use historical data, e.g. three occurrences of what is interpreted as a voice signal. Such systems require large amounts of memory to handle the historical data and the current data.
Voice detection is not just used to determine transmit or receive. A reliable voice detection circuit is necessary in order to properly control echo cancelling circuitry, which, if activated at the wrong time, can severely distort a desired voice signal. In the prior art, this problem has not been solved satisfactorily.
In view of the foregoing, it is therefore an object of the invention to provide an improved method for analyzing the energy content of an incoming signal.
Another object of the invention is to provide a simple but effective circuit for detecting voice.
A further object of the invention is to provide a circuit having dynamically adjustable thresholds for analyzing energy content of a speech signal.
Another object of the invention is to provide a voice activity detector that does not require large amounts of data for reliable detection of a voice signal.
A further object of the invention is to provide an apparatus and a method for analyzing the envelope of a signal with minimal computation.
Another object of the invention is to provide an apparatus and a method for analyzing a signal that is less hardware intensive than in the prior art.
A further object of the invention is to provide an apparatus and a method for analyzing a signal that is faster than in the prior art.
Another object of the invention is to reduce the amount of data being processed without reducing the resolution of the system.
A further object of the invention is to provide reliable activation of echo cancelling circuitry.