Because of the popularity of the Internet, a growing need for remote access, and the increase in data traffic volume that has exceeded the voice traffic volume through the voice and data communication networks, the transmission of voice as data rather than circuit switched voice is becoming more important. The problem that exists when voice is transmitted as data such as voice-over-packet technology or voice-over-the-Internet is to guarantee the quality of service. To reduce the bandwidth required to carry voice, voice-over-packet systems employ a voice activity detection to suppress the packetization of voice signals between individual speech utterances such as the silent periods in a voice conversation. Such techniques adapt to varying levels of noise and converge on appropriate thresholds for a given voice conversation. Use of voice activity detection reduces the required bandwidth of an aggregation of channels 50% to 60% for conversations that are essentially half-duplex, only one person speaks at a time in a half-duplex conversation.
When silence suppression is being used, a noise generator at the receiving end compliments the suppression of silence at the transmitting end by generating a local noise signal during the silent periods rather than muting the channel or playing nothing. Muting the channel gives the listener the unpleasant impression of a dead line. The match between the generated noise and the true background noise determines the quality of the noise generator.
Within the prior art, it is welt known that voice activity detection to determine silence and the removal of those silent periods can cause speech utterances to sound choppy and unconnected when cutting in or out of the speech. Two terms are utilized to express this problem. First, front-end clipping refers to clipping the beginning of an utterance. Second, holdover time refers to the time the activity detector continues to packetize speech after the voice signal level falls below the speech threshold. The holdover time is normally set to the period between words as has been determined for a particular conversation so as to avoid front-end clipping at the beginning of each word. However, excessive holdover times reduce network efficiency and too little causes speech to sound choppy.