This invention relates in general to packetized voice communication systems, and more particularly to a method of detecting silence in a stream of voice packets that is robust to low-energy fricatives at the end of speech bursts. The method requires very little computation and can easily be implemented in hardware.
A packetized voice transmission system comprises a transmitter and a receiver. The transmitter collects voice samples and groups them into packets for transmission across a network to the receiver. The transmitter performs no operations upon the data. The data itself is companded according to u-law or A-law, as defined in ITU-T specification G.711, and is transmitted continuously at a constant TDM data rate (Time Division Multiplexing).
In order to save network bandwidth, packets of samples are only transmitted if voice activity is detected in the packet (i.e. voice data is not transmitted if the packet contains silence). It is known in the art for transmitters to test each packet for silence, prior to transmission, and after a sequence of packets is detected as containing silence, inhibiting transmission of subsequent silence packets until the next xe2x80x9cnon-silentxe2x80x9d packet is detected.
In the event of silence detection, it is known to generate comfort noise to the listening party, as set forth in commonly-assigned UK Patent Application No. 9927595.0 filed Nov. 22, 1999.
One example of a prior art system utilises complex digital signal processing (DSP) to detect voice, rather than silence, as set forth in U.S. Pat. No. 5,276,765 and pendix A of ITU-T specification G.728.1.
Another approach is based on determining the energy level of a signal and comparing it with a silence threshold energy level. This approach is less effective than the previously mentioned DSP approach but is considerably less expensive to implement in hardware. Examples of this latter approach are set forth in U.S. Pat. Nos. 4,028,496; 4,167,653; 4,277,645; 5,737,695 and 5,867,574.
According to the present invention, a system is provided for detecting silence in a voice packet by comparing the voice energy with an adaptive silence threshold which allows for varying levels of background noise in the transmitter. In response to detecting silence, the transmitter is halted in order to preserve channel bandwidth. Inhibition of the transmitter is delayed after detecting silence so as not to clip the beginning or ending of talk spurts and so as to pass fricatives.