A high percentage of a conversation between two or more people is silence, during which no voice activity takes place. In telephone networks providing voice services, any transmission of voice payload for these periods of silence constitutes a waste of bandwidth. Telecommunications service providers have recognized this and generally strive to apply silence suppression in the case when no voice activity is taking place as a way to realize bandwidth savings for service providers of voice networks. When silence suppression is applied in networks transmitting voice over packets (e.g., voice over internet protocol (VoIP) networks, or voice over asynchronous transfer mode (VoATM) networks), no packets are transmitted during periods of silence. The associated feature is often simply called VAD (Voice Activity Detection and directed silence suppression), and is used to determine whether or not to transmit packets, i.e. suppress silence. Often the feature is referred to simply as VAD, which is somewhat of a simplification of terms, as VAD is used to dynamically control, i.e. turn on and off, silence suppression.
Generally, VAD kicks in only after a certain integration period during which no voice activity takes place, typically 250 ms. This allows the system to distinguish real periods of voice inactivity from mere temporary drops in the wave pattern generated by speech. Likewise, when voice activity resumes after a period of silence, a certain period of time is required to determine that voice activity is resuming (as opposed to, e.g., a spike caused by static) only after which silence suppression is again turned off.
This leads to the problem of clipping, i.e., the problem that the initial period of voice activity before silence suppression is turned off, perhaps a few tens of milliseconds, is not transmitted and lost. Although the loss is only brief, the result is a noticeable degradation of quality of voice service to the end users, as e.g. the initial syllable of a word is cut off after each period of brief voice inactivity, as observed on VISM. The result is that some customers may ask their voice service providers to turn VAD off, which prohibits the service providers from realizing the substantial bandwidth savings associated with VAD.
Another conventional solution is to buffer the voice signals. An incoming voice signal is forwarded into a buffer. After detection of voice activity, the buffer starts to be played out. This way, no voice activity is lost, with the buffer buffering the period of time necessary to turn off silence suppression after voice activity initially occurs. However, this solution introduces a significant delay in voice transmission, which in itself constitutes another degradation of quality of voice service severe enough to be generally unacceptable.