VAD is a technique used in voice processing in which the presence or absence of human voice is decided. VAD may be employed in voice communication applications to deactivate some processes during non-voice section of an audio session. For example, VAD can avoid unnecessary coding/transmission of silence packets in Voice over Internet Protocol applications, saving on computation and on network bandwidth. VAD is an integral part of different voice communication systems such as audio conferencing, echo cancellation, speech recognition, speech encoding, and hands-free telephony.
In a typical networking environment, there is some uncertainty about the occurrence and timing of received packets when being sent from one end point to another. For applications where the data being transmitted is mediated or not in real time, this can be overcome by using techniques that monitor transmission and correct through retransmitting data and/or providing suitable delays in the transmission to overcome any network impairment. For applications where various parties are interacting with each other in real time, such as in voice communication, this approach is not feasible as the delay in communications may represent a significant degradation in utility and perceived quality.
A conventional approach to this problem is to introduce a queue or jitter buffer that is sufficient to provide resilience to some level of network jitter, without necessarily ensuring that all data packets for the voice stream arrive. In some approaches, the length of such a jitter buffer is set to provide resilience to a certain degree by using statistical or historical information to provide a buffer and delay that permits a certain percentage of packets to be received in time for processing, forwarding or decoding and use.
Such jitter buffers introduce a necessary latency into the overall communications path. Many approaches have been proposed to manage this balance and trade off of quality and latency, and also, to improve encoding, decoding or processing to reduce the impact of missing data packets on the output voice stream.