This invention relates generally to a system for communicating real-time audio, video, and data signals over a packet-switched data network and more particularly to mapping Voice Activation Detection (VAD) to a scheduled access media.
A voice or other type of data stream is transmitted over a packet network by first formatting the data stream into multiple discrete packets. For example, in a Voice over Internet Protocol (VoIP) application, a digitized audio stream is quantized into packets that are placed onto a packet network and routed to a packet telephony receiver. The receiver converts the packets back into a continuous digital audio stream that resembles the input audio stream. A codec (a compression/decompression algorithm) is used to reduce the communication bandwidth required for transmitting the audio packets over the network.
A voice Activity Detection (VAD) is also known as Silence Suppression and is a voice processing technique used in packet switched networks to reduce bandwidth usage. With VAD, a transmitting CODEC sends audio samples only when audio signals are above a set audio energy threshold. For example, audio packets are not generated and transmitted over the packet network when the speaker is not currently talking. Without VAD, audio packets would be generated that contain only background noise.
The receiving CODEC compensates for the silence intervals by inserting silence or comfort noise equal to the perceived background noise of the conversation. VAD reduces the network bandwidth required for conducting a phone conversation and accommodates roughly twice as many voice conversations on the packet-switched network.
One problem with VAD is that additional packet latency is created from the starting and stopping of packet generation and transmission. VAD is also not currently incorporated into scheduled access media, such as cable modem networks. In a cable modem network, packets from multiple cable modems are scheduled for transmission during allocated grants. This grant scheduling adds to the latency already created by VAD.
Accordingly, a need remains for incorporating VAD into a scheduled access media while also reducing VAD induced latency.
A network processing node allocates unsolicited grants at a selected time interval for scheduling transmission of audio packets. The network processing node switches from allocating unsolicited grants to providing a polling request when Voice Activity Detection (VAD) at a transmitting endpoint stops generating and transmitting audio packets. The network processing node switches back to allocating unsolicited grants when the endpoint starts generating more audio packets.
The unsolicited grants include one or more additional grants within the selected time interval a grant startup that flush out one or more audio packets that may already be queued for transmitting. These additional grants reduce the latency caused by VAD stopping and then restarting audio packet transmission at the endpoint.
The network endpoint transmits the audio packets from a transmit queue. The audio packets arrive in the transmit queue when VAD detects audio signals above a predefined energy threshold. The additional grants sent by the network processing node allow transmission of multiple audio packets from the packet queue during the same grant time interval. This eliminates delays in audio packet playout caused while a receiving jitter buffer waits for a minimum number of audio packets.
In one embodiment of the invention, the network endpoint comprises a Cable Modem (CM) and the network processing node comprises a Cable Modem Termination System (CMTS). VAD is enabled between the CM and the CMTS according to a Data Over Cable System Interface Specification (DOCSIS). However, the invention can be implemented in any scheduled access media or access media protocol.
In another aspect of the invention, a notification is sent to the CMTS immediately after audio activity is detected. The CM then encodes the audio signals while waiting for the CMTS to reallocate grants. Because audio signal encoding is overlapped with CM notification and CMTS grant reallocation, the initial encoding latency is eliminated.
The foregoing and other objects, features and advantages of the invention will become more readily apparent from the following detailed description of a preferred embodiment of the invention which proceeds with reference to the accompanying drawings.