1. Field of the Invention
The present invention relates to the field of telecommunications and, more specifically, to managing real-time data packet receipt and playout in the presence of variable packet delays.
2. Description of the Related Art
Real-time digital audio for Internet telephony and playback for World Wide Web browsers employs packetized audio data that is transferred over a network. Each packet contains information that allows the data network to route it to the appropriate destination. Packets from many different transmitters travel sequentially over single connections between routing points (nodes), and packets from the same transmitter (source) may travel different paths through nodes of the network. Consequently, each packet in a sequence of packets from a specific source to a specific receiver (destination) may experience a different delay as it travels through its path through the network. Delay variation also occurs as the packets experience different competing traffic loads at nodes along the network. This variation in delay is termed “jitter.”
In addition to the uneven arrival of packets, jitter may also cause out-of-sequence packets. An out-of-sequence packet occurs when the order of the sequence of packets arriving at the destination differs from the order in which the sequence of packets was transmitted by the source. For overall perceived playback quality at the destination, it is preferable to play out voice packets in the correct order at a constant rate and without excessive delays. Hence, network jitter that is not compensated for may significantly degrade the quality of voice service (e.g., in a two-way conversation). One method to compensate for the network jitter is to introduce a jitter buffer at the destination receiver.
FIG. 1 is a block diagram of a prior art jitter-buffering system for audio delivery, e.g., voice over Internet Protocol (VoIP), and continuous playback at an audio receiver. When an initial (first) packet arrives at the receiver, it is enqueued into a jitter buffer 102 and is not played out immediately. Instead, the initial packet is held in buffer 102 for a predetermined amount of time (referred to as the release threshold) before being forwarded to a decoder 104 for playout. After the first packet is played out, subsequent packets are played out at uniform time intervals.
It is preferable to keep the release threshold at a minimum for two reasons. First, the jitter buffer at the receiver, such as buffer 102, is of finite length (i.e., it can only hold a fixed number of packets). Therefore, buffer overflow (resulting in loss or dropping of incoming packets) should be avoided. Second, as mentioned above, the total “end-to-end” delay may be perceivable by network users. If the total delay of the voice path exceeds approximately 200 msec, the conversation may be perceived as lagging (having low quality). Longer delays can noticeably disrupt interactive communications and significantly impair human conversations. Thus, the total end-to-end delay should preferably be less than 200 msec. However, if the release threshold is too low, then “slower” packets will not arrive before their designated playout time, causing buffer underflow and degrading the quality of voice transmission.