Packet networks transmit media content whose playout is time sensitive. It is well known that packets representing portions of real time conversation or similar data streams traveling through computer networks experience substantial delays or other signal interference sufficient for perception of a break in playout by a human experiencing reception of the data stream.
Packet delay has two effects, i.e., delay in an absolute sense can interfere with the rhythm of interaction either between humans in conversation or with machine interaction; and delay variations, also known as jitter, can create unexpected pauses that may impair the intelligibility of the data stream. In a specific example, the quality of a packetized voice delivered to a client computer or network destination could be perceived as “jerky” or discontinuous at sense-critical moments.
Jitter, the more serious of these problems, is the difference between when a packet is expected to arrive and when it actually is received. Jitter is due primarily to queuing delays and congestion in the packet network, which cause discontinuity in delivery of packets of the real-time data stream.
Time-sensitive data streams need a steady, even stream of packets to reproduce human or machine input from the other end for optimal human perception and interaction. Humans may obtain the logical sense of a broken up playout while experiencing such extensive frustration that focus on the playout is diminished, effectively losing the value of intelligible content delivered to a human receiver. Delivery of voice packets is often irregular because conditions in the network are always changing. During congested periods, buffers on a network can fill instantaneously, delaying some packets until there is room for them on the network. Other packets in the same data stream may not be delayed, because there was no congestion when they passed over the network. Thus, various packets in the same data stream can experience different amounts of inter-arrival variance, or jitter, which is a variable component of the total end-to-end network delay.
Some packet networks compensate for jitter by setting up a buffer, called the jitter buffer, on a gateway router at the receiving end of the voice transmission to be buffered and as close, at the physical layer, to the playout devices of the receiving human. It is well known that an IP network can use a jitter buffer to receive packets at irregular intervals, which are sometimes out of sequence so that the jitter buffer holds the packets briefly, reorders them if necessary, and then plays them out at evenly spaced intervals to a decoder in a Digital Signal Processor (DSP) on the gateway. Algorithms in the DSP determine the size and behavior of the jitter buffer, based on user configuration and current network jitter conditions, to maximize the number of correctly delivered packets and minimize the amount of delay. Adaptive jitter buffers are well known in the art to include simple or complex algorithms to handle playout to optimize human comprehension and enjoyment.
It is also well known in the prior art that packet length is unaffected by jitter buffer manipulation, i.e., that the jitter buffer acts to organize the packet population retained in the buffer for prior to release to playout devices but that the packet length is scrupulously maintained to preserve playout integrity. There is a need for a system which alters the relationship between prior art jitter buffers and ultimate playout to further improve delivery of real time data streams to a human recipient.
Even in the absence of network jitter, buffer overflows or underflows, known as slips, occur at the receiver if its clock is not synchronized to the transmitter clock, due to the fact that the read and write rates at the receive buffer will not identical. A slip results in the distortion of the played out speech. Assuming a circular buffer design, it causes a speech segment, equal in duration to that of the buffer, to be deleted if the read clock is slower than the write clock, or to be repeated if it is faster. A further consequence of this clock skew is that the buffer delay in the receiver varies from zero to the maximum capacity of the buffer even though the network has a constant propagation delay. This implies that the playout delay, which should be constant in a jitter-free network, will also have an identical variation.
It is generally difficult to estimate this clock skew and convert the sampling rate of the received stream to a new rate to account for the skew. Hence there is a need for an integrated adaptive jitter buffer in modern Voice-over-Internet-Protocol (VoIP) systems, where both the network jitter and the clock skew problems are simultaneously solved, thereby providing a better subjective voice quality for the communicating parties.