The present invention relates, generally, to voice over Internet and, more particularly, to a system and method for using packet statistics to control de-jitter delay in voice over packet data networks to optimize voice playback quality.
Packets traveling over a packet data network encounter a propagation delay, which is the interval between the time a packet is transmitted and the time the packet is received. A problem, referred to as jitter, occurs when the propagation delay of successively transmitted packets is not constant. Jitter can be described as the difference between the actual propagation delay of a specific packet and the, average propagation delay of some predetermined number of packets.
A packet, such as a voice packet or a data packet, comprises a group of binary digits which are transmitted and switched as a logical unit. When voice is transmitted over a packet data network, the transmitter interposes a fixed time interval between the transmission of each successive packet. These same intervals are required between voice packets at the time the packets are played back in order to ensure smooth playback quality. Traditional telephone networks are circuit switched, and thus avoid problems associated with timely arrival of packets. However, when voice packets are to be transferred over data networks, such as voice over the Internet, there is no guarantee of consistent time delays between the voice packets as is the case with telephone networks. Jitter, if not compensated for, degrades the playback quality of real time voice signals carried by the voice packets.
Because packet data networks, such as the Internet, cannot guarantee the delivery time of data packets (or their order, for that matter), the packets arrive at an inconsistent rate. Therefore, the packets are received with variable delays between them rather than the fixed delay (interval) originally interposed between each packet. The variability in the arrival rate of data causes jitter in the received packets. In order to alleviate problems due to jitter, it is well known to use a buffer (called a xe2x80x98jitter bufferxe2x80x99) at the receiver end of a system to provide a delay, called xe2x80x98de-jitterxe2x80x99 delay, to compensate for these variable delays.
Most systems use a jitter buffer to store at least one packet of data from the network before passing it to a playback device. These buffers can significantly reduce the occurrence of data starvation and ensure the timing is correct when sending data to the playback device. Without jitter buffers, gaps in the data would cause the voice playback to sound choppy or distorted. The jitter buffer provides an adjustable length time window which can be expanded as necessary to allow for varying delays between received packets, particularly packets whose propagation time is longer than the average. These xe2x80x98latexe2x80x99 packets can thus be re-assembled in slightly-delayed real time into a voice stream to be played back with the original fixed delay between them.
Some presently known methods for receiving voice transmitted over packet data networks use immediate decision schemes to adjust the size of the window for receiving a voice packet (the de-jitter delay). Immediate decision schemes determine whether a given packet arrives within a predetermined time relative to a preceding packet. If a given packet does not arrive within the predetermined time, then the packet is considered xe2x80x98latexe2x80x99. In other words, a voice packet is late if it does not arrive within the existing window for receiving a voice packet. In order to reduce the number of xe2x80x98latexe2x80x99 packets, the de-jitter delay is increased to expand the window for receiving the voice packet. This increases the probability that subsequent late packets will have time to arrive.
Such immediate decision schemes typically increase the de-jitter delay until a required quality of service (QOS) is achieved or a maximum de-jitter delay is reached. This QOS is a predetermined range, number, percentage, or the like defining a parameter related to the number of packets which are considered late in a given period of time. For example, the QOS may require a certain percentage of transmitted voice packets to arrive xe2x80x98on-timexe2x80x99 within this time period. The QOS may also limit the de-jitter delay to a predetermined maximum time. However, immediate decision schemes make no distinction between late packets and lost packets. xe2x80x98Latexe2x80x99 voice packets are those packets which do not arrive at the receiver within the time during which reconstruction of the voice stream must occur. xe2x80x98Lostxe2x80x99 packets are those which never arrive at the receiver. Therefore increasing the de-jitter delay indefinitely will not help recover lost packets. Immediate decision schemes treat both late and lost packets as xe2x80x98missingxe2x80x99 packets. Thus, since no distinction is made between late packets and lost packets, in an attempt to capture xe2x80x98missingxe2x80x99 packets, where all of the packets are xe2x80x98lostxe2x80x99 packets, existing systems may increase the de-jitter delay to an unnecessarily long period of time. This lengthy de-jitter delay degrades system performance without improving voice playback quality.
Another known method, Digital Simultaneous Voice and Data (DSVD) uses an error correction scheme which recognizes when a packet is late and subsequently tries to correct the error by adjusting the de-jitter delay. However, this method also fails to distinguish between late and lost voice packets, so that the de-jitter delay may be unnecessarily increased. Alternatively, voice reconstruction software may use forward error correction to reconstruct the lost voice packets. However, forward error correction requires transfer of redundant information in each subsequent voice packet which degrades overall system performance.
Voice playback quality is degraded when jitter is not compensated for. In addition, the problem of unnecessarily increasing the de-jitter delay without providing a corresponding improvement in voice playback quality remains unresolved by the prior art. A method is needed which overcomes the shortcomings of the prior art in determining how to effectively adjust the de-jitter delay in order to achieve smooth playback quality.
The present invention provides an improved method for improving voice playback quality by intelligently compensating for jitter in the transfer of voice data over packet data networks. A predetermined quality of service factor is used to determine the jitter delay for received voice packets in order to optimize the number of voice packets received. The invention uses packet sequence information in the voice packet protocol to determine which voice packets are missing and which voice packets are late within a predetermined statistically significant interval. The jitter delay is decreased when the number of missing packets is significantly less than that specified by the quality of service factor. In most cases, the jitter delay is increased when the number of missing packets is greater than that specified by the quality of service factor. However, the present method does not increase the jitter delay at all when no late packets are observed during the predetermined interval.