1. Field of the Invention
The present invention relates to network data transmission, and more specifically, to optimizing a playout delay for packets transmitted in a network, said packets comprising data for playout in a stream and compressed according to a codec (compressor/decompressor).
2. Description of the Prior Art
The popularity of the Internet has led to the development of technologies that allow real-time streaming of voice, audio, and video transmissions. Nearly everyone who has used the Internet has at one time or another listened to streaming audio or watched streaming video. More recently, other methods of communication through the Internet have been developed such as voice over Internet protocol (VoIP). Using software that implements VoIP is becoming a popular and economical way for people to communicate with each other through the Internet and other computer networks.
One of the major obstacles in the communication of packets belonging to a streaming transmission, such as VoIP packets, is variance in network delay known as jitter. Jitter is typically reduced by delaying the playout of packets according to a playout delay. As network delay is not constant, reducing the amount of jitter in a transmission requires reasonable measurements of network delay and accurate estimations of playout delay. However, the playout delay cannot be too long, as the transmission is intended to be real-time streaming and long playout delays defeat this intention.
FIG. 1 is a schematic diagram that shows packets of data of a voice data 20 being sent across a network 10. The data 20 includes audible ranges 20a, 20c, and 20e where there is discernable audio information and silent ranges 20b and 20d where there is an absence of discernable audio information. A sender 12, being a PC or other device, sends packets P1–P15 in order at regular intervals, but because of network delay delaying the transmission of the packets P1–P15, some of the packets P1–P15 arriving at a receiver 14, a similar PC or device, must be further delayed by different amounts to form a cohesive voice data 22. The voice data 22 includes audible ranges 22a, 22c, and 22e and silent ranges 22b and 22d corresponding to the ranges 20a–20e of the sent data 20.
The packet P1 is sent by the sender 12 at a given time. The packet P1 is delayed by the network 10 for any number of reasons, said delay and further delays being indicated in FIG. 1 by a shaded block having a label “D”. The packet P1 is further delayed by the receiver 14 so it can be played contiguously with the packet P2 that is also delayed by the network 10. If the packet P1 is not further delayed by the receiver 14, packets P1 and P2 would not be played contiguously, and an audible break in the data 22 would occur. The audible break in the data 22 would be heard by a listener at the receiver 14, which translates to poor audio quality of the playout data 22.
The packets P2–P5 are all delayed by the network 10 by the same amount of time and do not have to be further delayed by the receiver 14 to be played in sequence with proper timing. However, the packet P7 arrives before the packet P6. The receiver 14 must delay the playout of the packet P7 until the packet P6 is received. This delay is added to the silent range 22b of the data 22 so that the audible range 22c is not affected. The packets P8 and P9 arrive simultaneously as do the packets P10 and P11 because of network delay and packet bursting. Playout of the packets P9 and P11 is accordingly delayed, however, no further delay of the data 22 results. The packets P13 and P14 suffer a similar disorder as the packets P6 and P7. The packets P12 and P15 arrive at the receiver 14 normally.
The above description with reference to FIG. 1 is a simplification. The packets P1–P15 were assumed to arrive at the receiver delayed by an integer multiple of their packet length. In reality, a substantially large number of packets in a given transmission must be delayed, as network delay and jitter are essentially continuous in time and packet length is digital.
FIG. 1 shows that the entire received data 22 is delayed by three blocks by a combination of network delay and additional playout delay added by the receiver 14. If this additional delay were not added by the receiver 14, some packets would be played out of order and others would not be played at all. The prior art teaches a number of ways to estimate the delay required to be added by the receiver 14.
A fundamental and arguably most useful method of estimating playout delay is the mean delay and variance (MDV) method described in R. Ramjee, J. Kurose, D. Towsley, and H. Schulzrinne, “Adaptive Playout Mechanisms for Packetized Audio Applications in Wide-Area Networks”, Proceedings of IEEE INFOCOM, Toronto, Canada, pp. 680–686, June 1994, which is incorporated herein by reference. The MDV method is further described in Marco Roccetti, Vittorio Ghini, Giovanni Pau, Paola Salomoni, and Maria Elena Bonfigli, “Design and Experimental Evaluation of an Adaptive Playout Delay Control Mechanism for Packetized Audio for use over the Internet”, November 1998, which is also incorporated herein by reference. Briefly, the MDV method estimates playout delay from a variance of a mean network delay in conjunction with a smoothing factor. This simple adaptive approach offers significant improvement over other non-adaptive approaches.
Another method of estimating playout delay is described in the real-time transport protocol (RTP) standard. H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, “RTP: A Transport Protocol for Real-Time Applications”, RFC 1889, January 1996 details the RTP standard and is incorporated herein by reference. The RTP method of estimating delay is essentially the MDV method applied with a fixed smoothing factor. While simpler than the MDV method, the RTP method offers a less accurate estimation of network delay.
Other prior art methods of estimating playout delay include a spike detection method described in “Adaptive Playout Mechanisms for Packetized Audio Applications in Wide-Area Networks”, and a related gap-based method described in Jesus Pinto and Kenneth J. Christensen, “An Algorithm for Playout of Packet Voice based on Adaptive Adjustment of Talkspurt Silence Periods”, 1999, http://citeseer.nj.nec.com/pinto99algorithm.html, which is incorporated herein by reference. Both the spike detection method and the gap-based method offer little significant improvement over the MDV method at the expense of added complexity.
Finally, the prior art offers a normalized least mean square (NLMS) method that is described in Phillip DeLeon and Cormac J. Sreenan, “An Adaptive Predictor for Media Playout buffering”, Stanford, March, 2001, http://citeseer.nj.nec.com/deleon99adaptive.html, which is incorporated herein by reference. The NLMS is a complicated method that offers no readily apparent advantages over other methods.
In addition, the prior art has numerous patents relating to the playout of digital information and performance monitoring of the playout. For instance, Daum et al. teach stream synchronization for MPEG playback in the comprehensive U.S. Pat. No. 5,815,634, and Jain describes a real-time receiver and method for receiving and playing out real-time packetized data in U.S. Pat. No. 6,259,677, both of which are included herein by reference. Additionally, Schulman in U.S. Pat. No. 5,600,632 teaches performance monitoring in a network using synchronized network analyzers relating to packet delay, and Agrawal et al. provide a predictive approach to synchronization using a method for maintaining and updating statistical trends of network delay in U.S. Pat. No. 6,072,809, both of which are include herein by reference.
The prior art methods mentioned and described above share a common characteristic, that is, they optimize the playout delay from network statistics only. The prior art methods do not adequately consider the codec used in compressing data for playout and resulting actual playout quality.