1. Field of the Invention
The present invention relates to network data transmission, and more specifically, to optimizing a playout delay for packets transmitted in a network based on a duplex mode of real-time interactive communication.
2. Description of the Prior Art
The popularity of the Internet and wireless communications has lead to the development of technologies that allow real-time digital communication between people. Recently, communication through the Internet and wireless communications networks has been improved for new technologies such as voice over Internet protocol (VoIP) and the other real-time interactive communication systems.
One of the major obstacles in the communication of packets belonging to a network transmission, such as VoIP packets, is variance in network delay known as jitter. Jitter is typically reduced by delaying the playout of packets according to a playout delay. As network delay is not constant, reducing the amount of jitter in a transmission requires reasonable measurements of network delay and accurate estimations of playout delay. However, the playout delay cannot be too long, as the transmission is intended to be real-time and long playout delays defeat this intention. Minimizing playout delay is particularly important in two-way communications, such as through VoIP, wireless phones, videophones, or on-line games, to avoid delay related inconvenience to users when communicating to each other.
FIG. 1 is a schematic diagram that shows packets of data of a voice data 20 being sent across a network 10. For example, the data 20 can be media output from the Internet such as a VoIP transmission, a person's voice being transmitted across a wireless phone network, or similar communications data. The data 20 includes audible ranges 20a, 20c, and 20e where there is discernable audio information and silent ranges 20b and 20d where there is an absence of discernable audio information. A sender 12, being a PC, wireless telephone, or other device, sends packets P1-P15 in order at regular intervals, but because of network delay delaying the transmission of the packets P1-P15 some of the packets P1-P15 arriving at a receiver 14, a similar PC or device, must be further delayed by different amounts to form a cohesive voice data 22. The voice data 22 includes audible ranges 22a, 22c, and 22e and silent ranges 22b and 22d corresponding to the ranges 20a-20e of the sent data 20.
The packet P1 is sent by the sender 12 at a given time. The packet P1 is delayed by the network 10 for any number of reasons, said delay and further delays being indicated in FIG. 1 by a shaded block. The packet P1 is further delayed by the receiver 14 so it can be played contiguously with the packet P2 that is also delayed by the network 10. If the packet P1 is not further delayed by the receiver 14, packets P1 and P2 would not be played contiguously, and an audible break in the data 22 would occur. The audible break in the data 22 would be heard by a listener at the receiver 14, which translates to poor audio quality of the playout data 22.
The packets P2-P5 are all delayed by the network 10 by the same amount of time and do not have to be further delayed by the receiver 14 to be played in sequence with proper timing. However, the packet P7 arrives before the packet P6. The receiver 14 must delay the playout of the packet P7 until the packet P6 is received. This delay is added to the silent range 22b of the data 22 so that the audible range 22c is not affected. The packets P8 and P9 arrive simultaneously as do the packets P10 and P11 because of network delay and packet bursting. Playout of the packets P9 and P11 is accordingly delayed, however, no further delay of the data 22 results. The packets P13 and P14 suffer a similar disorder as the packets P6 and P7. The packets P12 and P15 arrive at the receiver 14 normally.
The above description with reference to FIG. 1 is a simplification. The packets P1-P15 were assumed to arrive at the receiver delayed by an integer multiple of their packet length. In reality, a substantially large number of packets in a given transmission must be delayed, as network delay and jitter are essentially continuous in time and packet length is digital.
FIG. 1 shows that the entire received data 22 is delayed by three blocks by a combination of network delay and additional playout delay added by the receiver 14. If this additional delay were not added by the receiver 14, some packets would be played out of order and others would not be played at all. The prior art teaches a number of ways to estimate the delay required to be added by the receiver 14. However, too much playout delay can result in inconvenience and even misunderstanding between parties in two-way schemes. A fundamental and arguably most useful method of estimating playout delay is the mean delay and variance (MDV) method described in R. Ramjee, J. Kurose, D. Towsley, and H. Schulzrinne, “Adaptive Playout Mechanisms for Packetized Audio Applications in Wide-Area Networks”, Proceedings of IEEE INFOCOM, Toronto, Canada, pp. 680-686, June 1994, which is incorporated herein by reference. The MDV method is further described in Marco Roccetti, Vittorio Ghini, Giovanni Pau, Paola Salomoni, and Maria Elena Bonfigli, “Design and Experimental Evaluation of an Adaptive Playout Delay Control Mechanism for Packetized Audio for use over the Internet”, November 1998, which is also incorporated herein by reference. Briefly, the MDV method estimates playout delay from a variance of a mean network delay in conjunction with a smoothing factor. This simple adaptive approach offers significant improvement over other non-adaptive approaches.
Another method of estimating playout delay is described in the real-time transport protocol (RTP) standard. H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, “RTP: A Transport Protocol for Real-Time Applications”, RFC 1889, January 1996 details the RTP standard and is incorporated herein by reference. The RTP method of estimating delay is essentially the MDV method applied with a fixed smoothing factor. While simpler than the MDV method, the RTP method offers a less accurate estimation of network delay.
Other prior art methods of estimating playout delay include a spike detection method described in “Adaptive Playout Mechanisms for Packetized Audio Applications in Wide-Area Networks”, and a related gap-based method described in Jesus Pinto and Kenneth J. Christensen, “An Algorithm for Playout of Packet Voice based on Adaptive Adjustment of Talkspurt Silence Periods”, 1999. Both the spike detection method and the gap-based method offer little significant improvement over the MDV method at the expense of added complexity.
Finally, the prior art offers a normalized least mean square (NLMS) method that is described in Phillip DeLeon and Cormac J. Sreenan, “An Adaptive Predictor for Media Playout buffering”, Stanford, March 2001. The NLMS is a complicated method that offers no readily apparent advantages over other methods.
In addition, the prior art has numerous patents relating to the playout of digital information and performance monitoring of the playout. For instance, Daum et al. teach stream synchronization for MPEG playback in the comprehensive U.S. Pat. No. 5,815,634, and Jain describes a real-time receiver and method for receiving and playing out real-time packetized data in U.S. Pat. No. 6,259,677, both of which are included herein by reference. Additionally, Schulman in U.S. Pat. No. 5,600,632 teaches performance monitoring in a network using synchronized network analyzers relating to packet delay, and Agrawal et al. provide a predictive approach to synchronization using a method for maintaining and updating statistical trends of network delay in U.S. Pat. No. 6,072,809, both of which are include herein by reference.
The prior art methods mentioned and described above optimize the playout delay from network statistics and do not adequately consider a duplex of communications. That is, playout delay is not adjusted considering half-duplex or full-duplex modes of communication. Thus, advantages of considering these modes are not realized resulting in actual playout quality, and therefore overall communications quality, being deteriorated.