1. Field of the Invention
The present invention relates to media playout of a network transmission of packets, and more specifically to determining an objective quality of media playout for a playout mechanism.
2. Description of the Prior Art
The popularity of the Internet and wireless communications has lead to the development of technologies that allow real-time digital communication between people. Recently, communication through the Internet and wireless communications networks has been improved for new technologies such as voice over Internet protocol (VoIP) and the other real-time interactive communication systems.
One of the major obstacles in the communication of packets belonging to a network transmission, such as VoIP packets, is variance in network delay known as jitter. Jitter is typically reduced by delaying the playout of packets according to a playout delay. As network delay is not constant, reducing the amount of jitter in a transmission requires reasonable measurements of network delay and accurate estimations of playout delay. However, the playout delay cannot be too long, as the transmission is intended to be real-time and long playout delays defeat this intention. Minimizing playout delay is particularly important in two-way communications, such as through VoIP, wireless phones, videophones, or on-line games, to avoid delay related inconvenience to users when communicating to each other.
FIG. 1 is a schematic diagram that shows packets of data of a voice data 20 being sent across a network 10. For example, the data 20 can be media output from the Internet such as a VoIP transmission, a person's voice being transmitted across a wireless phone network, or similar communications data. The data 20 includes audible ranges 20a, 20c, and 20e where there is discernable audio information and silent ranges 20b and 20d where there is an absence of discernable audio information. A sender 12, being a PC, wireless telephone, or other device, sends packets P1–P15 in order at regular intervals, but because of network delay delaying the transmission of the packets P1–P15 some of the packets P1–P15 arriving at a receiver 14, a similar PC or device, must be further delayed by different amounts to form a cohesive voice data 22. The voice data 22 includes audible ranges 22a, 22c, and 22e and silent ranges 22b and 22d corresponding to the ranges 20a–20e of the sent data 20.
The packet P1 is sent by the sender 12 at a given time. The packet P1 is delayed by the network 10 for any number of reasons, said delay and further delays being indicated in FIG. 1 by a shaded block. The packet P1 is further delayed by the receiver 14 so it can be played contiguously with the packet P2 that is also delayed by the network 10. If the packet P1 is not further delayed by the receiver 14, packets P1 and P2 would not be played contiguously, and an audible break in the data 22 would occur. The audible break in the data 22 would be heard by a listener at the receiver 14, which translates to poor audio quality of the playout data 22.
The packets P2–P5 are all delayed by the network 10 by the same amount of time and do not have to be further delayed by the receiver 14 to be played in sequence with proper timing. However, the packet P7 arrives before the packet P6. The receiver 14 must delay the playout of the packet P7 until the packet P6 is received. This delay is added to the silent range 22b of the data 22 so that the audible range 22c is not affected. The packets P8 and P9 arrive simultaneously as do the packets P10 and P11 because of network delay and packet bursting. Playout of the packets P9 and P11 is accordingly delayed, however, no further delay of the data 22 results. The packets P13 and P14 suffer a similar disorder as the packets P6 and P7. The packets P12 and P15 arrive at the receiver 14 normally.
The above description with reference to FIG. 1 is a simplification. The packets P1–P15 were assumed to arrive at the receiver delayed by an integer multiple of their packet length. In reality, a substantially large number of packets in a given transmission must be delayed, as network delay and jitter are essentially continuous in time and packet length is digital.
FIG. 1 shows that the entire received data 22 is delayed by three blocks by a combination of network delay and additional playout delay added by the receiver 14. If this additional delay were not added by the receiver 14, some packets would be played out of order and others would not be played at all. The prior art teaches a number of ways to estimate the delay required to be added by the receiver 14. However, too much playout delay can result in inconvenience and even misunderstanding between parties in two-way mechanisms. A fundamental and arguably most useful method of estimating playout delay is the mean delay and variance (MDV) method described in R. Ramjee, J. Kurose, D. Towsley, and H. Schulzrinne, “Adaptive Playout Mechanisms for Packetized Audio Applications in Wide-Area Networks”, Proceedings of IEEE INFOCOM, Toronto, Canada, pp. 680–686, June 1994, which is incorporated herein by reference. The MDV method is further described in Marco Roccetti, Vittorio Ghini, Giovanni Pau, Paola Salomoni, and Maria Elena Bonfigli, “Design and Experimental Evaluation of an Adaptive Playout Delay Control Mechanism for Packetized Audio for use over the Internet”, November 1998, which is also incorporated herein by reference. Briefly, the MDV method estimates playout delay from a variance of a mean network delay in conjunction with a smoothing factor. This simple adaptive approach offers significant improvement over other non-adaptive approaches.
Another method of estimating playout delay is described in the real-time transport protocol (RTP) standard. H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, “RTP: A Transport Protocol for Real-Time Applications”, RFC 1889, January 1996 details the RTP standard and is incorporated herein by reference. The RTP method of estimating delay is essentially the MDV method applied with a fixed smoothing factor. While simpler than the MDV method, the RTP method offers a less accurate estimation of network delay.
Once playout delay has been estimated and calculated, packets of a network transmission can be played out by the receiver 14. The quality of the playout as perceived by a user is directly related to the method, the above described being examples, of determining the playout delay. In order to determine which method yields the best playout quality it is of interest to develop a quality measurement system.
Several prior art methods of determining playout quality are taught in ITU-T P.861 (1996), “Objective quality measurement of telephone-band (300–3400 Hz) speech codecs”, 02.1998; ETSI EG 201 377-1 V1.1.1, “Specification and measurement of speech transmission quality”, 02.2001; and ITU-T G.862, “Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow band telephone networks and speech codecs”, 02.2001 all of which are included herein by reference. These methods, relating to speech quality, are not well suited for two-way communication, and cannot readily measure the effects of echo and comfortable interruption. Furthermore, the prior art methods of measuring playout quality are not well suited to comparing different playout mechanisms.
In addition, the prior art has numerous patents relating to objective playout quality measurement. Regardless of how applied, whether to measuring signal, speech, or video quality, the prior art patents are only suitable for static data or one-way communication.