I. Field
The present invention relates generally to delivery of information over a wireless communication system, and more specifically to synchronization of audio and video data transmitted over a wireless communication system.
II. Background
Various techniques for transmitting multimedia or real-time data, such as audio or video data, over various communication networks have been developed. One such technique is the real-time transport protocol (RTP). RTP provides end-to-end network transport functions suitable for applications transmitting real-time data over multicast or unicast network services. RTP does not address resource reservation and does not guarantee quality-of-service for real-time services. The data transport is augmented by a control protocol (RTCP) to allow monitoring of the data delivery in a manner scalable to large multicast networks, and to provide minimal control and identification functionality. RTP and RTCP are designed to be independent of the underlying transport and network layers. The protocol supports the use of RTP-level translators and mixers. Further details about RTP can be found in “RTP: A Transport Protocol for Real-Time Applications”, H. Schulzrinne [Columbia University], S. Casner [Packet Design], R. Frederick [Blue Coat Systems Inc.], V. Jacobson [Packet Design], RFC-3550 draft standard, Internet Engineering Steering Group, July 2003 incorporated by reference herein, in its entirety.
An example illustrating aspects of RTP is an audio conferences where the RTP is carried on top of Internet Protocol (IP) services of the Internet for voice communications. Through an allocation mechanism, an originator of the conference obtains a multicast group address and pair of ports. One port is used for audio data, and the other is used for control (RTCP) packets. This address and port information is distributed to the intended participants. The audio conferencing application used by each conference participant sends audio data in small partitions, for examples partitions of 20 ms duration. Each partition of audio data is preceded by an RTP header; and the combined RTP header and data are encapsulated into a UDP packet. The RTP header includes information about the data, for example it indicates what type of audio encoding, such as PCM, ADPCM or LPC, is contained in each packet, Time Stamp (TS) the time at which the RTP packet is to be rendered, Sequence Number (SN) a sequential number of the packet that can be used to detect lost/duplicate packets, etc. This allows senders to change the type of encoding used during a conference, for example, to accommodate a new participant that is connected through a low-bandwidth link or react to indications of network congestion.
In accordance with the RTP standard, if both audio and video media are used in an RTP conference, they are transmitted as separate RTP sessions. That is, separate RTP and RTCP packets are transmitted for each medium using two different UDP port pairs and/or multicast addresses. There is no direct coupling at the RTP level between the audio and video sessions, except that a user participating in both sessions should use the same name in the RTCP packets for both so that the sessions can be associated.
A motivation for transmitting audio and video as separate RTP sessions is to allow some participants in the conference to receive only one medium if they choose. Despite the separation, synchronized playback of a source's audio and video can be achieved using timing information carried in the RTP/RTCP packets for both sessions.
Packet networks, like the Internet, may occasionally lose, or reorder, packets. In addition, individual packets may experience variable amounts of delay in their respective transmission times. To cope with these impairments, the RTP header contains timing information and a sequence number that allow a receiver to reconstruct the timing produced by the source. This timing reconstruction is performed separately for each source of RTP packets in a session.
Even though the RTP header includes timing information and a sequence number, because the audio and video are delivered in separate RTP streams, there is potential time slip, also referred to as lip-synch or AV-synch, between the streams. An application at a receiver will have to re-synchronize these streams prior to rendering audio and video. In addition, in applications where RTP streams, such as audio and video, are transmitted over wireless networks there is an increased likelihood that packets may be lost, thereby making re-synchronization of streams more difficult.
There is therefore a need in the art for improving the synchronization of audio and video RTP streams that are transmitted over networks.