When transmitting audio and video in a broadcast environment, the program being broadcast is said to be continuous, meaning that the broadcast of content on a certain channel is assumed to have started at a point far in the past and will continue very far into the future. In a broadcast environment that makes use of the Real Time Transport Protocol (RTP) over the Internet Protocol (IP), such as DVB-H (where H stands for handheld) the clocks used to synchronize audio and video use a different multiplier, which can cause problems for a receiver attempting to synchronize audio and video during playback. In other words, audio and video synchronization timestamps are derived from the same clock, but during the packetization process, a different multiplier is used which effectively causes the video and audio timestamps to rollover (or return to zero) at a different point in time. In particular, the video timestamp clock is always 90 kHz in frequency and the audio timestamp clock is always the sampling frequency of the audio, which could range from 8 kHz to 48 kHz. The rollover point is determined by the size of the storage unit used to represent the timestamp. In the case of RTP, 32-bits is the designated size of the timestamp, therefore it will rollover when the timestamp value reaches 2^32 or 4,294,967,296. Given that rollover can occur at a different point in time for audio and video, a receiver that joins the broadcast at any given point in time after the first rollover will not be able to effectively synchronize audio and video. Unfortunately, after the first rollover point, the receiver does not have enough information to relate the audio timestamps to video timestamps for synchronization purposes.
Turning to FIG. 1, an exemplary audio and video timestamp rollover scenario is indicated generally by the reference numeral 100. If the end client receiver decides to receive the broadcast before the first video timestamp rollover occurs, the receiver has enough information to synchronize audio and video together because both timestamps were assumed to have started at zero. However, if the end client receiver decides to receive the broadcast anytime after the first video rollover, there is no mechanism available that provides the end client receiver with adequate information to synchronize audio and video together. It is important to understand that the receiver does not know when the broadcast session started, and it does not know when it joined the broadcast session, therefore it does not know what audio timestamps correspond to what video timestamps.
MPEG2 transport solves this problem by using the same clock multiplier for both audio and video. Specifically, MPEG2 transport streams use a master 27 MHz for the system clock and derive a 90 kHz clock from it which the audio and video presentation timestamps (PTSs) use. As a result, audio and video always remain in synchronization at the receiver since the rollover for audio and video timestamps occurs at the same time.
In other RTP based streaming systems, audio and video synchronization is done by assuming the first received audio and video packets are related. This technique is applicable to systems that request content in a video on demand (VOD) fashion, but is unfortunately not very effective lipsync method for continuous broadcast programs because there is no guarantee that the first audio packet received corresponds to the first video packet received.
Another alternative embodiment that can be used for audio and video synchronization is to force the audio timestamp counter to rollover when the video timestamp counter rolls over. This technique will produce effective results similar to systems that use the same clock frequency, such as MPEG2 transport, but unfortunately, this method will cause problems with backwards compatibility in existing end client receiver systems. Specifically, existing receiver systems may have problems dealing with a discontinuity resulting from a forced rollover.