The following relates generally to media synchronization, and more specifically to audiovisual synchronization in wireless communications systems. Wireless communications systems are widely deployed to provide various types of communication content such as voice, video, video telephony, packet data, messaging, broadcast, and so on. These systems may be multiple-access systems capable of supporting communication with multiple users by sharing the available system resources (e.g., time, frequency, and power). Examples of such multiple-access systems include code-division multiple access (CDMA) systems, time-division multiple access (TDMA) systems, frequency-division multiple access (FDMA) systems, and orthogonal frequency-division multiple access (OFDMA) systems.
Media synchronization, including audiovisual (also referred to as audio-video or AV) synchronization, is central to a positive user experience for services that have both audio and video components (e.g., video telephony (VT)). AV synchronization, which is also known as “lip sync,” is generally defined as the process of ensuring that a relative delay between audio and video stream captures are maintained when they are viewed at some AV receiver. Relative delay being the time difference of an audio frame captured at a microphone and video captured at a camera.
Audio and video components are often transmitted in independent Real-time Transport Protocol (RTP) streams. These independent RTP streams may lead to poor synchronization of audio and video packets. Low production-quality movies provide limited AV synchronization. For example, the words of the soundtrack may not match the actor's apparent speech patterns. In some cases, it may appear as though the speaker is not actually saying the words attributed to her. Similarly poor AV synchronization may occur during real-time communication, such as VT phone calls.