Audio-visual (A/V) synchronization errors as short as plus or minus one-half film frame can be detected by most film editors. Since film is projected at 24 frames per second (fps) in the U.S. and 25 fps in Europe, one-half film frame equates to approximately +/−20 msec. Similarly, plus or minus one video frame corresponds to +/−33-40 msec.
The acceptable range in which audio-video synchronization can vary is non-symmetrical because human perception of audio-video synchronization is weighted more in one direction than the other. Because light travels much faster than sound, events are usually seen before the accompanying sound is heard. For example, simultaneously seeing and hearing a basketball hitting the court in a large sports venue appears relatively correct to the first few rows. However, the further back a viewer is located, the more the sound of the ball hitting the floor lags behind the sight of the ball hitting the floor. Even though the amount the sound lags increases the further back the viewer is located, the difference is perceived as natural.
However, if the audio-video timing was reversed, a spectator watching a basketball game would hear the sound of the ball hitting the floor before the ball appeared to make contact with the floor. The arrival of the sound of the ball hitting the floor before the ball appears to have hit the floor would be a very unnatural experience. The discrepancy would seem incorrect even to spectators in the first few rows where the amount of audio-visual synchronization error would be small because the error is in the “wrong” direction. In particular, human perception is much more forgiving for sound lagging behind sight.
International Telecommunications Union (ITU) recommendation ITU-R BT.1359-1 (1998) was based on research that showed the reliable detection of audio-video synchronization errors fell between 45 msec for audio leading video and 125 msec for audio lagging behind video. The recommendation states that the tolerance from the point of capture to the viewer and or listener should be no more than 90 msec for audio leading video to 185 msec for audio lagging behind video. The Advanced Television Systems Committee (ATSC) Implementation Subcommittee (IS) issued a finding (Doc. IS-191 (Jun. 23, 2003)) recommending a tolerance of +/−15 msec.
Conventional solutions synchronize audio and video by setting the audio as master and dropping or repeating frames of video to synchronize the two signals. However, dropping and repeating video frames can affect the quality of the video image presented.
A method of synchronizing audio and video without affecting the quality of the video presented would be desirable. It would also be desirable to switch seamlessly between trick play modes (e.g., x1.5 playback) and normal (e.g., x1) modes without stopping the playback and/or going through a full handshake procedure to avoid a gap in either or both the audio and the video.