Audiovisual (e.g. television and motion picture) programming involves the combination of an audio stream with a video stream. The creation and distribution of audiovisual programming typically involves a chain of capture, storage, processing, and transmission steps. At various points in this chain the audio and/or video streams may be represented, for purposes of storage, processing or transmission, as analog or digital signals. An audio stream may include one channel (mono), two channels (stereo) or more than two channels (e.g. surround sound) of signals. A video stream includes a series of pictures, where each picture represents the appearance of a visual scene at an instant, or during a short interval, of time.
For proper presentation of the audio and video streams to a viewer, it is desirable for the audio and video streams to be in proper temporal alignment. When a program's audio and video streams are presented in proper alignment, the sound of a speaker's voice occurs at very nearly the same time that the speaker's lips move, and other visual actions are likewise appropriately synchronized with their associated sounds. When a program's audio and video streams are not properly aligned, the lack of synchronization between visual events and audible events may be disturbing to the viewer.
In typical systems for transmission and/or distribution of audiovisual programming, the audio and video streams are often captured, stored, processed and transmitted independently from one another. Consequently the problem arises to maintain proper synchronization between audio and video streams. Increasingly, both broadcasting engineers and the viewing audience are becoming aware that audio-video synchronization errors in broadcasting (usually realized as problems with lip-sync) are occurring more frequently and often with greater magnitude.
There is a desire in the industry to identify sources of differential audio-video delay in television production, post-production, and distribution; audio-video delay issues through professional MPEG encoding and decoding systems; and differential audio-video delay arising in consumer receiver, decoding, and display devices. There is also a need for out-of-service methods of measuring differential audio-video delay; for in-service (during program) methods of measuring differential audio-video delay; and for devices for correcting differential audio-video delay at different points in the broadcast chain.
Two industry references relating to synchronization issues are the ATSC Implementation Subcommittee Finding: “Relative Timing of Sound and Vision for Broadcast Operations” (see www.atsc.org/standards/is—191.pdf) and the ITU recommendation: ITU-R BT.1359-1, “Relative Timing of Sound and Vision for Broadcasting” (available from www.itu.int/publications/bookshop/index.html). Some broadcasters have adopted target tolerances for synchronization errors that are more restrictive than those set forth by the ATSC and ITU.
A variety of methods exist for ensuring proper audio and video synchronization at some points in the aforementioned audiovisual programming chain. Some examples of existing methods are the clapperboard, used in postproduction of motion pictures, and Presentation Time Stamps, which are included in the MPEG-2 Systems Specification.
Although each of these known methods is effective at maintaining audio and video synchronization at one or more points in the chain, at the present time there is a scarcity of effective means for ensuring audio and video synchronization in a comprehensive manner. As a result it is not uncommon for some viewers of audiovisual programming to be subjected to a poor viewing experience as a result of non-synchronization between the audio and video streams.