In many entertainment and other communication systems, it is desirable to synchronize media signals such as audio and video signals. Typically, such signals are initially generated or provided by an audio/video source (i.e. a video tape player, a DVD player or set-top television decoder) in a pair (or more) of time synchronized signals. The audio and video signals may then be processed and transmitted to an audio/video destination (i.e. a television) via different transmission paths that typically include signal processing equipment and transmission links. As a result of different delays in the different transmission paths, the signals may become out of sync with one another. This is often referred to as the “lip sync” problem. The word “sync” is commonly used in this context as a short form for “synchronization”.
There are several existing methods for measuring and correcting “lip sync” errors including side information embedding, test signal generation and detection, and end of path checking. In some implementations of the side information embedding method a system embeds timing data in each of the media signals. The timing data may consist of time code or sequence numbers that are embedded in the coded media signals and/or in the media stream encapsulating the coded media signals. In other implementations, side information consisting of random or pseudo-random data is embedded in a side field in the media signal (e.g. VBI of uncompressed video) or as a watermark in the media content of the signal. Synchronization errors can then be detected by correlating the embedded data at the destination.
In the test signal generation and detection method, test signals are generated at the source. These signals usually contain pulses or transitions of certain patterns that are synchronized to each other. Then at the destination, the amount of time skew between the pulses or transitions from different media signals is detected to determine the amount of synchronization error.
In the end of path checking method, the two or more media signals are compared at the destination and the synchronization error is estimated and corrected. For example, by correlating spoken audio with moving lips, one may calculate the amount of delay between audio and video.
Each of these existing methods has various limitations. For example, in the side information embedding method, the side information may be dropped or corrupted in the path, and if the side information is watermarked, it may corrupt the content of the media signal. The test signal generation and detection method can only be used when the system is not in service and accordingly, if there is a delay change in the path when in service, then the synchronization would be incorrect until the next out-of-service check. One limitation of the end of path checking method is that it is limited to certain types of content, such as video containing facial movements and audio containing the corresponding speech.
Accordingly, there is a need for a method for measuring and correcting “lip sync” errors that can be done in-service, is non-intrusive and works with many types of media signals and transmission paths. Similarly, there is a need for a system that implements such a method.