1. Field of the Invention
The invention relates to the creation, manipulation, transmission, storage, and especially synchronization of multi-media entertainment, educational and other programming having at least video and associated information. The invention will find particular use with respect to the creation and distribution of television programs.
2. Background Art
The creation, manipulation, transmission, storage, etc. of multi-media content, be it entertainment, educational, scientific, business, and other programming having at least video and associated information requires synchronization. Typical examples of such programming are television and movie programs, motion medical images, and various engineering and scientific content. These are collectively referred to as “programs.”
Often these programs include a visual or video portion, an audible or audio portion, and may also include one or more various data type portions. Typical data type portions include closed captioning, narrative descriptions for the blind, additional program information data such as web sites and further information directives and various metadata included in compressed (such as for example MPEG and JPEG) systems.
Often the video and associated signal programs are produced, operated on, stored or conveyed in a manner such that the synchronization of various ones of the aforementioned audio, video and/or data is affected. For example the synchronization of audio and video, commonly known as lip sync, may be askew when the program is produced. If the program is produced with correct lip sync, that timing may be upset by subsequent operations, for example such as processing, storing or transmission of the program.
One aspect of multi-media programming is maintaining audio and video synchronization in audio-visual presentations, such as television programs, for example to prevent annoyances to the viewers, to facilitate further operations with the program or to facilitate analysis of the program.
The video and audio signals in a television system are increasingly being subjected to more and more steps of digital processing. Each step has the potential to add a different amount of delay to the video and audio, thereby introducing a lip sync error. Incorrect lip sync is a major concern to newscasters, advertisers, politicians and others who are trying to convey a sense of trust, accuracy and sincerity to their audience. Studies have demonstrated that when lip sync errors are present, viewers perceive a message as less interesting, more unpleasant, less influential and less successful than the same message with proper lip sync.
Because light travels faster than sound, we are used to seeing events before we hear them—lightning before thunder, a puff of smoke before a cannon shot and so on. Therefore, to some extent, we can tolerate “late” audio. Unfortunately, as shown in FIG. 1, even in a simple television system, the video is almost always delayed more than the audio, creating the unnatural situation of “early” audio. Any one contributor to the lip sync error may or may not be noticeable. But the cumulative error from the original acquisition point to the viewer can easily become both noticeable and objectionable. The potential for lip sync errors increases even further when MPEG compressed links are added to one or more stages of the overall system.
As shown in a typical television FIG. 1, as video moves from video pickup devices, typically CCD cameras, 101 and 111, to frame synchronizers, 103, production switchers, 121, digital video effects, 121 and 131, noise reducers and intermediate transmitters, 135, and receivers, 141, including MPEG encoders and decoders, more frame syncronizers 143, local transmitters, 151, tuners and demodulators, 161, and TVs with digital processing, 171 and the like, and as the audio goes from remote and studio pickup, 101 and 111, to an audio board, 123, further audio processing, 133, intermediate transmitters, 135, and receivers, 141, through audio limiters, 145, and local transmitters, 151, to a tuner-demodulator, 161 and an audio amplifier and speaker, 173, the video is delayed more than the audio. The cumulative delay of the video with respect to the audio can be 6 or more frames. With the inclusion of video and audio compression in any part(s) of the system the video delays with respect to audio can be much more. Worse yet, the amount of video delay frequently jumps by a frame or more as the operating mode changes, or as frames of video are dropped or repeated to achieve synchronization of the video to studio and other references. Using a fixed audio delay to “mop up” the audio to video timing errors is rarely a satisfactory solution because of the constantly changing video delay.
While not shown in this typical system of FIG. 1, data is frequently carried along with the video signals through much of the system, via separate paths, thus when the video is delayed as described above, the timing of the data relative to the video is disrupted. Using a fixed data delay to “mop up” the data to video timing errors is rarely a satisfactory solution because of the constantly changing video delay
Standards committees in various countries have studied the lip sync problem and have set guidelines for the maximum allowable errors. For the most part, these studies have determined that lip sync errors become noticeable to most viewers if the audio is early by more than 25-35 milliseconds (about 1 NTSC frame) or late by more than 80-90 milliseconds (2.5-3.0 NTSC frames). In June of 2003, the Advanced Television Systems Committee (ATSC) issued a finding that stated “ . . . at the inputs to the DTV encoding device . . . the sound program should never lead the video program by more than 15 milliseconds, and should never lag the video program by more than 45 milliseconds.” The finding continued “Pending [a finding on tolerances for system design], designers should strive for zero differential offset throughout the system.” In other words, it is important to eliminate or minimize the errors at each stage where they occur, instead of allowing them to accumulate.
Fortunately, the “worst case” condition in FIG. 1 is now less likely to present itself than was the case a few years ago. Firstly, it is now quite common to install audio tracking delays, exemplified by the Pixel Instruments AD-3000 or AD-3100, alongside each video frame synchronizer or other video delay devices having delay signal outputs, thereby eliminating at least one common source of variable lip sync errors. The AD-3000 and AD-3100 variable audio delays are available from Pixel Instruments Corp. of Los Gatos, Calif.
Secondly, due to the continuing cost effectiveness of digital electronics, newer master control switchers have an internal DVE for squeezeback operation rather than an external DVE. This allows the use of a constant insertion delay of 1 frame for both the video and the audio paths in all modes of operation.
Unfortunately, again due to the continuing cost effectiveness of digital electronics, newer master control switchers are now incorporating built in video frame synchronizers, scan converters and other video delaying circuitry.
Since the 1970s, digital video effects processors (DVEs or transform engines) have been used to produce “over the shoulder”, “double box” and other multiple source composited effects. The video being transformed is delayed (usually by one or more frames) relative to the background video in the switcher. So, any time one or more DVE processors are on-air, the associated video sources will be delayed, resulting in a lip sync error. In the past, when the DVE processor was external to the switcher, a tally signal from the switcher could be used to trigger the insertion of a compensating audio delay when the DVE in on-air. However, today's production switchers are usually equipped with internal DVEs and a tally output is no longer available.
Thus, a need exists for a method, system, and program product for producing time synchronized multi-media signals.