The invention relates to the automated measurement of relative audio to video timing in audio visual communications systems. The invention is of particular use in film and television type systems, as well as any other system where the audio and vision portions of a program are carried separately and delays to one or the other, or both, cause missynchronization of an associated audio and vision signal. This missynchronization is commonly referred to as lip sync error. Other uses and purposes for the present invention will also become known to one skilled in the art from the teachings herein.
1. Field of the Invention
The field of the invention includes the use of film, video and other storage and distribution systems where audio and video are carried, at least in part of the system, in separate transmission, processing or storage paths. Delays in one path or the other or both can cause missynchronization of the visual and associated audio signal, causing a need to delay the earlier of the two or more signals to place the signals back into synchronization. The present invention is for automatically measuring the relative delay or timing difference between a visual and one or more associated audio signals by comparing particular types of movements and other cues in the visual signal to particular types of sounds and silences in the audio signal. The system is automatic and will operate where the delays may be automatically continuously changed or operator adjusted from time to time.
2. Description of the Prior Art
It is well known in television systems to provide for timing adjustment of audio to video signals by various means of measuring video delays, coupling the amount of video delay or advance to an appropriate audio or video delay and delaying the earlier arriving of the two to place them back in synchronism. Such systems are shown for example in U.S. Pat. No. 4,313,135 to Cooper, RE 33,535 (U.S. Pat. No. 4,703,355) to Cooper and U.S. Pat. No. 5,202,761 to Cooper and U.S. Pat. No. 5,387,943 to Silver.
The prior art systems by Cooper mentioned above all either need the availability of video signal before the delay to encode a timing signal or as a reference to measure video delay and are not intended to operate with audio and video signals which are received out of sync.
The Cooper '135 Patent shows measuring relative video delay by inspecting relatively delayed and undelayed video and delaying audio by a corresponding amount. The Cooper '535 patent shows measuring the relative audio to video delay by use of timing signals embedded in the video, and the use of delays or variable speed controls on the playback mechanisms to maintain or achieve proper synchronization. The Cooper '761 patent also shows a different approach to measuring the relative audio to video delay by use of timing signals embedded in the video, direct control of video and audio delays, as well as the use of delays or variable speed controls on the memory playback mechanisms to maintain or achieve proper synchronization. For the sake of brevity, the term AV sync will be used to mean all the various types of systems and signals in which there is audio and visual program information timing.
The prior art Silver system is intended to operate with audio and video signals which are received out of sync without access to the video signal before delay is added, but is not automatic. The Silver system only shows how to semiautomatically measure the relative timing but does not show an automatic system in that the operator must always locate the lips in the video frame. A disadvantage of Silver's invention is that it is easily fooled since his suggested method of correlating motion vectors with audio characteristics is not very precise.
One reason for the correlation being fooled is that there is a one to one correspondence between opening and closing of the mouth, and the opening and closing occurs at fairly regular intervals, and since there will be one closing for every opening. Silver did not realize these problems, since he states that "the processor matches peaks in the audio data with open mouths in the video data and valleys in the audio data with closed mouths in the video data" (Col. 3 l 25-32). In addition, Silver requires an operator to locate the mouth.
Silver does not show how to automatically keep the AV timing in sync. The Silver '943 patent shows a "semiautomatic" system for detecting relative audio to video timing errors (Col. 1, lines 52-55) which uses "Motion vectors [which] are computed by the motion detector from the extracted video data on a frame-by-frame, or field-by-field, basis to determine the direction and relative amount of motion between the lips. Also the size of the mouth opening is determined. Any one or more of common motion detection/estimation algorithms may be used, such as sub-pixel block matching, to generate the motion vectors" (Col. 2, lines 41-49). The video data which is used for these computations comes from "The portion of the video signal from the frame buffer corresponding to the lips as outlined by the operator is input to a motion detector to product motion vectors".
About his operation, Silver says "The processor 16 correlates zero motion vectors with open and closed mouths. Then, using a time correlation technique of sliding the data records in time relative to each other, the processor matches peaks in the audio data with open mouths in the video data and valleys in the audio data with closed mouths in the video data".