The present invention relates to data transmission, and more particularly to a method and apparatus for transparently embedding data within a video signal in order to indicate authentication, program ownership and/or reception-verification of the video signal.
Television signals are usually copyrighted or otherwise proprietary to the originator and, in the case of television network distribution, are distributed to affiliate stations for re-broadcast. Unauthorized re-broadcast is difficult to detect since it is often difficult to determine the creator or originator of the material from the material itself. This is particularly true for short sequences or when the video image has been cropped or modified to mask any identification logos that may have been explicitly burned into the video image.
Another common problem with television and video distribution, which is becoming more severe, is the synchronization of an aural or audio component that is to be distributed with the video signal. Modem digital signal processing techniques requiring large buffering, such as MPEG compression, add latency to the video distribution and, since the audio may be distributed or processed separately, an error in audio to video "lip-sync" or in sound to action often occurs. Sometimes this latency is variable, requiring continual re-synchronization. In addition to audio to video synchronization problems, the audio signal may be distorted or missing entirely. By embedding, for example, the audio envelope as data within the video signal, it is also possible to detect and compare the received audio with the original which was coded and embedded as data within the video signal as a quality measure as well as video to audio delay.
Another problem is with authentication of video signals representing visual images, possibly created by a computer, where it is desirable to detect a forgery or imposter signal. In this case the video signal may be replicated to such a degree that it is difficult to detect it from the authentic video sequence. Also it is sometimes desirable to detect the beginning and/or ending of a particular sub-sequence of a video signal, such as a motion test sequence for in-service video quality assessment of video processed by MPEG compression. It is possible to devise a remote receiver to capture a special test sequence segment of a distributed video signal and compare that segment to an undistorted, stored version to assess the quality of the video. Although it is possible to uniquely and reliably detect when the sequence occurs so that it, and only it, is captured, this generally requires the advance processing of a signature vector, transmission of that vector through a separate channel and a correlation means at the receiver. By transparently embedding data into the video a predetermined start and end code may be detected by the receiver without any preloading of signature vectors and signature vector preprocessing.
Published Canadian Patent Application No. 2,174,413 (A1) by Geoffrey B. Rhoads entitled "Identification/Authentication Coding Method and Apparatus" discusses techniques for providing authentication of image signals. In Rhoads an imperceptible N-bit identification code is embedded throughout an image with a small noise pattern in a coded fashion. In particular bits of a binary identification code are referenced sequentially to add up to N independent, noise-like patterns to the original image signal. The detection of these patterns is done by N sequential correlations with stored replicas of each pattern. This may also be done simultaneously by N correlators, as is well described in the public domain as a "correlation receiver." Rhoads further discloses adding or subtracting exactly N independent, noise-like images to improve detection and/or encoded image quality. This later modification, referred to by Rhoads as "true polarity", is well described in the public domain as "bi-orthogonal signaling" for a correlation receiver allowing 2 N composite symbols (patterns) to be created by adding or subtracting N bi-orthogonal symbols (patterns). A disadvantage of both of Roads' methods is that each of the N noise-like patterns, which are added or subtracted to form one of the 2 N possible composite patterns, needs to be properly scaled and designed to minimize image degradation by the addition of the composite pattern to the image. A further disadvantage is that the original unencoded image, as well as the N patterns, need to be stored in the receiver so that the unencoded image may be subtracted from the encoded image for detection.
U.S. Pat. No. 4,969,041 issued Nov. 6, 1990 to William J. O'Grady and Robert J. Dubner entitled "Embedment of Data in a Video Signal" discloses adding one of a plurality of low-level waveforms to the video signal, with the level of the low level waveform being below the noise level of the video signal, for authentication purposes or for transmitting information. Each low-level waveform corresponds to a particular data word being embedded. At the receiving end the video signal is correlated with an identical set of low-level waveforms to produce a set of correlation coefficients--the highest correlation coefficient indicating the presence of a particular one of the low-level waveforms which is then converted into the corresponding data word. The Rhoads composite pattern consists of the summation of up to N independent patterns, which would be the same as O'Grady/Dubner when N=1 since only one pattern is sent at a time. For an N-bit data word O'Grady/Dubner implies the need for storage of 2 N patterns rather than the N patterns of Rhoads. But since 2 N patterns may be generated by summing N bi-orthogonal sub-patterns, the effect is the same and only N patterns need to be stored in a manner identical to Rhoads. In both of these patents the symbol rate is one pattern per picture or field.
Some other drawbacks of O'Grady/Dubner are that it limits the degree to which a pattern or sequence of patterns can be hidden by not fully exploiting the limitations of the human psychovisual process, such as spatial masking; it does not fully exploit the available signal to noise ratio since the correlation output is unipolar; and it does not include coding of the data represented by the embedded symbols so as to provide temporal redundancy for error correction over missing video frames.
What is desired is a method of embedding an unobtrusive data pattern into the video signal that is useful in addressing the above-noted problems.