1. Field of the Invention
The present invention relates generally to devices and methods used in detecting unauthorized copying, and more particularly such devices and methods which permit detecting unauthorized copying of compressed digital video data.
2. Description of the Prior Art
Recent developments in digital video technology permit transmitting video programs by various means, including broadcasting, that have sufficient quality at a remote receiver to permit recording commercially marketable copies. It is readily apparent that owners of programming content, e.g. movie studios, broadcasting networks, independent producers, etc., are unwilling to distribute commercially valuable properties, even on a pay-per-view basis, using this improved video technology if every receiver can become a recorder for a commercially marketable copy of their property. Accordingly, various proposals have been made for embedding a humanly unobservable but automatically detectable code into video that permits identifying an unauthorized copy, and preferably permits unambiguously determining the process and equipment used in recording the unauthorized copy. Proposals for systems that are capable of embedding such unobservable but detectable codes into video are presently being identified by the word xe2x80x9cwatermarking.xe2x80x9d
An article entitled xe2x80x9cDigital Watermarking: New Techniques for Image Ownership Brandingxe2x80x9d by Chris Okron published in the October 1996 issue of Advanced Imaging at pages 93-94 (xe2x80x9cthe Okron articlexe2x80x9d) discloses embedding a bit string in a digital image which introduces minute changes into the image but the changes are typically below the ability of the human eye to detect. The article further reports that the embedded watermark can survive common image processing operations such as rotation, scaling, scanning, compression, transcoding and clipping as well as outright attacks. One specific technique reported in the article is embedding a small amount of random noise into perceptually significant components of an original digital image. Another technique reported in the article is placing an imitation of naturally occurring random image variations throughout a digital image, automatically varying the intensity of the watermark so it remains invisible in both flat and detailed ares of an image.
A technical paper entitled xe2x80x9cA Low Cost Perceptive Digital Picture Watermarking Methodxe2x80x9d by F. Goffin, et al. published at pages 264-277 of SPIE Vol. 3022, Storage and Retrieval for Image and Video Databases V, Feb. 13-14, 1997, Copyright 1997, The Society of Photo-Optical Instrumentation Engineers (xe2x80x9cthe Goffin articlexe2x80x9d), describes embedding a watermark line-by-line going from the top to the bottom of a digital video frame. Bits of the watermark are encoded through the phase of Maximal Length Sequences (xe2x80x9cMLSxe2x80x9d) which have good correlation properties. Underlying the embedding of the MLSs into lines of the digital video frame is a masking criterion, deduced from physiological and psychophysic studies, that guarantees the invisibility of the watermark. The retrieval of the watermark copyright information does not require using the original picture, thus no human intervention is needed for decoding the watermark. The Goffin article states that Joint Photographic Experts Group (xe2x80x9cJPEGxe2x80x9d) digital compression does not removed an embedded MLS watermark.
Copyrighted works for which watermarking appears more difficult are digital video programs that have been compressed in accordance with the Moving Picture Experts Group (xe2x80x9cMPEGxe2x80x9d) standards, e.g. MPEG I and MPEG II standards. MPEG I is the popular name applied to an International Organization for Standardisation (xe2x80x9cISOxe2x80x9d) and International Electrotechnical Commission (xe2x80x9cIECxe2x80x9d) standard ISO/IEC 11172. ISO/IEC has adopted a corresponding standard, ISO/IEC 13818, for MPEG II. The MPEG I and MPEG II standards respectively define serial system streams that are well suited for quality:
1. video playback from digital storage media such as a hard disk, CD-ROM, or digital video disk (xe2x80x9cDVDxe2x80x9d); and
2. transmission such as over a cable antenna television (xe2x80x9cCATVxe2x80x9d) system or high bit rate digital telephone system, e.g. a T1, ISDN Primary Rate, or ATM digital telecommunications network.
A MPEG I or MPEG II system stream includes a compressed video bitstream that may decompressed to present a succession of frames of digital video data. As illustrated in FIG. 1, a MPEG compressed video bitstream consists of successive groups of pictures (xe2x80x9cGOPSxe2x80x9d) 20. Each GOP 20 includes intra (xe2x80x9cIxe2x80x9d) frames 22, predicted (xe2x80x9cPxe2x80x9d) frames 24, and bidirectional (xe2x80x9cBxe2x80x9d) frames 26. An I frame 22 of MPEG compressed digital video data is both encoded and decoded without direct reference to video data in other frames. Therefore, MPEG compressed video data for an I frame 22 represents an entire uncompressed frame of digital video data. A MPEG P frame 24 is both encoded and decoded with reference to a prior frame of video data, either reference to a prior I frame 22 or reference to a prior P frame 24. A B frame 26 of MPEG encoded digital video data is both encoded and decoded with reference both to a prior and to a successive reference frame, i.e. reference to decoded I or P frames 22 or 24. The MPEG I and MPEG II specifications define a GOP 20 to be one or more I frames 22 together with all of the P frames 24 and B frames 26 for which the one or more I frames 22 are a reference. MPEG II operates in a manner analogous to MPEG I with an additional feature that the I frames 22, P frames 24, and a B frames 26 of the MPEG I GOP 20 could be fields of the I frames 22, P frames 24, and a B frames 26, thus permitting field-to-field motion compensation in addition to frame-to-frame motion compensation.
Regardless of whether an I frame 22, a P frame 24, or a B frame 26 is being compressed, in performing MPEG compression each successive frame 32 of uncompressed digital video data is divided into slices 34 representing, for example, sixteen immediately vertically-adjacent, non-interlaced television scan lines 36. MPEG compression further divides each slice 34 into macroblocks 38, each of which stores data for a matrix of picture elements (xe2x80x9cpelsxe2x80x9d) 40 of digital video data, e.g. a 16xc3x9716 matrix of pels 40.
MPEG compression processes the digital video data for each macroblock 38 in a YCbCr color space. The Y component of this color space represents the brightness, i.e. luminance, at each pel 40 in the macroblock 38. The Cb and Cr components of the color space represent subsampled color differences, i.e. chrominance, for 2xc3x972 groups of immediately adjacent pels 40 within the macroblock 38. Thus, each macroblock 38 consists of 6 8xc3x978 blocks of digital video data that in the illustration of FIG. 1 are enclosed within a dashed line 42. The 6 8xc3x978 blocks of digital video data making up each macroblock 38 includes:
1. 4 8xc3x978 luminance blocks 44 that contain brightness data for each of the 16xc3x9716 pels 40 of the macroblock 38; and
2. 2 8xc3x978 chrominance blocks 46 that respectively contain subsampled Cb and Cr color difference data also for the pels 40 of the macroblock 38.
In compressing all the macroblocks 38 of each I frame 22 and certain macroblocks 38 of P frames 24 and B frames 26, MPEG digital video compression separately compresses data of the luminance blocks 44 and of the chrominance blocks 46, and then combines the separately compressed blocks 44 and 46 into the compressed video bitstream.
Mathematically, the 4 luminance blocks 44 and 2 chrominance blocks 46 of each macroblock 38 respectively constitute 8xc3x978 matrices. Referring now to FIG. 2, compressing each macroblock 38 includes independently computing an 8xc3x978 Discrete Cosine Transform (xe2x80x9cDCTxe2x80x9d) 52 for each of the 6 8xc3x978 blocks 44 and 46 making up the macroblock 38. The 6 8xc3x978 DCTs 52, only one of which is depicted in FIG. 2, respectively map the data of the 6 blocks 44 and 46 into 64 frequency coefficients. Each frequency coefficient in the DCT 52 represents a weighing factor that is applied to a corresponding basis cosine curve. The 64 basis cosine curves vary in frequency. Low cosine frequencies encode coarse luminance or chrominance structure in the macroblock 38. High cosine frequencies encode detail luminance or chrominance features in the macroblock 38. Adding together the basis cosine curves weighted by the 64 DCT coefficients reproduces exactly the 8xc3x978 matrix of an encoded block 44 or 46.
By themselves, the coefficients of the DCT 52 for a block 44 or 46 provide no compression. However, because video data for most macroblocks 38 lack detail luminance or chrominance features, most high-frequency coefficients for the DCTs 52 are typically zero (0) or near zero (0). To further increase the number of zero coefficients in each DCT 52, MPEG encoding divides each coefficient by a quantization value which generally increases with the frequency of the basis cosine curve for which the coefficient is a weight. Dividing the coefficients of the DCT 52 by their corresponding MPEG quantization values reduces image detail. Large numeric values for quantization reduce detail more, but also provide greater data compression for reasons described in greater detail below.
After quantizing the DCT 52, the quantized frequency coefficients are processed in a zigzag order as indicated by arrows 54a-54i in FIG. 2. Applying a zigzag order to the quantized frequency coefficients tends to produce long sequences of DCT frequency coefficients having zero (0) value. Run-length encoding, indicated by an arrow 56 in FIG. 2, is then applied to the zigzag order of the quantized DCT coefficients. For those quantized DCT coefficients that differ from the immediately preceding and succeeding DCT coefficient along the zigzag path, run-length encoding specifies a run-length of zero (0), i.e. a single occurrence of the quantized DCT coefficient. Long sequences of zero (0) coefficients along the zigzag path depicted in FIG. 2, are efficiently encoded using a lesser amount of data. MPEG run-length encoding represents each such sequence of consecutive identical valued quantized frequency coefficients by a token 58, depicted in FIG. 2, which specifies how many consecutive quantized frequency coefficients have the same value together with the numerical value for that set of quantized frequency coefficients.
The tokens 58 extracted from the sequence of quantized frequency coefficients are then further compressed through Huffman coding, indicated by an arrow 62 in FIG. 2. Huffman coding converts each token 58 into a variable length code (xe2x80x9cVLCxe2x80x9d) 64. MPEG assigns values that are only 2-3 binary digits (xe2x80x9cbitsxe2x80x9d) long for the VLCs 64 representing the most common tokens 58. Conversely, MPEG video compression assigns values that are up to 28 bits long for the VLCs 64 representing rare tokens 58. The Huffman coded VLCs 64 thus determined are then appropriately merged into a MPEG compressed video bitstream 66 depicted in FIG. 3.
As illustrated in FIG. 3, a serial MPEG system stream 68 is assembled by concatenating packs 72 of compressed data selected respectively from a MPEG compressed audio bitstream 74 and from the compressed video bitstream 66. The compressed video bitstream 66 and the compressed audio bitstream 74 are both prepared and merged into the system stream 68 by a MPEG encoder 76 depicted in FIG. 4. In the illustration of FIG. 4, the MPEG encoder 76 receives either an analog or a digital video signal from any one of various video sources such as from a video camera 78, from a video tape player 82, from a video disk player 84 or from some other type of video-data storage-device 86. As indicated by an arrow 92 in FIG. 4, the system stream 68 thus assembled by the MPEG encoder 76 may be supplied directly in real-time to a broadcast transmitter 94 located near to the MPEG encoder 76. Alternatively, the system stream 68 may be supplied in real-time to a remotely located broadcast transmitter 94 via some communication channel such as a T1, ISDN Primary Rate, or ATM digital telecommunications network 96. For non-real-time applications, the MPEG encoder 76 may record the system stream 68 onto a general purpose digital video-data storage-device 98 or onto a special purpose digital video-disk storage-device 102 such as a CD-ROM or DVD from which it is subsequently reproduced and supplied to the broadcast transmitter 94.
The broadcast transmitter 94 itself then distributes the system stream 68 in various different ways such to receivers located in some geographic area as a high frequency (xe2x80x9cHFxe2x80x9d) or ultra-high frequency (xe2x80x9cUHFxe2x80x9d) signal that is broadcast conventionally from an antenna 104, or to a satellite 106 via a conventional microwave dish 108. As depicted in FIG. 4, a set-top box (xe2x80x9cSTBxe2x80x9d) 112 that is coupled to a conventional television set 114 may receive the HF or UHF broadcast system stream 68 with an antenna 116, or receive the system stream 68 from the satellite 106 with a conventional microwave dish 118. Yet another way in which the STB 112 may receive the system stream 68 is a coaxial-cable feed 122 provided by a cable antenna television (xe2x80x9cCATVxe2x80x9d) service 124. As illustrated in FIG. 4, the CATV service 124 may itself receive the broadcast system stream 68 indirectly with an antenna 126 from the antenna 104 or with a microwave dish 128 from the satellite 106, or directly from the MPEG encoder 76 via a coaxial-cable feed 132 or other real-time communication channel such as a T1, ISDN Primary Rate, or ATM digital telecommunications network.
As described above, regardless of how the STB 112 receives the system stream 68, the video and audio signals at the STB 112 have sufficient quality to permit recording commercially marketable copies of works that have been decoded from the MPEG encoded system stream 68. Thus, in addition to or instead of providing an analog video signal decoded from the MPEG encoded system stream 68 to the television set 114, unauthorized copies may be made at the STB 112 by supplying the high-quality decoded analog video signal to a video cassette recorder (xe2x80x9cVCRxe2x80x9d) 134, to a video-disk recorder 136, or to a video-data storage-device 86
A technical paper entitled xe2x80x9cWatermarking of MPEG-2 Encoded Video Without Decoding and Re-Encodingxe2x80x9d by F. Hartung, et al. published at pages 264-273 of SPIE Vol. 3020, Multimedia Computing and Networking 1997, Feb. 10-11, 1997, Copyright 1997, The Society of Photo-Optical Instrumentation Engineers (xe2x80x9cthe Hartung articlexe2x80x9d), describes a technique, similar to that proposed in the Okron article, for adding a noise-like signal to video pels. To add a noise-like watermark to MPEG a compressed video bitstream the Hartung article proposes:
1. decoding Huffman encoded non-zero DCT coefficients of the compressed video data stream to obtain the DCT coefficient;
2. adding the corresponding DCT coefficient from DCT processed watermark signal to the decoded DCT coeffi- cient;
3. re-quantize and re-Huffman encode the watermarked DCT coefficient; and
4. if substituting the watermarked DCT coefficient into the compressed video data stream will not increase the bit rate, replacing the un-watermarked DCT coefficient with the watermarked DCT coefficient.
Altering existing a MPEG compressed video bitstream can result in image drift by which an alteration made to a prior frame of MPEG compressed video data may continue to appear during presentation of subsequent frames of decompressed video data. Accordingly, the Hartung article further explains that in watermarking a MPEG compressed video bitstream drift compensation data must also be added which encodes the difference between digital video data predicted using the un-watermarked compressed digital video data and that predicted using the watermarked compressed digital video data.
An object of the present invention is to apply a watermark to compressed digital video data that appears imperceptibly but detectably in digital video data decompressed therefrom.
Another object of the present invention is apply a watermark to compressed digital video data that it is impractical, preferably impossible, to remove even if the basic watermarking technique were publicly known.
Another object of the present invention is to apply a watermark to compressed digital video data that is distributed throughout a sequence of video frames.
Another object of the present invention is to apply a watermark to compressed digital video data that cannot be removed by an unauthorized party without unacceptably degrading image quality.
Another object of the present invention is apply a watermark to compressed digital video data that is capable of unambiguously identifying the process and equipment used in recording an unauthorized copy.
Another object of the present invention is to apply a watermark to compressed digital video data easily.
Another object of the present invention is to apply a watermark to compressed digital video data without decompressing the compressed digital video data.
Another object of the present invention is to apply a watermark to compressed digital video data knowing only locations within the compressed digital video data at which the watermark is to be applied.
Another object of the present invention is to apply a watermark to compressed digital video data that does not increase the compressed video data bit-rate.
Another object of the present invention is apply to compressed digital video data a watermark that can be augmented at each step in transmitting the compressed digital video data.
Another object of the present invention is to apply a watermark to compressed digital video data that persists through recompression of decompressed digital video data.
Briefly, the present invention in one aspect includes a method for adding a watermarks to a compressed video bitstream. The method selects a plurality of sites within the compressed video bitstream that encode a DCT coefficient which is apt for modification to embed a watermark into the compressed video bitstream. While various criteria described in greater detail below affect the selection of apt sites for watermarking, such sites are preferably located in MPEG B frames 26, and the DCT coefficient to be modified preferably has a run-length of zero (0). Having determined a plurality of watermarking sites, the method then modifies the DCT coefficient for at least some of the selected plurality of sites thereby embedding the watermark into the compressed video bitstream. The watermarked compressed video bitstream containing the modified DCT coefficients is then transmitted by any of various different methods well known to those skilled in the art for reception and presentation of the watermarked video.
In another aspect the present invention includes a system for detecting an unauthorized copy of a video by identifying a watermark embedded into a compressed video bitstream. The watermarked compressed video bitstream is produced from an un-watermarked video by changing values of selected DCT coefficients at watermarking sites in the compressed video bitstream. The system detecting an unauthorized copy includes an original video input for receiving a video signal of the un-watermarked video, and a copy video input for receiving a video signal of the copy of the video which may possibly include the watermark. A frame differencer also included in the system receives from the original and copy video input the received video signals, and produces a synchronized, frame-by-frame difference between the video signals.
The system includes means for identifying the watermark when embedded into the video signal received by the copy video input. This watermark identification means may be implemented in two (2) different ways, or in both ways. Either of the two (2) watermark identification means receives the frame-by-frame difference produced by the frame differencer and also receives a site list containing data which specifies characteristics of sites at which the watermark may be embedded into the video signal received by the copy video input. One watermark identification means includes a digital-to-analog converter that receives the frame-by-frame difference produced by the frame differencer and converts the difference into an analog video signal. This particular watermark identification means also includes a video monitor for receiving the analog video signal produced by the digital-to-analog converter and for visually displaying watermarking sites. An alternative watermark identification means includes a frame analyzer that receives the frame-by-frame difference produced by the frame differencer, and automatically determines if a watermark occurs at a watermarking site. A particularly preferred embodiment of the frame analyzer computes a signal-to-noise ratio (xe2x80x9cSNRxe2x80x9d) between the DCT coefficient at the watermarking site in a DCT computed from the frame-by-frame difference, and other coefficients of the DCT computed from the frame-by-frame difference.
These and other features, objects and advantages will be understood or apparent to those of ordinary skill in the art from the following detailed description of the preferred embodiment as illustrated in the various drawing figures.