The extensive use of video content on the Internet has led to a need for a system that allows statistics on the use and distribution of video material to be gathered from computers and computer networks, etc. A key function of such a system is the reliable and efficient identification of video content which might involve the separate or combined use of metadata, digital watermarking and digital fingerprinting.
Metadata can be added to digital files of video content in order to be able to easily identify the video content (as well as for other reasons). However, this metadata can be easily removed by (malicious) parties who do not wish the video content to be easily identified.
Digital watermarking of video content is the process of embedding auxiliary information into a digital representation of the video content, which can later be used to identify the content. Ideally, such watermarking of video content should be imperceptible to a viewer and robust to any editing or transcoding that the video might undergo (as well as being difficult for a user to simply remove the watermark). The design of robust and imperceptible digital watermarking techniques is challenging and has attracted much research effort. The metadata and watermarking approaches have a number of drawbacks. There is the need for insertion of a watermark or metadata in all possible versions of the signal. Then, even if every media file were to originally include such identification data, the techniques are vulnerable to tampering and files could be “unlocked”. Once unlocked, metadata and watermarking techniques cannot be used to re-identify data, so that content could be distributed and used without risk of detection.
Digital fingerprinting refers to a method of identifying and matching digital files based on digital properties of those files. Representing a large data item, such as piece of digital video content, by a relatively small digital fingerprint allows for the efficient identification of copies. The fingerprinting process first requires the analysis of digital video content of interest to build an indexed database of digital fingerprints. A query signal may then be analysed by the extraction of a query fingerprint (from the query signal) and the comparison of that fingerprint with the database of known digital fingerprints. Digital fingerprinting may be performed on properties of the raw data file, but such techniques are not robust to the effects of processes such as transcoding, resampling and re-editing.
However, more robust performance may be achieved by basing the digital fingerprints on properties of the underlying video, such as trends in luminance, color, pixel positioning and visual attention.
Digital fingerprinting on properties of the video content has the advantage over metadata and watermarking that no auxiliary data needs to be added to the digital files. These files cannot then be “unlocked” without significantly changing properties of the underlying video content. Even then, such changes might be rendered ineffective by suitably upgraded fingerprint profiles. Such “future proofing” is a significant advantage of the use of digital fingerprinting for video content identification.
Digital fingerprinting techniques should be reliable, robust and efficient. Reliable means that the process used to generate and compare fingerprints should be such that a fingerprint extracted from an unknown signal is reliably associated with the correct indexed fingerprint if it is present in the indexed database of digital fingerprints. Robust means that reliability should not be seriously affected by transcoding, re-sampling, re-editing, etc of the video signal. Finally, efficient means that the computational complexity of both the calculation of the query fingerprint and of performing the database comparison must be kept to practical limits (in addition the size of the database should be kept to a practical size, although, since the size of the database is likely to affect the complexity of performing a comparison, this may be a corollary of the database comparison constraint). Most research to date of which the Applicant is aware has focused on reliability and robustness aspects of digital fingerprinting, with analysis performed in the pixel domain. Such analysis requires the decoding of the video content, which, most especially for the latest compression techniques such as H.264, has significant processing implications. The complexity of decoding the video content can restrict the practical application of pixel-based fingerprinting for video identification, particularly where storage and processing limitations apply.
US 2006/0187358 describes a digital video content fingerprinting system which has improved efficiency compared to systems in which analysis is performed in the pixel domain. In this system only a very crude decoding is performed to obtain “DC images” (which approximately comprise frames of macroblock resolution only where the luminosity assigned to each macroblock corresponds approximately to the average luminosity of the “actual” pixels within that macro block (by “actual” it is meant the pixels that would result from doing a full and proper decoding of the compressed video content)). Although the DC image frames which result from this crude decoding are indeed very crude, it is sufficient to obtain a useful fingerprint and the amount of processing required to perform the crude decoding is much less than would be required to perform a full decoding to get to the pixel level.
Ramaswamy and Rao “Video authentication for H.264/AVC using digital signature standard and secure hash algorithm”, proceedings of the 16th Annual International workshop on network and operating systems support for digital audio and video, Nossdav 2006, XP002620466 describes a method of generating a digital signature for a piece of digital video which is computationally efficient, will detect even small tampering to the video and including various spatial and temporal manipulations of the video and can also point out the reason for an authentication failure if the video has been tampered with (including the group of pictures within which the tampering has been detected). It operates by taking certain coefficients (e.g. the DC coefficient and the first two AC coefficients) of every coded macroblock in every frame—i.e. without selecting a set of identified macroblocks satisfying a threshold criterion (with some macroblocks failing the threshold criterion and thus not being selected). In this way, any tampering of the video should be detected.
Shahabuddin et al “Compressed-domain temporal adaptation-resilient watermarking for H.264 video authentication” Multimedia and Expo, 2009—ICME 2009—IEEE Int. Conference on, IEEE, Piscataway, N.J., USA 28 Jun. 2009 pages 17-52-1755, XP031511116 describes a watermarking system in which a robust watermark is inserted into a digitally encoded piece of video so that the video can later be identified by recovering the watermark form the watermarked piece of digitally encoded content.
Saadi et al “Combined fragile watermark and digital signature for H.264/AVC video authentication” Proceedings of the 2009 European Signal Processing Conference (EUSIPCO-2009) pages 1799-1803XP002620467 describes a method of generating a digital signature and then inserting it into the video as a fragile watermark. The method of generating the digital signature is the same as in the Ramaswamy and Rao paper identified above.