1. Field of the Invention
The present invention relates generally to digital video technology and more particularly to matching techniques for detecting copies of a video using spatial and temporal factors. The techniques may be realized as methods, various steps/aspects of which may be performed by an appropriately configured apparatus, or may be embodied as a program of instructions, e.g., in the form of software on a device-readable medium.
2. Description of the Related Art
The ubiquitous nature of the Internet and the widespread availability of cost-effective digital storage has made copying, transmitting and storage of digital media almost effortless. As these tasks have become easier, protecting the Intellectual Property Rights (IPR) of such media has become more important. Detecting copies of digital media (images, audio and video) has become a crucial component in the effort to protect the IPR of digital content. Indeed, IPR is one of the main driving forces behind newly proposed standards regarding the copying of digital media, such as the proposed MPEG-21 standards. There are generally two approaches to digital media copy detection: watermarking and content-based copy detection.
Watermarking is a process that embeds information into the media prior to distribution. Thus, all legitimate copies of the content contain the identifying watermark, which can later be extracted to establish ownership.
Content-based copy detection, on the other hand, does not require additional information, beyond the media itself. Generally, an image or video contains enough unique information that can be used for detecting copies, especially illegal copies. Content-based copy detection schemes extract a signature from a test media, which is then compared to the signature extracted from the original media to determine if the test media is a copy of the original. The primary advantage of content-based copy detection over watermarking is the fact that no embedding is required before the media is distributed. Nevertheless, content-based copy detection schemes must also be sufficiently robust to properly handle media that has been modified by a third party for the purpose of avoiding copy detection.
Content-based copy detection algorithms have numerous uses. Such an algorithm can be employed in connection with a multimedia search engine to improve its retrieval efficiency by detecting and removing copies from the retrieval results before the search results are displayed. Content-based copy detection is also useful for media tracking, which involves keeping track of when and where a particular known piece of media has been used.
Color histogram-based methods, such as the histogram intersection method, have been used in content-based image/video retrieval systems. However, they are not suitable for copy detection systems since the color histogram does not preserve information about the spatial distribution of colors. The partition approach, which involves choosing a set of colors that describe all of the image colors and partitioning the image into sub-images, has been proposed. Here, the color information of each partition is obtained by a local color histogram. The similarity of two images is measured by comparing their local color histograms, and by considering the similarity of all the sub-images. However, the partition method comes with a high computational cost and requires a long search time. Additionally, this method will not detect images that have had their spatial outlay modified.
A sequence matching method, based on a set of key frames (or sub-sampled frames), has also been proposed. Although motion information is included with the key frames, it is not yet clear if the selected frames are appropriate to fully reflect the “action” within the video sequence. To match video clips, a variation of the method involving the intersection of linearized histograms of the DCT frames from the MPEG video was used. However, this technique did not address the variations between copies, such as signal modifications as well as display format conversions.
Another approach to matching video sequences is a correlation-based method, which is based on the sum of pixel differences between two image frames. Let I1 and I2 represent intensities in two image frames. There exists N tuples (I11, I21), . . . , (I1n, I2n), . . . , (I1N, I2N), wherein N denotes the number of pixels (or blocks) in an image. The quantity (Σi=1N|I1i−I2i|)/N measures the distance between (I1, I2). However, this distance measure is not robust, in that outlying pixels (or blocks) can distort the distance measure arbitrarily. It is also not robust to nonlinear intensity variations at corresponding pixels.
To avoid this substantial problem, the use of ordinal measures for stereo image matching was proposed. In such use, the ordinal variable is drawn from a discrete ordered set, such as school grades. The ratio between two measurements is not of importance; only their relative ordering is relevant. The relative ordering between measurements is expressed by their ranks. A rank permutation is obtained by sorting the measurements in ascending order and labeling them using integers [1,2,3, . . . , N], N denoting the number of measurements. An example of using ordinal measures is as follows: an image is partitioned into 3×3 equal-sized blocks, as shown in FIG. 1(a) which makes the system independent of input image sizes, and the 3×3 sub-image is calculated by taking the average intensity value of each block. The average values for the blocks are shown in FIG. 1(b). This array is then converted to a rank matrix as shown in FIG. 1(c). Suppose that the average intensity values in FIG. 1(b) are increased by 10 in the copied image so its sub-image has values: {{74, 71, 56}, {145, 156, 126}, {195, 184, 155}}. The rank matrix is not sensitive to the intensity value changes, and thus perfect matching with original image can be achieved.
Since it was first proposed for stereo image matching, the ordinal measure of pixel (or block) values has shown promising results on image/video matching. In one such matching method, each image frame is partitioned into 3×3 blocks, and the ordinal measure for each block is computed. This ordinal measure is referred to as a fingerprint. Then the sequences of fingerprints are compared for video sequence matching. Comparing this technique with techniques using motion signature and color signature, it was shown that matching by ordinal signature had the best performance, followed by the motion signature. Matching on the basis of color signature had the worst performance. An adaptation of this measure has been successfully used for image copy detection, and it was shown that the ordinal measures were very robust to various signal modifications.
However, there are two issues concerning the performance of this adaptation: its robustness and discriminability. Robustness determines the amount of data inconsistency that can be tolerated by the system before mismatches begin to occur, while the discriminability is concerned with its ability to reject irrelevant data such that false detections do not occur. A critical factor balancing between those conflicting issues is the number of partitions. As might be expected, the system becomes more robust as the number of partitions is reduced. Conversely, the discriminability becomes higher as the number of partitions increases.
While much work has been done in the field of video copy detection, further work is required, in particular further consideration of the issues of discriminability and partitioning, in designing a more robust video copy detection scheme.