1. Field of the Invention
The present invention relates to an apparatus and method for video fingerprinting. More particularly, the present invention relates to an apparatus and method for robust low-complexity video fingerprinting.
2. Description of the Related Art
In the last decade there has been a proliferation of digital videos attributed to advancements in video camera technology and the Internet. Copyright infringement and data piracy have recently become serious concerns for the ever growing video repositories. Videos on commercial sites are usually textually tagged and these tags provide little information to prevent copyright infringements.
Video content is distributed widely through various transport streams. During this distribution process, a video sequence may be altered, intentionally or otherwise, through various processes such as encoding artifacts, logo insertion, resizing, etc. When the video sequence arrives at a playback device for viewing, a mechanism for correct identification of the altered video is desirable for at least four reasons. First, content creators often invest large amounts of resources to create video sequences, including, for example, movies and television programs. Correct identification of altered videos can deter piracy, thus protecting the content creator's investment. Second, correct identification of altered videos enables improved parental control of viewed content by automatic blocking of videos identified as unsuitable for viewing by their children. Third, correct identification of altered videos allows automatic audience measurement for the identified video sequence. Fourth, correct identification of altered videos is a requirement of the Advanced Television Standards Committee (ATSC) 2.0 Standard for Internet Enhanced TV.
Several related art methods exist to allow video identification at a playback device. However, none of these related art methods provides correct identification robust to alterations. For example, textual tagging of video content is a simple method for video identification. A movie, for example, may have text tags attached which indicates the movie's title, director, writer, producer, studio, cast members, genre, etc. Unfortunately, the tags are often destroyed during the distribution process or by unscrupulous pirates, and have to be placed manually most of the times. This is not unexpected; pirates, for example, will take active steps to avoid their piracy being detected, and therefore will remove identifying tags when able to do so. Steganography is another video-identification method in which the identity is embedded obscurely within the video. For example, identification information may be hidden by using a least significant bit of each hundredth pixel of a key frame. Such a method of embedding information in a video would be essentially undetectable by the human eye. But this method is thwarted by alterations, particularly noise insertion.
Video fingerprinting is an identification method that survives noise attacks readily. This method consists of two stages. The first is the feature extraction stage where compact fingerprints/signatures are extracted from the video. This is followed by the matching stage where these signatures are matched against a database of copyright videos and the status of the query videos is determined. Below is a brief survey of the known related art in common feature extraction and matching algorithms, and their disadvantages.
In several video fingerprinting applications the first step is to identify key frames in a video. Key frames usually correspond to extrema in the global intensity of motion. But, key-frame selection algorithms are computationally intensive. Further, key-frame selection can be affected significantly by heavy artifacts such as severe compression or camera capture. Therefore, using the entire video sequence for video fingerprinting is preferred. The extracted features can either be global in the image domain, in the transform domain, or local in the image domain.
Global features like Scalable Color descriptor, Color Layout descriptor, and Edge histogram descriptor have been used in video-clip matching But in general, local image features are more robust to artifacts (video tampering/modification) which are localized, and hence are preferred to global features.
Compact Fourier Mellin Transform (CFMT) descriptor provides a concise and descriptive fingerprint for matching However, transforming the image frames to a different domain incurs significant computational complexity.
Local interest point based features such as Scale-Invariant Feature Transform (SIFT) and its compact version, Principal Component Analysis (PCA)-SIFT, have yielded promising results for the video fingerprinting problem. Interest point features are also expensive to generate. Here, the matching algorithm involves comparison of large number of interest point pairs without ordering which requires significant processing resources.
Low complexity local feature based algorithms for video fingerprinting such as Centroid of Gradient Orientations and Centroid of Gradient Magnitudes are popular, but gradient-based features are noise sensitive and are not robust to artifacts which affect the high frequency content of the video.
“Ordinal” features have also been used to obtain concise binary signatures for videos, but are again computationally intensive.
The Motion Picture Experts Group (MPEG)-7 video signature method has a simple feature extraction process, but its performance is primarily dependent on the pre-processing steps. This approach takes pre-determined pairs of blocks specifically trained to a video database, and may not work in other video databases.
There are several ways to compute the “distance” or difference between two fingerprints. Simple Euclidean distance is popular, but fails when the artifact is heavy and localized. More sophisticated distance measures like Hausdorff distance, partial Hausdorff distance, and its proposed variant outperform Euclidean distance when the query length is short. The final match is usually determined by comparing the distance obtained with a standard threshold. The Hausdorff based distance measures are computationally expensive, because they are designed to work well in even impractical cases where the frames are permuted. This matching technique is overkill for video fingerprinting, and its computational complexity overhead is not justified. Therefore, there is a requirement for a fingerprint distance measure which can be efficiently computed as well as robust to heavy artifacts which are localized in nature.
Further, in the feature extraction process, having low computational complexity is of paramount importance for practical applications. When a video fingerprinting algorithm has to be implemented in a portable device, even multipliers could impose a heavy computational penalty.
Accordingly, there is a need for an apparatus and method for robust, low-complexity video fingerprinting that can correctly identify a video, even after the video has experienced severe alterations.