In a video hosting website, such as, for example, YouTube, Google Video and Yahoo! Video, video content may be uploaded by users to the site and made available to others via search engines. It is believed that current web video search engines provide a list of search results ranked according to their relevance scores based on a particular a text query entered by a user. The user must then consider the results to find the video or videos of interest.
Since it is easy for users to upload videos to a host, obtain videos and distribute them again with some modifications, there are potentially numerous duplicate, or near duplicate, contents in the video searching results. Such duplicates would be considered by a user to be “essentially the same”, based on their overall content and subjective impression. For example, duplicate video content may include video sequences with identical or approximately identical content but which are in different file formats, have different encoding parameters, and/or are of different lengths. Other differences may be photometric variations, such as color and/or lighting changes, and/or minor editing operations in spatial or and temporal domain, such as the addition or alteration of captions, logos and/or borders. These examples are not intended to be an exhaustive list and other types of difference may also occur in duplicate videos.
The proliferation of duplicate videos can make it difficult or inconvenient for a user to find the content he or she actually wants. As an example, based on sample queries from YouTube, Google Video and Yahoo! Video, on average it was found that there are more than 27% near-duplicate videos listed in search results, with popular videos being those that are most duplicated in the results. Given a high percentage of duplicate videos in search results, users must spend significant time to sift through them to find the videos they need and must repeatedly watch similar copies of videos which have already been viewed. The duplicate results depreciate users' experience of video search, retrieval and browsing. In addition, such duplicated video content increases network overhead by storing and transferring duplicated video data across networks.
One type of video copy detection technique is sequence matching. In sequence matching, an interval of time with multiple frames provides a basis for comparing the similarity of a query video and a target video. Typically, this involves extracting a sequence of features, which may be, for example, ordinal, motion, color and centroid-based features, from both the query video frames and the target video frames. The extracted feature sequences are then compared to determine the similarity distance between the videos. For example, where ordinal signatures are used, each video frame is first partitioned into N1×N2 blocks and the average intensity of each block is calculated. Then, for each frame, the blocks are ranked according to their average intensities. The ranking order is considered to be that frame's ordinal measure. The sequence of ordinal measures for one video is compared with that of the other to assess their similarity.
Sequence matching enables the start of the overlapping position between duplicate videos to be determined. Sequence matching approaches are suitable for identifying almost identical videos and copies of videos with format modifications, such as coding and frame resolution changes, and those with minor editing in the spatial and temporal domains. In particular, using spatial and temporal ordinal signatures allows detection of video distortions introduced by video digitalization/encoding processes (for example, changes in color, brightness and histogram equalization, encoding parameters) and display format conversions (for example converting to letter-box or pillar-box) and modification of partial content (for example, cropping and zooming in).
Sequence matching techniques involve a relatively easy calculation and provide a compact representation of a frame, particularly when using ordinal measures. Sequence matching tends to be computationally efficient and real time computations may be carried out for processing live video. For example, an ordinal measure with 2×2 partitions of a frame needs only 4-dimensions to represent each frame, requiring fewer comparison points between two frames.
However, existing sequence matching based techniques are unable to detect duplicate video clips where there are changes in frame sequences, such as insertion, deletion or substitutions of frames. Changes of frame sequences are introduced by user editing, or by video hosting websites to insert commercials into a video, for example. Since it is not feasible to assume the type of user modification beforehand, the lack of ability to detect frame sequence changes limits the applicability of sequence matching techniques to real life problems.
Existing solutions for detecting duplicate videos with frame sequence alterations such as insertions, deletions or substitutions of frames, are based on keyframe matching techniques.
Keyframe matching techniques usually segment videos into a series of keyframes to represent the videos. Each keyframe is then partitioned into regions and features are extracted from salient local regions. The features may be, for example, color, texture, corners, or shape features for each region. Keyframe matching is capable of detecting approximate copies that have undergone a substantial degree of editing, such as changes in temporal order or insertion/deletion of frames. However, since there are simply too many local features in a keyframe, it is computationally expensive to identify keyframes, extract local features from each keyframe and conduct metric distance comparison between them to match a video clip against a large number of videos in database.
Recent research has been aimed at improving the speed of keyframe matching methods by fast indexing the feature vectors or by using statistical information to reduce the dimension of feature vectors. However, for online analysis, both the cost of segmenting videos into keyframes and the cost of extracting local features from a query video are still unavoidable. It becomes a real challenge to provide online real-time video duplication detection in a Web 2.0 video hosting environment. Keyframe matching approaches are more suitable for offline video redundancy detection with fine-grain analysis to aggregate and classify database videos.