In a video hosting website, such as, for example, YouTube, Google Video and Yahoo! Video, video content may be uploaded by users to the site and made available to others via search engines. Statistics show that on YouTube, for example, which is one of the most popular video sharing sites, there are currently about two billion views a day, with 24 hours of video being uploaded every minute. The increasing popularity of social networking sites makes it even easier for users to obtain videos, edit them such as by adding logos or annotations for example, and upload modified videos to the same video sharing website or elsewhere. This results in potentially many similar or identical copies of a video shared on the same site, making it inconvenient for users to find the content which he or she actually wants and increasing the resources needed to store and serve the videos.
Videos may be considered by a user to be “essentially the same” or duplicates based on their overall content and subjective impressions. For example, duplicate video content may include video sequences with identical or approximately identical content but which are in different file formats, have different encoding parameters, and/or are of different lengths. Other differences may be photometric variations, such as color and/or lighting changes, and/or minor editing operations in the spatial and/or temporal domain, such as the addition or alteration of captions, logos and/or borders and/or resizing and cropping of frames. These examples are not intended to be an exhaustive list and other types of difference may also occur. Accordingly, a video copy can contain various distortions, modifications and format conversions from an original video and still be considered a duplicate of the original video.
The proliferation of duplicate videos can make it difficult or inconvenient for a user to find the content he or she actually wants. As an example, based on sample queries from YouTube, Google Video and Yahoo! Video, on average it was found that there are more than 27% near-duplicate videos listed in search results, with popular videos being those that are most duplicated in the results. Given a high percentage of duplicate videos in search results, users must spend significant time to sift through them to find the videos they need and must repeatedly watch similar copies of videos which have already been viewed. The duplicate results depreciate users' experience of video search, retrieval and browsing. In addition, such duplicated video content increases network overhead by storing and transferring duplicated video data across networks.
Content Based Copy Detection (CBCD) techniques facilitate video content based retrieval by searching a database of videos for copies (either exact or approximate) of a query video. Application of CBCD techniques may be beneficial in a number of ways for users, content producers or owners and network operators. For example, by detecting duplicate videos, video sharing sites may reduce the number of stored redundant video copies; users may have a better video search experience if they no longer need to browse through near-duplicate results; copyright holders may more easily discover re-useage of their video clips; and content distribution networks may direct users to access a nearby copy of video, the presence of which may not be otherwise known.
Video copy detection is the problem of determining if a given video contains a subsequence that is perceptually similar to a subsequence in a target video. More precisely, given a query video Q and a target video T both as a sequence of video frames, a video Q contains a subsequence of frames Qs that is a copy or near-copy of a subsequence of frames Ts in the target video T, if the dissimilarity between Qs and Ts is less than a noise threshold. There is no limitation on the lengths of both videos. Query video Q could be either longer or shorter than a target video T.
Various CBCD techniques have been proposed to find video copies based on identifying different video features and applying matching schemes to them. Since video sharing websites often contain a large database, it is challenging to find a similar or exact copy of a video from such a large video database and provide real-time response to Internet users.
Video copy detection may be divided into steps: generation of video feature descriptors and descriptor matching.
Video features may be represented by global descriptors and local descriptors. Global descriptors, including ordinal measures, are obtained from the entire region of a frame. Local descriptors, such as the Harris descriptor or a scale-invariant feature transform (SIFT), can be obtained by partitioning each frame into regions and extracting features from salient local regions. The Harris descriptor is also more specifically known as the “Harris corner detector” and detects corners in an image based on the eigenvalues of image gradients.
Local features may include, for example, color, texture, corners, and/or shape features from each region, or other features, this being a non-exhaustive list. In general, global feature descriptors are efficient to compute and compact in storage, but less robust with respect to local changes. Local feature decriptors are more robust but computationally more expensive and require more storage space.