1. Field of Art
The invention generally relates to video processing, and more specifically to video fingerprinting.
2. Description of the Related Art
Electronic video libraries may contain thousands or millions of video files, making management of these libraries an extremely challenging task. The challenges become particularly significant in the case of online video sharing sites where many users can freely upload video content. In some cases, users upload unauthorized copies of copyrighted video content, and as such, video hosting sites need a mechanism for identifying and removing these unauthorized copies. While some files may be identified by file name or other information provided by the user, this identification information may be incorrect or insufficient to correctly identify the video. An alternate approach of using humans to manually identifying video content is expensive and time consuming.
Another problem faced by video sharing sites is that users may upload multiple copies of video content to the site. For example, popular items such as music videos may be uploaded many times by multiple users. This wastes storage space and becomes a significant expense to the host. A third problem is that due to the large number of files, it is very difficult to organize the video library based on video content. Thus, search results may have multiple copies of the same or very similar videos making the results difficult to navigate for a user.
Various methods have been used to automatically detect similarities between video files based on their video content. In the past, various identification techniques (such as an MD5 hash on the video file) have been used to identify exact copies of video files. Generally, a digital “fingerprint” is generated by applying a hash-based fingerprint function to a bit sequence of the video file; this generates a fixed-length monolithic bit pattern—the fingerprint—that uniquely identifies the file based on the input bit sequence. Then, fingerprints for files are compared in order to detect exact bit-for-bit matches between files. Alternatively, instead of computing a fingerprint for the whole video file, a fingerprint can be computed for only the first frame of video, or for a subset of video frames. However, each of these methods often fail to identify videos uploaded by different users with small variations that change the exact bit sequences of the video files. For example, videos may be uploaded from different sources and may vary slightly in how they are compressed and decompressed. Further, different videos may have different source resolutions, start and stop times, frame rates, and so on, any of which will change the exact bit sequence of the file, and thereby prevent them from being identified as a copy of an existing file.
Other attempts to solve the described problems have involved applying techniques related to finding duplicate images. In these techniques individual frames of the video are treated as separate and independent images. Image transforms are performed to extract information representing spatial characteristics of the images that are then compared. However, there are two main weaknesses in this technique when trying to handle video. First, video typically contains an enormous number of image frames. A library may easily contain thousands or millions of videos, each having frame rates of 15 to 30 frames per second or more, and each averaging several minutes in length. Second, directly applying image matching techniques to video ignores important sequential information present in video. This time information is extremely valuable in both improving detection of duplicates and reducing the amount of data that needs to be processed to a manageable quantity, but is presently ignored by most techniques.
In view of the problems described above, an improved technique is needed for finding similarities between videos and detecting duplicate content based on the perceived visual content of the video. In addition, a technique is needed for comparing videos that is unaffected by small differences in compression factors, source resolutions, start and stop times, frame rates, and so on. Furthermore, the technique should be able to compare and match videos automatically without relying on manual classification by humans.