1. Field of Art
The disclosure generally relates to comparing digital media data, and, more particularly, relates to comparing digital media data using robust hashing techniques.
2. Description of the Related Art
As the number of media files (files comprising media data such as image, video, and or audio data) included in typical collections has increased, so too has the importance of efficiently and reliably detecting near-duplicate media files. For example, online video hosting services that allow users to upload videos for viewing by other users can, over time, acquire a very large video database that includes many videos. Typically, many videos in the database are either exact or near-duplicates of other videos in the database. Accurately detecting near-duplicate videos within the database improves system performance by, for example, improving the ability of the online video hosting service to manage its video inventory, provide better searches, and offer faster overall response time.
However, conventional near-duplicate detection and hashing schemes are not acceptably reliable when dealing with near-duplicate media files that are spatially or temporally cropped versions of one another. As used herein, a cropped version of a first media file is a second media file that includes media data which represents only a portion of a spatial or temporal extent of the media content represented by the media data of the first media file. For example, a first image file may have a size of 800×600 pixels, and a second image file may be cropped on one side to produce a final size of 600×600 pixels. Such spatial crops cause position information associated with transform coefficients and feature descriptors for the two image files to be incongruous because information for the cropped area is no longer included and/or the relative offsets of features are altered. The outputs of conventional hashing schemes are unacceptably sensitive to these variations, hindering the effectiveness of conventional near-duplicate detection techniques. Similar cropping-based incongruities can arise when near-duplicate videos have different aspect ratios or surrounding margins. Temporal crops arise when audio or video files include similar content but have different durations, causing incongruous temporal information associated with transform coefficients and feature descriptors. For example, a temporal crop of an audio file (e.g., eliminating the first or last ten seconds of content) can result in an altered distribution of frequency domain coefficients relative to the original audio file.