This invention relates generally to the identification and comparison of multimedia materials containing a series of visual images.
Audio and video fingerprinting techniques are often used for identification of multimedia content. A digital fingerprint is a compact representation of the characteristic features of multimedia content that can be used to categorize the content and distinguish it from perceptually different materials. The characteristic features of audio and video fingerprints should be robust and withstand against typical content distortions, noise, digital compression, and filtering. Simultaneously, these exclusive characteristics should also assure minimal false positive and false negative results, which lead to incorrect identification.
Unlike audio content, video content is three-dimensional, consisting of a two-dimensional image plane and a time axis. Due to the spatial nature of video content, it is subject to 2D transformations and distortions. During content production, the editing cycle may produce multiple versions of the same material with various spatial representations. Some video content may appear perceptually similar to the human eye, yet contain significantly different spatial composition, which results in varying image pixel values. Typical examples of these variations include widescreen and full screen editions of the same video content. However, variations can also occur from cropping, rotation, and affine transformations as a result of compression or copying, for example, due to recording of the video content projected on a movie screen from varying angles.
During the production cycle of the video content, multiple methods of recording are currently used. Common methods of the content recording are shown in FIG. 2. The first category is based on recording a scene using apparatus with different zoom and aspect ratio, placing emphasis on a certain portion of a larger image. Typically, this produces two distinct version of video content. The first format is the full-screen, where only a confined part of the larger image is displayed on the screen. The visual image is produced by zooming on a chosen region of the larger image, then expanding the image to fit the typical television screen, usually with an aspect ratio of 4:3. A second possible format is the wide-screen format, where the camera records the entire wider scene. While this format displays the entire scene, the produced image may be compressed horizontally or zoomed to fit the video frame commonly used for video capture, such as a film. The sub-category of this format is the anamorphic wide-screen display. This format fits a wide-screen display format into a standard full-screen, compressing the visual content horizontally while maintaining unchanged vertical resolution. Another sub-category is the masked wide-screen format where the whole image is resized proportionally and padded on the top and bottom by black bars. Visual display may vary greatly between the full-screen and wide-screen versions of the same visual content, with the wide-screen format displaying a greater range of horizontal visual content while maintaining identical or similar vertical range.
In addition to the spatial distortions and changes, the edited movie may also contain overlays, logos, banners, closed captions, and fragments of other movies embedded as picture-in-picture. Human viewers usually ignore these irrelevant parts of the visual content and concentrate on the perceptually significant elements regardless of the video format and its aspect ratio. Existing video fingerprinting techniques are unable to differentiate between perceptually relevant content and insignificant elements and insets during extraction of the characteristic features of a series of visual images. Due to this lack of perceptual versatility, existing fingerprinting algorithms must conduct analysis of the whole visual image to determine the regions of interest for identification purposes, since whether the regions are perceptually relevant has no impact on their usefulness for identification purposes.