Video fingerprinting techniques are based on deriving a fingerprint or characteristic signature from the underlying video signal. Signatures derived from spatial, temporal and spatiotemporal domains are among the most widely used techniques. Spatial signatures characterize one frame of the video sequence, temporal signatures characterize the video signal over time, whereas spatiotemporal signatures characterize a combination of spatial and temporal information. Approaches for characterizing the video signal include ordinal ranking of the subblock mean luminances. See, e.g., Bhat, D. N. and Nayar, S. K., “Ordinal measures for image correspondence,” IEEE Trans. Pattern Ana. Mach. Intell., vol. 20, no. 4, pp. 415-423, April 1998. Mohan, R., “Video sequence matching,” Proc. Int. Conf. Acoust., Speech and Signal Processing (ICASSP), vol. 6, pp. 3697-3700, January 1998. Another approach includes differential signatures that denote differences (binarized) between mean luminances of neighboring subblocks. Oostveen, J., Kalker, T. and Haitsma, J., “Feature extraction and a database strategy for video fingerprinting,” Proc. 5th Int. Conf. Recent Advance in Visual Information Systems, pp. 117-128, 2002. The ordinal measures and differences can be computed either spatially, temporally or spatiotemporally. The video frame is usually divided into subblocks as shown in FIG. 1A, with their mean luminances shown in FIG. 1B. The ordinal ranking of subblock luminances is shown in FIG. 1C. Ordinal measures are robust against global changes (such as change in brightness, contrast) and against common operations such as compression. Kim, C. and Vasudev B., “Spatiotemporal sequence matching for efficient video copy detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 1, pp. 127-132, January 2005. Lu, J., “Video fingerprinting for copy identification: from research to industry applications”, Proceedings of SPIE, Media Forensics and Security, Vol. 7254, February 2009.
Ordinal signatures are susceptible to any change in the image that alters the global ranking of the subblock mean luminances. Examples include horizontal or vertical cropping, as well as local luminance alterations such as insertion of a logo or subtitles. In addition, both ordinal signatures and differential signatures suffer from sensitivity to geometric transformations such as rotation, translation (horizontal or vertical shifts), cropping, scaling, and aspect ratio change.
We propose a new set of signatures based on local nonlinear filtering operations, such as multi-axis comparison filters. One set of embodiments for images and video calculates signatures based on mean subblock luminances. Signature recognition systems based on multi-axis filters show greater robustness to local changes introduced by operations such as logo and subtitle insertion, cropping and shifts. In addition, we show how these signatures can be adapted to deal with some amount of rotation.
One aspect of the invention is a method for audio signal recognition comprising: receiving an electronic audio signal; transforming the electronic audio signal into signatures based on a multi-axis filtering of the electronic audio signal; submitting the signatures to a database for matching to recognize the electronic audio signal.
Another aspect of the invention is a method of constructing a content identification system. The method comprises receiving an electronic audio signal; transforming the electronic audio signal into signatures based on a multi-axis filtering of the electronic audio signal; forming signature data structures from output of the filtering; storing the signature data structures in a database on one or more computer readable media; and transforming the signature data structures in the one or more computer readable media into a different structure.
Additional aspects of the invention include applying the multi-axis filtering to two dimensional representations of audio, such as a time-frequency representation. For example, one axis is frequency, and the other time. The non linear filter is applied to the neighboring samples of each of several samples in this time frequency domain in which each sample is a magnitude value in a frequency domain.
Additional aspects of the invention include applying multi-axis filtering to plural color values per sample location.
Further features will become apparent with reference to the following detailed description and accompanying drawings.