Hash functions are generally known in the field of cryptography, where they are used, inter alia, to identify large amounts of data. For instance, in order to verify correct reception of a large file, it suffices to send the hash value (also referred to as signature) of that file. If the returned hash value matches the hash value of the original file, there is almost complete certainty that the file has been correctly received by the receiving party. The remaining uncertainty is introduced due to the fact that a collision might occur: i.e. two different files may have the same hash value. A carefully designed hash function minimizes the probability of collision.
A particular property of a cryptographic hash is its extreme fragility. Flipping a single bit in the source data will generally result in a completely different hash value. This makes cryptographic hashing unsuitable for identifying multimedia content where different quality versions of the same content should yield the same signature. Signatures of multimedia content that are to a certain extent invariant to data processing (as long as the processing retains an acceptable quality of the content) are referred to as robust signatures or, which is our preferred naming convention, robust hashes. By using a database of robust hashes and content identifiers, unknown content can be identified, even if it is degraded (e.g. by compression or AD/DA conversion). Robust hashes capture the perceptually essential parts of audio-visual content.
Using a robust hash to identify multimedia content is an alternative to using watermarking technology for the same purpose. There is, however, also a great difference. Whereas watermarking requires action on original content (viz. watermark embedding) before being released, with its potential impact on content quality and logistical problems, robust hashing requires no action before release. The drawback of hashing technology is that access to a database is needed (e.g. hashing is only viable in a connected context), whereas watermark detectors can operate locally (for example in non-connected DVD players).
U.S. Pat. No. 4,677,466 discloses a known method of deriving a signature from a television signal for broadcast monitoring. In this prior art method, the signature is derived from a short video or audio sequence after the occurrence of a specified event such as a blank frame.