Media fingerprints are compact and unique bit stream identifiers that are derived from, or comprise components that may be extracted from, underlying media content. Media fingerprints are robust to modifications on content such as transcoding, geometric distortion, and various attacks. Media fingerprints can be efficiently stored in a database and searched to enable content identification applications. Example applications of media fingerprinting technology includes the detection of copyrighted material streaming in the internet, broadcast monitoring, retrieval of enhancement metadata during content playback, synchronizing audio and video portions of multimedia content, and metadata propagation in broadcast studios.
Extracting media fingerprints from underlying media content typically includes extraction of robust features of the media content, and extracting a robust hash signature from the extracted features. Extracting the robust features allows those features to compactly represent the underlying perceptual content with invariance under various processing operations. Robust hash extraction has two effectively competing requirements.
First, robust hash extraction allows fingerprint bits that are extracted from modified instances of the content (e.g., off speed playout, in which the modified content instance has essentially been re-recorded from an original instance of the content at a slightly different speed) to be similar to fingerprints that are extracted from the original content instance. Thus, relatively small changes in feature values do not result in drastic changes in the extracted hash bits, which imparts robustness to the fingerprints. Second, robust hash extraction allows the extracted fingerprint bits to be unique, which affects search time, e.g., the time taken to find a match in a database of media fingerprints.
For example, a database of media fingerprints is searchable for identifying content. At the time the database is constructed, each fingerprint codeword is used for indexing, e.g., in a hash table. Each fingerprint codeword in the hash table links to the location in a fingerprint file or to media where that fingerprint codeword is present. The number of links per fingerprint index in the hash table may be referred to herein as a number of collisions.
The more unique a fingerprint codeword is, the more quickly its match may be found in the database, e.g., as a return on a query over the database. As a fingerprint's uniqueness is reduced however, database queries may demand more look-ups, and computing a best pick, e.g., a best match in terms of smallest distance from the query fingerprint. Thus, fingerprints that have a small number of average collisions per fingerprint codeword have shorter search durations. Fingerprints with a smaller number of average collisions per fingerprint codeword are more scalable for searching through a large database of fingerprints than other fingerprints, for which the average number of collisions is higher.
Robust hash functions have been proposed that are based on projection of a feature matrix Q onto a set of pseudo-random matrices. For example, proposed pseudo-random matrices had elements that are uniformly distributed in the range [−0.5, 0.5]. The projected values are compared against a threshold of 0 to derive the hash bits. However, the average number of collisions for fingerprints extracted according to this approach is usually large. Imposing certain conditions on the projection matrices may improve the average number of collisions.
For example, conditions have been imposed on the projection matrices Pi (i=1, 2, . . . K) in which K represents the number of signature bits derived from a feature matrix Q. An offline training set is used to improve the collision property. Projecting the feature matrix Q onto a set of pseudo-random matrices, and imposing conditions on the projection matrices, both strive to select the matrices that minimize cross-correlation among the projected features. Projecting the feature matrix Q onto a set of pseudo-random matrices uses an iterative procedure to select the matrices that satisfy a cross-correlation threshold. However, the approach does not consider optimizing the selected projection matrices in terms of achieving optimal fingerprinting system performance. Imposing conditions on the projection matrices may use singular value decomposition (SVD) of the feature covariance matrix to achieve a zero cross-correlation of the projected values, which optimizes the projections and reduces search latency.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, issues identified with respect to one or more approaches should not assume to have been recognized in any prior art on the basis of this section, unless otherwise indicated.