Media fingerprints are compact and unique bit stream identifiers that are derived from, or comprise components that may be extracted from, underlying media content. Media fingerprints are robust to modifications on content such as transcoding, geometric distortion, and various attacks. Media fingerprints can be efficiently stored in a database and searched to enable content identification applications. Example applications of media fingerprinting technology includes the detection of copyrighted material streaming in the internet, broadcast monitoring, retrieval of enhancement metadata during content playback, synchronizing audio and video portions of multimedia content, and metadata propagation in broadcast studios.
Media fingerprinting systems typically function with a database of reference fingerprints, which are extracted from a set of reference media content. Queries may thus be conducted over a fingerprint database to identify an instance of media content. In this context, the media instance to be identified may be referred to herein as “query content.” When query content that is to be identified is presented to the fingerprint system, the system extracts (e.g., derives, computes, samples components) from the query content and matches the extracted fingerprints against the reference fingerprints that are stored in the database.
Media fingerprinting systems typically function with a database of fingerprints, which are extracted from a set of reference media content. The fingerprint database may be queried upon a situation in which an instance of media content is to be identified. The media content that undergoes identification may be referred to herein as “query content.” Upon presentation of query content to the fingerprint system for identification or another utility, query content, fingerprint systems function to derive (e.g., compute, extract) fingerprints from components of the query content. The fingerprints that are extracted from the query content are matched against the reference fingerprints, which are stored in the database.
Hash values generally represent any number or set of numbers that are computed using a well-defined procedure or mathematical function (which may be referred to as a hash function) that is applied to possibly larger or variable-sized data. Hash values may be used for indexing content (e.g., storing and querying content based on hash values). For example, a hash value used for indexing a fingerprint (or sub-fingerprint) of media content may be derived based on one or more features in the media content. Hash values used for indexing a fingerprint may be referred to as fingerprint codewords. Furthermore, fingerprints themselves may be used as hash values for indexing data. The terms hash values, fingerprint codewords, or fingerprints may be used interchangeably herein.
Matching the query fingerprint with the stored reference fingerprints returns an identity in relation to the query fingerprint, e.g., based on its similarity with a corresponding reference fingerprint. Hash based lookups are typically used to match stored reference content with query content. However, noise and attacks, which modify the query content in relation to the reference content can reduce both the re-call rate (e.g., accuracy) and increase search times that relate to the fingerprint queries
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, issues identified with respect to one or more approaches should not assume to have been recognized in any prior art on the basis of this section, unless otherwise indicated.