Two media objects (e.g., music files) that perceptually sound the same to the human listener may not have identical digital content. For example, two audio files of the same song may be in two different digital formats (e.g., MP3™ and WMA™). In another example, two copies of the same song (e.g., both ripped from the same CD-ROM disk) may be digitally different due to bit errors incurred when the two songs were independently ripped from the disk.
Fingerprinting refers to generating a digital identifier, or “fingerprint” from a media object, such that identical or closely matching fingerprints are generated from two or more digital media objects that contain perceptually equivalent (e.g., to the human listener) content. Typically, the fingerprint is much smaller than the original media object.
A common task in fingerprinting systems is to search through a database or catalog of fingerprints to find matches for a query fingerprint. One approach to the search is to compare all of the fingerprints in the database to the query fingerprint. If the database is large, this is not efficient. Another approach is to use an indexing scheme to reduce the number of comparisons.
Generally speaking, fingerprinting and indexing media objects to enable fast, accurate searches may be very complex.