Today, deciphering unknown data has become an increasingly difficult, but necessary, task. As data and storage systems become more complex, programmers are finding that their old matching techniques are inefficient. One common method to identify unknown data is to compare it to known data stored in a database. If the unknown data point matches a point contained in the database, then the unknown data can be identified. However, this process requires one to check the unknown data against every piece of known data in the database. Today, test data is being stored in large multi-dimensional structures that hold enormous quantities. As a result, comparing everything in a database has become quite time consuming and CPU intensive. Furthermore, data itself can be rather large, requiring many complicated computations to match.
For example, this problem arises in the process of audio fingerprinting. Audio fingerprinting is the task of identifying an audio track that is missing or has incorrect metadata. A media player which plays audio files may wish to display the title of the song playing and its artist. Generally, the player will look to a file's metadata in order to determine title and artist. However, such information my be inaccurate or missing from the metadata. In order to determine the needed information, then, the player may try to match a signature or other electronic representation of the song to known signatures or other electronic representations of pre-computed songs. Essentially, the player tries to match the unknown metadata of the song to known metadata, or “fingerprints,” that are stored in a database.
There are inherent performance challenges with matching audio fingerprints, though. For instance, once a song is processed, a 64 floating point number is produced and stored in a large multi-dimensional structure. Each digit of the number may correspond to a different dimension of the database. For audio fingerprinting, it is typical to have a 64 dimension database holding nearly 1,000,000 known fingerprints. Furthermore, finding a match requires comparing the unknown song to every pre-calculated fingerprint in the database. As can be imagined, comparing 1,000,000 64 floating point numbers that are housed in a 64 dimension database is time consuming.
Moreover, the comparison calculation of the unknown song to a known fingerprint is quite complex. It is performed by first mapping the unknown song and known fingerprints into multi-dimensional space. In order to compare songs, each known fingerprint's Euclidean distance from the unknown song is calculated. If the unknown song is close enough to a known fingerprint, it is considered a match. Euclidean distance measurements in multi-dimensional space are quite CPU intensive. Consequently, comparing an unknown song to every fingerprint in a multi-dimensional database is slow.
Therefore, a need arises for a method to accurately locate the section of a multi-dimensional database where a match may exist. If such an area can be pinpointed, only the known data within that area would need to be searched. Finding and searching such an area would greatly reduce the amount of time and processor power needed to identify unknown data. Audio fingerprinting is merely one illustration of the problem at hand. Similar difficulties arise in many applications performing comparisons in multi-dimensional databases. The ability to search such structures more efficiently would greatly reduce the time and CPU usage needed to perform data matching tasks.