Audio matching provides for identification of a recorded audio sample (e.g., an audio track of a video) by comparing the audio sample to a set of reference samples. To make the comparison, an audio sample can be transformed to a time-frequency representation (e.g., by employing a short time Fourier transform). Using a time-frequency representation, interest points that characterize time and frequency locations of peaks or other distinct patterns of a spectrogram can be extracted from the audio sample. Fingerprints can be computed as functions of sets of interest points. Fingerprints of the audio sample can then be compared to fingerprints of reference samples to determine identity of the audio sample. Fingerprints of reference samples can be stored within a reference index in a format that provides for efficient matching of audio sample fingerprints with reference index data.
In a typical descriptor audio matching system, a reference index maps locality sensitive hash (“LSH”) bands representing content within the reference index. For example, a fingerprint represented as a vector of strings can be divided into subfingerprints or single strings of the vector. LSH bands can then be built by concatenating several low-entropy locality sensitive hashes from an individual subfingerprint into an LSH band. However, some LSH bands within a reference index are more common than others. A list of offsets can be generated for LSH bands in the reference index where the list of offsets delineates reference samples or multiple parts of the same reference sample that share the same LSH band.
When an LSH band in an audio sample matches an LSH band in a reference index, additional matching processing can occur that can compare the audio sample and the reference index in a more in depth manner. However, if a large amount of reference LSH bands are identical, e.g., the list of offsets is highly populated, the computing and memory resources necessary to process additional potential matches can decrease matching efficiency. In addition, common LSH bands with a large list of offsets may to be too indiscriminate to be useful in successfully matching an audio sample.