Audio samples can be recorded by many commercially available electronic devices such as smart phones, tablets, e-readers, computers, personal digital assistants, personal media players, etc. Audio matching provides for identification of a recorded audio sample by comparing the audio sample to a set of reference samples. To make the comparison, an audio sample can be transformed to a time-frequency representation of the sample by using, for example, a short time Fourier transform (STFT). Using the time-frequency representation, interest points that characterize time and frequency locations of peaks or other distinct patterns of the spectrogram can then be extracted from the audio sample. Fingerprints or descriptors can be computed as functions of sets of interest points. Fingerprints of the audio sample can then be compared to fingerprints of reference samples to determine identity of the audio sample.
When comparing the fingerprint of an audio sample to fingerprints of reference samples, it is desirable to have a database containing enough reference samples to make the comparison. The audio sample could come from any of a large number of sources, and be of any of a variety of content types (e.g., from music, from a movie, or from a television show). Consequently, a database of reference samples that contains sufficient music samples, movie samples, and television samples to afford for accurate matching could contain millions of samples.
When implementing an audio matching system using millions of samples as reference samples, it is likely, that in matching a sample fingerprint to the millions of reference fingerprints, that too many references may be returned as potential matches. These false positive matches present challenges in determining the actual identity of the recorded audio sample. Thus, effectively filtering and eliminating false positive matches in attempting to identity the recorded audio sample is desirable.