Audio samples can be recorded by many commercially available electronic devices such as smart phones, tablets, e-readers, computers, personal digital assistants, personal media players, etc. Audio matching provides for the identification of a recorded audio sample by comparing the audio sample to a set of reference samples. To make the comparison, an audio sample can be transformed to a time-frequency representation of the sample by using, for example, a short time Fourier transform (STFT). Using the time-frequency representation, interest points that characterize the time and frequency locations of peaks or other distinct patterns of the spectrogram can then be extracted from the audio sample. Fingerprints or descriptors can then be computed as functions of sets of interest points.
There are a number of possible interest point detection methods that differ in how the time-frequency representation is constructed or in the parameters that define what constitutes a unique point in the spectrogram. Different interest point detection methods are effective to varying extents, depending on the nature of the underlying audio signal. For example, some interest point detection methods may perform better for samples with heavy percussion whereas other detection methods may perform better for samples of classical music. Furthermore, different interest point detection methods may perform better with speech versus music.