Audio samples can be recorded by many commercially available electronic devices such as smart phones, tablets, e-readers, computers, personal digital assistants, personal media players, etc. Audio matching provides for the identification of a recorded audio sample by comparing the audio sample to a set of reference samples. To make the comparison, an audio sample can be transformed to a time-frequency representation of the sample by using, for example, a short time Fourier transform (STFT). Using the time-frequency representation, interest points that characterize time and/or frequency locations of peaks or other distinct patterns of the spectrogram can then be extracted from the audio sample. Fingerprints or descriptors can then be computed as functions of sets of interest points. Fingerprints of the audio sample can then be compared to fingerprints of reference samples to determine identity of the audio sample.
Pitch-shifting can affect an audio sample by shifting the frequency of interest points. For example, when trying to match audio played on the radio, television, or in a remix of a song, the speed of the audio sample may be slightly changed from the original. Samples that have altered speed will also likely have an altered pitch. Even a small pitch shift that is hard to notice for listeners may prevent difficult challenges in matching the signal. Therefore, characterizing interest points within a fingerprint in a manner that is robust to pitch shifting is desirable.