Audio samples can be recorded by many commercially available electronic devices such as smart phones, tablets, e-readers, computers, personal digital assistants, personal media players, etc. Audio matching provides for identification of a recorded audio sample by comparing the audio sample to a set of reference samples. To make the comparison, an audio sample can be transformed to a time-frequency representation of the sample by using, for example, a short time Fourier transform (STFT). Using the time-frequency representation, interest points that characterize time and frequency locations of peaks or other distinct patterns of the spectrogram can then be extracted from the audio sample. Fingerprints or descriptors can be computed as functions of sets of interest points. Fingerprints of the audio sample can then be compared to fingerprints of reference samples to determine identity of the audio sample.
Recorded audio signals can suffer from many types of distortion. A signal could suffer from noise distortion, pitch shift distortion, compression algorithm distortion, etc. These distortions will affect which interest points are selected. A fingerprint containing distorted interest points may prevent an audio signal from being matched with a fingerprint containing clean undistorted interest points of the same audio signal.
An audio matching system that is robust to distortion is desirable. By generating fingerprints of an audio sample containing interest points that are robust to distortion, the accuracy of an audio matching system relying on the generated fingerprints can be improved. Thus, effectively selecting interest points that are robust to distortion and generating fingerprints based on those interest points is desirable.