Over the past few years, several methods for audio feature extraction and signature generation have been proposed. In general, the problem of mapping high-dimensional audio input data into lower-dimensional feature vectors containing sufficient relevant information is a difficult problem and at the core of audio fingerprinting and identification systems.
One class of methods reduces high dimensionality audio information by a distortion discriminant analysis (DDA), where each layer of DDA projects its input into directions in an attempt to maximize a signal-to-noise ratio for a given set of distortions.
Toward a similar goal, high input dimensionality may be reduced by a projection of the audio spectrogram onto a random basis vector. Such a projection permits generation of a compact bit vector representing each overlapping time window of the audio signal.
Also, compact signatures representing audio content may be based on Mel Frequency Cepstral Coefficients (MFCC) with various schemes used for descriptor coefficient quantization, selection, and signature generation. MFCC parametric spectral representation facilitates generation of descriptor parameters independent of the number of filter bands while the quantization of descriptor coefficients and subsequent selection provide precise bitwise signature generation.
Other methods which may be used are based on one-dimensional wavelet transform coefficients, or on the sign of energy differences, simultaneously along time and frequency axes in an overlapping audio sample time window spectrogram.
A number of applications treat the whole audio spectrogram as a pattern and extract signatures based on temporal evolution of the audio spectrum analyzed with Haar-Wavelet types of filters at multiple scales.
Providing robust audio identification, in presence of significant ambient and interfering sounds, and tracking of identified content in order to bring various applications directly to smart phones and mobile devices such as tablets is a difficult problem.