This disclosure generally relates to audio signal identification, and more specifically to noise-insensitive indexing of audio signals using audio fingerprints derived from the audio signals' spectrograms.
Real-time identification of audio signals is increasingly used in various applications. For example, a common application uses audio signal identification methods to identify the name, artist, and/or album of an unknown song. Many audio signal methods generate an audio fingerprint for an audio signal, which includes features of the audio signal usable to identify the audio signal. These features may be based on acoustical and perceptual properties of the audio signal. To identify the audio signal, the audio fingerprint generated from the audio signal is compared to reference audio fingerprints associated with identifying information.
However, conventional audio signal identification techniques based on audio fingerprinting do not effectively manage noise and distortion in an audio signal. Many audio signals contain noise or signal distortions that have unique features themselves, thereby masking the underlying audio signal and making it difficult—or often impossible—to identify the signal. In particular, if the signal to noise ratio is very low (e.g., less than −6 dB), the noise completely masks the signal. Thus, conventional audio identification techniques that treat noise features as an identifying part of the audio signal's fingerprint often incorrectly match the signal to reference audio fingerprints, resulting in false negatives or no identification at all. These false negative identifications can occur because many conventional techniques incorrectly identify a match between the different signals' noisy portions. Additionally, tempo shifting that occurs when an audio signal is played faster or slower than its original speed shifts a signal's spectral content along the time axis, resulting in noise increasingly masking the original signal. Many existing identification techniques using spectral analysis are therefore unable to identify noisy or distorted version of the audio signal accurately.
Furthermore, current audio identification techniques often identify noisy signals at a reduced rate, which includes false negatives, when identifying audio signals based on audio fingerprints including the signal's noisy or distorted portions. In particular, index-based selections of reference fingerprints for matching against a “test” audio fingerprint also suffer from noise and distortion contained in the index of each reference fingerprint.
Many conventional techniques use an index structure to improve the speed of searching and matching fingerprints against a database of reference fingerprints. In the presence of noise and distortions, such techniques often produce index values that fail to match against the indexes contained in the database. By not accounting for noise or distortions, these techniques too often fail to identify proper candidates among the database's reference signals for further matching against the signal's fingerprint, which prevents proper identification of the signal.
To address this noise problem, conventional techniques repeatedly modify the calculated index values of the signal's fingerprint and then search among indexes of the reference fingerprints until identifying a match between the test and reference fingerprints. Such a repetitive permutation process requires a large amount of computational resources, including, for example, excessive memory space to store all possible permutations of the fingerprint indexes. To reduce the amount of index permutations, some techniques calculate the robustness of different index bits and permute only “weak bits” (i.e., bits that are more sensitive to noise or distortions and thus require more processing, or permutations, before identifying a match). But in practice this approach often fails because the noisy or distorted portions of a signal cannot be reliably predicted, which would require extracting the noisy or distorted portions from the test signal before comparing the signal's fingerprint to any reference fingerprint.