Audio and audio-video recordings and electronically generated audio and audio-video files are ubiquitous in the digital age. Such pieces of audio and audio-video can be captured with a variety of electronic devices including tape recorders, MP3 player/recorders, video recorders, mobile phones, digital cameras, personal computers, digital audio recorders, and the like. These pieces of audio and audio-video can easily be stored, transported, and distributed through digital storage devices, email, Web sites, etc.
There are many examples of sounds which may be heard multiple times in the same recording, or across different recordings. These are easily identifiable to a listener as instances of the same sound, although they may not be exact repetitions at the waveform level. The ability to identify recurrences of perceptually similar sounds has applications in a number of audio and/or audio-video recognition and classification tasks.
With the proliferation of audio and audio-video recording devices and public sharing of audio and audio-video footage, there is an increasing likelihood of having access to multiple recordings of the same event. Manually discovering these alternate recordings, however, can be difficult and time consuming. Automatically discovering these alternate recordings using visual information (when available) can be very difficult because different recordings are likely to be taken from entirely different viewpoints and thus have different video content.