In current audio recognition systems, audio samples are typically recorded live from an environment, and processed in order to extract useful information. For example, a fifteen second audio sample from a song can be captured using a microphone. The sample can subsequently be processed such that the song's title, artist, and album can be identified.
Current audio recognition systems are unable to perform recognitions for more than a single domain or content type. Illustratively, many audio recognition systems are only able to recognize that a captured audio sample is from a song. These same systems cannot, for instance, recognize that an audio sample is from a television show episode, is a sample of a speech, or is an environmental recording (e.g., bird song). As a result, users often must switch between different systems in order to properly identify their audio samples. Furthermore, current audio recognition systems provide results only after an audio sample has been completely captured. These systems are unable to provide results while an audio sample is still being recorded. As such, users frequently must wait relatively lengthy periods before receiving results.