As industries move toward multimedia rich working environments, usage of all forms of audio and visual content representations (radio broadcast transmissions, streaming video, audio canvas, visual summarization, etc.) becomes more frequent. Whether a user, content provider, or both, everybody searches for ways to optimally utilize such content. For example, one method that has much potential for creative uses is content identification. Enabling a user to identify content that the user is listening to or watching offers a content provider new possibilities for success.
As a specific example, suppose a user hears a song or piece of music broadcast over the radio that the user would like to purchase, but the user cannot identify the song. A content provider could enable a fingerprint of the song to be captured via a telephone handset and then identify the content. After recognition, the content provider could send identifying information (e.g., title, artist(s) and record label) to the user, with e-commerce options, such as to order the music or a corresponding ring tone, for example.
Furthermore, if the user could identify a broadcast source of desired content, more commerce possibilities become available to the content provider, such as advertisement and promotional plans, for example.
Existing methods for identifying the broadcast source of desired content may use watermarks embedded into an audio stream that identifies the respective station. Thus, each broadcast station would need to actively embed a watermark into the audio stream, increasing data processing complexity, and furthermore each broadcast station would need to use a watermarking technique that follows an agreed-upon standard used by a source identification system. Any station that does not follow such standards would not be identified by these means. Furthermore, a watermark signal needs to be robust enough to withstand distortion, which can occur if audio is sampled within a noisy room with reverberation or if the audio is subject to lossy compression.
Another method for identifying the broadcast source of desired content includes performing a cross-correlation analysis between an audio sample and audio feeds captured from broadcast stations (e.g., from a monitoring station). A matching station would show a strong spike in the cross correlation. However, a difficulty with cross-correlation analysis is that where a lossy compression means is employed, signals are weak and strong correlations may be difficult to achieve. In many voice codecs, phase information can be destroyed and a cross-correlation analysis would not yield a peak even if the audio sample and correct matching broadcast feed were cross-correlated, for example.
Using existing methods for broadcast source identification, it may be difficult to distinguish between sources in real-time. For instance, in the context of real-time source identification in which many highly popular songs are being played, there is a probability that two sources may have the same song playing at the same time. Alternatively, a single source may be broadcast through multiple channels (over-the-air broadcast, internet streaming, etc.) and thus, when the channels are not synchronized, source identification can be difficult.