1. Technical Field
The invention is related to identification of repeating objects or music in an audio stream, and in particular, to a system and method for jointly segmenting and identifying repeating objects from one or more broadcast audio streams.
2. Related Art
There are many existing schemes for identifying audio objects such as particular advertisements, station jingles, or songs embedded in an audio stream. For example, several such audio identification schemes are referred to as “audio fingerprinting” schemes. Typically, audio fingerprinting schemes take a known audio object, and reduce that object to a set of parameters, such as, for example, frequency content, energy level, etc. These parameters, or “fingerprints,” are then stored in a database of known objects. Sampled portions of the streaming audio are then compared to the fingerprints in the database for identification purposes.
Consequently, such schemes typically rely on a comparison of the audio stream to a large database of previously identified audio objects. In operation, such schemes often sample the audio stream over a desired period using some sort of sliding window arrangement, and compare the sampled data to the database in order to identify potential matches. In this manner, individual objects in the audio stream can be identified. This identification information is typically used for any of a number of purposes, including segmentation of the audio stream into discrete objects, or generation of play lists or the like for cataloging the audio stream, or gathering statistics on the stream. However, as the size of the fingerprint database increases, it becomes increasingly computationally expensive to identify matching audio objects in the audio stream.
Further, many existing audio identification schemes require accurate registration at the beginning and end of the audio object being identified. Consequently, such schemes tend to work poorly in real world environments such as broadcast radio stations, where songs are often foreshortened, cross-faded with other songs at the beginning and endpoints, or corrupted by voiceovers, station identifiers, or other noise.
Therefore, what is needed is a computationally efficient system and method for identifying and segmenting audio objects, such as songs, station identifiers, jingles, advertisements, emergency broadcast signals, etc., from an audio stream such as a broadcast radio or television signal. Further, such a system and method should be robust in real-world type environments where the endpoints of audio objects are unknown, corrupted, or otherwise noisy.