With the increasing ubiquity of internet connection and mobile devices as well as the development of media technology, the amount of freely available and private multimedia data has exploded. It becomes an essential issue to acquire useful and desired information from the mass multimedia data. In other words, the efficient management and retrieval of information resources from a multimedia database becomes more important.
Audio retrieval refers to searching for an audio sample in an audio database, for which most of the existing approaches are either example-based or text-based. In the case of an example-based approach, the audio retrieval is performed with the help of an audio example similar to the target audio sample to be retrieved. On the other hand, for a text-based approach, the audio retrieval is performed based on some text information describing the audio content one looks for. Specifically, the latter is conducted through the comparison and matching of the provided text information with those associated to the audio database.
The text-based approach is basically easier and more efficient than the example-based approach. Text information associated to the audio database as well as the provided query is easier to be created by a user than a precise audio example. The requirements for an audio retrieval system to conduct text-based retrieval are relatively lower. In addition, the example-based approaches are often cumbersome and do not facilitate user interaction with the retrieval system. However, there is a problem that the text-based approach is inapplicable when it comes to a lack of textual information associated to the audio database.
In addition to audio retrieval, audio source separation is another important technique for efficient utilization of an audio database. Since the retrieved audio sample, either by text- or example-based approach, is commonly mixed with other unexpected audio data, an additional step of audio source separation is required for purifying the audio data into simpler and more precise audio sources. For example, a mixed audio record can contain both bird singings and wind sounds, of which the former is the main audio source of interest and the latter is unfavorite background sounds.
For audio source separation, most of the existing approaches require some preliminary source examples to achieve an acceptable separation quality, namely the example-based approach is more general. The simpler and more efficient text-based approach for audio source separation is not yet addressed in the field.