The present invention relates to a field of multimedia information retrieval, and more specifically, to a pattern based audio searching method and system.
The widespread popularity of the Internet has promoted the rapid development of multimedia information techniques. The amount of multimedia data available on the Internet has gained a sharp increase. For example, audio/video files uploaded every minute on YouTube are as long as 48 hours. The massive amount of data makes it impossible to preview them in sequence, and thus data indexing and retrieving have become more challenging tasks.
How to accurately find out data files of a desired subject matter from a data corpus is a study hotspot in the field of multimedia information retrieval. For example, a wedding company may want to find mass materials according to fewer wedding samples to compose a final wedding file. A radio producer or a team from a video website may desire to search interested types of programs from a mass amount of data based on limited materials to assist in rapid programming. Moreover, users may want to perform automatic label-based achieving on their own multimedia databases for more efficient management.
Compared to video based retrieving, audio based retrieving has wider applications, for example, in situations where only audio data can be provided, such as, for example, radio broadcasting. Audio data contains significant information that can help to understand content, and is generally smaller than a video file. Therefore, in the case of having to compress a video file to a slightly obscure extent due to, for example, network upload capacity restrictions, audio content can still keep clear anyway.
However, audio indexing and retrieving methods have many defects in the prior art. At first, existing audio indexing and retrieval methods need a large amount of manual labels. For example, an audio website in general has a large amount of unlabeled or roughly labeled files, which lack well defined descriptions and effective recommended related links to other data. Operators have to manually label some well-known programs or files having higher access amounts and recommend related links. Thus, such audio indexing and retrieving methods can only be used in special fields and limited datasets.
Secondly, existing audio indexing and retrieving methods only model based on audio labels per se, resulting in inaccurate indexing and retrieving results. For example, the sound of water splashing has distinct meanings in the context of a natural stream and in the context of a home kitchen. Also, the sound of clapping is distinct in entertainment, talk show or sports programs. If a user inputs a stream splashing sound as a sample and desires to find out more similar materials from a multimedia database, the existing audio retrieving methods cannot distinguish data files containing the sound of water splashing in a natural stream pattern from that in a home kitchen pattern. Obviously, many audio retrieving results are inaccurate when not taking their context information into account.
Thirdly, existing audio retrieving methods commonly employ a single round sequential retrieval strategy where audio data is first segmented and then each segment is classified. Thereby, errors in a previous step may affect the execution result of its subsequent steps, and are accumulated in the final retrieval result, resulting in an inaccurate retrieval result or a result completely deviated from the retrieving target.