The invention relates to automatically performing content-based indexing on structured multimedia data
The amount of information generated in today""s society is growing exponentially. Moreover, the data is made available in more than one dimension across different media, such as video, audio, and text. This mass of multimedia information poses serious technological challenges in terms of how multimedia data can be integrated, processed, organized, and indexed in a semantically meaningful manner to facilitate effective retrieval.
When the amount of data is small, a user can retrieve desired content in a linear fashion by simply browsing the data sequentially. With the large amounts of data now available, and expected to still grow massively in the future, such linear searching is no longer feasible. One example used daily is a table of contents for a book. The larger the amount of information, the more the abstraction needed to create the table of contents. For instance, while dividing an article into a few sections may suffice, a book may need subsections or even sub-subsections for lower level details and chapters for higher level abstraction. Furthermore, when the number of books published grows rapidly, in order to assist people to choose appropriate books to buy, books are grouped into different categories such as physics, mathematics, and computer hardware or into even higher levels of abstraction such as categories of literature, science, travel, or cooking.
Usually, a content structure is designed by the producer before the data is being generated and recorded. To enable future content based retrieval, such intended semantic structure (metadata) should be conveyed simultaneously to the users as the content (data) is delivered. In this way, users can choose what they desire based on the description in such metadata. For example, every book or magazine is published together with its table of contents, through which users can find the page number (index) where the desired information is printed by simply jumping to the page.
There are different methods to generate the above described abstraction or metadata. The most intuitive one is to do it manually as in the case of books (table of contents) or broadcast news (closed caption) delivered from major American national broadcast news companies. Since manual generation of index is very labor intensive, and thus, expensive, most types of digital data in practice is still delivered without metadata attached.
The invention provides a system and method for automation of index and retrieval processes for multimedia data. The system and method provide the ability to segment multimedia data, such as news broadcasts, into retrievable units that are directly related to what users perceive as meaningful.
The method may include separating a multimedia data stream into audio, visual and text components, segmenting the audio, visual and text components based on semantic differences, identifying at least one target speaker using the audio and visual components, identifying a topic of the multimedia event using the segmented text and topic category models, generating a summary of the multimedia event based on the audio, visual and text components, the identified topic and the identified target speaker, and generating a multimedia description of the multimedia event based on the identified target speaker, the identified topic, and the generated summary.
In this regard, the method may include automatically identifying a hierarchy of different types of content. Examples of such content include different speakers (e.g., anchor), news reporting (correspondences or interviews), general news stories, topical news stories, news summaries, or commercials. From such extracted semantics, an indexed table can be constructed so that it provides a compact yet meaningful abstraction of the data. Compared with conventional linear information browsing or keywords based search with a flat layer, the indexed table facilitates non-linear browsing capability that is especially desired when the amount of information is huge.