The present invention relates to content-based processing of multimedia data, and more particularly to creation and use of attributes of multimedia data that are descriptive of the content thereof.
Multimedia information typically exists in various inhomogeneous forms, including, for example, digital, analogue (e.g., VCR magnetic tape and audio magnetic tape), optical (e.g., conventional film), image (e.g., pictures and drawings on paper), and so forth. The ability to locate this multimedia information is important in modem society, and is particularly important in various professional and consumers applications such as, for example, education, journalism (e.g., searching speeches of a certain politician using his name, his voice or his face), tourist information, cultural services (e.g., history museums, art galleries, and so forth), entertainment (e.g., searching for a game or for karaoke titles), investigation services (e.g., human characteristics recognition and forensics), geographical information systems, remote sensing (e.g., cartography, ecology, natural resources management, and so forth), surveillance (e.g., traffic control, surface transportation, nondestructive testing in hostile environments, and so forth), biomedical applications, shopping (e.g., searching for clothes that you like), architecture, real estate, interior design, social (e.g., dating services), and film, video and radio archives. Unfortunately, present systems are not thorough, quick or efficient in searching multimedia information; see, e.g., International Organisation for Standardisation ISO/IEC JTC1/SC29/WG11 Coding of Moving Pictures and Audio, MEPG-7 Applications Document V.8, No. N2728, March 1999, which is hereby incorporated herein by reference in its entirety.
An important step in support of searching multimedia information is to represent it in a form that is searchable using modem computer systems. Much interest has been expressed in developing forms of audio-visual information representation that go beyond the simple waveform or sample-based representations, the compression-based representations such as MPEG-1 and MPEG-2, and the object-based representations such as MPEG-4, and that can be passed onto, or accessed by, a device or a computer code. Numerous proprietary solutions have been developed for describing multimedia content and for extracting the representations and querying the resulting collections of representations, but these have only proliferated yet more heterogeneous multimedia information and exacerbated the difficulties of conducting quick and efficient searches of multimedia information.
A “descriptor” is a representation of a feature, a “feature” being a distinctive characteristic of multimedia information regardless of the media or technology of the multimedia information and regardless of how the multimedia information is stored, coded, displayed, and transmitted. Since descriptors used in different proprietary multimedia information retrieval systems are not necessarily compatible, interest has been expressed in creating a standard for describing multimedia content data that will support the operational requirements of computational systems that create, exchange, retrieve, and/or reuse multimedia information. Examples include computational systems designed for image understanding (e.g., surveillance, intelligent vision, smart cameras), media conversion (e.g., speech to text, picture to speech, speech to picture), and information retrieval (quickly and efficiently searching for various types of multimedia documents of interest to the user) and filtering (to receive only those multimedia data items which satisfy the user's preferences) in a stream of audio-visual content description.
Accordingly, a need exists for a standard for describing multimedia content data that will support these operational requirements as well as other operational requirements yet to be developed.