The present invention relates to content-based processing of multimedia data, and more particularly to creation and use of attributes of multimedia data that are descriptive of the content thereof.
Multimedia information typically exists in various inhomogeneous forms, including, for example, digital, analogue (e.g., VCR magnetic tape and audio magnetic tape), optical (e.g., conventional film), image (e.g., pictures and drawings on paper), and so forth. The ability to locate this multimedia information is important in modern society, and is particularly important in various professional and consumers applications such as, for example, education, journalism (e.g., searching speeches of a certain politician using his name, his voice or his face), tourist information, cultural services (e.g., history museums, art galleries, and so forth), entertainment (e.g., searching for a game or for karaoke titles), investigation services (e.g., human characteristics recognition and forensics), geographical information systems, remote sensing (e.g., cartography, ecology, natural resources management, and so forth), surveillance (e.g., traffic control, surface transportation, non-destructive testing in hostile environments, and so forth), biomedical applications, shopping (e.g., searching for clothes that you like), architecture, real estate, interior design, social (e.g., dating services), and film, video and radio archives. Unfortunately, present systems are not thorough, quick or efficient in searching multimedia information; see, e.g., International Organisation for Standardisation ISO/IEC JTC1/SC29/WG11 Coding of Moving Pictures and Audio, MEPG-7 Applications Document V.8, No. N2728, March 1999, which is hereby incorporated herein by reference in its entirety.
An important step in support of searching multimedia information is to represent it in a form that is searchable using modem computer systems. Much interest has been expressed in developing forms of audio-visual information representation that go beyond the simple waveform or sample-based representations, the compression-based representations such as MPEG-1 and MPEG-2, and the object-based representations such as MPEG-4, and that can be passed onto, or accessed by, a device or a computer code. Numerous proprietary solutions have been developed for describing multimedia content and for extracting the representations and querying the resulting collections of representations, but these have only proliferated yet more heterogeneous multimedia information and exacerbated the difficulties of conducting quick and efficient searches of multimedia information.
A xe2x80x9cdescriptorxe2x80x9d is a representation of a feature, a xe2x80x9cfeaturexe2x80x9d being a distinctive characteristic of multimedia information regardless of the media or technology of the multimedia information and regardless of how the multimedia information is stored, coded, displayed, and transmitted. Since descriptors used in different proprietary multimedia information retrieval systems are not necessarily compatible, interest has been expressed in creating a standard for describing multimedia content data that will support the operational requirements of computational systems that create, exchange, retrieve, and/or reuse multimedia information. Examples include computational systems designed for image understanding (e.g., surveillance, intelligent vision, smart cameras), media conversion (e.g., speech to text, picture to speech, speech to picture), and information retrieval (quickly and efficiently searching for various types of multimedia documents of interest to the user) and filtering (to receive only those multimedia data items which satisfy the user""s preferences) in a stream of audio-visual content description.
Accordingly, a need exists for a standard for describing multimedia content data that will support these operational requirements as well as other operational requirements yet to be developed.
Accordingly, an object of the present invention as realized in particular embodiments is to improve the efficiency of retrieval of multimedia information from a repository.
Another object of the present invention as realized in particular embodiments is to improve the speed of retrieval of multimedia information from a repository.
Yet another object of the present invention as realized in particular embodiments is to provide a standard representation of a feature of multimedia information.
These and other objects are achieved in the various embodiments of the present invention. For example, one embodiment of the present invention is a method of representing a plurality of multimedia information, comprising acquiring descriptors for the multimedia information, generating at least one meta-descriptor for the descriptors, and attaching the at least one meta-descriptor to the multimedia information.
Another embodiment of the present invention is a method of representing a plurality of multimedia information which collectively is of various content types, comprising acquiring descriptors for the multimedia information, generating clusters of the descriptors, generating meta-descriptors for the clusters, and respectively attaching the meta-descriptors for the clusters to items of the multimedia information described by the descriptors in the clusters.
A further embodiment of the present invention is a method of searching multimedia information in a repository described by descriptors using a query multimedia information item, comprising acquiring meta-descriptors of the repository descriptors, selecting query multimedia information, extracting at least one query descriptor from the query multimedia information based on the meta-descriptors to obtain at least one query descriptor, comparing the query descriptor with the repository descriptors, and ranking at least some of the multimedia information in the repository in accordance with the comparing step.
Another embodiment of the present invention is a method of retrieving multimedia information from a repository, comprising extracting repository descriptors from the multimedia information in the repository, generating clusters of the repository descriptors, indexing the repository descriptors to the multimedia information in the repository, generating meta-descriptors for the clusters, attaching the meta-descriptors for the clusters to the respective multimedia information in the clusters, selecting query multimedia information, extracting at least one descriptor from the query multimedia information based on the meta-descriptors to obtain at least one query descriptor, comparing the query descriptor with the repository descriptors, and ranking at least some of the multimedia information in the repository in accordance with the comparing step.
A further embodiment of the present invention is a data structure for representing information about a plurality of descriptors that are representations of features of an item of multimedia information belonging to a particular category of multimedia content, comprising a plurality of data elements indicating relevancy of the descriptors in describing the item of multimedia information.