Digital multimedia information is becoming widely distributed though broadcast transmission, such as digital television signals, and interactive transmission, such as the Internet. The information may be in still images, audio feeds, or video data streams. However, the availability of such a large volume of information has led to difficulties in identifying content that is of particular interest to a user. Various organizations have attempted to deal with the problem by providing a description of the information that can be used to search, filter and/or browse to locate the particular content. The Moving Picture Experts Group (MPEG) has promulgated a Multimedia Content Description Interface standard, commonly referred to as MPEG-7 to standardize the content descriptions for multimedia information. In contrast to preceding MPEG standards such as MPEG-1 and MPEG-2, which define coded representations of audio-visual content, an MPEG-7 content description describes the structure and semantics of the content and not the content itself.
Using a movie as an example, a corresponding MPEG-7 content description would contain “descriptors” (D), which are components that describe the features of the movie, such as scenes, titles for scenes, shots within scenes, time, color, shape, motion, and audio information for the shots. The content description would also contain one or more “description schemes” (DS), which are components that describe relationships among two or more descriptors and/or description schemes, such as a shot description scheme that relates together the features of a shot. A description scheme can also describe the relationship among other description schemes, and between description schemes and descriptors, such as a scene description scheme that relates the different shots in a scene, and relates the title feature of the scene to the shots.
MPEG-7 uses a Data Definition Language (DDL) that specifies the language for defining the standard set of description tools (DS, D) and for defining new description tools and provides a core set of descriptors and description schemes. The DDL definitions for a set of descriptors and description schemes are organized into “schemas” for different classes of content. The DDL definition for each descriptor in a schema specifies the syntax and semantics of the corresponding feature. The DDL definition for each description scheme in a schema specifies the structure and semantics of the relationships among its children components, the descriptors and description schemes. The DDL may be used to modify and extend the existing description schemes and create new description schemes and descriptors.
The MPEG-7 DDL is based on XML (extensible markup language) and the XML Schema standards. The descriptors, description schemes, semantics, syntax, and structures are represented with XML elements and XML attributes. Some of the XML elements and attributes may be optional.
The MPEG-7 content description for a particular piece of content is defined as an instance of an MPEG-7 schema; that is, it contains data that adheres to the syntax and semantics defined in the schema. The content description is encoded in an “instance document” that references the appropriate schema. The instance document contains a set of “descriptor values” for the required elements and attributes defined in the schema, and for any necessary optional elements and/or attributes. For example, some of the descriptor values for a particular movie might specify that the movie has three scenes, with scene one having six shots, scene two having five shots, and scene three having ten shots. The instance document may be encoded in a textual format using XML, or in a binary format, such as the binary format specified for MPEG-7 data, known as “BiM,” or a mixture of the two formats.
The instance document is transmitted through a communication channel, such as a computer network, to another system that uses the content description data contained in the instance document to search, filter and/or browse the corresponding content data stream. Typically, the instance document is compressed for faster transmission. An encoder component may both encode and compress the instance document or the functions may be performed by different components. Furthermore, the instance document may be generated by one system and subsequently transmitted by a different system. A corresponding decoder component at the receiving system uses the referenced schema to decode the instance document. The schema may be transmitted to the decoder separately from the instance document, as part of the same transmission, or obtained by the receiving system from another source. Alternatively, certain schemas may be incorporated into the decoder.
The content description may be transmitted prior to, or subsequent to, the content that it describes or may be transmitted along with the content. For example, MPEG-2 provides mechanisms for the inclusion of a content metadata stream with the content video stream. An MPEG-7 content description may be transmitted in this additional data stream. However, some standards may not allow for such an additional stream, and no current standards allow for the synchronization of the descriptive data with its associated content. That is, with current standards, the content descriptions descriptor values and the multimedia content they describe (e.g., scene, shot, frame) are not synchronized for delivery and presentation. For example, the current MPEG-7 standard lacks the necessary tools to map the timed transport of MPEG-7 data onto arbitrary delivery layers such as MPEG-2 and MPEG-4, to achieve synchronization with the multimedia content.