Digital multimedia information is becoming widely distributed through broadcast transmission, such as digital television signals, and interactive transmission, such as the Internet. The information may be in still images, audio feeds, or video data streams. However, the availability of such a large volume of information has led to difficulties in identifying content that is of particular interest to a user. Various organizations have attempted to deal with the problem by providing a description of the information that can be used to search, filter and/or browse to locate the particular content. The Moving Picture Experts Group (MPEG) has promulgated a Multimedia Content Description Interface standard, commonly referred to as MPEG-7 to standardize the content descriptions for multimedia information. In contrast to preceding MPEG standards such as MPEG-1 and MPEG-2, which define coded representations of audio-visual content, an MPEG-7 content description describes the structure and semantics of the content and not the content itself.
Using a movie as an example, a corresponding MPEG-7 content description would contain “descriptors,” which are components that describe the features of the movie, such as scenes, titles for scenes, shots within scenes, and time, color, shape, motion, and audio information for the shots. The content description would also contain one or more “description schemes,” which are components that describe relationships among two or more descriptors, such as a shot description scheme that relates together the features of a shot. A description scheme can also describe the relationship among other description schemes, and between description schemes and descriptors, such as a scene description scheme that relates the different shots in a scene, and relates the title feature of the scene to the shots.
MPEP-7 uses a Data Definition Language (DDL) to define descriptors and description schemes, and provides a core set of descriptors and description schemes. The DDL definitions for a set of descriptors and description schemes are organized into “schemas” for different classes of content. The DDL definition for each descriptor in a schema specifies the syntax and semantics of the corresponding feature. The DDL definition for each description scheme in a schema specifies the structure and semantics of the relationships among its children components, the descriptors and description schemes. The DDL may be used to modify and extend the existing description schemes and create new description schemes and descriptors.
The MPEG-7 DDL is based on the XML (extensible markup language) and the XML Schema standards. The descriptors, description schemes, semantics, syntax, and structures are represented with XML elements and XML attributes. Some of the XML elements and attributes may be optional.
The MPEG-7 content description for a particular piece of content is an instance of an MPEG-7 schema; that is, it contains data that adheres to the syntax and semantics defined in the schema. The content description is encoded in an “instance document” that references the appropriate schema. The instance document contains a set of “descriptor values” for the required elements and attributes defined in the schema, and for any necessary optional elements and/or attributes. For example, some of the descriptor values for a particular movie might specify that the movie has three scenes, with scene one having six shots, scene two having five shots, and scene three having ten shots. The instance document may be encoded in a textual format using XML, or in a binary format, such as the binary format specified for MPEG-7 data, known as “BiM,” or a mixture of the two formats.
The instance document is transmitted through a communication channel, such as a computer network, to another system that uses the content description data contained in the instance document to search, filter and/or browse the corresponding content data stream. Typically, the instance document is compressed for faster transmission. An encoder component may both encode and compress the instance document or the functions may be performed by different components. Furthermore, the instance document may be generated by one system and subsequently transmitted by a different system. A corresponding decoder component at the receiving system uses the referenced schema to decode the instance document. The schema may be transmitted to the decoder separately from the instance document, as part of the same transmission, or obtained by the receiving system from another source. Alternatively, certain schemas may be incorporated into the decoder.
Description schemes directed to describing content generally relate to either the structure or the semantics of the content. Structure-based description schemes are typically defined in terms of segments that represent physical, spatial and/or temporal features of the content, such as regions, scenes, shots, and the relationships among them. The details of the segments are typically described in terms of signals, e.g., color, texture, shape, motion, etc. The semantic description of the content is provided by the semantic-based description schemes. These description schemes describe the content in terms of what it depicts, such as objects, people, events, and their relationships. Depending on user domains and applications, the content can be described using different types of features, tuned to the area of application. For example, the content can be described at a low abstraction level using descriptions of such content features as objects' shapes, sizes, textures, colors, movements and positions. At a higher abstraction level, a description scheme may provide conceptual information about the reality captured by the content such as information about objects and events and interactions among objects. For example, a high abstraction level description may provide the following semantic information: “This is a scene with a barking brown dog on the left and a blue ball that falls down on the right, with the sound of passing cars in the background.”
Current methods for constructing semantic descriptions allow for automatic creation of many low-level descriptions. However, construction of high-level descriptions still requires significant human interaction. One reason for that is the lack of formal specification of logic in MPEG-7 semantic descriptions that would allow computer science specialists to develop software for automatically constructing semantic descriptions of any abstraction level.