Digital multimedia information is becoming widely distributed though broadcast transmission, such as digital television signals, and interactive transmission, such as the Internet. The information may be in still images, audio feeds, or video data streams. The Moving Picture Experts Group (MPEG) has promulgated a Multimedia Content Description Interface, commonly referred to as MPEG-7, to standardize the description of multimedia information when it is transmitted from a system that generates the information to a system that uses the information.
MPEG-7 defines a generic structure for describing multimedia content. For example, the structure for a standard movie could include scenes, shots within scenes, titles for scenes, and time, color, shape, motion, and audio feature information for shots. The corresponding description would contain a “descriptor” component that describes features of the content, such as color, shape, motion, frequency, title, etc., and a “description scheme” component that describe relationships among two or more descriptors, e.g., a shot description scheme that relates together the features of a shot. For example, Fourier descriptors and polygon vertices are representations, or descriptors for a shape feature. A description scheme can also describe the relationship among other description schemes and between description schemes and descriptors, e.g., a scene description scheme that relates the different shots in a scene and relates the title of the scene to the shots.
The structure and format of an MPEG-7 content description is defined by a schema using a Description Definition Language (DDL), which is designed to define descriptors and description schemes. For each descriptor, the schema specifies the syntax and semantics of the corresponding feature. For each description scheme, the schema specifies the structure and semantics of the relationships among its children components, which are descriptors and description schemes. The DDL for MPEG-7 multimedia content is based on the XML (extensible markup language) and the XML Schema standards. The descriptors, description schemes, semantics, syntax, and structures of the content description are represented with XML elements and XML attributes. Some of the XML elements and attributes may be optional.
A multimedia content description is encoded in an XML “instance document” that references the appropriate schema and is an instance of the schema defined in DDL; that is, it contains data that adheres to the syntax and semantics defined in the DDL schema. The instance document contains a set of “descriptor values” for the required elements and attributes in the schema and for any necessary optional elements and/or attributes. An instance document is transmitted from a system having or generating the description data, through a communication channel such as a computer network, to another system that will a consume the multimedia content description data contained in the instance document.
When transmitting a multimedia description represented in XML it possible to send the data in either textual form or to encode the XML data into another binary form, such as the binary format specified for MPEG-7 data, known as “BiM”.
FIG. 1 illustrates one example of transmitting a multimedia document. In system 110, content description 112 is a multimedia description, for example a description of a movie. Encoder 114 encodes content description 112 in a format suitable for transmission. After encoding, encoded data stream 116 represents the encoded form used for transmitting content description 112 to decoder 118 over some communication channel (not shown). Decoder 118 receives encoded data stream 116 and uses it to reconstruct content description 120, which is the same as content description 112. Encoder 114 may use various methods to form the encoded data used to transmit content description 112.
While various means may be used by encoder 114 to transmit encoded data stream 116 to decoder 118, normally the description data must be sent as a encoded data stream over a communication channel, such as a network communication channel using a communications protocol like TCP/IP. Although compression can reduce transmission time by decreasing the size of the encoded stream, if the description is large, transmitting the entire description over a network can still take too much time. Rather than sending the entire description, the encoder may send parts of the description to the encoder. The method for determining which parts to send is not the subject of this invention and is application dependent. Herein, the term “description fragment” means a part of a description.
The content description 112 may be either static or dynamic. A static description is one that does not change during the duration of communication between encoder 114 and decoder 118. A dynamic description is one in which the description data 112 is changed during the duration of the communication between encoder 114 and decoder 118. This change must be reflected in the description data 120. Description data 112 is changed when data pertaining to the multimedia content is added, deleted, or changed. For example, if the description pertains to a scene being captured by a camera the description may change if an object appears or disappears from the scene. Additionally, if the description is about a television program, portions of the data may be updated, e.g. the broadcast time of the program may changes, and such changes sent from the encoder to decoder. When the description is dynamic it is more efficient to send only updates to the encoder rather the resending the entire changed description.
Furthermore, in a distributed environment, description data is spread across multiple hosts and each host contains only a part of a complete description. If all description fragments (i.e., parts of the description) are streamed from a single encoder, then the encoder spends a significant amount of time and resources in gathering all the description fragments.