More and more digital broadcast services are now available, and it therefore appears as useful to enable a good exploitation of multimedia information resources by users, that generally are not information technology experts. Said multimedia information generally consists of natural and synthetic audio, visual, and object data, intended to be manipulated in view of operations such as streaming, compression and user interactivity, and the MPEG-4 standard is one of the most agreed solutions to provide a lot of functionalities allowing to carry out said operations. The most important aspect of MPEG-4 is the support of interactivity by the concept of object, that designates any element of an audio-visual scene: the objects of said scene are encoded independently and stored or transmitted simultaneously in a compressed form as several bitstreams, the so-called elementary streams. The architecture of a typical MPEG-4 terminal, shown in FIG. 1, comprises the following elements (starting at the bottom of the figure, but the functionality “interactivity” means that said components may also be actuated in the reverse sense, from the terminal to the server or anyother type of transmitter):    (a) a delivery or transport layer 11 also called “TransMux layer” and which is media independent—MPEG-4 data can be transporter on transport layers such as RTP (Internet), MPEG-2 transport streams, H.323, or ATM, for instance—and receives multiplexed streams of compressed data from a transmission (or storage) medium;    (b) a synchronization or elementary stream layer 12, also called “FlexMux layer”, which receives FlexMux streams from the layer 11 and which is in charge of the synchronization and buffering of the compressed data: this layer receives the packetized streams delivered by the transport layer 11 and outputs elementary streams respectively corresponding to different multimedia objects and composed of access units;    (c) a media layer (or compression layer) 13, receiving the elementary streams from the layer 12 and performing the decoding of the data that are extracted from said layer 12;    (d) a composition and rendering stage 14, intended to build the final scene arrangement, and a display 15 of the obtained audiovisual scene.
The specification of MPEG-4 include an object description framework intended to identify and describe the elementary streams (audio, video, etc . . . ) and to associate them in an appropriate manner in order to obtain the scene description and to construct and present to the end user a meaningful multimedia scene: MPEG-4 models multimedia data as a composition of objects. However the great success of this standard contributes to the fact that more and more information is now made available in digital form. Finding and selecting the right information becomes therefore harder, for human users as for automated systems operating on audio-visual data for any specific purpose, that both need information about the content of said information, for instance in order to take decisions in relation with said content.
The objective of the MPEG-7 standard, not yet frozen, will be to describe said content, i.e. to find a standardized way of describing multimedia material as different as speech, audio, video, still pictures, 3D models, or other ones, and also a way of describing how these elements are combined in a multimedia document. MPEG-7 is therefore intended to define a number of normative elements called descriptors D (each descriptor is able to characterize a specific feature of the content, e.g. the color of an image, the motion of an object, the title of a movie, . . . ), description schemes DS (the Description Schemes define the structure and the relationships of the descriptors), description definition language DDL (intended to specify the descriptors and description schemes), and coding schemes for these descriptions (FIG. 2 gives a graphical overview of these MPEG-7 normative elements and their relation). Whether it is necessary to standardize descriptors and description schemes is still in discussion in MPEG. It seems however likely that at least a set of the most widely used will be standardized.