Nowadays schemes for multiplexing and synchronization of coded bit stream of multimedia data including plural objects such as a moving image, a sound, a text and a CG are standardized in the ISO/IEC 14496 part 1 (MPEG-4 Systems). In the MPEG-4 Systems, operations of ideal terminal model called a system decoder model are defined.
The above MPEG-4 data stream, which is different from conventional general multimedia stream, has a function of independently transmitting/receiving plural video scenes and video objects on a single stream. Further, as for audio data, plural objects can be reproduced from a single stream. The MPEG-4 data stream includes BIFS (Binary Format for Scenes) expanded from VRML (Virtual Reality Modeling Language) for handling natural moving images and sounds as information defining spatial and time placement of respective objects as well as conventional video and audio data. The BIFS is information describing an MPEG-4 scene by binary representation.
As respective objects necessary for scene synthesis are independently subjected to an optimum coding before they are transmitted, they are independently decoded on the decoding side. In accordance with the description of the above BIFS, time axes of the respective data are synchronized to that inside a reproduction device, thereby the scene is synthesized and reproduced.
When a bit stream of such multimedia data is transmitted, it is necessary to generate and transmit data having an optimum amount of information in correspondence with capability and type of reception side terminal and the status of communication line. That is, if the reception side terminal is a mobile information terminal with a low processing capability such as a cellular phone or a PDA (Personal Data Assistant) or if the communication line is crowded, it is necessary for the transmission side to previously compress the transmission data by a high compression rate coding format, otherwise to reduce the image size, the transmission rate or frame rate then encode the data.
Plural schemes have been proposed for optimizing the information amount then encoding and transmitting data by controlling moving image/audio rate, selecting time/spatial scalability, converting image size, and/or controlling error durability in correspondence with capability of reception side terminal and the status of communication line.
However, regarding transmission of bit stream having plural objects such as a still image, a moving image, a CG, a text and the like, no scheme has been proposed for optimizing display positions and placement of the respective objects then encoding and transmitting the data.
The present invention has been made in consideration of the above problems, and has its object to provide data processing apparatus and method for, upon coding and delivery of multimedia data having plural objects such as a moving image, a still image, a text and a CG, changing the multimedia data such that the respective objects and the layout thereof are changed in correspondence with the type and capability of the reception side terminal.