A block diagram of a conventional distributed multimedia presentation environment is shown in FIG. 1. A multimedia system in such an environment can produce a composed multimedia document (not shown) which is displayed on a presentation device 11, such as a television or computer monitor. First, various presentation documents 16, 17 are pre-composed by authors. This task of generating the presentation documents is called authoring. Many different types of documents can be authored, e.g., composition of video presentation 16 or composition of Web pages 17. Once these documents are prepared, they are stored in various data storage devices 14, 15 such as hard disks, digital video disks or storage devices of a satellite broadcasting company. Upon a user's request for presentation of a document, the presentation documents are delivered through the network 13 to the presentation device 11 for presentation. This is referred to as the “pull” mode. The presentation can also be sent by the storage devices 14, 15 pro-actively (e.g., by the broadcasting companies) to users without any explicit request via the network 13 or otherwise. This is referred to as the “push” mode.
Therefore, conventionally, multimedia presentation materials are generated before the presentation, i.e., at an authoring time. Once generated, presentation is accomplished exactly as the presentation materials are pre-composed. The user's capability to interact with the presentation is limited through interaction with the control panels 12 on the presentation device 11. Typical examples of the possible interactions provided by these control panels 12 include selection of different materials (channels or URLs), fast forward, fast reverse, pause, etc. However, in prior art systems, no dynamic changes to the presentation materials are supported.
A somewhat enhanced interaction capability is provided by the 3D object model. A pre-specified interaction semantic is built into the object. For example, PanoramIX, an image-based rendering software from IBM (www.software.ibm.com/net.media), uses environment maps (e.g., cylinders with tops/bottoms or polyhedral approximations of spheres) to render the background from a fixed viewpoint that does not translate. It supports smooth rotation and continuous zoom of the view. PanoramIX also allows pre-composition of a complex scene with embedded sprites, audio segments, video clips and 3D objects. It uses a control file that is pre-defined during scene authoring, using a special authoring tool.
Another example of a prior art system that uses pre-composed complex scenes is contained in the specifications of the MPEG-4 standard. MPEG-4 employs a BIFS (Binary Information for Scenes) composite file that describes a scene hierarchy, along the lines of VRML (Virtual Reality Modelling Language) scene graphs. Leaf nodes of the graph are AVOs (Audio-Visual Objects) that could be a video or audio or a 3D object or any other component media type. In fact, PanoramIX may be viewed as one instance of the MPEG-4 specification.
While they do provide some added interaction capability, these systems still fall short of providing the full ability to dynamically alter presentations after authoring.