The present invention relates to computer network based multimedia application in general, and more particularly to PROTO implementation in MPEG-4.
ISO/EEC 14496, commonly referred to as xe2x80x9cMPEG-4,xe2x80x9d is an international standard for coding of multimedia. Part 1 of the standard includes specifications for the description of a scene graph comprising one or more multimedia objects. Part 5 of the standard includes a software implementation of the specifications in the form of an MPEG-4 player. An MPEG-4 player parses a bitstream containing a scene description, constructs the scene graph, and renders the scene.
ISO/IEC 14496 specifies a system for the communication of interactive audio-visual scenes. This specification includes the following elements:
The coded representation of natural or synthetic, two-dimensional (2D) or three-dimensional (3D) objects that can be manifested audibly and/or visually (audio-visual objects);
The coded representation of the spatio-temporal positioning of audiovisual objects as well as their behavior in response to interaction;
The coded representation of information related to the management of data streams, including synchrorization, identification, description and association of stream content; and
A generic interface to the data stream delivery layer functionality.
The overall operation of a system communicating audiovisual scenes may be summarized as follows. At the sending terminal, the audio-visual scene information is compressed, supplemented with synchronization information, and passed to a delivery layer that multiplexes it into one or more coded binary streams that are transmitted or stored. At the receiving terminal, these streams are demultiplexed and decompressed. The audio-visual objects are composed according to the scene description and synchronization information, and are presented to the end user. The end user may have the option to interact with this presentation. Interaction information can be processed locally or transmitted back to the sending terminal. ISO/IEC 14496 defines the syntax and semantics of the bitstreams that convey such scene information, as well as the details of their decoding processes.
An audiovisual scene in MPEG-4 is composed from one or more individual objects or xe2x80x9cnodesxe2x80x9d arranged in an object tree, including primitive media objects that correspond to leaves in the tree and compound media objects that group primitive media objects together, encompassing sub-trees. For example, the primitive media objects corresponding to a visual representation of a talking person and his corresponding voice may be tied together to form a compound media object, containing both the aural and visual components of the talking person. Each node is of specific node-type, representing elements such as lines, squares, circles, video clips, audio clips, etc.
Scene description in MPEG-4 addresses the organization of audio-visual objects in a scene, in terms of both spatial and temporal attributes. This information enables the composition and rendering of individual audio-visual objects after the respective decoders have reconstructed the streaming data for them. The scene description is represented using a parametric approach, such as is defined by the Binary Format for Scenes (BIFS). The description consists of an encoded hierarchy/tree 100 of scene description nodes with attributes and other information, including event sources and targets, as illustrated in prior art FIG. 1. Leaf nodes 102 of tree 100 correspond to elementary audio-visual data, whereas intermediate nodes 104 group this material to form compound audio-visual objects, and perform grouping, transformation, and other such operations on audio-visual objects. The scene description can evolve over time by using scene description updates. (Dashed lines indicate links to leaf nodes not shown).
In order to facilitate active user involvement with the presented audio-visual information, ISO/IEC 14496 provides support for user and object interactions. Interactivity mechanisms are integrated with the scene description information in the form of xe2x80x9croutes,xe2x80x9d which are linked event sources and targets, and xe2x80x9csensors,xe2x80x9d which are special nodes that can trigger events based on specific conditions. These event sources and targets are part of scene description nodes, and thus allow close coupling of dynamic and interactive behavior with the specific scene at hand. Scene description node fields may be connected though routes, which cause a change in one node to affect the properties and behavior of another node. Constructing a scene involves parsing the scene hierarchy, node types, node fields values, and routes and linking them all in a scene graph.
Part 5 of the MPEG-4 standard provides reference software that implements different aspects of the MPEG-4 specification. One portion of the reference software is referred to as IM1, which includes several modules. The IM1 core module parses scene description bitstreams, constructs the memory structure of the scene graph, and prepares the scene graph for the Renderer module which traverses the scene graph and renders it on the terminal hardware (e.g., screen, speaker). IMI implements a two-plane approach in its handling of the scene graph. The two planes are referred to as the Scene Manipulation Plane and the Renderer Plane. The Scene Manipulation Plane parses the scene description stream, constructs the scene graph, manipulates it, and handles route activation without knowledge of the semantics of the node. The Renderer Plane traverses the scene graph and renders it without knowledge of the syntax of the scene description bitstream. The two planes are inplemented as two separate modules that communicate via the scene graph.
The IM1 player architecture is illustrated in FIG. 2. The core, generally designated 200, includes sub-modules that parse the scene description and construct the scene tree, collectively implementing the Scene Manipulation Plane. Modules 202 and 204 interact with the core module 200 through APIs. The Renderer 206 traverses the scene graph and renders the scene, implementing the Renderer Plane. Core 200 and Renderer 206 interface through a Scene Graph Interface 208.
IM1 scene graph elements are represented several main classes including:
Specific node types are represented by classes derived from MediaObject. There are two layers of inheritance. For each node type, a class is derived from MediaObject implementing the specific node syntax. Each node can then be overloaded by specific implementations of MPEG players in the Renderer Plane to implement the rendering of the scene graph For example, the Appearance node is declared as:
class Appearance:public MediaObject
BIFS_DECLARE_NODE is a macro that declares virtual functions, described below, that overloads the pure declarations in MediaObject.
The implementation of Appearance is as follows:
BIFS_IMPLEMENT_NODE_START(Appearance)
BIFS_FIELD(Appearance, material, 0, 0, 0, xe2x88x921, 0, 0)
BIFS_FIELD(Appearance, texture, 1, 1, 1, xe2x88x921, 0, 0)
BFS_FIELD(Appearance, textureTransform, 2, 2, 2, xe2x88x921, 0, 0)
BIS_IMPLEMENT_NODE_END(Appearance)
BIFS_IMPLEMENT_NODE_START, BIFS_IMPLEMENT_NODE_END and BIFS_FIELD are macros that are used together to define the field table for each node. MediaObject uses these tables to access the fields of its derived objects in a generic way using virtual methods that are declared as xe2x80x9cpurexe2x80x9d in MediaObject and overloaded by each node. Example of these methods include:
This mechanism ensures that once nodes are instantiated and inserted into the scene graph, their fields can be accessed by their serial number from the base MediaObject class. While processing scene update commands or routes, the Scene Manipulation Plane uses this technique to manipulate the scene graph.
There are several types of node fields, including the integer field (SFInt), character string field (SFString), or node field that points to child nodes (SFNode). In addition, node fields are either scalar (e.g., SFInt), or vector-based (e.g., MFInt). NodeField is the base class for node fields. Specific field types are implemented by classes that extend NodeField. NodeField uses virtual methods to overload classes in order to implement functionality that is specific to a certain field type. For example, the virtual method Parse is used to parse a field value from the scene description bitstream.
ROUTE is the mechanism to mirror a change in a field of one node in the scene graph into a field in another node. ROUTEs are implemented by the Route class. Among the member variables of this class are a pointer to the source node, a pointer to the target node, and serial numbers of the fields in the source and target nodes. The Route class also has an Activate method. Each NodeField object contains a list of all the routes that originate from this field. Whenever the field value is changed, the Activate method is called on all these routes. This method locates the target node of the route, calls its GetField method to get the target field, and copies the value of the source field to the target field. Copying is performed by the assignment operator, which is overloaded by each instance of NodeField.
In the Renderer Plane, Render is defined as a virtual method in MediaObject. Each node overloads this method with a specific Render method, which renders the node and all its child nodes. The Render methods are implemented so that each instance accesses the node; fields by their names, having knowledge of their types and semantics. This differs from the generic behavior of the Scene Manipulation Plane, which uses the GetField method to access node fields and has no knowledge about the specifics of each node.
Version 2 of the MPEG-4 standard introduces PROTOs. A PROTO is a parameterized sub-tree that can be defined once and be called into the scene graph at many places without repeating information. PROTO was introduced as a way to save bandwidth and authoring effort. A PROTO is a sub-graph of nodes and routes that can be defined once and inserted anywhere in the scene graph. This subgraph represents the PROTO code. A PROTO exposes an interface, which is a set of parameters whose syntax is similar to node fields. The parameters are linked to node fields in the PROTO code. These fields are called xe2x80x9cISedxe2x80x9d fields. A node is instantiated by using its name in the scene graph and assigning values to its interface parameters. The scene graph parser handles PROTO instances as regular nodes, viewing only its interface, while the Renderer processes the PROTO code.
PROTOs may be better understood with reference to FIGS. 3 and 4 which represents a scene having two Person objects. In FIG. 3 both Person objects 300 and 302 have the same structure, each comprising a Voice object 304 and a Sprite object 306. The Voice and Sprite objects 304 and 306 may have different attributes as expressed by different field values.
FIG. 4 shows the same scene when PROTO is used. A Person PROTO 400 is defined once, and instantiated twice, as Person1402 and Person2404. Thus, the Voice and Sprite objects are not themselves duplicated as they are in FIG. 3.
PROTOs provide a way to define new nodes by combining the functionality of predefined nodes. Scenes that use PROTOs contain a section of PROTO definitions, which can then be instantiated in the scene and used like normal nodes. While PROTOs are beneficial in theory, an efficient implementation of PROTOS remains elusive. PROTOs should be implemented in such a way that provides more benefit than cost. Specifically, PROTO implementation should require minimal changes to the IM1 code, in order not to compromise the efficiency and stability of the code. Furthermore, changes should be made in the Scene Manipulation Plane so that no modifications will be required at the Renderer Plane. The complex and divergent tasks of PROTO definition and PROTO instantiation must be clearly defined. Although PROTO can be used as a node, their structure is different than other MPEG-4 nodes that have fixed fields that are described by hard-coded node tables, while PROTO fields are defined at run-time. PROTOs should also be kept hidden from the Renderer. Without PROTOs, the Renderer traverses the scene graph from fields that point to nodes down to the nodes themselves. With PROTOs, fields might point to a PROTO instance instead of to a node of a known type, with the Renderer incapable of directly rendering a PROTO node.
The present invention seeks to provide an efficient PROTO implementation in MPEG4.
In accordance with the present invention, a method of PROTO implementation in MPEG-4 is provided, including the steps of: defining a PROTO object class, instantiating a PROTO object, calling the PROTO object into an MPEG-4 scene graph, and rendering the PROTO object.
In accordance with another embodiment of the present invention, the defining step of the method includes: defining the class by inheriting the class from MediaObject, defining: in the class a variable representing an array of NodeField* objects, inserting PROTO fields into the array of NodeField* objects, defining in the class a variable representing an array of BifsFieldTable structures, inserting descriptions of the PROTO fields into the array of BifsFieldTable structures, overloading GetFieldCount, GetFieldTable and GetField methods of the PROTO class, locating PROTO field objects, defining in the class a variable representing an array of pointers to the MediaObject, inserting at least one PROTO code node into the array of pointers to the MediaObject, defining in the class an array of pointers to routes, inserting at least one PROTO code route into the array of pointers to routes, linking at least one PROTO code ISed node field to a corresponding PROTO interface field by a route object, linking at least one IN parameter to a node field by a route object, linking at least one OUT parameter to a node field by a route object, linking at least one IN/OUT parameter by two routes, one for each direction, and adding any of the routes to a field of the PROTO object.
In accordance with another embodiment of the present invention, the instantiating step of the method includes: cloning an original PROTO object, cloning each node field of the original PROTO object, returning a pointer to the clone object, copying the value of each of the node fields to a NodeField object, cloning a route that connects two of the node fields between a source node and a target node, cloning at least one interface field object of the original PROTO object, storing the cloned interface field objects in the clone object, cloning at least one PROTO object node, cloning at least one PROTO object route, and returning a pointer to the clone PROTO.
In accordance with another embodiment of the present invention, the calling step includes: overloading either of the xe2x88x92 greater than  operator of SFGenericNode and the  operator of MFGenericNode of the PROTO object, and if the node that is pointed to is a PROTO instance, returning the address of the first node of the PROTO object""s PROTO code.