Conventional modeling languages for real-time 3D scene rendering have traditionally focused on aspects of scene structure, geometry, appearance, and, to some degree, animation, and interactivity. This focus has been driven by the following two factors. First, 3D computer graphics applications have been geared toward user-driven experiences and, thus, tend to be structured around a rendered response to events. Second, the majority of these applications take a “render it as fast as you can” approach to scene updates, with little respect paid to fidelity of the time base. Conventional modeling languages fail to provide the accuracy of temporal relationship between two media assets. For example, if a video asset and an audio asset are to start at the same time, this can be achieved by prescribing start time for each asset independent of other assets. This allows the start times to be slightly different. It is desirable that the start time for each asset be controlled by the same field, thereby resulting in accurate synchronization of the assets. Media assets include audio media, video media, animations, audio-visual media, images or events.
As full motion video and high fidelity audio are integrated into a scene rendering mix, it is desirable to deliver high quality television-like viewing experiences while supporting viewer interactivity. It is desirable to provide a passive viewing experience that is more television-like and not a web page-like viewing experience.
In a declarative markup language, the semantics required to attain the desired outcome are implicit in the description of the outcome. It is not necessary to provide a separate procedure (i.e., write a script) to get the desired outcome. One example of a declarative language is HyperText Markup Language (HTML).
Various approaches to scoring animation and playback have previously been developed in other computer-based media, including Macromedia Director and the WSC's Synchronized Multimedia Integration Language (SMIL). However, these existing scoring systems do not allow for declarative composition of a real-time scene wherein the independent scores are dynamically composed and decomposed hierarchically, structuring time in manner akin to the spatial scene graph. For example structuring blocks of time to be next to each other or structuring block of time to be parallel (synchronized) with each other. The conventional scoring systems do not allow variable rate and direction of score evaluation to be done declaratively, and neither do they allow declarative implementation of a modular computation strategy based upon a normalized “fraction done” output, suitable for rapid assembly and reuse of behavioral animation.