1. Field of the Invention
The present invention relates to a system for developing and executing a multimedia presentation. More particularly, the present invention concerns an extensible markup language (XML)-based system which allows development and execution of a synchronized multimedia presentation, wherein a sequence of the presentation and/or the behavior of a multimedia object therein is alterable in response to a user event, a system event, a timed event, and/or a detected state of another multimedia object.
2. Description of the Related Art
The hypertext markup language (“HTML”) is an XML-based language commonly used to create pages to be displayed by a World Wide Web browser application. More particularly, a developer creates a web page by developing an HTML source file which includes data specifying particular items to be included in the page and spatial relationships among the items. Upon receiving an HTML source file, a web browser parses the file to obtain instructions for formatting and displaying the items of the web page to a user.
Like other high-level programming languages, HTML utilizes fairly intuitive syntax and text-based commands. In addition, a previously-created HTML source file can easily be used as a model for a new source file and resulting web page, or can be edited so as to reformat its resulting web page using simple text-based editing. HTML has proved popular due to its reusability, editability, the simplicity of its required syntax and the functionality provided by its defined commands.
However, traditional HTML can be used only to define spatial, rather than temporal, relationships of objects within a web page. One proposal for providing control over temporal relationships between objects is video streaming, in which frames of a video presentation are sequentially delivered by an Internet server directly to a web browser. However, such a video presentation can be edited only by using sophisticated video-editing equipment, and is not easily-editable using a text-based markup language. Moreover, the video approach requires a very high bandwidth to produce a presentation of reasonable quality.
Another proposed approach requires animated GIF-formatted images to be merged into a single file, which is then downloaded to a web browser for rapid sequential display. Although such animation can be edited by adding, deleting, or substituting images of the animation, editing is significantly more difficult and less flexible than web page editing using a traditional markup language, particularly in the typical case in which a user has access only to the merged file. This approach also requires constant refresh of the web browser display, and therefore unacceptably consumes transmission bandwidth.
In addition to the above shortcomings, the content of a video presentation or an animated GIF presentation cannot be modified during execution of the presentation based on an occurrence of an event. It is possible to display, through a web browser, multimedia presentations which are responsive to events and/or user input by executing a suitable Java-language program using the browser's Java Virtual Machine. Development of these programs, however, requires a significant computer programming background. Therefore, this Java-based approach fails to provide the simplicity and editability of languages such as HTML.
In response to the foregoing, the World Wide Web Consortium (W3C) has circulated a proposed standard, entitled “Synchronized Multimedia Integration Language (SMIL) 1.0 Specification”, the contents of which are incorporated herein by reference. SMIL is text-based, XML-based, editable and reusable. In this regard, the W3C standard “Extensible Markup Language (XML) 1.0” is also incorporated herein by reference.
SMIL is intended to provide control over temporal relationships between web page objects. Specifically, SMIL allows a user to define whether multimedia objects of a page are executed in parallel or sequentially, and further provides nested control of parallel/sequential object execution.
For example, the SMIL specification defines the <par> element, which is used to specify elements to be executed in parallel. FIG. 1 shows an illustrative example of a SMIL source file portion utilizing the <par> element. The source file portion shown in FIG. 1 begins with <par> element 100 and ends with corresponding </par> notation 101. It should be noted that the <element> </element> beginning/end syntax is an XML grammar requirement.
According to the SMIL specification, all child elements nested one level below elements 100 and 101 are to be executed in parallel. Therefore, since two child elements <seq> 110 and <seq>120 exist at a first level below <par> element 100, objects corresponding to elements 110 and 120 are each executed in parallel.
In the case of <seq> 110, two media object elements exist between <seq> 110 and end notation 111. In this regard, and also according to the SMIL specification, elements existing as children to a <seq> element are executed sequentially. Therefore, video element 130 is processed, followed by video element 140. It should be noted that statements 130 and 140 each utilize the XML shorthand syntax which allows end notation “/” to be located within an element declaration.
As described above, child elements to <seq> element 120 are executed in parallel with the child elements of <seq> element 110 by virtue of <par> element 100. Therefore, all elements between <seq> element 120 and notation 121 are processed in parallel with elements 130 and 140. In this regard, nested within element 120 and notation 121 are <par> element 150, corresponding notation 151, <seq> element 160 and notation 161. According to <seq> element 120, the video sources indicated by video elements 170 and 180 are first played in parallel, followed by the video sources of video elements 190 and 200, which are played sequentially.
FIG. 2a and FIG. 2b illustrate a portion of a multimedia presentation governed by the SMIL source file shown in FIG. 1. As shown in FIG. 2a, video V1 begins executing at a same time that video V31 and video V32 begin executing in parallel. In this example, while either one of video V31 or video V32 continues to play, video V1 finishes and, by virtue of <seq> element 110, video V2 begins to play. At time t1, the one of video V31 and video V32 having a longer duration than the other terminates.
Next, as shown in FIG. 2b, video V2 continues to play and video V4 begins to play in parallel by virtue of <par> element 100. It should be noted that video V4 begins to play upon termination of the longer of video V31 and V32 due to <seq> element 120. After termination of video V4, video V5 is played.
FIG. 2c shows a timeline describing the presentation illustrated in FIG. 2a and FIG. 2b. It also should be noted that FIG. 2c represents only one possible timeline resulting from the FIG. 1 SMIL source file, and that the timeline of FIG. 2c depends heavily upon relative durations of the video objects used therein. As described with respect to FIG. 1, FIG. 2a and FIG. 2b, the timeline of FIG. 2c shows that video V1 and video V2 are played sequentially while video V31 and video V32 are played in parallel. After termination of V32, and while video V2 is playing in parallel, video V4 and video V5 are sequentially played.
The SMIL specification provides for presentation control mechanisms in addition to those shown in FIG. 1. For example, attributes can be added to the <par> and <seq> statements to alter their above-described functionality. The SMIL specification also describes several media object elements which can be used in addition to the <video> element of FIG. 1. These elements, which are described in detail in the SMIL specification, include <ref>, <animation>, <audio>, <img>, <video>, <text> and <textstream>. Each of the listed media object elements can also be used with specified attributes which influence their respective functioning.
Notably, the <par>, <seq>, and media object elements are each controllable based on system attributes, such as bit rate and screen size. As described in the SMIL specification, when an attribute specified for an element evaluates to “false”, the element carrying this attribute is ignored. For example, if statement 170 of FIG. 1 were replaced by                <video src “v31.mpg” system-bitrate=“56000”/>then, upon encountering line 170, a SMIL-enabled browser would evaluate the approximate bandwidth, in bits-per-second, available to the currently-executing system. In a case that the bandwidth is smaller than 56000, the replaced line 170 would be ignored. As a result, video V31 would be absent from the display shown in FIG. 2a.         
Other system attributes defined in detail in the SMIL specification include “system-captions”, “system-language”, “system-overdub-or-caption”, “system-required” and “system-screen-depth”. Each of these attributes can be included in a <par>, <seq>, or media object statement as described above to change the functioning thereof.
SMIL 1.0 also provides a <switch> element which can be helpful in controlling a multimedia presentation based on the above-described system attributes. The <switch> element allows an author to specify a set of alternative elements from which only one acceptable element should be chosen.
In this regard, upon encountering a <switch> element in a SMIL document, a browser evaluates the children of the <switch> element in the order in which they are listed. The first accepted child is selected to the exclusion of all other children of the <switch> element.
For example, in a case that lines 160, 190, 200 and 161 of FIG. 1 were replaced with
<switch>  <video src=“v4.mpg” system-bitrate=“56000”/>  <video src=“v5.mpg” system-bitrate=“28800”/></switch>,the presentation sequence governed by FIG. 1 would depend upon the approximated bandwidth. Specifically, in a case that the bandwidth is greater than or equal to 56000, only video V4 would be played in FIG. 2b upon termination of parallely-executing video V31 and video V32. On the contrary, if the system bitrate is less than 56000 but greater than 28800, video v5 would be played exclusively in FIG. 2b. 
One shortcoming of the above-described elements is that the testable attributes defined by SMIL 1.0 are all system-based attributes, in that they concern characteristics of the computing system upon which the browser is executing, or the system to which the browser is connected. Thus, although the <par> and <seq> synchronization elements and the testable attributes described above are useful in creating a multimedia presentation, these features do not provide control of multimedia presentations based on anything non-system based, such as a user interaction, a timed event, or a state change of an object.
In addition, SMIL 1.0 offers no ability to change an attribute of an object during execution of a multimedia presentation. As such, the functionality and flexibility of SMIL 1.0 is somewhat limited.
Accordingly, what is needed is a system for producing and displaying pages to be transmitted over the World Wide Web which provides control of multimedia objects on a web page based on user events, system events, timed events, or object state, and control over object attributes. Such a system also preferably allows simple and efficient creation and editing of such a web page and the control mechanisms therein.