1. Field of the Invention
This invention relates generally to real-time, interactive multimedia and more specifically relates to rendering and blending techniques and systems used in the creation and presentation of real-time, interactive, three-dimensional (3-D) multimedia content on computer systems.
2. Discussion of Related Art
There exist standard rendering techniques referred to as “image composition.” Image composition generally involves taking multiple static images and blending or mixing them together to form a more visually complex and appealing image.
There are several key steps to traditional image composition methods. First, several images are created independently through any of a variety of standard techniques. For example, a digital camera can record a picture in digital memory, or a three-dimensional (3-D) modeling and animation software package can render a digital, two-dimensional (2-D) image projection of a 3-D scene and save it as a computer file.
Using a plurality of such static images, the selected images are loaded into a composition software application. The images are typically created in a digital format or may be converted to a digital format from their original form.
Next, “layers” of images are combined or rendered together. Each image may be associated with a layer. A layer may be envisioned as a sheet of clear plastic imprinted with a 2-D image. Composition may then be envisioned as the stacking of multiple such layers one atop another and taking a picture of the final result.
Merely stacking such layers of images is insufficient to create a complex composite image. Rather, each layer must be “blended” with other layers, so that layers underneath a higher layer show through as desired in the final resulting image. As known in the art, such blending often uses extra image information known as “alpha channel” or “alpha” information that, for example, may be used to define the level of transparency for each object or pixel in the particular layer. Referring back to the metaphor of a clear plastic sheet imprinted with an image, one can further imagine that the alpha channel information may be used to define where different areas of the plastic sheet (different portions of the imprinted image) are more or less transparent than other areas of the plastic sheet.
In traditional image composition, there are two distinct types of image composition. A first type, considered more primitive, is often referred to as “masking” wherein particular portions of the combined images are either fully transparent or completely opaque. A second type of image composition, generally considered more advanced, is herein referred to as “blending”, wherein each portion of an image may be combined mathematically with another image, allowing for example, each portion of the image to have any level of transparency in a spectrum ranging from fully transparent through completely opaque.
The techniques of advanced image composition can be applied to traditional video presentations, as videos are merely a series of static images presented to the viewer in quick succession. Postproduction video image composition generally involves taking multiple video clips from independent sources such as video cameras and 3-D graphics software, and blending or mixing their frame images together to form a more visually complex and appealing video. This step is typically called “postproduction” because it is a time intensive step that occurs after the primary video sources are created. Video presentations created with advanced image composition techniques using blending during postproduction can be more visually captivating and information dense compared to video presentations that do not use these techniques.
There exists a common media form known as “real-time interactive multimedia”. Real-time interactive multimedia generally involves the real-time construction of a graphical presentation on an end-user's computing device, and the subsequent display of the constructed presentation on a video monitor or viewing device. Such presentations typically consist of a variety of media objects such as 2-D graphics, 3-D graphics, video, and text, all brought together in a single presentation. Real-time interactive multimedia presentations usually animate or modify these media objects through time for visual or functional effect. The modifications are often in response to user interaction with an input device such as a computer mouse or keyboard.
Computing devices include end-user computers such as personal computer (“PCs”), set-top devices, personal digital assistants (“PDAs”) and workstations (all referred to herein synonymously as “computers”, “personal computers”, “user systems” or “PCs”).
The term “real-time” as used herein refers to the fact that a computer system is constructing, or dynamically rendering, a presentation image in time for it to be displayed without the viewer losing a sense of visual continuity. The term “visual continuity” refers to the ability to cause the human visual cortex to see a continuous progression of visual events from a time sequence of discrete frames or images that are displayed in quick succession. This technique is used in movie theaters, by displaying a time sequence of pictures at a rate of 24 frames per second. Experts in human vision and signal processing observe that visual continuity decreases as the rate at which a series of pictures is displayed decreases, also known as the “frame rate”. There are many dependent factors that affect visual continuity at a given frame rate, such as the type of multimedia presentation, the activity of the media objects within the presentation, among other factors. Generally speaking, 6 to 7 frames per second may be considered low quality, 8 to 19 frames per second may be considered good quality, and 20 frames per second and above may be considered high quality for multimedia presentations. Visual continuity may be achieved for special purposes in special sequences of images at rates of 5 frames per second or lower. In general, for most common multimedia presentations, visual continuity requires a frame rate of at least 5 frames per second.
Because each frame, or visual image, of a real-time interactive multimedia presentation is usually constructed after the last frame was presented to the viewer, but before the time at which visual continuity would be suspended, input to the computer by a user can affect the course of events in the presentation. Such interaction by the user allows the personal computer to produce a visual image, or frame, that differs from what would have been constructed and presented had the user not interacted with the presentation. This differs significantly from traditional video, where a series of static, pre-created images are displayed to a viewer in quick succession.
Real-time interactive multimedia presentations are usually stored as descriptions that tell the computer how to use various media objects to construct, or render, frames of images through time. Additionally, such descriptions instruct the computer as to how it should respond to user input during the presentation, allowing for increased utility for the user. Subsequently, real-time interactive multimedia presentations can produce large quantities of visual information from relatively small descriptions based on mathematical and algorithmic descriptions, by combining and rendering media objects in real-time on a viewer's computer. Such a description for constructing real-time interactive multimedia imagery is also known herein as “presentation data.”
More specifically, the presentation data used for the creation or rendering of a real-time interactive multimedia presentation typically includes scenes and scene views. As used herein, a “scene” is an algorithmic and mathematical description of media objects and their behavior through time, existing within a common coordinate system. As known in the art, a scene may have one or more associated virtual “cameras”, also known herein as “scene views”, or simply “views”. A scene view is a description of how image data should be calculated or rendered from an associated scene. A scene view is described in relation to a coordinate system in which media objects belonging to that scene are situated, enabling imagery to be derived from the scene. The properties of a view, such as where the view is spatially situated within the scene coordinate system and how the view is rotated in relation to the scene coordinate system, affect the imagery that is derived from the scene. Additionally, the scene view may specify additional properties that affect how the image data is rendered or calculated from the scene.
Real-time interactive multimedia presentation files tend to be much smaller, measured in bytes, than a comparable-quality digital video file that displays the same visual information. Additionally, traditional video is not generally interactive, and therefore does not allow a user to change the course of events in a presentation while it is being viewed. Consequently, real-time interactive multimedia is very desirable for many applications where traditional video is too large, such as delivery over a bandwidth constrained computer network, or where traditional video does not provide the required interactivity, such as educational applications.
However, real-time interactive multimedia systems typically use forms of image composition that are inferior in quality and style to advanced postproduction image composition techniques used with static images and traditional video. Specifically, real-time interactive multimedia systems allow for limited forms of image composition using real-time 2-D and 3-D scenes, but do not allow image composition involving the blending of imagery derived from multiple real-time 3-D scene views. The blending of imagery derived from multiple real-time views in relation to one or more 3-D scenes yields a graphical style similar to the advanced postproduction-oriented image composition used in the creation of static images and traditional video. Consequently, the real-time image composition methods operable in current real-time interactive multimedia systems handicap the visual quality of their presentations, rendering them less visually captivating and less information dense.
In sum, present media systems confront the user with a choice-they can use visually appealing video with advanced image composition but sacrifice dynamic interactivity and smaller file sizes, or they may use size-efficient, real-time interactive multimedia, but sacrifice the visually appealing features of blended, layered 3-D image composition.
It is evident from the above discussion that a need exists for an improved method of rendering real-time interactive multimedia presentations.