1. Field of Invention
This invention relates to systems and methods for decoding and presenting encoded audio and visual data. More specifically, this invention relates to systems and methods for playing, browsing and interacting with MPEG-4 coded scenes including one or more audio and/or visual objects.
2. Description of Related Art
MPEG-1 and MPEG-2 encoding and decoding standards are frame-based encoding and decoding techniques. That is, in MPEG-1 and MPEG-2, audio-visual data, such as a video recording, is organized into separate frames, where each frame is a complete image. In MPEG-1 and MPEG-2, the human-recognizable objects within each image are not distinguished from each other in encoding and decoding the data defining the image. Thus, while each frame can be treated independently from any other frame, each frame is itself a unitary element of the audio-visual data. FIG. 1 is an exemplary embodiment of an MPEG-2 playback system.
The Virtual Reality Modeling Language, or VRML, is a computer language that is used to create text descriptions defining three-dimensional synthetic images. That is, VRML is used to define the three-dimensional objects that appear in a synthetic, e.g., computer-generated, image, including shapes and sizes of the objects, the appearance of each object, including, material, color, shading and texture, and the location of each objects, including position and orientation. The objects are generally synthetic, e.g., computer-generated, objects. VRML is also used to define the lighting in the synthetic image, including the type and position of one or more light sources.
MPEG-4 is a new audio-visual data encoding and decoding standard. In particular, MPEG-4, in contrast to MPEG-1 and MPEG-2, is not a frame-based encoding and decoding technique. MPEG-4 is an object-based encoding and decoding technique. Objects can be synthetic or natural objects, and further, can be audio, video or graphics objects. In MPEG-4, each frame is decomposed into a plurality of different objects and a scene description graph that indicates where each object appears in that frame. The object-based nature of MPEG-4, along with requirements of flexible composition and user interactivity, requires using some scene description mechanism.
Each object resides in its own video object plane that defines at least that object""s shape, motion, opaqueness and color, including surface texture. The scene description graph defines the spatial location of each object within the bounds of the frame. The scene description graph also defines the position of each object within the depth of the frame, i.e., which objects are xe2x80x9cin front ofxe2x80x9d which other objects.
These features allow new kinds of flexibilities not offered by simply decoding and presenting a video frame as in MPEG-2. MPEG-4 players can be flexible, and the systems and methods for playing, browsing and interacting with MPEG-4 coded scenes of this invention allows users the ability to browse two-dimensional (2D) or three-dimensional (3D) MPEG-4 scenes typically composed from synthetic and natural media elements. Furthermore, the systems and methods for playing, browsing and interacting with MPEG-4 coded scenes of this invention allow users the ability to interact with and customize such scenes. This invention further describes systems and methods for constructing MPEG-4 based multimedia players and browsers to facilitate these flexibilities, such as programmatic control via JavaScript and Java, and to enhance the user""s experience, while, at the same time, remaining compatible with the MPEG-4 standards.
These and other features and advantages of this invention are described in or are apparent from the following detailed description of the systems and methods according to this invention.