1. Field of the Invention
This invention is directed to multimedia data storage, transmission and compression systems and methods. In particular, this invention is directed to systems and methods that implement the MPEG-J multimedia data storage, transmission and compression standards. This invention is also directed to control systems and methods that allow for graceful degradation and enhanced functionality and user interactivity of MPEG-4 systems.
2. Related Art
The need for interoperability, guaranteed quality and performance and economies of scale in chip design, as well as the cost involved in content generation for a multiplicity of formats, has lead to advances in standardization in the areas of multimedia coding, packetization and robust delivery. In particular, the International Standards Organization Motion Picture Experts Group (ISO MPEG) has created a number of standards, such as MPEG-1, MPEG-2, MPEG-4 and MPEG-J to standardize bitstream syntax and decoding semantics for coded multimedia.
In MPEG-1 systems and MPEG-2 systems, the audio-video model was very simple, where a given elementary stream covered the entire scene. In particular, MPEG-1 systems and MPEG-2 systems were only concerned with representing temporal attributes. Thus, there was no need to represent spatial attributes in a scene in MPEG-1 systems and MPEG-2 systems.
The success of MPEG-1 and MPEG-2, the bandwidth limitations of the Internet and other distributed networks and of mobile channels, the flexibility of distributed network-based data access using browsers, and the increasing need for interactive personal communication has opened up new paradigms for multimedia usage and control. The MPEG-4 standard addresses coding of audio-visual information in the form of individual objects and a system for combining and synchronizing playback of these objects.
MPEG-4 systems introduced audio-video objects, requiring that the spatial attributes in the scene also need to be correctly represented. Including synthetic audio-video content in MPEG-4 systems is a departure from the model of MPEG-1 systems and MPEG-2 systems, where only natural audio-video content representation was addressed. MPEG-4 systems thus provide the required methods and structures for representing synthetic and natural audio-video information. In particular, MPEG-4 audio-video content has temporal and spatial attributes that need to be correctly represented at the point of content generation, i.e., during encoding, and that also need to be correctly presented at the player/decoder. Because the MPEG-4 player/decoder also allows for limited user interactivity, it should more properly be referred to as an MPEG-4 browser.
Correctly representing temporal attributes in MPEG-4 systems is essentially no different than in MPEG-1systems and MPEG-2 systems. For these earlier standards, the temporal attributes were used to synchronize the audio portions of the data with the video portions of the data, i.e., audio-video synchronization such as lip-synchronization, and to provide system clock information to the decoder to help buffer management. Because significantly more diverse types of elementary streams can be included in MPEG-4 systems, representing temporal attributes is more complex. But, as mentioned earlier, the fundamental methods for representing temporal attributes in MPEG-4 systems is essentially the same as for MPEG-1 systems and MPEG-2 systems.
In the MPEG-1 systems and MPEG-2 systems standards, the specifications extend monolithically from the packetization layer all the way to the transport layer. For example, the MPEG-2 systems Transport Stream specification defined the packetization of elementary streams (i.e., the PES layer) as well as the Transport layer. With MPEG-4 systems, this restriction has been relaxed. The transport layer is not defined normatively, as the transport layer is very application specific. It is left to other standards setting bodies to define the transport layer for their respective application areas. One such body is the Internet Engineering Task Force (IETF), which will define standards for transporting MPEG-4 streams over the Internet.
Representing spatial information in MPEG-4 systems is carried out using a parametric approach to scene description. This parametric approach uses the Virtual Reality Modeling Language (VRML). The Virtual Reality Modeling Language allows spatial and temporal relationships between objects to be specified, and allows description of a scene using a scene graph approach.
The scene description defines one or more dynamic properties of one or more audio and video objects. However, in MPEG-4 systems, the Virtual Reality Modeling Language has been extended to provide features otherwise missing from Virtual Reality Modeling Language.
MPEG-4 uses a binary representation, BInary Format for Scene (BIFS), of the constructs central to VRML and extends VRML in many ways to handle real-time audio/video data and facial/body animation. The key extensions to Virtual Reality Markup Language for MPEG-4 systems involve streaming, timing and integrating 2D and 3D objects. These extensions are all included in the BInary Format for Scene (BIFS) specification.
FIG. 1 outlines one exemplary embodiment of a MPEG-4 systems player, which is also referred to as a xe2x80x9cPresentation Enginexe2x80x9d or an xe2x80x9cMPEG-4 browserxe2x80x9d. The main components on the main data path are the demultiplexer layer, the media decoders, and the compositor/renderer. Between these three sets of components there are decoder buffers and composition buffers, respectively. The MPEG-4 systems decoder model has been developed to provide guidelines for platform developers. The binary format for scene data is extracted from the demultiplexer layer, and it is used to construct the scene graph.
Using application programming interfaces (APIs) has been long recognized in the software industry as a way to achieve standardized operations and functions over a number of different types of computer platforms. Typically, although operations can be standardized via definition of the API, the performance of these operations may still differ on various platforms, as specific vendors with interest in a specific platform may provide implementations optimized for that platform.
To enhance the features of VRML and to allow programmatic control, DimensionX has released a set of APIs known as Liquid Reality. Recently, Sun Microsystems has announced an early version of Java3D, an API specification that supports representing synthetic audiovisual objects as a scene graph. Sun Microsystems has also released the Java Media Framework Player API, a framework for multimedia playback.
As noted above, when coded multimedia is used for distributed networked and local networked applications on a multimedia data processing system, such as a personal computer, a number of situations may arise. First, the bandwidth for networked access of multimedia may be either limited or time-varying, requiring transmission of only the most significant information, followed by transmitting additional information as more bandwidth becomes available.
Second, regardless of the bandwidth available, the client, i.e., the multimedia data processing system, decoding the transmitted information may be limited in processing and/or memory resources. Furthermore, these resources may be time-varying. Third, a multimedia user may require highly interactive nonlinear browsing and playback. This is not unusual, because significant amounts of textual content on distributed networks, such as the Internet, are capable of being browsed using hyperlinked features and because this is also expected to be true for presentations employing coded audio-visual objects. The parametric MPEG-4 system may only be able to deal with the these situations in a very limited way. For example, when the parametric MPEG-4 system is incapable of decoding or presenting all of the coded audio-visual objects, the parametric MPEG-4 system may respond by dropping those objects or temporal occurrences of those objects. However, this results in choppy audio-visual presentations. Further, MPEG-4 may not offer any sophisticated control to the user to allow the user to deal with these situations.
To get around the limitations of this known parametric MPEG-4 system, another known implementation of the MPEG-4 standard is a programmatic MPEG-4 system. U.S. patent application Ser. No. 09/055,934, incorporated herein by reference, discloses such a programmatic MPEG-4 system. This programmatic MPEG-4 system includes a set of defined application programming interfaces (APIs) for media decoding, user functionalities and authoring. These application programming interfaces can be invoked by client applications. This programmatic MPEG-4 system allows a number of enhanced real-time and other functions in response to user inputs, as well as graceful degradation in the face of limited system resources available to MPEG-4 clients.
The incorporated 934 application discloses standardized interfaces for MPEG-4 playback and browsing under user control, as well as one type of response to time-varying local and networked resources. These interfaces facilitate adaptation of coded media data to immediately available terminal resources. These interfaces also facilitate interactivity expected to be sought by users, either directly as a functionality or indirectly embedded in audiovisual applications and services expected to be important in the future.
The incorporated 934 application also discloses an interfacing method in the form of a robust application programming interface specification including a visual decoding interface, a progressive interface, a hot object interface, a directional interface, a trick mode, a transparency interface, and a stream editing interface. These interfaces facilitate a substantial degree of adaptivity.
This invention provides systems and methods that use a combination of MPEG-4 media and safe executable code so that content creators can embed complex control mechanisms with in their media data to intelligently manage the operation of the audio-visual session.
This invention separately provides systems and methods for implementing the MPEG-J video data storage, compression and decoding standards.
This invention separately provides an improved MPEG-J architecture.
This invention additionally provides an improved MPEG-J architecture having improved structure, modularity and organization.
This invention separately provides an MPEG-J application engine that allows for graceful degradation of MPEG-4 content in view of limited processing, memory or bandwidth resources.
This invention separately provides an MPEG-J application engine that allows for enhanced functionality of and user interactivity with MPEG-4 content.
This invention separately provides application programming interfaces for MPEG-J.
New paradigms in communication, software and networking, such as that offered by the Java(trademark) language, offer new opportunities for flexibility, adaptivity and user interaction. For instance, the advent of the Java(trademark) language offers networking and platform independence critical to downloading and executing of applets, such as, for example, Java classes, on a client system from a server system storing the applets. Depending on the design of the applet, either a single access to the data stored on the server may be needed and all the necessary data may be stored on the client, or several partial accesses may be needed. This partial access design is used to reduce storage space and time needed for startup. This partial access design is referred to as streamed playback.
This invention provides a collection of Java API""s with which applications can be developed to interact with a data processor and content. According to this invention, MPEG-J is a Java(trademark)-enabled set of standards that define the file organization, storage and compression of video data streams. In the context of MPEG-J according to this invention, the data processor can be implemented as a set-top box or a PC with Java packages conforming to a well-defined Java platform. The Java-based application includes Java byte code, which may be available from a local source, like a hard disk, or which may be loaded from a remote site over a network. As indicated above, the term xe2x80x9cMPEG-4 browserxe2x80x9d refers to the MPEG-4 system. MPEG-J adds programmatic control to the MPEG-4 system, through an xe2x80x9cApplication enginexe2x80x9d. The Application Engine enhances the Presentation Engine by providing added interactive capability. The MPEG-J Java byte code will be available to the MPEG-J Application engine as a separate elementary stream.
The improved architecture and application programming interfaces (APIs) of MPEG-J according to this invention allow selective media decoding facilitating graceful degradation to varying resources of a client, as well as improved functionalities as required in interactive user applications.
In one potential use of MPEG-J, a content provider designs all of the MPEG-J content, i.e., the MPEG-J data stream. This use of the MPEG-J standard is desirable for content providers, and requires only incremental updates to the MPEG-J data stream. In this case, any changes to the MPEG-J data stream can be done using binary format for scene (BIFS) updates. In another potential use of MPEG-J, the client dynamically controls the displayed video scene generated from the MPEG-J data stream. This use of the MPEG-J standard desirable for set top manufacturers. However, change in non-updatable nodes of the MPEG-J data stream may not be possible.
MPEG-J will eventually need to serve both of the potential uses. Since the first potential use is more deterministic than the second potential use, the systems and methods of this invention fully implement the first potential use. The systems and methods of this invention provide the hooks to partially implement the second potential use. In particular, in one exemplary embodiment of the systems and methods of this invention, the MPEG-J scene graph capabilities are always based on the tightly-integrated, i.e., content-provider-oriented, model.
In one exemplary embodiment of the application programming interfaces of this invention, the MPEG-J application programming interface (API) is not a single application programming interface, but rather is a collection of application programming interfaces (APIs) that address various interfaces for a flexible MPEG-4 system. In one exemplary embodiment, the MPEG-J application programming interfaces are implemented using the Java language. The application programming interfaces include one or more of an Application Manager API; a SceneGraph Manager API; a Resource Manager API; an Interactivity Manager API; a Media Decoders API; a Functionality API; a Networking API; and a Device API. In one exemplary embodiment of the application programming interfaces of this invention, one or more of these application programming interfaces are implemented as object-oriented-programming object classes. The object classes are organized into various packages. In particular, various objects of one or more of the application programming interfaces are collected into a single package.
These and other features and advantages of this invention are described in or are apparent from the following detailed description of various exemplary embodiments of the systems and methods according to this invention.