1. Field of Invention
The invention relates to the field of coded multimedia and its storage and delivery to users, and more particularly to such coding when either the channel and decoding resources may be limited and time varying, or user applications require advanced interaction with coded multimedia objects.
2. Description of Related Art
Digital multimedia offers advantages including manipulation, multigeneration processing, error robustness and others, but incurs constraints due to the storage capacity or transmission bandwidth required, and thus frequently requires compression or coding for practical applications. Further, in the wake of rapid increases in demand for digital multimedia over the Internet and other networks, the need for efficient storage, networked access, search and retrieval, a number of coding schemes, storage formats, retrieval techniques and transmission protocols have evolved. For instance, for image and graphics files, GIF, TIF and other formats have been used. Similarly, audio files have been coded and stored in RealAudio, WAV, MIDI and other formats. Animations and video files have often been stored using GIF89a, Cinepak, Indeo and others.
To play back the plethora of existing formats, decoders and interpreters are often needed and may offer various degrees of speed and quality performance depending on whether these decoders and interpreters are implemented in hardware or in software, and particularly in the case of software, on the capabilities of the host computer. If such content is embedded in web pages accessed via a computer (e.g. a PC), the web browser needs to be set up correctly for all the anticipated content and recognize each type of content and support a mechanism of content handlers (software plugins or hardware) to deal with such content.
The need for interoperability, guaranteed quality and performance and economies of scale in chip design, as well as the cost involved in content generation for a multiplicity of formats has lead to advances in standardization in the areas of multimedia coding, packetization and robust delivery. In particular, ISO MPEG (International Standards Organization Motion Picture Experts Group) has standardized bitstream syntax and decoding semantics for coded multimedia in the form of two standards referred to as MPEG-1 and MPEG-2. MPEG-1 was primarily intended for use on digital storage media (DSM) such as compact disks (CDs), whereas MPEG-2 was primarily intended for use in a broadcast environment (transport stream), although it also supports an MPEG-1 like mechanism for use on DSM (program stream). MPEG-2 also included additional features such as DSM Command and Control for basic user interaction as may be needed for standardized playback of MPEG-2, either standalone or networked.
With the advent of inexpensive boards/PCMCIA cards and with availability of Central Processing Units (CPUs), the MPEG-1 standard is becoming commonly available for playback of movies and games on PCs. The MPEG-2 standard on the other hand, since it addresses relatively higher quality applications, is becoming common for entertainment applications via digital satellite TV, digital cable and Digital Versatile Disk (DVD). Besides the applications and platforms noted, MPEG-1 and MPEG-2 are expected to be utilized in various other configurations, in streams communicated over network and streams stored over hard disks/CDs, as well as in the combination of networked and local access.
The success of MPEG-1 and MPEG-2, the bandwidth limitation of Internet and mobile channels, the flexibility of web-based data access using browsers, and the increasing need for interactive personal communication has opened up new paradigms for multimedia usage and control. In response, ISO-MPEG started work on a new standard, MPEG-4. The MPEG-4 standard has addressed coding of audio-visual information in the form of individual objects and a system for composition and synchronized playback of these objects. While the MPEG-4 development of such a fixed parametric system continues, in the meantime, new paradigms in communication, software and networking such as that offered by the Java language have offered new opportunities for flexibility, adaptivity and user interaction.
For instance, the advent of the Java language offers networking and platform independence critical to downloading and executing of applets (java classes) on a client PC from a web server which hosts the web pages visited by the user. Depending on the design of the applet, either a single access to the data stored on the server may be needed and all the necessary data may be stored on the client PC, or several partial accesses (to reduce storage space and time needed for startup) may be needed. The latter scenario is referred to as streamed playback.
As noted, when coded multimedia is used for Internet and local networked applications on a computer like a PC, a number of situations may arise. First, the bandwidth for networked access of multimedia may be either limited or time-varying, necessitating transmission of the most significant information only and followed by other information as more bandwidth becomes available.
Second, regardless of the bandwidth available, the client side PC on which decoding may have to take place may be limited in CPU and/or memory resources, and furthermore, these resources may be time-varying. Third, a multimedia user (consumer) may require highly interactive nonlinear browsing and playback; this is not unusual, since a lot of textual content on web pages is capable of being browsed using hyperlinked features and the same paradigm is expected for presentations employing coded audio-visual objects. The parametric MPEG-4 system may only be able to deal with the aforementioned situations in a very limited way, such as by dropping objects or temporal occurrences of objects it is incapable of decoding or presenting, resulting in choppy audio-visual presentations. Further, MPEG-4 may not offer any sophisticated control by the user of those kinds of situations. To get around such limitations of the parametric system, one potential option for MPEG-4 development is in a programmatic system.
The use of application programming interfaces (APIs) has been long recognized in the software industry as a means to achieve standardized operations and functions over a number of different types of computer platforms. Typically, although operations can be standardized via definition of the API, the performance of these operations may still differ on various platforms as specific vendors with interest in a specific platform may provide implementations optimized for that platform. In the field of graphics, Virtual Reality Modeling Language (VRML) allows a means of specifying spatial and temporal relationships between objects and description of a scene by use of a scene graph approach. MPEG-4 has used a binary representation (BIFS) of the constructs central to VRML and extended VRML in many ways to handle real-time audio/video data and facial/body animation. To enhance features of VRML and to allow programmatic control, DimensionX has released a set of APIs known as Liquid Reality. Recently, Sun Microsystems has announced an early version of Java3D, an API specification which among other things supports representation of synthetic audiovisual objects as scene graph. Sun Microsystems has also released Java Media Framework Player API, a framework for multimedia playback. However, none of the currently available API packages offer a comprehensive and robust feature set tailed to the demands of MPEG-4 coding and other advanced multimedia content.