1. Field of Invention
The invention relates to the field of coded multimedia and its storage, transmission and delivery to users, and more particularly to such coding when a flexible means for generating, editing or interpreting bitstreams representing multimedia objects is necessary.
2. Description of Related Art
Digital multimedia offers advantages of manipulation, multigeneration processing and error robustness and others, but incurs constraints due to the storage capacity or transmission bandwidth required. Multimedia content thus frequently needs to be compressed or coded. Further, in the wake of rapid increases in demand for digital multimedia over the Internet and other networks, the need for efficient storage, networked access and search and retrieval has increased, and a number of coding schemes, storage formats, retrieval techniques and transmission protocols have evolved. For instance, for image and graphics files, GIF, TIF and other formats have been used. Similarly, audio files have been coded and stored in RealAudio, WAV, MIDI and other formats. Animations and video files have often been stored using GIF89a, Cinepak, Indeo and others. To play back the plethora of existing formats, decoders and interpreters are often needed, and may offer various degrees of speed, quality and performance depending on whether these decoders and interpreters are implemented in hardware or in software, and particularly in the case of software, on the capabilities of the host computer. If multimedia content is embedded in web pages accessed via a computer (e.g. a PC), the web browser needs to be set up correctly for all the anticipated content and must recognize each type of content and support a mechanism of content handlers (software plugins or hardware) to deal with such content.
The need for interoperability, guaranteed quality and performance and economies of scale in chip design, as well as the cost involved in content generation for multiplicity of formats, has led to advances in standardization in the areas of multimedia coding, packetization and robust delivery. In particular, International Standards Organization Motion Pictures Experts Group (ISO MPEG) has standardized bitstream syntax and decoding semantics for coded multimedia in the form of two standards, referred to as MPEG-1 and MPEG-2. MPEG-1 was primarily intended for use on digital storage media (DSM) such as compact disks (CDs), whereas MPEG-2 was primarily intended for use in broadcast environment (transport stream), although it also supports MPEG- 1 like mechanism for use on DSM (program stream). MPEG-2 also included additional features such as DSM-Control and Command for basic user interaction, as may be needed for standardized playback of MPEG-2, either standalone or networked. With the advent of inexpensive boards and PCMCIA cards and the availability of fast Central Processing Units (CPUs), the MPEG-1 standard is becoming commonly available for playback of movies and games on PCs. The MPEG-2 standard, on the other hand, since it addresses relatively higher quality applications, is becoming common for entertainment applications via digital satellite TV, digital cable and Digital Versatile Disk (DVD). Besides the applications/platforms noted, MPEG-1 and MPEG-2 are expected to be utilized in various other configurations, in streams communicated over network, streams stored on hard disks or CDs, and in the combination of networked and local access.
The success of MPEG-1 and MPEG-2, the bandwidth limitations of Internet and mobile channels, the flexibility of web based data access using browsers, and the increasing need for interactive personal communication has opened up new paradigms for multimedia usage and control. In response, ISO-MPEG has developed a new standard, called MPEG-4. The MPEG-4 standard has addressed coding of audio-visual information in the form of individual objects, and a system for composition and synchronized playback of these objects. While development of MPEG-4 for such fixed systems continues, in the meantime new paradigms in communication, software and networking such as that offered by the Java language have offered new opportunities for flexibility, adaptivity and user interaction. For instance, the Java language offers networking and platform independence critical to downloading and executing of applets (java classes) on a client PC from a web server which hosts the web pages visited by the user. Depending on the design of the applet, either a single access to the data stored on the server may be needed and all the necessary data may be stored on the client PC, or several partial accesses (to reduce storage space and time needed for startup) may be needed. The later scenario is referred to as streamed playback.
As noted, when coded multimedia is used for Internet and local networked applications on a computer, say a PC, a number of situations may arise. First, the bandwidth for networked access of multimedia may be either limited or time-varying, necessitating transmission of the most significant information only and followed by other information as more bandwidth becomes available. Second, regardless of the bandwidth available, the client side PC on which decoding may have to take place may be limited in CPU and/or memory resources, and furthermore, these resources may also be time-varying. Third, a multimedia user (consumer) may require highly interactive nonlinear browsing and playback. This is not unusual since a lot of textual content on web pages is capable of being browsed via use of hyperlinked features, and the same paradigm is expected for presentations employing coded audio-visual objects. The MPEG-4 system without enhanced capabilities may only be able to deal with the aforementioned situations in a very limited way.
The use of application programming interfaces (APIs) has long been recognized in the software industry as a means to achieve standardized operations and functions over a number of different types of computer platforms. Typically, although operations can be standardized via definition of API, the performance of these operations may differ on various platforms as specific vendors with interest in a specific platform may provide implementations optimized for that platform. In the field of graphics, Virtual Reality Modeling Language (VRML) allows a means of specifying spatial and temporal relationships between objects and description of a scene by use of a scene graph approach. MPEG-4 has used a binary format screen representation (BIFS) of the constructs central to VRML and extended VRML in many ways to handle real-time audio/video data and effects such as facial or body animation. Since the MPEG-4 standard offers many tools for coding of various types of media as well as scene graph representation, and further, each media coding may involve separate coding of individual objects, an organized yet flexible mechanism for bitstream generation, editing and interpretation is highly desirable.