1. Field of Invention
The invention relates to information processing, and more particularly to advanced storage and retrieval of audiovisual data objects according to the MPEG-4 standard, including utilization of an expanded physical object table including a list of local object identifiers.
2. Description of Related Art
In the wake of rapidly increasing demand for network, multimedia, database and other digital capacity, many multimedia coding and storage schemes have evolved. Graphics files have long been encoded and stored in commonly available file formats such as TIF, GIF, JPG and others, as has motion video in Cinepak, Indeo, MPEG-1 and MPEG-2, and other file formats. Audio files have been encoded and stored in RealAudio, WAV, MIDI and other file formats. These standard technologies have advantages for certain applications, but with the advent of large networks including the Internet the requirements for efficient coding, storage and transmission of audiovisual (AV) information have only increased.
Motion video in particular often taxes available Internet and other system bandwidth when running under conventional coding techniques, yielding choppy video output having frame drops and other artifacts. This is in part because those techniques rely upon the frame-by-frame encoding of entire monolithic scenes, which results in many megabits-per-second data streams representing those frames. This makes it harder to reach the goal of delivering video or audio content in real-time or streaming form, and to allow editing of the resulting audiovisual scenes.
In contrast with data streams communicated across a network, content made available in random access mass storage facilities (such as AV files stored on local hard drives) provide additional functionality and sometimes increased speed, but still face increasing needs for capacity. In particular, taking advantage of the random access characteristics of the physical storage medium, it is possible to allow direct access to, and editing of, arbitrary points within a graphical scene description or other audiovisual object information. Besides random access for direct playback purposes, such functionality is useful in editing operations in which one wishes to extract, modify, reinsert or otherwise process a particular elementary stream from a file.
In conjunction with the development of MPEG-4 coding and storage techniques, it is desirable to provide an improved ability to perform random access of audiovisual objects within video sequences. The opportunity to streamline random access would highlight and strengthen the potential of advanced capabilities provided by MPEG-4, and relieve the demands that those capabilities may impose on resources.
Part of the approach underlying MPEG-4 formatting is that a video sequence consists of a sequence of related scenes separated in time. Each picture is comprised of a set of audiovisual objects that may undergo a series of changes such as translations, rotations, scaling, brightness in color variations, etc., from one scene to the next. New objects can enter a scene and existing objects can depart, leaving certain objects present only in certain pictures. When scene changes occur, the entire scene and all the objects comprising the picture may be reorganized or initialized.
One of the identified functionalities of MPEG-4 is improved temporal random access, with the ability to efficiently perform random access of data within an audiovisual sequence in a limited time, and with fine resolution parts (e.g., frames or objects). Improved temporal random access techniques compatible with MPEG-4 involve content-based interactivity requiring not only the ability to perform conventional random access, accessing individual pictures, but also the ability to access regions or objects within a scene.
While the MPEG-4 file format described in the incorporated application Ser. No. 09/055,933 realizes such advantages, that approach includes at least two disadvantages prompted in part on that file format's reliance on a standard physical object table (POT) and segment object table (SOT) structure.
A fundamental limitation in the exchange of audio-visual information today is that its representation is extremely low level. Conventionally, audio-visual information is currently composed of coded video or audio samples, often organized into blocks, arranged in a commercial format. In contrast, in the future, multimedia will require flexible formats to allow a quick adaptation of the audio-visual information to various requirements in terms of access, bandwidth scalability, streaming, as well as general data reorganization.