1. Field of Invention
The invention relates to information processing, and more particularly to advanced storage and retrieval of audiovisual data objects according to the MPEG-4 standard, including utilization of an expanded physical object table including a list of local object identifiers.
2. Description of Related Art
In the wake of rapidly increasing demand for network, multimedia, database and other digital capacity, many multimedia coding and storage schemes have evolved. Graphics files have long been encoded and stored in commonly available file formats such as TIF, GIF, JPG and others, as has motion video in Cinepak, Indeo, MPEG-1 and MPEG-2, and other file formats. Audio files have been encoded and stored in RealAudio, WAV, MIDI and other file formats. These standard technologies have advantages for certain applications, but with the advent of large networks including the Internet the requirements for efficient coding, storage and transmission of audiovisual (AV) information have only increased.
Motion video in particular often taxes available Internet and other system bandwidth when running under conventional coding techniques, yielding choppy video output having frame drops and other artifacts. This is in part because those techniques rely upon the frame-by-frame encoding of entire monolithic scenes, which results in many megabits-per-second data streams representing those frames. This makes it harder to reach the goal of delivering video or audio content in real-time or streaming form, and to allow editing of the resulting audiovisual scenes.
In contrast with data streams communicated across a network, content made available in random access mass storage facilities (such as AV files stored on local hard drives) provide additional functionality and sometimes increased speed, but still face increasing needs for capacity. In particular, taking advantage of the random access characteristics of the physical storage medium, it is possible to allow direct access to, and editing of, arbitrary points within a graphical scene description or other audiovisual object information. Besides random access for direct playback purposes, such functionality is useful in editing operations in which one wishes to extract, modify, reinsert or otherwise process a particular elementary stream from a file.
In conjunction with the development of MPEG-4 coding and storage techniques, it is desirable to provide an improved ability to perform random access of audiovisual objects within video sequences. The opportunity to streamline random access would highlight and strengthen the potential of advanced capabilities provided by MPEG-4, and relieve the demands that those capabilities may impose on resources.
Part of the approach underlying MPEG-4 formatting is that a video sequence consists of a sequence of related scenes separated in time. Each picture is comprised of a set of audiovisual objects that may undergo a series of changes such as translations, rotations, scaling, brightness in color variations, etc., from one scene to the next. New objects can enter a scene and existing objects can depart, leaving certain objects present only in certain pictures. When scene changes occur, the entire scene and all the objects comprising the picture may be reorganized or initialized.
One of the identified functionalities of MPEG-4 is improved temporal random access, with the ability to efficiently perform random access of data within an audiovisual sequence in a limited time, and with fine resolution parts (e.g., frames or objects). Improved temporal random access techniques compatible with MPEG-4 involve content based interactivity requiring not only the ability to perform conventional random access, accessing individual pictures, but also the ability to access regions or objects within a scene.
While the MPEG-4 file format described in the incorporated 933 application realizes such advantages, that approach includes at least two disadvantages prompted in part on that file format""s reliance on a standard physical object table (POT) and segment object table (SOT) structure.
A fundamental limitation in the exchange of audio-visual information today is that its representation is extremely low level. Conventionally, audio-visual information is currently composed of coded video or audio samples, often organized into blocks, arranged in a commercial format. In contrast, in the future, multimedia will require flexible formats to allow a quick adaptation of the audio-visual information to various requirements in terms of access, bandwidth scalability, streaming, as well as general data reorganization.
The data structures, file formats, systems and methods of this invention provide enhanced audiovisual coding and storage techniques, related to MPEG-4, by introducing enhanced formatting including an expanded physical object table which utilizes an xe2x80x9corderedxe2x80x9d list of unique identifiers for a particular object for every object instance. Therefore, using the invention, two object instances of the same object in the same segment can be separately identified. Thus, among other advantages, different instances of the identical object may be differentiated from one another.
The term xe2x80x9corderedxe2x80x9d herein denotes that all access layer data (AL PDUs) of the same object instance are placed in the file in their natural order of occurrence, or coding order.
An additional benefit of the invention is that a given object instance can change its local identifier in time and still be randomly accessed by means of an improved physical object table/segment object table (POT)/(SOT) mechanism.
The invention in one aspect relates to a method of composing data in a file, and a medium for storing that file, the file including a file header containing physical object information and logical object information, and generating a sequence of audiovisual segments, each including a plurality of audiovisual objects. The physical object information and the physical object information contains pointers to access the audiovisual segments.
In another aspect the invention provides a corresponding method of extracting data from a file, including by accessing a file having a header which contains physical object information and logical object information, and accessing audiovisual segments contained therein.
In another aspect the invention provides a system for processing a data file including a processor unit and a storage unit connected to the processor unit, the storage unit storing a file including a file header and a sequence of audiovisual segments. The file header contains physical object information and logical object information, and the physical object information contains pointers to access the audiovisual segments.
This invention proposes a framework that integrates advanced concepts such as objects based audio-visual representation, meta-data and object oriented programming to achieve a flexible and generic representation of the audiovisual information and the associated methods to operate on it.
A multimedia file to be streamed over a given packet network should be quickly ready for streaming. Additionally, once transferred to the user terminal, multimedia file should allow easy editing and manipulation. This needs to be extended to the interchange of the audiovisual information among different systems and terminals, bridging the huge gap that exists between the way in which the user thinks about the multimedia and the way the current tools operate on it. By using an object based framework and meta-data information, the data structures, file formats, systems and methods of this invention provide the actual structure of the content to survive the process of acquisition, editing and distribution.
Meta-data is critical to allow further editing, indexing and searching as well as streaming over a given network support. It is essential to reach the required level of flexibility that it includes object relationships.
The data structures, file formats, systems and methods of this invention provide a conceptual framework for Intermedia format development in MPEG-4 called Flexible-Integrated Intermedia Format (Flexible-IIF or F-IIF). The Flexible-Integrated Intermedia Format (Flexible-IIF or F-IIF) is an advanced extension to the Integrated Intermedia Format (IIF) disclosed in the incorporated 015 application and set forth below in FIGS. 1-12. The Flexible-Integrated Intermedia Format (Flexible-IIF or F-IIF) can be visualized as a natural umbrella and unification tool for other Intermedia formats proposed in MPEG-4 and possibly a basis of the forthcoming MPEG-7.
Some of the current characteristics of the Flexible-Integrated Intermedia Format (Flexible-IIF or F-IIF) include enhanced flexibility, easy reprogrammability, versatile support for user and local terminal interaction, support for xe2x80x9cpackaged formats,xe2x80x9d and extension to the MPEG-4 Intermedia requirements specified in the incorporated 015 application. The Flexible-Integrated Intermedia Format (Flexible-IIF or F-IIF) is a very flexible and extensible meta-data representation and manipulation tool similarly to what is done in the context of computer music in the Xlisp based Stella, which is discussed inxe2x80x9chttp://ccrmawww.stanford.edu/CCRMA/Software/cm/tutorials/stella/toc.htmlxe2x80x9d and Common Music and Common Lisp Music, which is discussed at xe2x80x9chttp://ccrmawww. stanford.edu/CCRMA/Software/clm/clm.htmlxe2x80x9d.
These and other features and advantages of this invention are described in or are apparent from the following detailed description of the data structures, file formats, systems and methods according to this invention.