The invention relates to a method of encapsulation of data into network transport packets of constant size, said data being organized in successive, individually accessible portions of coded representations of audio-visual objects, and each of said portions being sub-divided into segments. This invention is particularly useful with networks such as MPEG-2 Transport Stream (MPEG-2 TS) and the Asynchronous Transfer Mode (ATM), for the encapsulation of MPEG-4 data into the transport packets of these networks.
The future MPEG-4 standard, which will be in force in January 1999, proposes standardized ways to represent audio-visual objects (called AVOs) of natural or synthetic origin, to compose them together to create compound AVOs that form audio-visual scenes, to multiplex and synchronize the data associated with these AVOs, and to interact with the audio-visual scenes generated at the receiver""s end.
As shown in FIG. 1, described later in a more detailed manner, an MPEG-4 audio-visual scene, received by a system such as described for instance in the document xe2x80x9cMPEG-4: Context and objectivesxe2x80x9d, R. Koenen and al., Signal Processing: Image Communication 9 (1997), May 1997, no4, pp.295-304, is generally composed of several AVOs organized in a hierarchical fashion. The leaves of this hierarchical organization are primitive AVOs such as: a background, the picture of a talking person, the voice associated with that person, and so on, of any typexe2x80x94text, graphics, . . . xe2x80x94and which may be either bi- or tridimensional (2D,3D).
The data associated with these AVOs are conveyed in one or more Elementary Streams (ESs), characterized by the quality of service (QoS) they require for transmission and some other parameters. The data streams, coming from a transmission network or a storage medium in the form of TransMux Streams, must be properly demultiplexed to recover the Elementary Streams. These Elementary Streams are then passed to the appropriate decoders in view of their decompression, and in order to reconstruct the original AVOs (Primitive AV Objects). Decoded AVOs, along with scene description indications giving information on the composition of the concerned scene, are then used to compose and render the scene as described by its author (in a given hierarchical form). Also to the extent allowed by the author, upstream data are sent back to the Network Layer in order to interact with the scene.
The Systems part of the MPEG-4 standard describes a system for communicating audiovisual information in the form of a coded representation of natural or synthetic objects (the media objects called AVOs hereabove). In such a system, at the sending side, this audiovisual information is indeed compressed, composed, and multiplexed in binary streams, and after the transmission, at the receiving side, these streams are demultiplexed, decompressed, composed, and presented to the terminal of the end user (who generally can interact with the presentation). The Elementary Streams conveying the data associated with the AVOs contain the coded representation of these data: scene description information, audiovisual information, content-related information, and other additional data. After transmission, the ESs are decoded, composed according to the scene description information (the composition being in fact defined as the process of applying scene description information in order to identify the spatio-temporal attributes of the media objects) and presented to the terminal, all these processes being synchronized according to the terminal decoding model (=Systems Decoder Model, or SDM) and the synchronization information.
The purpose of said SDM is to provide a view of the behavior of a terminal complying with the MPEG-4 standard: it is used by the sender to predict how the receiver will behave in terms of buffer management and synchronization when reconstructing the audiovisual information that composes the session. More precisely, an MPEG-4 terminal (such as depicted in FIG. 1) comprises a multi-layer structure consisting of a TransMux layer, a FlexMux layer and an Access Unit layer (this Layer Model provides a common model on which all implementations of MPEG-4 terminals can be based). The TransMux layer, which designates any existing or future underlying multiplex functionality that is suitable to transport MPEG-4 data streams (thus allowing MPEG-4 to be used in a wide variety of operation environments), is not defined in the context of MPEG-4: it is in fact an interface to the transmission network (for example, MPEG-2 TS or ATM) or the storage medium, that allows to offer transport services matching the requested quality of service. The FlexMux layer, completely specified by MPEG-4, consists of a flexible tool for interleaving data (one or more Elementary Streams into one FlexMux stream) and allows to identify the different channels for the data that have been multiplexed.
The Access Unit layer conveys both time base information and time stamped Access Units of the Elementary Streams and allows therefore for an identification of Access Units (video or audio frames, scene description commands, . . . ) in the Elementary Streams and a recovery of time base (an Access Unitxe2x80x94or AUxe2x80x94is the smallest individually accessible portion of the coded representation of an AVO within an Elementary Stream, to which timing information can be attributed). A compression layer processes the data (object descriptor, scene description information, primitive AV objects) allowing to carry out the composition and rendering steps of the concerned audiovisual interactive scene and the data corresponding to the interactive actions allowed by the return channel.
Moreover, the Elementary Streams are conveyed according to a packetized representation: the ESs data encapsulated into so-called SL-packetized streams are sent and/or received through a stream multiplex interface intended to encapsulate the demultiplexer of the SDM, to provide access to streaming data and to fill up decoding buffers with these data. An SL-packetized stream consists of a sequence of packets (according to the syntax and semantics defined in the standard) that encapsulate a single ES. The packets contain elementary stream data partitioned into the above-mentioned Access Units, as well as side information for timing and Access Unit labeling.
For a transmission of data, and especially of multimedia data of the MPEG-4 type over various networks, the format of the data has to be adapted to the format the network is able to work with. When adapting multiplexed packets to networks working with packets of constant size (such as MPEG-2 TS or ATM, as described hereinafter) in order to interoperate with these networks, it could unfortunately happen, even if the data is segmented to fit the size of these packets, that some segments are too small to fit this size. An object of the invention is therefore to propose a general method of adaptation of multiplexed data to networks working with packets of constant size. To this end the invention relates to a method such as described in the preamble of the description and which is moreover characterized in that it comprises, for matching the last segment of each portion to the constant size of the transport network, a padding step provided for adding a specific padding packet to each of said last segments. This technical solution has a particular interest when said data is multimedia data of the MPEG-4 type and each of said portions, called Access Unit or AU, is sub-divided into segments called Access Unit layer-Packet Data Units, or AL-PDUs. More particularly, said method is characterized in that the size of the padding packet of each successive portion is computed according to the following sub-steps:
the number of segments of each portion is detected and examined;
if said number is greater than 1, each successive network packet of constant size is built by adding to each segment except the last one appropriate headers corresponding to the concerned transport network, and the size of the padding packet is then computed by difference between the size of the last segment and the size of the network packets and taking into account the values of said headers;
if said number is not greater than 1, the size of the padding packet is computed by difference between the size of the single segment and the size of the network packets and taking into account the values of said headers;
based on that size of the padding packet, the last complete network packet corresponding to said last or single segment is built.