In the MPEG-4 standard ISO/IEC 14496:2001, in particular in part 3 Audio and in part 1 Systems, several audio objects that can be coded with different MPEG-4 format coding types can together form a composed audio system representing a single soundtrack from the several audio substreams. User interaction, terminal capability, and speaker configuration may be used when determining how to produce a single soundtrack from the component objects. Audio composition means mixing multiple individual audio objects to create a single soundtrack, e.g. a single channel or a single stereo pair. A set of instructions for mixdown is transmitted or transferred in the bitstream. In a receiver the multiple audio objects are decoded separately, but not directly played back to a listener. Instead, the transmitted instructions for mixdown are used to prepare a single soundtrack from the decoded audio objects. This final soundtrack is then played for the listener.
ISO/IEC 14496:2001 is the second version of the MPEG-4 Audio standard, whereas ISO/IEC 14496 is the first version. In the above MPEG-4 Audio standard nodes for presenting audio are described. Header streams that contain configuration information, which is necessary for decoding the audio substreams are transported via MPEG-4 Systems. In a simple audio scene the channel configuration of the audio decoder—for example 5.1 multichannel—can be fed inside the Compositor from one node to the following node so that the channel configuration information can reach the presenter, which is responsible for the correct loudspeaker mapping. The presenter represents that final part of the audio chain which is no more under the control of the broadcaster or content provider, e.g. an audio amplifier having volume control and the attached loudspeakers.
‘Node’ means a processing step or unit used in the above MPEG-4 standard, e.g. an interface carrying out time synchronisation between a decoder and subsequent processing units, or a corresponding interface between the presenter and an upstream processing unit. In general, in ISO/IEC 14496-1:2001 the scene description is represented using a parametric approach. The description consists of an encoded hierarchy or tree of nodes with attributes and other information including event sources and targets. Leaf nodes in this tree correspond to elementary audio-visual data, whereas intermediate nodes group this material to form audio-visual objects, and perform e.g. grouping and transformation on such audio-visual objects (scene description nodes).
Audio decoders either have a predetermined channel configuration by definition, or receive e.g. some configuration information items for setting their channel configuration.