Spatial audio coding tools are well-known in the art and are standardized, for example, in the MPEG-surround standard. Spatial audio coding starts from a plurality of original input, e.g., five or seven input channels, which are identified by their placement in a reproduction setup, e.g., as a left channel, a center channel, a right channel, a left surround channel, a right surround channel and a low frequency enhancement (LFE) channel. A spatial audio encoder may derive one or more downmix channels from the original channels and, additionally, may derive parametric data relating to spatial cues such as interchannel level differences in the channel coherence values, interchannel phase differences, interchannel time differences, etc. The one or more downmix channels are transmitted together with the parametric side information indicating the spatial cues to a spatial audio decoder for decoding the downmix channels and the associated parametric data in order to finally obtain output channels which are an approximated version of the original input channels. The placement of the channels in the output setup may be fixed, e.g., a 5.1 format, a 7.1 format, etc.
Also, spatial audio object coding tools are well-known in the art and are standardized, for example, in the MPEG SAOC standard (SAOC=spatial audio object coding). In contrast to spatial audio coding starting from original channels, spatial audio object coding starts from audio objects which are not automatically dedicated for a certain rendering reproduction setup. Rather, the placement of the audio objects in the reproduction scene is flexible and may be set by a user, e.g., by inputting certain rendering information into a spatial audio object coding decoder. Alternatively or additionally, rendering information may be transmitted as additional side information or metadata; rendering information may include information at which position in the reproduction setup a certain audio object is to be placed (e.g. over time). In order to obtain a certain data compression, a number of audio objects is encoded using an SAOC encoder which calculates, from the input objects, one or more transport channels by downmixing the objects in accordance with certain downmixing information. Furthermore, the SAOC encoder calculates parametric side information representing inter-object cues such as object level differences (OLD), object coherence values, etc. As in SAC (SAC=Spatial Audio Coding), the inter object parametric data is calculated for individual time/frequency tiles. For a certain frame (for example, 1024 or 2048 samples) of the audio signal a plurality of frequency bands (for example 24, 32, or 64 bands) are considered so that parametric data is provided for each frame and each frequency band. For example, when an audio piece has 20 frames and when each frame is subdivided into 32 frequency bands, the number of time/frequency tiles is 640.
A desired reproduction format, i.e. an output channel configuration (output loudspeaker configuration) may differ from an input channel configuration, wherein the number of output channels is generally different from the number of input channels. Thus, a format conversion may be used for mapping the input channels of the input channel configuration to the output channels of the output channel configuration.