An audio scene may generally comprise audio objects and audio channels. An audio object is an audio signal which has an associated spatial position which may vary with time. An audio channel is an audio signal which corresponds directly to a channel of a multichannel speaker configuration, such as a so-called 5.1 speaker configuration with three front speakers, two surround speakers, and a low frequency effects speaker.
Since the number of audio objects typically may be very large, for instance in the order of hundreds of audio objects, there is a need for coding methods which allow the audio objects to be efficiently reconstructed at the decoder side. There have been suggestions to combine the audio objects into a multichannel downmix (i.e. into a plurality of audio channels which corresponds to the channels of a certain multichannel speaker configuration such as a 5.1 configuration) on an encoder side, and to reconstruct the audio objects parametrically from the multichannel downmix on a decoder side.
An advantage of such an approach is that a legacy decoder which does not support audio object reconstruction may use the multichannel downmix directly for playback on the multichannel speaker configuration. By way of example, a 5.1 downmix may directly be played on the loudspeakers of a 5.1 configuration.
A disadvantage with this approach is however that the multichannel downmix may not give a sufficiently good reconstruction of the audio objects at the decoder side. For example, consider two audio objects that have the same horizontal position as the left front speaker of a 5.1. configuration but a different vertical position. These audio objects would typically be combined into the same channel of a 5.1 downmix. This would constitute a challenging situation for the audio object reconstruction at the decoder side which would have to reconstruct approximations of the two audio objects from the same downmix channel, a process that cannot ensure perfect reconstruction and that sometimes even lead to audible artifacts.
There is thus a need for encoding/decoding methods which provide an efficient and improved reconstruction of audio objects.
Side information or metadata is often employed during reconstruction of audio objects from e.g. a downmix. The form and content of such side information may for example affect the fidelity of the reconstructed audio objects and/or the computational complexity of performing the reconstruction. It would therefore be desirable to provide encoding/decoding methods with a new and alternative side information format which allows for increasing the fidelity of reconstructed audio objects, and/or which allows for reducing the computational complexity of the reconstruction.
All the figures are schematic and generally only show parts which are necessary in order to elucidate the disclosure, whereas other parts may be omitted or merely suggested. Unless otherwise indicated, like reference numerals refer to like parts in different figures.