The present invention relates to a transformation of multi-channel parameters, and in particular to the generation of coherence parameters and level parameters, which indicate spatial properties between two audio signals, based on an object-parameter based representation of a spatial audio scene.
There are several approaches for parametric coding of multi-channel audio signals, such as ‘Parametric Stereo (PS)’, ‘Binaural Cue Coding (BCC) for Natural Rendering’ and ‘MPEG Surround’, which aim at representing a multi-channel audio signal by means of a down-mix signal (which could be either monophonic or comprise several channels) and parametric side information (‘spatial cues’) characterizing its perceived spatial sound stage.
Those techniques could be called channel-based, i.e. the techniques try to transmit a multi-channel signal already present or generated in a bitrate-efficient manner. That is, a spatial audio scene is mixed to a predetermined number of channels before transmission of the signal to match a predetermined loudspeaker set-up and those techniques aim at the compression of the audio channels associated to the individual loudspeakers.
The parametric coding techniques rely on a down-mix channel carrying audio content together with parameters, which describe the spatial properties of the original spatial audio scene and which are used on the receiving side to reconstruct the multi-channel signal or the spatial audio scene.
A closely related group of techniques, e.g. ‘BCC for Flexible Rendering’, are designed for efficient coding of individual audio objects rather than channels of the same multi-channel signal for the sake of interactively rendering them to arbitrary spatial positions and independently amplifying or suppressing single objects without any a priori encoder knowledge thereof. In contrast to common parametric multi-channel audio coding techniques (which convey a given set of audio channel signals from an encoder to a decoder), such object coding techniques allow rendering of the decoded objects to any reproduction setup, i.e. the user on the decoding side is free to choose a reproduction setup (e.g. stereo, 5.1 surround) according to his preference.
Following the object coding concept, parameters can be defined, which identify the position of an audio object in space, to allow for flexible rendering on the receiving side. Rendering at the receiving side has the advantage, that even non-ideal loudspeaker set-ups or arbitrary loudspeaker set-ups can be used to reproduce the spatial audio scene with high quality. In addition, an audio signal, such as, for example, a down-mix of the audio channels associated with the individual objects, has to be transmitted, which is the basis for the reproduction on the receiving side.
Both discussed approaches rely on a multi-channel speaker set-up at the receiving side, to allow for a high-quality reproduction of the spatial impression of the original spatial audio scene.
As previously outlined, there are several state-of-the-art techniques for parametric coding of multi-channel audio signals which are capable of reproducing a spatial sound image, which is—dependent on the available data rate—more or less similar to that of the original multi-channel audio content.
However, given some pre-coded audio material (i.e. spatial sound described by a given number of reproduction channel signals), such a codec does not offer any means for a-posteriori and interactive rendering of single audio objects according to the liking of the listener. On the other hand, there are spatial audio object coding techniques which are specially designed for the latter purpose, but since the parametric representations used in such systems are different from those for multi-channel audio signals, separate decoders are needed in case one wants to benefit from both techniques in parallel. The drawback that results from this situation is that, although the back-ends of both systems fulfill the same task, which is rendering of spatial audio scenes on a given loudspeaker setup, they have to be implemented redundantly, i.e. two separate decoders are necessitated to provide both functionalities.
Another limitation of the prior-art object coding technology is the lack of a means for storing and/or transmitting pre-rendered spatial audio object scenes in a backwards compatible way. The feature of enabling interactive positioning of single audio objects provided by the spatial audio object coding paradigm turns out to be a drawback when it comes to identical reproduction of a readily rendered audio scene.
Summarizing, one is confronted with the unfortunate situation that, although a multi-channel playback environment may be present which implements one of the above approaches, a further playback environment may be necessitated to also implement the second approach. It may be noted, that according to the longer history, channel-based coding schemes are much more common, such as, for example, the famous 5.1 or 7.1/7.2 multi-channel signals stored on DVD or the like.
That is, even if a multi-channel audio decoder and associated playback equipment (amplifier stages and loudspeakers) are present, a user needs an additional complete set-up, i.e. at least an audio decoder, when he wants to play back object-based coded audio data. Normally, the multi-channel audio decoders are directly associated to the amplifier stages and a user does not have direct access to the amplifier stages used for driving the loudspeakers. This is, for example, the case in most of the commonly available multi-channel audio or multimedia receivers. Based on existing consumer electronics, a user desiring to be able to listen to audio content encoded with both approaches would even need a complete second set of amplifiers, which is, of course, an unsatisfying situation.