The present invention is related to audio processing and, particularly, to multi-channel audio processing of a multi-channel signal having two or more channel signals.
It is known in the field of multi-channel or stereo processing to apply the so-called mid/side stereo coding. In this concept, a combination of the left or first audio channel signal and the right or second audio channel signal is formed to obtain a mid or mono signal M. Additionally, a difference between the left or first channel signal and the right or second channel signal is formed to obtain the side signal S. This mid/side coding method results in a significant coding gain, when the left signal and the right signal are quite similar to each other, since the side signal will become quite small. Typically, a coding gain of a quantizer/entropy encoder stage will become higher, when the range of values to be quantized/entropy-encoded becomes smaller. Hence, for a PCM or a Huffman-based or arithmetic entropy-encoder, the coding gain increases, when the side signal becomes smaller. There exist, however, certain situations in which the mid/side coding will not result in a coding gain. The situation can occur when the signals in both channels are phase-shifted to each other, for example, by 90°. Then, the mid signal and the side signal can be in a quite similar range and, therefore, coding of the mid signal and the side signal using the entropy-encoder will not result in a coding gain and can even result in an increased bit rate. Therefore, a frequency-selective mid/side coding can be applied in order to deactivate the mid/side coding in bands, where the side signal does not become smaller to a certain degree with respect to the original left signal, for example.
Although the side signal will become zero, when the left and right signals are identical, resulting in a maximum coding gain due to the elimination of the side signal, the situation once again becomes different when the mid signal and the side signal are identical with respect to the shape of the waveform, but the only difference between both signals is their overall amplitudes. In this case, when it is additionally assumed that the side signal has no phase-shift to the mid signal, the side signal significantly increases, although, on the other hand, the mid signal does not decrease so much with respect to its value range. When such a situation occurs in a certain frequency band, then one would again deactivate mid/side coding due to the lack of coding gain. Mid/side coding can be applied frequency-selectively or can alternatively be applied in the time domain.
There exist alternative multi-channel coding techniques which do not rely on a kind of a waveform approach as mid/side coding, but which rely on the parametric processing based on certain binaural cues. Such techniques are known under the term “binaural cue coding”, “parametric stereo coding” or “MPEG Surround coding”. Here, certain cues are calculated for a plurality of frequency bands. These cues include inter-channel level differences, inter-channel coherence measures, inter-channel time differences and/or inter-channel phase differences. These approaches start from the assumption that a multi-channel impression felt by the listener does not necessarily rely on the detailed waveforms of the two channels, but relies on the accurate frequency-selectively provided cues or inter-channel information. This means that, in a rendering machine, care has to be taken to render multi-channel signals which accurately reflect the cues, but the waveforms are not of decisive importance.
This approach can be complex particularly in the case, when the decoder has to apply a decorrelation processing in order to artificially create stereo signals which are decorrelated from each other, although all these channels are derived from one and the same downmix channel. Decorrelators for this purpose are, depending on their implementation, complex and may introduce artifacts particularly in the case of transient signal portions. Additionally, in contrast to waveform coding, the parametric coding approach is a lossy coding approach which inevitably results in a loss of information not only introduced by the typical quantization but also introduced by looking on the binaural cues rather than the particular waveforms. This approach results in very low bit rates but may include quality compromises.
There exist recent developments for unified speech and audio coding (USAC) illustrated in FIG. 7a. A core decoder 700 performs a decoding operation of the encoded stereo signal at input 701, which can be mid/side encoded. The core decoder outputs a mid signal at line 702 and a side or residual signal at line 703. Both signals are transformed into a QMF domain by QMF filter banks 704 and 705. Then, an MPEG Surround decoder 706 is applied to generate a left channel signal 707 and a right channel signal 708. These low-band signals are subsequently introduced into a spectral band replication (SBR) decoder 709, which produces broad-band left and right signals on the lines 710 and 711, which are then transformed into a time domain by the QMF synthesis filter banks 712, 713 so that broad-band left and right signals L, R are obtained.
FIG. 7b illustrates the situation when the MPEG Surround decoder 706 would perform a mid/side decoding. Alternatively, the MPEG Surround decoder block 706 could perform a binaural cue based parametric decoding for generating stereo signals from a single mono core decoder signal. Naturally, the MPEG Surround decoder 706 could also generate a plurality of low band output signals to be input into the SBR decoder block 709 using parametric information such as inter-channel level differences, inter-channel coherence measures or other such inter-channel information parameters.
When the MPEG Surround decoder block 706 performs the mid/side decoding illustrated in FIG. 7b, a real-gain factor g can be applied and DMX/RES and L/R are downmix/residual and left/right signals, respectively, represented in the complex hybrid QMF domain.
Using a combination of a block 706 and a block 709 causes only a small increase in computational complexity compared to a stereo decoder used as a basis, because the complex QMF representation of the signal is already available as part of the SBR decoder. In a non-SBR configuration, however, QMF-based stereo coding, as proposed in the context of USAC, would result in a significant increase in computational complexity because of the necessitated QMF banks which would necessitate in this example 64-band analysis banks and 64-band synthesis banks. These filter banks would have to be added only for the purpose of stereo coding.
In the MPEG USAC system under development, however, there also exist coding modes at high bit rates where SBR typically is not used.