Joint coding of the left (L) and right (R) channels of a stereo signal enables more efficient coding compared to independent coding of L and R. A common approach for joint stereo coding is mid/side (M/S) coding. Here, a mid (M) signal is formed by adding the L and R signals, e.g. the M signal may have the form
  M  =            1      2        ⁢                  (                  L          +          R                )            .      
Also, a side (S) signal is formed by subtracting the two channels L and R, e.g. the S signal may have the form
  S  =            1      2        ⁢                  (                  L          -          R                )            .      
In case of M/S coding, the M and S signals are coded instead of the L and R signals.
In the MPEG (Moving Picture Experts Group) AAC (Advanced Audio Coding) standard (see standard document ISO/IEC 13818-7), L/R stereo coding and M/S stereo coding can be chosen in a time-variant and frequency-variant manner. Thus, the stereo encoder can apply L/R coding for some frequency bands of the stereo signal, whereas M/S coding is used for encoding other frequency bands of the stereo signal (frequency variant). Moreover, the encoder can switch over time between L/R and M/S coding (time-variant). In MPEG AAC, the stereo encoding is carried out in the frequency domain, more particularly in the MDCT (modified discrete cosine transform) domain. This allows to adaptive choose either L/R or M/S coding in a frequency and also time variant manner. The decision between L/R and M/S stereo encoding may be based by evaluating the side signal: when the energy of the side signal is low, M/S stereo encoding is more efficient and should be used. Alternatively, for deciding between both stereo coding schemes, both coding schemes may be tried out and the selection may be based on the resuiting quantization efforts, i.e., the observed perceptual entropy.
An alternative approach to joint stereo coding is parametric stereo (PS) coding. Here, the stereo signal is conveyed as a mono downmix signal after encoding the downmix signal with a conventional audio encoder such as an AAC encoder. The downmix signal is a superposition of the L and R channels. The mono downmix signal is conveyed in combination with additional time-variant and frequency-variant PS parameters, such as the inter-channel (i.e. between L and R) intensity difference (IID) and the inter-channel cross-correlation (ICC). In the decoder, based on the decoded downmix signal and the parametric stereo parameters a stereo signal is reconstructed that approximates the perceptual stereo image of the original stereo signal. For reconstructing, a decorrelated version of the downmix signal is generated by a decorrelator. Such decorrelator may be realized by an appropriate all-pass filter. PS encoding and decoding is described in the paper “Low Complexity Parametric Stereo Coding in MPEG-4”, H. Purnhagen, Proc. Of the 7th Int. Conference on Digital Audio Effects (DAFx'04), Naples, Italy, Oct. 5-8, 2004, pages 163-168. The disclosure of this document is hereby incorporated by reference.
The MPEG Surround standard (see document ISO/IEC 23003-1) makes use of the concept of PS coding. In an MPEG Surround decoder a plurality of output channels is created based on fewer input channels and control parameters. MPEG Surround decoders and encoders are constructed by cascading parametric stereo modules, which in MPEG Surround are referred to as OTT modules (One-To-Two modules) for the decoder and R-OTT modules (Reverse-One-To-Two modules) for the encoder. An OTT module determines two output channels by means of a single input channel (downmix signal) accompanied by PS parameters. An OTT module corresponds to a PS decoder and an R-OTT module corresponds to a PS encoder. Parametric stereo can be realized by using MPEG Surround with a single OTT module at the decoder side and a single R-OTT module at the encoder side; this is also referred to as “MPEG Surround 2-1-2” mode. The bitstream syntax may differ, but the underlying theory and signal processing are the same. Therefore, in the following all the references to PS also include “MPEG Surround 2-1-2” or MPEG Surround based parametric stereo.
In a PS encoder (e.g. in a MPEG Surround PS encoder) a residual signal (RES) may be determined and transmitted in addition to the downmix signal. Such residual signal indicates the error associated with representing original channels by their downmix and PS parameters. In the decoder the residual signal may be used instead of the decorrelated version of the downmix signal. This allows to better reconstruct the waveforms of the original channels L and R. The use of an additional residual signal is e.g. described in the MPEG Surround standard (see document ISO/IEC 23003-1) and in the paper “MPEG Surround—The ISO/MPEG Standard for Efficient and Compatible Multi-Channel Audio Coding, J. Herre et al., Audio Engineering Convention Paper 7084, 122nd Convention, May 5-8, 2007. The disclosure of both documents, in particular the remarks to the residual signal therein, is herewith incorporated by reference.
PS coding with residual is a more general approach to joint stereo coding than M/S coding: M/S coding performs a signal rotation when transforming L/R signals into M/S signals. Also, PS coding with residual performs a signal rotation when transforming the L/R signals into downmix and residual signals. However, in the latter case the signal rotation is variable and depends on the PS parameters.
Due to the more general approach of PS coding with residual, PS coding with residual allows a more efficient coding of certain types of signals like a paned mono signal than M/S coding. Thus, the proposed coder allows to efficiently combine parametric stereo coding techniques with waveform based stereo coding techniques.
Often, perceptual stereo encoders, such as an MPEG AAC perceptual stereo encoder, can decide between L/R stereo encoding and M/S stereo encoding, where in the latter case a mid/side signal is generated based on the stereo signal. Such selection may be frequency-variant, i.e. for some frequency bands L/R stereo encoding may be used, whereas for other frequency bands M/S stereo encoding may be used.
In a situation where the L and R channels are basically independent signals, such perceptual stereo encoder would typically not use M/S stereo encoding since in this situation such encoding scheme does not offer any coding gain in comparison to L/R stereo encoding. The encoder would fall back to plain L/R stereo encoding, basically processing L and R independently.
In the same situation, a PS encoder system would create a downmix signal that contains both the L and R channels, which prevents independent processing of the L and R channels. For PS coding with a residual signal, this can imply less efficient coding compared to stereo encoding, where L/R stereo encoding or M/S stereo encoding is adaptively selectable.
Thus, there are situations where a PS coder outperforms a perceptual stereo coder with adaptive selection between L/R stereo encoding and M/S stereo encoding, whereas in other situations the latter coder outperforms the PS coder.