Joint coding of the left (L) and right (R) channels of a stereo signal enables more efficient coding compared to independent coding of L and R. A common approach for joint stereo coding is mid/side (M/S) coding. Here, a mid (M) signal is formed by adding the L and R signals, e.g. the M signal may have the formM=(L+R)/2Also, a side (S) signal is formed by subtracting the two channels L and R, e.g., the S signal may have the formS=(L−R)/2In the case of M/S coding, the M and S signals are coded instead of the L and R signals.
In the MPEG (Moving Picture Experts Group) AAC (Advanced Audio Coding) standard (see standard document ISO/IEC 13818-7), L/R stereo coding and M/S stereo coding can be chosen in a time-variant and frequency-variant manner. Thus, the stereo encoder can apply L/R coding for some frequency bands of the stereo signal, whereas M/S coding is used for encoding other frequency bands of the stereo signal (frequency variant). Moreover, the encoder can switch over time between L/R and M/S coding (time-variant). In MPEG AAC, the stereo encoding is carried out in the frequency domain, more particularly the MDCT (modified discrete cosine transform) domain. This allows choosing adaptively either L/R or M/S coding in a frequency and also time variable manner.
Parametric stereo coding is a technique for efficiently coding a stereo audio signal as a monaural signal plus a small amount of side information for stereo parameters. It is part of the MPEG-4 Audio standard (see standard document ISO/IEC 14496-3). The monaural signal can be encoded using any audio coder. The stereo parameters can be embedded in the auxiliary part of the mono bit stream, thus achieving full forward and backward compatibility. In the decoder, it is the monaural signal that is first decoded, after which the stereo signal is reconstructed with the aid of the stereo parameters. A decorrelated version of the decoded mono signal, which has zero cross correlation with the mono signal, is generated by means of a decorrelator, e.g., an appropriate all-pass filter which may include one or more delay lines.
Essentially, the decorrelated signal has the same spectral and temporal energy distribution as the mono signal. The monaural signal together with the decorrelated signal are input to the upmix process which is controlled by the stereo parameters and which reconstructs the stereo signal. For further information, see the paper “Low Complexity Parametric Stereo Coding in MPEG-4”, H. Purnhagen, Proc. of the 7th Int. Conference on Digital Audio Effects (DAFx'04), Naples, Italy, Oct. 5-8, 2004, pages 163-168.
MPEG Surround (MPS; see ISO/IEC 23003-1 and the paper “MPEG Surround—The ISO/MPEG Standard for Efficient and Compatible Multi-Channel Audio Coding”, J. Herre et al., Audio Engineering Convention Paper 7084, 122nd Convention, May 5-8, 2007) allows combining the principles of parametric stereo coding with residual coding, substituting the decorrelated signal with a transmitted residual and hence improving the perceptual quality. Residual coding may be achieved by downmixing a multi-channel signal and, optionally, by extracting spatial cues. During the process of downmixing, residual signals representing the error signal are computed and then encoded and transmitted. They may take the place of the decorrelated signals in the decoder. In a hybrid approach, they may replace the decorrelated signals in certain frequency bands, preferably in relatively low bands.
According to the current MPEG Unified Speech and Audio Coding (USAC) system, of which two examples are shown in FIG. 1, the decoder comprises a complex-valued quadrature mirror filter (QMF) bank located downstream of the core decoder. The QMF representation obtained as the output of the filter bank is complex—thus oversampled by a factor two—and can be arranged as a downmix signal (or, equivalently, mid signal) M and a residual signal D, to which an upmix matrix with complex entries is applied. The L and R signals (in the QMF domain) are obtained as:
      [                            L                                      R                      ]    =            g      ⁡              [                                                            1                -                α                                                    1                                                                          1                +                α                                                                    -                1                                                    ]              ⁡          [                                    M                                                D                              ]      where g is a real-valued gain factor and α is a complex-valued prediction coefficient. Preferably, α is chosen such that the energy of the residual signal D is minimized. The gain factor may be determined by normalization, that is, to ensure that the power of the sum signal is equal to the sum of the powers of the left and right signals. The real and imaginary parts of each of the L and R signals are mutually redundant—in principle, each of them can be computed on the basis of the other—but are beneficial for enabling the subsequent application of a spectral band replication (SBR) decoder without audible aliasing artifacts occurring. The use of an oversampled signal representation may also, for similar reasons, be chosen with the aim of preventing artifacts connected with other time- or frequency-adaptive signal processing (not shown), such as the mono-to-stereo upmix. Inverse QMF filtering is the last processing step in the decoder. It is noted that the band-limited QMF representation of the signal allows for band-limited residual techniques and “residual fill” techniques, which may be integrated into decoders of this type.
The above coding structure is well suited for low bit rates, typically below 80 kb/s, but is not optimal for higher bit rates with respect to computational complexity. More precisely, at higher bitrates, the SBR tool is typically not utilized (as it would not improve coding efficiency). Then, in a decoder without a SBR stage, only the complex-valued upmix matrix motivates the presence of the QMF filter bank, which is computationally demanding and introduces a delay (at a frame length of 1024 samples, the QMF analysis/synthesis filter bank introduces a delay of 961 samples). This clearly indicates a need for a more efficient coding structure.