The present disclosure relates to a method for performing an adaptive down-mixing and following up-mixing of a multi-channel audio signal. In particular, the method is related to down-mixing and up-mixing operations that are commonly used in multi-channel audio coding or spatial audio coding.
Conventional adaptive down-mixing methods use a down-mixing transformation that is signal-dependent. Depending on the particular realization of the signal the most efficient down-mixing transformation is selected from a set of available down-mixing transformations. For example, in the case of stereo coding the down-mixing transformation of the stereo coding scheme can be selected, from a set comprising two different down-mixing transformations comprising an identity transformation (so-called LR coding) and a transformation yielding a sum (so-called M/Mid-channel) and a difference of the input channels (so-called S/Side-channel).
Such a conventional coding scheme is typically referred to as M/S coding or Mid/Side coding. Further such a conventional M/S coding provides only a limited rate distortion gain since the set of available transforms is limited. Moreover, since a closed loop coding is used, the associated complexity can be large.
These drawbacks of M/S coding have been addressed by down-mixing methods where the down-mixing transformation is computed based on an interchannel covariance matrix as described in M. Briand, D. Virette and N. Martin “Parametric Coding of Stereo Audio Based on Principal Component Analysis”, Proc. of the 9th International Conference on Digital Audio Effects, Montreal, Canada, Sep. 28, 2006. Further, this approach is limited to a stereo signal and cannot be adapted to a larger number of input channels. An extension of this approach to a higher number of channels is described in D. Yang, H. Ai, C. Kyriakakis, and C.-C. J. Kuo, “Progressive Syntax-Rich Coding of Multichannel Audio Sources,” EURASIP Journal on Applied Signal Processing, vol. 2003, pp. 980-992, January 2003. But this approach does not allow generating a backward compatible downmix.
Another disadvantage associated with the usage of a fixed set of down-mixing transformations is the difficulty in finding a suitable set of down-mixing transformations for the general case. A further conventional down-mixing transformation has been proposed in G. Hotho, L. F. Villemoes and J. Breebaart “A Backward-Compatible Multichannel Audio Codec” IEEE Transactions on Audio, Speech and Language Processing, Vol. 16, No. 1, pp. 83 to 93, January 2008. This conventional method achieves a backward compatibility by combining a matrix down-mixing transformation with prediction of the secondary channels from the primary channels. This results in a parametric coding scheme where the parameters are prediction parameters. However, this conventional approach as described by Hotho et al. is only efficient when the number of channels is low. In addition, the coding performance of this conventional down-mixing approach is suboptimal in terms of rate distortion performance.
Conventional adaptive down-mixing methods either support an arbitrary number of channels but do not preserve the spatial characteristics of the original multi-channel audio signal, which means that the backward compatibility cannot be achieved, or they preserve the spatial characteristics of the original multi-channel audio signal in the generated down-mix but can only be used for multi-channel audio signals with a limited number of audio channels. Consequently, there is a need for a method and apparatus for performing an adaptive down-mixing of a multi-channel audio signal which allows preserving the spatial characteristics of the original multi-channel audio signal and which at the same time offer a backward compatibility.