Recent development in audio coding has made methods available to recreate a multi-channel representation of an audio signal based on a stereo (or mono) signal and corresponding control data. These methods differ substantially from older matrix based solution such as Dolby Prologic, since additional control data is transmitted to control the re-creation, also referred to as up-mix, of the surround channels based on the transmitted mono or stereo channels.
Hence, such a parametric multi-channel audio decoder, e.g. MPEG Surround reconstructs N channels based on M transmitted channels, where N>M, and the additional control data. The additional control data represents a significantly lower data rate than that required for transmission of all N channels, making the coding very efficient while at the same time ensuring compatibility with both M channel devices and N channel devices. [J. Breebaart et al. “MPEG spatial audio coding/MPEG Surround: overview and current status”, Proc. 119th AES convention, New York, USA, October 2005, Preprint 6447].
These parametric surround coding methods usually comprise a parameterization of the surround signal based on Channel Level Difference (CLD) and Inter-channel coherence/cross-correlation (ICC). These parameters describe power ratios and correlation between channel pairs in the up-mix process. Further Channel Prediction Coefficients (CPC) are also used in prior art to predict intermediate or output channels during the up-mix procedure.
Other developments in audio coding have provided means to obtain a multi-channel signal impression over stereo headphones. This is commonly done by downmixing a multi-channel signal to stereo using the original multi-channel signal and HRTF (Head Related Transfer Functions) filters.
Alternatively, it would, of course, be useful for computational efficiency reasons and also for audio quality reasons to short-cut the generation of the binaural signal having the left binaural channel and the right binaural channel.
However, the question is how the original HRTF filters can be combined. Further a problem arises in a context of an energy-loss-affected upmixing rule, i.e., when the multi-channel decoder input signal includes a downmix signal having, for example, a first downmix channel and a second downmix channel, and further having spatial parameters, which are used for upmixing in a non-energy-conserving way. Such parameters are also known as prediction parameters or CPC parameters. These parameters have, in contrast to channel level difference parameters the property that they are not calculated to reflect the energy distribution between two channels, but they are calculated for performing a best-as-possible waveform matching which automatically results in an energy error (e.g. loss), since, when the prediction parameters are generated, one does not care about energy-conserving properties of an upmix, but one does care about having a good as possible time or subband domain waveform matching of the reconstructed signal compared to the original signal.
When one would simply linearly combine HRTF filters based on such transmitted spatial prediction parameters, one will receive artifacts which are especially serious, when the prediction of the channels performs poorly. In that situation, even subtle linear dependencies lead to undesired spectral coloring of the binaural output. It has been found out that this artifact occurs most frequently when the original channels carry signals that are pairwise uncorrelated and have comparable magnitudes.