Recent development in audio coding has made available the ability to recreate a multi-channel representation of an audio signal based on a stereo (or mono) signal and corresponding control data. These methods differ substantially from older matrix based solutions such as Dolby Prologic, since additional control data is transmitted to control the re-creation, also referred to as up-mix, of the surround channels based on the transmitted mono or stereo channels.
Hence, such a parametric multi-channel audio decoder, e.g. MPEG Surround, reconstructs N channels based on M transmitted channels, where N>M, and the additional control data. The additional control data represents a significant lower data rate than transmitting the all N channels, making the coding very efficient while at the same time ensuring compatibility with both M channel devices and N channel devices.
These parametric surround coding methods usually comprise a parameterization of the surround signal based on IID (Inter channel Intensity Difference) or CLD (Channel Level Difference) and ICC (Inter Channel Coherence). These parameters describe power ratios and correlations, between channel pairs in the up-mix process. Further parameters also used in prior art comprise prediction parameters used to predict intermediate or output channels during the up-mix procedure.
Other developments in reproduction of multi-channel audio content have provided means to obtain a spatial listening impression using stereo headphones. To achieve a spatial listening experience using only the two speakers of the headphones, multi-channel signals are down mixed to stereo signals using HRTF (head related transfer functions), intended to take into account the extremely complex transmission characteristics of a human head for providing the spatial listening experience.
Another related approach is to use a conventional 2-channel playback environment and to filter the channels of a multi-channel audio signal with appropriate filters to achieve a listening experience close to that of the playback with the original number of speakers. The processing of the signals is similar as in the case of headphone playback to create an appropriate “spatial stereo down mix” having the desired properties. Contrary to the headphone case, the signal of both speakers directly reaches both ears of a listener, causing undesired “crosstalk effects”. As this has to be taken into account for optimal reproduction quality, the filters used for signal processing are commonly called crosstalk-cancellation filters. Generally, the aim of this technique is to extend the possible range of sound sources outside the stereo speaker base by cancellation of inherent crosstalk using complex crosstalk-cancellation filters.
Because of the complex filtering, HRTF filters are very long, i.e. they may comprise several hundreds of filter taps each. For the same reason, it is hardly possible to find a parameterization of the filters that works well enough not to degrade the perceptual quality when used instead of the actual filter.
Thus, on the one hand, bit saving parametric representations of multi-channel signals do exist that allow for an efficient transport of an encoded multi-channel signal. On the other hand, elegant ways to create a spatial listening experience for a multi-channel signal when using stereo headphones or stereo speakers only are known. However, these require the full number of channels of the multi-channel signal as input for the application of the head related transfer functions that create the headphone down mix signal. Thus, either the full set of multi-channels signals has to be transmitted or a parametric representation has to be fully reconstructed before applying the head related transfer functions or the crosstalk-cancellation filters and thus either the transmission bandwidth or the computational complexity is unacceptably high.