1. Field of the Invention
The present invention relates to multi-channel reconstruction of audio signals based on an available stereo signal and additional control data.
2. Description of Prior Art
Recent development in audio coding has made available the ability to recreate a multi-channel representation of an audio signal based on a stereo (or mono) signal and corresponding control data. These methods differ substantially from older matrix based solution such as Dolby Prologic, since additional control data is transmitted to control the re-creation, also referred to as up-mix, of the surround channels based on the transmitted mono or stereo channels.
Hence, the parametric multi-channel audio decoders reconstruct N channels based on M transmitted channels, where N>M, and the additional control data. The additional control data represents a significant lower data rate than transmitting the additional N-M channels, making the coding very efficient while at the same time ensuring compatibility with both M channel devices and N channel devices.
These parametric surround coding methods usually comprise a parameterisation of the surround signal based on IID (Inter channel Intensity Difference) and ICC (Inter Channel Coherence). These parameters describe power ratios and correlation between channel pairs in the up-mix process. Further parameters also used in prior art comprise prediction parameters used to predict intermediate or output channels during the up-mix procedure.
One of the most appealing usage of prediction based method as described in prior art is for a system that re-creates 5.1 channel from two transmitted channels. In this configuration a stereo transmission is available at the decoder side, which is a downmix of the original 5.1 multichannel signal. In this context it is particularly interesting to be able to as accurately as possible extract the center channel from the stereo signal, since the center channel is usually downmixed to both the left and the right downmix channel. This is done by means of estimating two prediction coefficients describing the amount of each of the two transmitted channels used to build the center channel. These parameters are estimated for different frequency regions similarly to the IID and ICC parameters above.
However, since the prediction parameters do not describe a power ratio of two signals, but are based on wave-form matching in a least square error sense, the method becomes inherently sensitive to any modification of the stereo waveform after the calculation of the prediction parameters.
Further developments in audio coding over the recent years has introduced High Frequency Reconstruction methods as a very useful tool in audio codecs at low bitrates. One example is SBR (Spectral Band Replication) [WO 98/57436], that is used in MPEG standardized codecs such as MPEG-4 High Efficiency AAC. Common for these methods are that they re-create the high frequencies on the decoder side from a narrow-band signal coded by the underlying core-codec and a small amount of additional guidance information. Similar to the case of the parametric reconstruction of multi-channel signals based on one or two channels, the amount of control data required to re-create the missing signal components (in the case of SBR, the high frequencies), is significantly smaller than the amount of data that would be required to code the entire signal with a wave-form codec.
It should be understood however, that the re-created highband signal, is perceptually equal to the original highband signal, while the actual wave-form differs significantly. Furthermore, for wave-form coders coding stereo signals at low bitrate stereo pre-processing is commonly used, which means that a limitation on the side signal of the mid/side representation of the stereo signal is performed.
When a multi-channel representation is desired based on a stereo codec signal using MPEG-4 High Efficiency AAC or any other codec utilising high frequency reconstruction techniques, these and other aspects of the codec used to code the down-mixed stereo signal must be considered.
Even further, it is common that for a recording available as a multi-channel audio signal there is a dedicated stereo mix available, that is not an automated down-mix version of the multi-channel signal. This is commonly referred to as “artistic down-mix”. This down-mix cannot be expressed as a linear combination of the multi-channel signals.