Recent development in audio coding has made available the ability to recreate a multi-channel representation of an audio signal based on a stereo (or mono) signal and corresponding control data. These methods differ substantially from older matrix based solutions such as Dolby Prologic, since additional control data is transmitted to control the re-creation, also referred to as upmix, of the surround channels based on the transmitted mono or stereo channels.
Hence, such a parametric multi-channel audio decoder, e.g. MPEG Surround, reconstructs N channels based on M transmitted channels, where N>M, and the additional control data. The additional control data represents a significant lower data rate than transmitting all N channels, making the coding very efficient while at the same time ensuring compatibility with both M channel devices and N channel devices.
These parametric surround coding methods usually comprise a parameterization of the surround signal based on IID (Inter channel Intensity Difference) and ICC (Inter Channel Coherence). These parameters describe power ratios and correlation between channel pairs in the upmix process. Further parameters also used in prior art comprise prediction parameters used to predict intermediate or output channels during the upmix procedure.
Two famous examples of such multi-channel coding are BCC coding and MPEG surround. In BCC encoding, a number of audio input channels are converted to a spectral representation using a DFT (Discrete Fourier Transform) based transform with overlapping windows. The resulting uniform spectrum is then divided into non-overlapping partitions. Each partition has a bandwidth proportional to the equivalent rectangular bandwidth (ERB). Then, spatial parameters called ICLD (Inter-Channel Level Difference) and ICTD (Inter-Channel Time Difference) are estimated for each partition. The ICLD parameter describes a level difference between two channels and the ICTD parameter describes the time difference (phase shift) between two signals of different channels. The level differences and the time differences are given for each channel with respect to a common reference channel. After the derivation of these parameters, the parameters are quantized and encoded for transmission.
The individual parameters are estimated with respect to one single reference channel in BCC-coding. In other parametric surround coding systems, e.g. in MPEG surround, a tree-structured parameterization is used. This means, that the parameters are no longer estimated with respect to one single common reference channel but to different reference channels that may even be a combination of channels of the original multi-channel signal. For example, having a 5.1 channel signal, parameters may be estimated between a combination of the front channels and between a combination of the back channels.
Of course, backward compatibility to already established audio-standards is highly desirable also for the parametric coding schemes. For example, having a mono-downmix signal it is desirable to also provide a possibility to create a stereo-playback signal with high fidelity. This means that a monophonic downmix signal has to be upmixed into a stereo signal, making use of the additionally transmitted parameters in the best possible way.
One common problem in multi-channel coding is energy preservation in the upmix, as the human perception of the spatial position of a sound-source is dominated by the loudness of the signal, i.e. by the energy contained within the signal. Therefore, utmost care must be taken in the reproduction of the signal to attribute the right loudness to each reconstructed channel such as to avoid the introduction of artifacts strongly decreasing the perceptional quality of the reconstructed signal. As during the downmix amplitudes of signals are commonly summed up, the possibility of interference arises, being described by the correlation or coherence parameter.
When it comes to the reconstruction of a reduced number of channels (a number of channels smaller than the original number of channels of the multi-channel signal), schemes like BCC are simple to handle, since every parameter is transmitted with respect to the same single reference channel. Therefore, having knowledge on the reference channel, the most relevant level information (absolute energy measure) can easily be derived for every channel needed for the upmix. Thus, reduced number of channels can be reconstructed without the need to reconstruct the full multi-channel signal first. Thus, the energy computations for the energies of the multichannel signal is easier in BCC by using single variables rather than products of variables, but this is only a first step. When it comes to deriving energies and correlations of a reduced number of channels which should come as close as possible to partial downmixes of the original multichannel signals, the level of difficulty in MPEG Surround and BCC is comparable.
In contrast thereto, a tree-based structure as MPEG surround uses a parameterization in which the relevant information for each individual channel is not contained in a single parameter. Therefore, in prior art, reconstructing reduced numbers of channels requires the reconstruction of the multi channel signal followed by a downmix into the reduced numbers of channels to not violate the energy preservation requirement. This has the obvious disadvantage of extremely high computational complexity.