1. Field of the Invention
The present invention relates to coding of multi-channel audio signals using spatial parameters and in particular to new improved concepts for generating and using de-correlated signals.
2. Description of the Related Art
Recently, multi-channel audio reproduction techniques are becoming more and more important. In the view of an efficient transmission of multi-channel audio signals having 5 or more separate audio channels, several ways of compressing a stereo or multi-channel signal have been developed. Recent approaches for the parametric coding of multi-channel audio signals (parametric stereo (PS), “Binaural Cue Coding” (BCC) etc.) represent a multi-channel audio signal by means of a down-mix signal (could be monophonic or comprise several channels) and parametric side information, also referred to as “spatial cues”, characterizing its perceived spatial sound stage.
A multi-channel encoding device generally receives—as input—at least two channels, and outputs one or more carrier channels and parametric data. The parametric data is derived such that, in a decoder, an approximation of the original multi-channel signal can be calculated. Normally, the carrier channel (channels) will include sub-band samples, spectral coefficients, time domain samples, etc., which provide a comparatively fine representation of the underlying signal, while the parametric data do not include such samples of spectral coefficients but include control parameters for controlling a certain reconstruction algorithm instead. Such a reconstruction could comprise weighting by multiplication, time shifting, frequency shifting, phase shifting, etc. Thus, the parametric data includes only a comparatively coarse representation of the signal or the associated channel.
The binaural cue coding (BCC) technique is described in a number of publications, as in “Binaural Cue Coding applied to Stereo and Multi-Channel Audio Compression”, C. Faller, F. Baumgarte, AES convention paper 5574, May 2002, Munich, in the 2 ICASSP publications “Estimation of auditory spatial cues for binaural cue coding”, and “Binaural cue coding: a normal and efficient representation of spatial audio”, both authored by C. Faller, and F. Baumgarte, Orlando, Fla., May 2002.
In BCC encoding, a number of audio input channels are converted to a spectral representation using a DFT (Discrete Fourier Transform) based transform with overlapping windows. The resulting uniform spectrum is then divided into non-overlapping partitions. Each partition has a bandwidth proportional to the equivalent rectangular bandwidth (ERB). Then, spatial parameters called ICLD (Inter-Channel Level Difference) and ICTD (Inter-Channel Time Difference) are estimated for each partition. The ICLD parameter describes a level difference between two channels and the ICTD parameter describes the time difference (phase shift) between two signals of different channels. The level differences and the time differences are normally given for each channel with respect to a reference channel. After the derivation of these parameters, the parameters are quantized and finally encoded for transmission.
Although ICLD and ICTD parameters represent the most important sound source localization parameters, a spatial representation using these parameters can be enhanced by introducing additional parameters.
A related technique, called “parametric stereo” describes the parametric coding of a two-channel stereo signal based on a transmitted mono signal plus parameter side information. In this context, 3 types of spatial parameters, referred to as inter-channel intensity difference (IIDs), inter-channel phase differences (IPDs), and inter-channel coherence (ICC) are introduced. The extension of the spatial parameter set with a coherence parameter (correlation parameter) enables a parametrization of the perceived spatial “diffuseness” or spatial “compactness” of the sound stage. Parametric stereo is described in more detail in: “Parametric Coding of stereo audio”, J. Breebaart, S. van de Par, A. Kohlrausch, E. Schuijers (2005) Eurasip, J. Applied Signal Proc. 9, pages 1305-1322)”, in “High-Quality Parametric Spatial Audio Coding at Low Bitrates”, J. Breebaart, S. van de Par, A. Kohlrausch, E. Schuijers, AES 116th Convention, Preprint 6072, Berlin, May 2004, and in “Low Complexity Parametric Stereo Coding”, E. Schuijers, J. Breebaart, H. Purnhagen, J. Engdegard, AES 116th Convention, Preprint 6073, Berlin, May 2004.
The present invention relates to parametric coding of the spatial properties of an audio signal. Parametric multi-channel audio decoders reconstruct N channels based on M transmitted channels, where N>M, and additional control data. The additional control data represents a significant lower data rate than transmitting all N channels, making the coding very efficient while at the same time ensuring compatibility with at least both M channel devices and N. channel devices. Typical parameters used for describing spatial properties are inter-channel intensity differences (IID), inter-channel time differences (ITD), and inter-channel coherences (ICC). In order to reconstruct the spatial properties based on these parameters, a method is required that can reconstruct the correct level of correlation between two or more channels, according to the IC parameters. This is accomplished by means of a de-correlation method, i.e. a method to derive decorrelated signals from transmitted signals to combine decorrelated signals with transmitted signals within some upmixing process. Methods for upmixing based on a transmitted signal, a decorrelated signal, and IID/ICC parameters is described in the references given above.
There are a couple of methods available for creation of decorrelated signals. Preferably, the decorrelated signals have similar or equal temporal and spectral envelopes as the original input signals. Ideally, a linear time invariant (LTI) function with all-pass frequency response is desired. One obvious method for achieving this is by using a constant delay. However, using a delay, or any other LTI all-pass function, will result in non-all-pass response after addition of the non-processed signal. In the case of a delay, the result will be a typical comb-filter. The comb-filter often gives an undesirable “metallic” sound that, even if the stereo widening effect can be efficient, reduces much naturalness of the original. The constant delay method and other prior art methods suffer from the inability to create more than one de-correlated signal while preserving quality and mutual de-correlation.
The perceptual quality of a reconstructed multi-channel audio signal therefore depends strongly on an efficient concept that allows for the generation of a de-correlated signal from a transmitted signal, wherein ideally the de-correlated signal is orthogonal to the signal from which it is derived, i.e. perfectly de-correlated. Even if a perfectly de-correlated signal is available, a multi-channel upmix in which the individual channels are mutually de-correlated cannot be derived using a single de-correlated signal. During the upmixing a reconstructed audio channel is generated by combining a transmitted signal with the generated de-correlated signal, whereas the extent to which the de-correlated signal is mixed to the transmitted signal is typically controlled by a transmitted spatial audio parameter (ICC). Mutually perfectly de-correlated signals can therefore not be achieved, since every reconstructed audio channel has a fraction of the same de-correlated signal.