The present invention relates to parametric coding of the stereo image of an audio signal. Typical parameters used for describing stereo image properties are inter-channel intensity difference (IID), inter-channel time difference (ITD), and inter-channel coherence (IC). In order to re-construct the stereo image based on these parameters, a method is required that can re-construct the correct level of correlation between the two channels, according to the IC parameter. This is accomplished by a de-correlation method.
There are a couple of methods available for creation of decorrelated signals. Ideally, a linear time invariant (LTI) function with all-pass frequency response is desired. One obvious method for achieving this is by using a constant delay. However, using a delay, or any other LTI all-pass functions, will result in non-all-pass response after adding the non-processed signal. In the case of a delay, the result will be a typical comb-filter. The comb-filter often gives an undesirable “metallic” sound that, even if the stereo widening effect can be efficient, reduces much naturalness of the original.
Frequency domain methods for generating a de-correlated signal by adding a random sequence to the IID values along the frequency axis, where different sequences are used for the different audio channels, are also known from prior art. One problem with frequency domain decorrelation by the random sequence modifications is the introduction of pre-echoes. Subjective tests have shown that for non-stationary signals, pre-echoes are by far more annoying than post-echoes, which is also well supported by established psycho acoustical principles. This problem could be reduced by dynamically adapting transform sizes to the signal characteristics in terms of transient content. However, switching transform sizes is always a hard (i.e., binary) decision that affects the full signal bandwidth and that can be difficult to accomplish in a robust manner.
United States patent application publication US 2003/0219130 A1 discloses a coherence-based audio coding and synthesis. In particular, an auditory scene is synthesized from a mono audio signal by modifying, for each critical band, an auditory scene parameter such as an inter-aural level difference (ILD) and/or an inter-aural time difference (ITD) for each subband within the critical band, where the modification is based on an average estimated coherence for the critical band. The coherence-based modification produces auditory scenes having object widths, which more accurately match the widths of the objects in the original input auditory scene. Stereo parameters are the well-known BCC parameters, wherein BCC stands for binaural cue coding. When generating two different decorrelated output channels, frequency coefficients as obtained by a discrete Fourier transform are grouped together in a single critical band. Based on the inter-channel coherence measure, weighting factors are multiplied by a pseudo-random sequence which is preferably chosen such that the variance is approximately constant for all critical bands, and the average is “0” within each critical band. The same sequence is applied to the spectral coefficients of each different frame.