Within the field of audio coding it is generally desired to encode an audio signal in order to reduce the bit rate without unduly compromising the perceptual quality of the audio signal. The reduced bit rate is advantageous for limiting the bandwidth when communicating the audio signal or the amount of storage required for storing the audio signal.
Parametric descriptions of audio signals have gained interest during the last years, especially in the field of audio coding. It has been shown that transmitting (quantized) parameters which describe audio signals require only a limited transmission capacity to enable to synthesize perceptually substantially equal audio signals at the receiving end.
US2003/0026441 discloses the synthesizing of an auditory scene by applying two or more different sets of one or more spatial parameters (e.g. an inter-ear level difference ILD, or an inter-ear time difference ITD) to two or more different frequency bands of a combined audio signal, wherein each different frequency band is treated as if it corresponds to a single audio source in the auditory scene. In one embodiment, the combined audio signal corresponds to the combination of the left and right audio signals of a binaural signal corresponding to an input auditory scene. The different sets of spatial parameters are applied to reconstruct the input auditory scene. The transmission bandwidth requirements are reduced by reducing to one the number of different audio signals that need to be transmitted to a receiver configured to synthesize/reconstruct the auditory scene.
In the transmitter, a TF transform is applied to corresponding parts of each of the left and right audio signals of the input binaural signal to convert the signals to the frequency domain. An auditory scene analyzer processes the converted left and right audio signals in the frequency domain to generate a set of auditory scene parameters for each one of a plurality of different frequency bands in those converted signals. For each corresponding pair of frequency bands, the analyzer compares the converted left and right audio signals to generate one or more spatial parameters. In particular, for each frequency band, the cross-correlation function between the converted left and right audio signals is estimated. The maximum value of the cross-correlation indicates how much the two signals are correlated. The location in time of the maximum of the cross-correlation corresponds to the ITD. The ILD can be obtained by computing the level difference of the power values of the left and right audio signals.