Certain recently-introduced limited bit rate coding techniques analyze an input multi-channel signal to derive a downmix composite signal (a signal containing fewer channels than the input signal) and side-information containing a parametric model of the original sound field. The side-information and composite signal are transmitted to a decoder that applies the parametric model to the composite signal in order to recreate an approximation of the original sound field. The primary goal of such “spatial coding” systems is to recreate a multi-channel sound field with a very limited amount of data; hence this enforces limitations on the parametric model used to simulate the original sound field. Details of such spatial coding systems are contained in various documents, including those cited below under the heading “Incorporation by Reference.”
Such spatial coding systems typically employ parameters to model the original sound field such as interchannel amplitude differences, interchannel time or phase differences, and interchannel cross-correlation. Typically such parameters are estimated for multiple spectral bands for each channel being coded and are dynamically estimated over time.
A typical prior art spatial coding system is shown in FIGS. 1a (encoder) and 1b (decoder). Multiple input signals are converted to the frequency domain using an overlapped DFT (discrete frequency transform). The DFT spectrum is then subdivided into bands approximating the ear's critical bands. An estimate of the interchannel amplitude differences, interchannel time or phase differences, and interchannel correlation is computed for each of the bands. These estimates are utilized to downmix the original input signals into a monophonic composite signal. The composite signal along with the estimated spatial parameters are sent to a decoder where the composite signal is converted to the frequency domain using the same overlapped DFT and critical band spacing. The spatial parameters are then applied to their corresponding bands to create an approximation of the original multichannel signal.
In the decoder, application of the interchannel amplitude and time or phase differences is relatively straightforward, but modifying the upmixed channels so that their interchannel correlation matches that of the original multi-channel signal is more challenging. Typically, with the application of only amplitude and time or phase differences at the decoder, the resulting interchannel correlation of the upmixed channels is greater than that of the original signal, and the resulting audio sounds more “collapsed” spatially or less ambient than the original. This is often attributable to averaging values across frequency and/or time in order to limit the side information transmission cost. In order to restore a perception of the original interchannel correlation, some type of decorrelation must be performed on at least some of the upmixed channels. In the Breebaart et al AES Convention Paper 6072 and WO 03/090206 international application, cited below, a technique is proposed for imposing a desired interchannel correlation between two channels that have been upmixed from a single downmixed channel. The downmixed channel is first run through a decorrelation filter to produce a second decorrelated signal. The two upmixed channels are then each computed as linear combinations of the original downmixed signal and the decorrelated signal. The decorrelation filter is designed as a frequency dependent delay, in which the delay decreases as frequency increases. Such a filter has the desirable property of providing noticeable audible decorrelation while reducing temporal dispersion of transients. Also, adding the decorrelated signal with the original signal may not result in the comb filter effects associated with a fixed delay decorrelation filter.
The technique in the Breebaart et al paper and application is designed for only two upmix channels, but such a technique is desirable for an arbitrary number of upmix channels. Aspects of the present invention provide not only a solution for this more general multichannel decorrelation problem but also provide an efficient implementation in the frequency domain.