The present invention relates to digital audio signal processing, and more particularly to loudspeaker cross-talk cancellation devices and methods.
Cross-talk cancellation is an essential component of loudspeaker-based three-dimensional audio systems. For the case of stereo reproduction (two loudspeakers), cross-talk denotes the signal from the right speaker that is heard at the left ear and vice-versa. Without cross-talk, it is theoretically possible to generate virtual sound sources located at any angle from the listener by processing the signal using head-related transfer functions (HRTF) corresponding to the desired position of the virtual sound source. In a typical situation with cross-talk, however, the intended effect cannot be achieved properly.
The basic solution to eliminate cross-talk was proposed in B. Atal et al., U.S. Pat. No. 3,236,949 (1966). This solution consists of inverting the 2×2 matrix of the HRTFs from the two loudspeakers to the two ears. By applying the inverse matrix to the signals before reproduction at the loudspeakers, it is in principle possible to reproduce the original acoustic signals at the ears. The classical cross-talk cancellation method has received a few refinements, but remains essentially the same as in 1966. These refinements include: a matrix diagonalization method that dramatically reduces computational cost as described in D. Cooper et al, Prospects for Transaural Recording, 37 J. Audio Eng. Society 3-19 (1989) and a solution to widen the allowable area where the effect can be achieved (sweet spot) through a convenient choice of speaker angles as described in O. Kirkeby et al., The Stereo Dipole—A Virtual Source Imaging System Using Two Closely Spaced Loudspeakers, 46 J. Audio Eng. Society 387-395 (1998).
Nevertheless, cross-talk cancellation faces a number of limitations that continue to exist in spite of the great deal of research effort dedicated to their solutions. Some of the limitations are: (1) room reflections that occur in real-world listening situations; (2) imprecision of available HRTF data based on dummy-head measurements; (3) head movement; (4) ill-conditioned inverse HRTF matrices and consequent peaks in the magnitude spectrum. The approach proposed in the Kirkeby et al. article regarding problems (3) and (4) is to enforce a convenient speaker angle; while other approaches make use of least-squares optimization that requires feedback from microphones, as for example in P. Nelson et al., Adaptive Inverse Filters for Stereophonic Sound Reproduction, 40 IEEE Trans. Signal Proc. 1621-1632 (1992).
However, the limitations (1)-(4) persist without good robust solutions.