Audio spatialization refers to techniques that synthesize a virtual sound image in order for a listener to feel as though the synthesized sound originated by an actual source located at a certain position. Spatial audio differs from ordinary stereo in that spatialized audio may be perceived to come from a particular location relative to the listener.
Spatialized audio can be rendered by headphones or loudspeakers. Loudspeakers, however, lack practical inconveniences of headphones and are therefore preferred for certain applications, for example desktop environments, telepresence applications, etc. However, the quality of loudspeaker-based audio spatialization is generally lower, as it suffers from crosstalk caused by the contralateral audio paths (e.g., right speaker to left ear, left speaker to right ear) from the loudspeakers to the listener's ears. Such crosstalk often degrades the 3D cues of the spatialized audio (i.e., attributes of the sound that cause perception of space may be affected).
To address this problem, crosstalk cancellation techniques have been studied with the goal of eliminating or minimizing crosstalk by equalizing the acoustic transfer function between the loudspeakers and the listener's ear drums. To effectively cancel crosstalk, it is helpful to model the acoustic path from the loudspeaker to the listener's position. Such an acoustic path model is often represented as a matrix of transfer functions. Several methods to model transfer functions have been proposed. A simple approach is to use a free-field model, where the sound field radiated from a monopole in a free-field is computed based on the distances from the sources to the observation points. Under the assumption that the human head can be modeled as a sphere, the expression for the sound field produced by a sound wave impinging on a rigid sphere has been formulated. An improvement over the spherical head model has been to adopt a head related transfer function (HRTF). An HRTF is often measured in an anechoic chamber with dummy-heads to provide an acoustically realistic model of a human listener. Adding the direct path delay and attenuation of the sound wave, one can calculate accurate transfer functions between the loudspeakers and the listener and use the models for crosstalk cancellation.
Even with an HRTF or the like, crosstalk can be significant. Real-world environments with walls are often reverberant, which creates additional challenges for crosstalk cancellation. Conventional crosstalk cancellation degrades in a realistic listening room in which reverberation exists in general. Solutions such as careful layout (to improve direct-path dominance) and designing transfer functions that take into account room reverberation have been ineffective or impractical. Note that techniques that place a microphone at the center of user location (or sweet spot), will help with general room equalization, but it will not provide enough precision to help with crosstalk cancellation, as the RIR (room impulse response) will change significantly even with a few inches change in users' position. As of yet, there has been no practical approach to crosstalk cancellation that takes room reverberation into consideration.
Techniques related to audio crosstalk cancellation that involve practicable room modeling are discussed below.