Three-dimensional audio systems create an "immersive" auditory environment, where sounds can appear to originate from any direction with respect to the listener. Using "binaural synthesis" techniques, it is currently possible to deliver three-dimensional audio scenes through a pair of loudspeakers or headphones. Using loudspeakers involves greater complexity due to interference between acoustic outputs that does not occur with headphones. Consequently, a loudspeaker implementation requires not only synthesis of appropriate directional cues, but also further processing of the signals so that, in the acoustic output, sounds that would interfere with the spatial illusion provided by these cues are canceled. Existing systems require the listener to assume a fixed position with respect to the loudspeakers, because the cancellation functions correctly only in this orientation. If the listener moves outside a narrow equalization zone or "sweet spot," the illusion is lost.
It is well known that directional cues are embodied in the transformation of sound pressure from the free field to the ears of a listener; see Jens Blauert, Spatial Hearing (1983). A "head-related transfer function" (HRTF) represents a measurement of this transformation for a specific sound location relative to the listener's head, and describes the diffraction of sound by the torso, head, and external ear (pinna). Consequently, a pair of HRTFs, based on a known or assumed spatial location of the sound source, process sound signals so they appear to the listener to emanate from the source location--that is, the HRTFs produce a "binaural" signal.
It is straightforward to synthesize directional cues by convolving a sound with the appropriate HRTFs, thereby creating a synthetic binaural signal. When this is done using HRTFs designed for a particular listener, localization performance essentially matches free-field listening; see Wightman et al., J. Acoust. Soc. Am. 85(2):858-867 and 868-878 (1989). The use of non-individualized HRTFs-that is, HRTFs designed generically and not for a particular listener--results in poorer localization performance, particularly regarding front-back confusion and elevation judgments; see Wenzel et al., J. Acoust. Soc. Am. 94(1):111-123 (1993).
The sound travelling from a loudspeaker to the listener's opposite ear is called "crosstalk," and results in interference with the directional components encoded in the loudspeaker signals. That is, for each ear, sounds from the contralateral speaker will interfere with binaural signals from the ipsilateral speaker unless corrective steps are taken. Loudspeaker-based binaural systems, therefore, require crosstalk-cancellation systems. Such systems typically model sound emanating from the speakers and reaching the ears is using transfer functions; in particular, the transfer functions from two speakers to two ears form a 2.times.2 system transfer matrix. Crosstalk cancellation involves pre-filtering the signals with the inverse of this matrix before sending the signals to the speakers; in this way, the contralateral output is effectively canceled for each of the listener's ears.
Crosstalk cancellation using non-individualized head models (i.e., HRTFs) is only effective at low frequencies, where considerable similarity exists between the head responses of different individuals (since at low frequencies the wavelength of sound approaches or exceeds the size of a listener's head). Despite this limitation, existing crosstalk-cancellation systems are quite effective at producing realistic three-dimensional sound images, particularly for laterally located sources. This is because the low-frequency interaural phase cues are of paramount importance to sound localization; when conflicting high- and low-frequency localization cues are presented to a subject, the sound will usually be perceived at the position indicated by the low-frequency cues (see Wightman et al., J. Acoust. Soc. Am. 91(3):1648-1661 (1992)). Accordingly, the cues most critical to sound localization are the ones most effectively treated by crosstalk cancellation.
Existing crosstalk-cancellation systems usually assume a symmetric listening situation, with the listener located directly between the speakers and facing forward. The assumption of symmetry leads to simplified implementations, such as the shuffler topology described in Cooper et al., J. Audio Eng Soc. 37(1/2):3-19 (1989). One can compensate for a laterally displaced listener by delaying and attenuating one of the output channels (see U.S. Pat. Nos. 4,355,203 and 4,893,342). It is also possible to reformat the loudspeaker signals for different loudspeaker spread angles, as described, for example, in the '342 patent. It has not, however, been possible to maintain a binaural signal for a moving listener, or even for one whose head rotates.